Guus Sliepen [Thu, 28 Nov 2019 21:24:05 +0000 (22:24 +0100)]
Sync the base configuration directory at the end of meshlink_join().
While joining a mesh, we create a new current/ subdirectory. While the
contents were already synced to disk, we need to ensure the subdirectory
itself is also synced before returning.
Guus Sliepen [Thu, 14 Nov 2019 20:48:02 +0000 (21:48 +0100)]
Fix logic error preventing fast update of reflexive address.
When we are trying to communicate with peers that don't know our
reflexive address, and we just learned our own one, we want to inform
those peers of it immediately, so they can send PMTU probes to the right
address. A logic error prevented this from happening in the common case.
Guus Sliepen [Sat, 9 Nov 2019 16:57:55 +0000 (17:57 +0100)]
Fix __warn_unused_result__, add more of it and fix the resulting warnings.
Due to a bug in the autoconf test for function attributes, we were always
disabling __warn_unused_result__. Fix this, and add this function attribute
to a lot more functions whose results are definitely important.
This change makes it clear where we ignore the results of a function that
might fail. The proper fix in most cases is to propagate the result to the
caller. For meshlink_blacklist() and meshlink_whitelist(), we were not
return an error condition, even if we might fail to commit the blacklist
operation to permanent storage. So we now make these functions return a
bool.
Guus Sliepen [Sat, 9 Nov 2019 16:52:19 +0000 (17:52 +0100)]
Use a separate lockfile to lock the configuration directory.
We can't use meshlink.conf as the lock file, since it can move around.
Instead, create meshlink.lock right below confbase, and keep it always
locked while a meshlink handle is valid.
meshlink_destroy() will also use this lockfile to ensure we don't destroy
a directory that is still in use, and prevent race conditions between
meshlink_destroy() and meshlink_open().
Guus Sliepen [Tue, 5 Nov 2019 19:46:58 +0000 (20:46 +0100)]
Refuse invitees if we can't delete the invitation file.
If the call to unlink() or a subsequent sync of the invitation directory
fails, don't allow the invitee access, to prevent an invitation from
being used twice.
Guus Sliepen [Thu, 31 Oct 2019 21:26:25 +0000 (22:26 +0100)]
Allow nodes to learn their own reflexive UDP address.
When a node gets a succesful UDP probe reply, it informs the peer of the
UDP address and port that it has. The peer can then use this information to
inform other nodes it wants to communicate with.
Currently, this is only done to close the time window where two nodes have
established an SPTPS session via a third node, while the third node hasn't
learned of the two nodes' UDP addresses, in which case it was too late for
the third node to assist with hole punching.
Guus Sliepen [Thu, 31 Oct 2019 19:46:36 +0000 (20:46 +0100)]
Try to get a new reflexive UDP address if UDP probes failed.
If we are sending data to a node, but we don't have working UDP, it could
be because we don't have a good reflexive UDP address. Send a dummy ANS_KEY
once in a while as long as we still want to exchange data.
Guus Sliepen [Mon, 28 Oct 2019 20:14:17 +0000 (21:14 +0100)]
Don't call terminate_connection() from meshlink_blacklist().
If meshlink_blacklist() is called from a callback function, this can result
in a use-after-free bug. Instead, shut down the socket, so the event loop
will take care of it.
Guus Sliepen [Mon, 28 Oct 2019 20:12:14 +0000 (21:12 +0100)]
Set meshlink_errno when trying to create a channel to a blacklisted node.
Create a new errno value MESHLINK_EBLACKLISTED, which is used when trying
to send something or create a channel to a blacklisted node. Also improve
some log messages.
Guus Sliepen [Sun, 27 Oct 2019 13:02:47 +0000 (14:02 +0100)]
Restart UDP SPTPS when a node reconnects with a new session ID.
If a node restarts, but its old connection was not considered closed
yet, the graph algorithm never saw it go down. We still want to ensure
we restart UDP SPTPS in this case.
Guus Sliepen [Mon, 14 Oct 2019 19:34:54 +0000 (21:34 +0200)]
Remove support for broadcast packets.
This is not used in MeshLink. Removing support for this also means we no
longer have to calculate a minimum spanning tree whenever the graph is
updated.
Guus Sliepen [Sun, 13 Oct 2019 21:32:03 +0000 (23:32 +0200)]
Allow the mesh to detect when a node has completely restarted.
When calling meshlink_open(), a node creates a unique ID that is passed
along ADD_EDGE messages. When a node becomes unreachable and then reachable
again, this allows other nodes to detect whether it was just a temporary
network issue, or whether the node completely restarted (either because
meshlink_close() was called or because it crashed).
At the moment, when this is detected, other nodes close all open channels
with the restarted node.
Guus Sliepen [Thu, 10 Oct 2019 20:12:36 +0000 (22:12 +0200)]
Correctly handle incoming retransmissions of SYN packets.
If the SYNACK got lost, the peer that initiated a channel would retransmit
the SYN packet. However, the responder would ignore the retransmitted SYN
packet, causing the channel to stall and eventually time out.
Guus Sliepen [Thu, 10 Oct 2019 19:23:58 +0000 (21:23 +0200)]
Don't load config files partially.
There are various points where we update node information and store a new
host config file to disk. However, at startup we read only part of the host
config files. This allowed a corner case where we never read the full host
config file, but did write it to disk, losing information in the process.
Guus Sliepen [Thu, 10 Oct 2019 18:50:43 +0000 (20:50 +0200)]
Fix spurious channel closures after meshlink_stop()+meshlink_start().
When restarting the mesh without fully closing the handle, channels should
continue to work MeshLink was only stopped briefly. However, we were
accidentily always setting the connection timeout when a node's UDP status
changed, which would cause channels that didn't have any unsent data to
time out.
Guus Sliepen [Mon, 7 Oct 2019 19:00:01 +0000 (21:00 +0200)]
Add an error callback.
Normally, API functions report potential errors. However, it might happen
that the background thread runs into a serious error that prevents
MeshLink from operating as expected. Add a callback that is called in those
cases.
Currently, the only time it is called is when it can not create or modify
config files in the background thread.
Guus Sliepen [Mon, 7 Oct 2019 10:53:18 +0000 (12:53 +0200)]
Refactor the non-blackbox test suite.
- Add a default log callback function to utils.[ch]
- Use the functions from utils.[ch] where appropriate.
- Use assert() where possible.
- Make functions and variables static where possible.
- Remove the need for wrapper scripts.
Guus Sliepen [Sat, 5 Oct 2019 17:40:15 +0000 (19:40 +0200)]
Replace rand() by xoshiro256** with per-mesh state.
We need a reasonable fast PRNG that doesn't have to be secure, for things
like timer randomization, port number generation and so on. Already some
platforms warn about the use of rand(). Also, when calling fork(), both
parent and child have the same PRNG seed, which causes some inefficiencies
in the test suite.
Guus Sliepen [Sat, 5 Oct 2019 12:15:35 +0000 (14:15 +0200)]
Add assert() calls to the library.
To aid in debugging, start using assert() to ensure preconditions hold.
At the moment, we assume that NULL-pointer dereferences will always cause
segfaults, so we don't add assert(ptr) statements in those cases, but that
might change in the future.
Guus Sliepen [Fri, 4 Oct 2019 14:55:48 +0000 (16:55 +0200)]
Have meshlink_get_node() and _submesh() set MESHLINK_ENOENT when appropriate.
These functions can return NULL both when the parameters are invalid or if
the node or submesh does not exist, meshlink_errno must be set correctly
to distinguish between the two cases.
We were calling fclose() inside config_read_file(), which never called
fopen() itself. It is the caller's responsibility to close the file on
error. Also fix two error cases where the caller forgot to close fclose().
Update mesh->loop.now right after select() returns.
It can be that select() waits for a few seconds before an event or
timeout arives. We didn't update mesh->loop.now before calling any of
the callback functions, which meant they could use a time a few seconds
in the past. In particular, the last_ping_time of connections could be
set to a value such that they would immediately be considered timed out.
Reset UTCP timers when learning a node's public key.
If we try to open a channel to a node whose public key we do not yet
have, a REQ_KEY request is sent out. When the answer comes back we know
the key, but we have to tell UTCP to reset the timers, otherwise we have
to wait for a retransmission timeout to occur before resending the SYN
packet.
Simulate three simultaneous UDP channels transmitting 40 Mbps. Check that
>95% of the packets arrive, and that there is no TCP-like merging or
splitting of packets. This also checks that channels can be opened and
closed like regular TCP channels.
Normally the appname is used as the service name as well. However,
Catta's rules are that the service name must be at least two characters,
and only contain alphanumeric characters, dashses and underscores.
Correctly update our own host config file after meshlink_set_port().
We wrote the host config file before we updated mesh->self, causing the
old port number to remain in the host config file. This would cause
subsequent calls to meshlink_invite() to have the wrong port number in
the invitation.
Call fsync() on the configuration directories where appropriate.
Just ensuring individual config files are written atomically is good enough
to keep internal consistency, but we want to make sure directory metadata
is also synced to disk when returning from functions that expect the changes
to have been made permanent, such as meshlink_import() and
meshlink_blacklist(). Also do this when we create the initial directory
structure.
We already took care of syncing directory metadata when rotating keys for
encrypted storage.
Guus Sliepen [Thu, 29 Aug 2019 21:26:29 +0000 (23:26 +0200)]
Add support for AIO using filedescriptors.
This adds support to enqueue transmits between channels and filedescriptors.
Currently, it requires that read() and write() calls on the filedescriptors
are non-blocking and always succeed, which limits it to reading from and
writing to files.
Guus Sliepen [Sun, 18 Aug 2019 14:54:58 +0000 (16:54 +0200)]
Remove redundant call to graph().
When we receive an ADD_EDGE for an existing edge but with new information,
don't call graph() between removing the old edge and adding the new one.
This prevents a node from temporarily being considered unreachable, which
in turn would cause the SPTPS state to that node being reset.
Guus Sliepen [Thu, 15 Aug 2019 20:41:15 +0000 (22:41 +0200)]
Test concurrent AIO and non-AIO transfers.
Create 5 channels; 4 transfer a large amount of data via AIO, the 5th does
a regular meshlink_channel_send(). Verify that there is concurrency between
all channels.
Guus Sliepen [Mon, 12 Aug 2019 14:46:02 +0000 (16:46 +0200)]
Add meshlink_channel_aio_receive().
This function allows handing over a large buffer to MeshLink which will be
used to receive data without needing intervention from the application.
A callback is called when MeshLink has filled the buffer with the data.
Guus Sliepen [Mon, 12 Aug 2019 11:43:01 +0000 (13:43 +0200)]
Add meshlink_channel_aio_send().
This function allows handing over a large amount of data to MeshLink
which will be sent without needing intervention from the application.
A callback is called when MeshLink is done with the data, so the
application can call any cleanup function it needs to call.
Guus Sliepen [Sun, 4 Aug 2019 20:22:10 +0000 (22:22 +0200)]
Use condition variables to wait for threads to finish initializing.
To prevent a race condition when calling meshlink_stop() right after
meshlink_start(), we need to wait for the MeshLink and Catta threads to
finish initializing before we can return from meshlink_start().