Guus Sliepen [Thu, 31 Oct 2019 21:26:25 +0000 (22:26 +0100)]
Allow nodes to learn their own reflexive UDP address.
When a node gets a succesful UDP probe reply, it informs the peer of the
UDP address and port that it has. The peer can then use this information to
inform other nodes it wants to communicate with.
Currently, this is only done to close the time window where two nodes have
established an SPTPS session via a third node, while the third node hasn't
learned of the two nodes' UDP addresses, in which case it was too late for
the third node to assist with hole punching.
Guus Sliepen [Thu, 31 Oct 2019 19:46:36 +0000 (20:46 +0100)]
Try to get a new reflexive UDP address if UDP probes failed.
If we are sending data to a node, but we don't have working UDP, it could
be because we don't have a good reflexive UDP address. Send a dummy ANS_KEY
once in a while as long as we still want to exchange data.
Guus Sliepen [Mon, 28 Oct 2019 20:14:17 +0000 (21:14 +0100)]
Don't call terminate_connection() from meshlink_blacklist().
If meshlink_blacklist() is called from a callback function, this can result
in a use-after-free bug. Instead, shut down the socket, so the event loop
will take care of it.
Guus Sliepen [Mon, 28 Oct 2019 20:12:14 +0000 (21:12 +0100)]
Set meshlink_errno when trying to create a channel to a blacklisted node.
Create a new errno value MESHLINK_EBLACKLISTED, which is used when trying
to send something or create a channel to a blacklisted node. Also improve
some log messages.
Guus Sliepen [Sun, 27 Oct 2019 13:02:47 +0000 (14:02 +0100)]
Restart UDP SPTPS when a node reconnects with a new session ID.
If a node restarts, but its old connection was not considered closed
yet, the graph algorithm never saw it go down. We still want to ensure
we restart UDP SPTPS in this case.
Guus Sliepen [Mon, 14 Oct 2019 19:34:54 +0000 (21:34 +0200)]
Remove support for broadcast packets.
This is not used in MeshLink. Removing support for this also means we no
longer have to calculate a minimum spanning tree whenever the graph is
updated.
Guus Sliepen [Sun, 13 Oct 2019 21:32:03 +0000 (23:32 +0200)]
Allow the mesh to detect when a node has completely restarted.
When calling meshlink_open(), a node creates a unique ID that is passed
along ADD_EDGE messages. When a node becomes unreachable and then reachable
again, this allows other nodes to detect whether it was just a temporary
network issue, or whether the node completely restarted (either because
meshlink_close() was called or because it crashed).
At the moment, when this is detected, other nodes close all open channels
with the restarted node.
Guus Sliepen [Thu, 10 Oct 2019 20:12:36 +0000 (22:12 +0200)]
Correctly handle incoming retransmissions of SYN packets.
If the SYNACK got lost, the peer that initiated a channel would retransmit
the SYN packet. However, the responder would ignore the retransmitted SYN
packet, causing the channel to stall and eventually time out.
Guus Sliepen [Thu, 10 Oct 2019 19:23:58 +0000 (21:23 +0200)]
Don't load config files partially.
There are various points where we update node information and store a new
host config file to disk. However, at startup we read only part of the host
config files. This allowed a corner case where we never read the full host
config file, but did write it to disk, losing information in the process.
Guus Sliepen [Thu, 10 Oct 2019 18:50:43 +0000 (20:50 +0200)]
Fix spurious channel closures after meshlink_stop()+meshlink_start().
When restarting the mesh without fully closing the handle, channels should
continue to work MeshLink was only stopped briefly. However, we were
accidentily always setting the connection timeout when a node's UDP status
changed, which would cause channels that didn't have any unsent data to
time out.
Guus Sliepen [Mon, 7 Oct 2019 19:00:01 +0000 (21:00 +0200)]
Add an error callback.
Normally, API functions report potential errors. However, it might happen
that the background thread runs into a serious error that prevents
MeshLink from operating as expected. Add a callback that is called in those
cases.
Currently, the only time it is called is when it can not create or modify
config files in the background thread.
Guus Sliepen [Mon, 7 Oct 2019 10:53:18 +0000 (12:53 +0200)]
Refactor the non-blackbox test suite.
- Add a default log callback function to utils.[ch]
- Use the functions from utils.[ch] where appropriate.
- Use assert() where possible.
- Make functions and variables static where possible.
- Remove the need for wrapper scripts.
Guus Sliepen [Sat, 5 Oct 2019 17:40:15 +0000 (19:40 +0200)]
Replace rand() by xoshiro256** with per-mesh state.
We need a reasonable fast PRNG that doesn't have to be secure, for things
like timer randomization, port number generation and so on. Already some
platforms warn about the use of rand(). Also, when calling fork(), both
parent and child have the same PRNG seed, which causes some inefficiencies
in the test suite.
Guus Sliepen [Sat, 5 Oct 2019 12:15:35 +0000 (14:15 +0200)]
Add assert() calls to the library.
To aid in debugging, start using assert() to ensure preconditions hold.
At the moment, we assume that NULL-pointer dereferences will always cause
segfaults, so we don't add assert(ptr) statements in those cases, but that
might change in the future.
Guus Sliepen [Fri, 4 Oct 2019 14:55:48 +0000 (16:55 +0200)]
Have meshlink_get_node() and _submesh() set MESHLINK_ENOENT when appropriate.
These functions can return NULL both when the parameters are invalid or if
the node or submesh does not exist, meshlink_errno must be set correctly
to distinguish between the two cases.
We were calling fclose() inside config_read_file(), which never called
fopen() itself. It is the caller's responsibility to close the file on
error. Also fix two error cases where the caller forgot to close fclose().
Update mesh->loop.now right after select() returns.
It can be that select() waits for a few seconds before an event or
timeout arives. We didn't update mesh->loop.now before calling any of
the callback functions, which meant they could use a time a few seconds
in the past. In particular, the last_ping_time of connections could be
set to a value such that they would immediately be considered timed out.
Reset UTCP timers when learning a node's public key.
If we try to open a channel to a node whose public key we do not yet
have, a REQ_KEY request is sent out. When the answer comes back we know
the key, but we have to tell UTCP to reset the timers, otherwise we have
to wait for a retransmission timeout to occur before resending the SYN
packet.
Simulate three simultaneous UDP channels transmitting 40 Mbps. Check that
>95% of the packets arrive, and that there is no TCP-like merging or
splitting of packets. This also checks that channels can be opened and
closed like regular TCP channels.
Normally the appname is used as the service name as well. However,
Catta's rules are that the service name must be at least two characters,
and only contain alphanumeric characters, dashses and underscores.
Correctly update our own host config file after meshlink_set_port().
We wrote the host config file before we updated mesh->self, causing the
old port number to remain in the host config file. This would cause
subsequent calls to meshlink_invite() to have the wrong port number in
the invitation.
Call fsync() on the configuration directories where appropriate.
Just ensuring individual config files are written atomically is good enough
to keep internal consistency, but we want to make sure directory metadata
is also synced to disk when returning from functions that expect the changes
to have been made permanent, such as meshlink_import() and
meshlink_blacklist(). Also do this when we create the initial directory
structure.
We already took care of syncing directory metadata when rotating keys for
encrypted storage.
Guus Sliepen [Thu, 29 Aug 2019 21:26:29 +0000 (23:26 +0200)]
Add support for AIO using filedescriptors.
This adds support to enqueue transmits between channels and filedescriptors.
Currently, it requires that read() and write() calls on the filedescriptors
are non-blocking and always succeed, which limits it to reading from and
writing to files.
Guus Sliepen [Sun, 18 Aug 2019 14:54:58 +0000 (16:54 +0200)]
Remove redundant call to graph().
When we receive an ADD_EDGE for an existing edge but with new information,
don't call graph() between removing the old edge and adding the new one.
This prevents a node from temporarily being considered unreachable, which
in turn would cause the SPTPS state to that node being reset.
Guus Sliepen [Thu, 15 Aug 2019 20:41:15 +0000 (22:41 +0200)]
Test concurrent AIO and non-AIO transfers.
Create 5 channels; 4 transfer a large amount of data via AIO, the 5th does
a regular meshlink_channel_send(). Verify that there is concurrency between
all channels.
Guus Sliepen [Mon, 12 Aug 2019 14:46:02 +0000 (16:46 +0200)]
Add meshlink_channel_aio_receive().
This function allows handing over a large buffer to MeshLink which will be
used to receive data without needing intervention from the application.
A callback is called when MeshLink has filled the buffer with the data.
Guus Sliepen [Mon, 12 Aug 2019 11:43:01 +0000 (13:43 +0200)]
Add meshlink_channel_aio_send().
This function allows handing over a large amount of data to MeshLink
which will be sent without needing intervention from the application.
A callback is called when MeshLink is done with the data, so the
application can call any cleanup function it needs to call.
Guus Sliepen [Sun, 4 Aug 2019 20:22:10 +0000 (22:22 +0200)]
Use condition variables to wait for threads to finish initializing.
To prevent a race condition when calling meshlink_stop() right after
meshlink_start(), we need to wait for the MeshLink and Catta threads to
finish initializing before we can return from meshlink_start().
Guus Sliepen [Sat, 3 Aug 2019 20:56:40 +0000 (22:56 +0200)]
Correctly update device class when receiving an ADD_EDGE message.
In some cases we didn't update the device class information when receiving
an ADD_EDGE message. This could cause autoconnect to fail to work as
expected.
Catta only works on Ethernet interfaces. However, if there is a
non-Ethernet interface, MeshLink might still be able to connect to
peers. So just rely on getifaddrs() for checking whether to terminate
connections or not.
Close connections if the local address is no longer valid.
When we detect that there are changes on the network interfaces, check for
each active connection whether the local side of the connection has an
address that exists on at least one network interface. If not, then
communication via that connection is not possible. Instead of waiting for
a timeout, immediately terminate those connections.
Speed up reconnections on network interface changes.
Catta informs us whenever an interface comes online or goes offline. If we
detect that there are no online interfaces, immediately terminate all meta-
connections. Otherwise, reset the ping timers and reconnection timers for
outgoing connections.
Inform UTCP when a node is offline, so it will start connection timeouts.
When there are open channels to a node that is offline for longer than the
connection timeout, the channels will be marked closed, and callbacks will
be fired.