Guus Sliepen [Thu, 10 Oct 2019 18:50:43 +0000 (20:50 +0200)]
Fix spurious channel closures after meshlink_stop()+meshlink_start().
When restarting the mesh without fully closing the handle, channels should
continue to work MeshLink was only stopped briefly. However, we were
accidentily always setting the connection timeout when a node's UDP status
changed, which would cause channels that didn't have any unsent data to
time out.
Guus Sliepen [Mon, 7 Oct 2019 19:00:01 +0000 (21:00 +0200)]
Add an error callback.
Normally, API functions report potential errors. However, it might happen
that the background thread runs into a serious error that prevents
MeshLink from operating as expected. Add a callback that is called in those
cases.
Currently, the only time it is called is when it can not create or modify
config files in the background thread.
Guus Sliepen [Mon, 7 Oct 2019 10:53:18 +0000 (12:53 +0200)]
Refactor the non-blackbox test suite.
- Add a default log callback function to utils.[ch]
- Use the functions from utils.[ch] where appropriate.
- Use assert() where possible.
- Make functions and variables static where possible.
- Remove the need for wrapper scripts.
Guus Sliepen [Sat, 5 Oct 2019 17:40:15 +0000 (19:40 +0200)]
Replace rand() by xoshiro256** with per-mesh state.
We need a reasonable fast PRNG that doesn't have to be secure, for things
like timer randomization, port number generation and so on. Already some
platforms warn about the use of rand(). Also, when calling fork(), both
parent and child have the same PRNG seed, which causes some inefficiencies
in the test suite.
Guus Sliepen [Sat, 5 Oct 2019 12:15:35 +0000 (14:15 +0200)]
Add assert() calls to the library.
To aid in debugging, start using assert() to ensure preconditions hold.
At the moment, we assume that NULL-pointer dereferences will always cause
segfaults, so we don't add assert(ptr) statements in those cases, but that
might change in the future.
Guus Sliepen [Fri, 4 Oct 2019 14:55:48 +0000 (16:55 +0200)]
Have meshlink_get_node() and _submesh() set MESHLINK_ENOENT when appropriate.
These functions can return NULL both when the parameters are invalid or if
the node or submesh does not exist, meshlink_errno must be set correctly
to distinguish between the two cases.
We were calling fclose() inside config_read_file(), which never called
fopen() itself. It is the caller's responsibility to close the file on
error. Also fix two error cases where the caller forgot to close fclose().
Update mesh->loop.now right after select() returns.
It can be that select() waits for a few seconds before an event or
timeout arives. We didn't update mesh->loop.now before calling any of
the callback functions, which meant they could use a time a few seconds
in the past. In particular, the last_ping_time of connections could be
set to a value such that they would immediately be considered timed out.
Reset UTCP timers when learning a node's public key.
If we try to open a channel to a node whose public key we do not yet
have, a REQ_KEY request is sent out. When the answer comes back we know
the key, but we have to tell UTCP to reset the timers, otherwise we have
to wait for a retransmission timeout to occur before resending the SYN
packet.
Simulate three simultaneous UDP channels transmitting 40 Mbps. Check that
>95% of the packets arrive, and that there is no TCP-like merging or
splitting of packets. This also checks that channels can be opened and
closed like regular TCP channels.
Normally the appname is used as the service name as well. However,
Catta's rules are that the service name must be at least two characters,
and only contain alphanumeric characters, dashses and underscores.
Correctly update our own host config file after meshlink_set_port().
We wrote the host config file before we updated mesh->self, causing the
old port number to remain in the host config file. This would cause
subsequent calls to meshlink_invite() to have the wrong port number in
the invitation.
Call fsync() on the configuration directories where appropriate.
Just ensuring individual config files are written atomically is good enough
to keep internal consistency, but we want to make sure directory metadata
is also synced to disk when returning from functions that expect the changes
to have been made permanent, such as meshlink_import() and
meshlink_blacklist(). Also do this when we create the initial directory
structure.
We already took care of syncing directory metadata when rotating keys for
encrypted storage.
Guus Sliepen [Thu, 29 Aug 2019 21:26:29 +0000 (23:26 +0200)]
Add support for AIO using filedescriptors.
This adds support to enqueue transmits between channels and filedescriptors.
Currently, it requires that read() and write() calls on the filedescriptors
are non-blocking and always succeed, which limits it to reading from and
writing to files.
Guus Sliepen [Sun, 18 Aug 2019 14:54:58 +0000 (16:54 +0200)]
Remove redundant call to graph().
When we receive an ADD_EDGE for an existing edge but with new information,
don't call graph() between removing the old edge and adding the new one.
This prevents a node from temporarily being considered unreachable, which
in turn would cause the SPTPS state to that node being reset.
Guus Sliepen [Thu, 15 Aug 2019 20:41:15 +0000 (22:41 +0200)]
Test concurrent AIO and non-AIO transfers.
Create 5 channels; 4 transfer a large amount of data via AIO, the 5th does
a regular meshlink_channel_send(). Verify that there is concurrency between
all channels.
Guus Sliepen [Mon, 12 Aug 2019 14:46:02 +0000 (16:46 +0200)]
Add meshlink_channel_aio_receive().
This function allows handing over a large buffer to MeshLink which will be
used to receive data without needing intervention from the application.
A callback is called when MeshLink has filled the buffer with the data.
Guus Sliepen [Mon, 12 Aug 2019 11:43:01 +0000 (13:43 +0200)]
Add meshlink_channel_aio_send().
This function allows handing over a large amount of data to MeshLink
which will be sent without needing intervention from the application.
A callback is called when MeshLink is done with the data, so the
application can call any cleanup function it needs to call.
Guus Sliepen [Sun, 4 Aug 2019 20:22:10 +0000 (22:22 +0200)]
Use condition variables to wait for threads to finish initializing.
To prevent a race condition when calling meshlink_stop() right after
meshlink_start(), we need to wait for the MeshLink and Catta threads to
finish initializing before we can return from meshlink_start().
Guus Sliepen [Sat, 3 Aug 2019 20:56:40 +0000 (22:56 +0200)]
Correctly update device class when receiving an ADD_EDGE message.
In some cases we didn't update the device class information when receiving
an ADD_EDGE message. This could cause autoconnect to fail to work as
expected.
Catta only works on Ethernet interfaces. However, if there is a
non-Ethernet interface, MeshLink might still be able to connect to
peers. So just rely on getifaddrs() for checking whether to terminate
connections or not.
Close connections if the local address is no longer valid.
When we detect that there are changes on the network interfaces, check for
each active connection whether the local side of the connection has an
address that exists on at least one network interface. If not, then
communication via that connection is not possible. Instead of waiting for
a timeout, immediately terminate those connections.
Speed up reconnections on network interface changes.
Catta informs us whenever an interface comes online or goes offline. If we
detect that there are no online interfaces, immediately terminate all meta-
connections. Otherwise, reset the ping timers and reconnection timers for
outgoing connections.
Inform UTCP when a node is offline, so it will start connection timeouts.
When there are open channels to a node that is offline for longer than the
connection timeout, the channels will be marked closed, and callbacks will
be fired.
Guus Sliepen [Thu, 30 May 2019 21:34:35 +0000 (23:34 +0200)]
Speed up initial autoconnect after joining a mesh.
When we just joined a mesh, we quickly want to establish redundant
connections. We do this by resetting the outgoing timer if we receive a
public key for a node that we are trying to connect to, and by speeding up
the autoconnect algorithm if we don't have 3 connections (in progress) yet.
Guus Sliepen [Thu, 23 May 2019 21:20:01 +0000 (23:20 +0200)]
Autoconnect to reachable nodes without known public keys
We must allow the autoconnect algorithm to try connections to nodes that
are online but for which we don't have a public key, otherwise we risk
that no connections are formed at all, except to the inviting node.
Guus Sliepen [Wed, 13 Mar 2019 22:13:06 +0000 (23:13 +0100)]
Various fixes for the encrypted storage support.
- create_initial_config_files() and node_write_config() are now the only
functions that generate the content of new config files from scratch.
- All public API functions that change config files now immediately
write them out.
- Config files of nodes that join using an invitation file are immediately
written out.
- Ensure nodes marked dirty have their config files written out in
periodic_handler(), and on meshlink_stop().
- Fix some memory leaks.
- Write out updated config files, and recreate mesh->self in meshlink_set_port().
Guus Sliepen [Fri, 14 Dec 2018 21:21:17 +0000 (22:21 +0100)]
Add support for encrypted storage.
This is a large overhaul of how configuration files are handled. All files
are now in PackMessage format, and are read from disk in to memory in one
go, and also saved to disk from memory in one go, using functions in conf.c.
Guus Sliepen [Sun, 17 Mar 2019 21:01:43 +0000 (22:01 +0100)]
Add functions to get the amount of bytes in chanenl send and receive buffers.
meshlink_channel_get_sendq() and meshlink_channel_get_recvq() call the
underlying UTCP connection's utcp_get_sendq() and utcp_get_recvq().
These return the amount of bytes waiting in the send and receive buffers.
In particular, a non-zero value for sendq means that sent data has not been
ACKed by the peer yet.