Guus Sliepen [Tue, 4 Feb 2020 22:11:34 +0000 (23:11 +0100)]
Clear reachability times in imported host config files.
This mirrors what we do with host config files received during a join.
Guus Sliepen [Mon, 3 Feb 2020 20:24:50 +0000 (21:24 +0100)]
Force -fPIC when compiling libcatta.
Guus Sliepen [Mon, 3 Feb 2020 16:43:50 +0000 (17:43 +0100)]
Clear reachability times in host config files received during a join.
When a node joins an existing mesh, it gets passed one or more host config
files from the inviter. However, these might contain non-zero reachability
times, but the invitee has never seen those nodes, so clear them before
storing the host config files.
Guus Sliepen [Mon, 3 Feb 2020 16:03:07 +0000 (17:03 +0100)]
Prevent meshlink_errno from being set incorrectly by meshlink_invite()
We called a public API function inside meshlink_invite() to check that we
don't try to invite a node that's already known. That causes it to set
meshlink_errno to MESHLINK_ENOENT. Fix this by calling lookup_node()
instead.
Guus Sliepen [Mon, 3 Feb 2020 15:24:41 +0000 (16:24 +0100)]
Fix spelling errors.
Found by codespell.
Guus Sliepen [Mon, 3 Feb 2020 15:11:36 +0000 (16:11 +0100)]
Fix reachability queries for blacklisted nodes.
Guus Sliepen [Mon, 3 Feb 2020 15:10:26 +0000 (16:10 +0100)]
Fix compiling with GCC 10.
Guus Sliepen [Wed, 29 Jan 2020 08:28:25 +0000 (09:28 +0100)]
Fix potential segmentation fault on iOS.
The PONG handler could call freeaddrinfo() on a struct that was not
allocated with getaddrinfo(). On most platforms this apparently works
fine, but on iOS it will try to free memory that wasn't allocated. Fix
this by moving the code to reset an outgoing_t to a separate function,
and calling that from the PONG handler.
Guus Sliepen [Mon, 27 Jan 2020 14:07:35 +0000 (15:07 +0100)]
Only let mesh->self be reachable when the mesh is started.
This ensures meshlink_node_get_reachability(mesh->self) returns true only
if the mesh has been started. It also handles reachability of self in
graph.c just like any other node. This means there will now also be a
node status callback generated when the mesh is started and stopped.
Guus Sliepen [Fri, 24 Jan 2020 20:08:01 +0000 (21:08 +0100)]
Sync host config file immediately after initial connect.
Guus Sliepen [Sun, 19 Jan 2020 23:45:09 +0000 (00:45 +0100)]
Add meshlink_get_node_reachability().
This function returns the current state of a node's reachability, as
well as the last time the node became reachable and the last time it
became unreachable.
Guus Sliepen [Mon, 13 Jan 2020 13:23:15 +0000 (14:23 +0100)]
Add a configurable fast connection retry period.
If no nodes are reachable, allow connections to retry once every second for a
per device-class configurable amount of time.
Guus Sliepen [Fri, 6 Dec 2019 22:01:46 +0000 (23:01 +0100)]
Remember the address used by an invitee.
When a new node uses an invitation succesfully, store the address it
used to connect.
Guus Sliepen [Fri, 6 Dec 2019 21:58:29 +0000 (22:58 +0100)]
Remember the address used when connecting to an inviting node.
The inviter sends us its own host config file, which should be populated
with its known addresses. However, if a symbolic hostname is in the
invitation URL and it can resolve to multiple IP addresses, or if the IP
address associated with it is currently different from when the invitation
was generated, the address used to connect to the inviter might not be
present in its host config file. This could cause the invitation to succeed,
but then the nodes would fail to make a regular MeshLink connection.
Guus Sliepen [Fri, 6 Dec 2019 21:42:59 +0000 (22:42 +0100)]
Ensure all addresses in the invitation URL are also in the invitation file.
Guus Sliepen [Fri, 6 Dec 2019 20:50:02 +0000 (21:50 +0100)]
Prefer sockaddr_t over struct sockaddr_*.
This avoids a lot of pointer casts, and also fixes some problems with the
sockaddr length potentially being smaller than necessary.
Guus Sliepen [Fri, 6 Dec 2019 20:47:11 +0000 (21:47 +0100)]
Don't add duplicates to the list of recently seen addresses.
Duplicate addresses would be appended to the list, and could push out other
addresses. If the address already exists, only move it to the top if it is
not already there.
Also don't force an immediate write of the host config file when trying to
add an address that already exists.
Guus Sliepen [Sun, 1 Dec 2019 23:32:57 +0000 (00:32 +0100)]
Destroy new/ and old/ subdirectories when creating a new instance.
Guus Sliepen [Sun, 1 Dec 2019 22:56:10 +0000 (23:56 +0100)]
Add meshlink_get_all_nodes_by_last_reachable().
MeshLink now keeps track of when a node was last reachable. This can be
used by an application to detect nodes that were never reachable or which
have not been reachable for a certain amount of time.
Guus Sliepen [Sun, 1 Dec 2019 22:29:39 +0000 (23:29 +0100)]
Add a #define for the maximum number of tracked recently seen addresses.
Guus Sliepen [Thu, 28 Nov 2019 21:24:05 +0000 (22:24 +0100)]
Sync the base configuration directory at the end of meshlink_join().
While joining a mesh, we create a new current/ subdirectory. While the
contents were already synced to disk, we need to ensure the subdirectory
itself is also synced before returning.
Guus Sliepen [Thu, 28 Nov 2019 21:21:19 +0000 (22:21 +0100)]
Sync the base configuration directory after each subdirectory rename operation.
This ensures the proper ordering of the renames in the event of a crash.
Guus Sliepen [Thu, 28 Nov 2019 21:20:05 +0000 (22:20 +0100)]
Sync the base configuration directory after each call to config_destroy().
This guarantees proper ordering when deleting the current/, new/ and old/
subdirectories.
Guus Sliepen [Thu, 14 Nov 2019 20:48:02 +0000 (21:48 +0100)]
Fix logic error preventing fast update of reflexive address.
When we are trying to communicate with peers that don't know our
reflexive address, and we just learned our own one, we want to inform
those peers of it immediately, so they can send PMTU probes to the right
address. A logic error prevented this from happening in the common case.
Guus Sliepen [Mon, 11 Nov 2019 21:54:46 +0000 (22:54 +0100)]
Assert that nodes black/whitelisted by name persist after closing the mesh.
Guus Sliepen [Mon, 11 Nov 2019 21:49:05 +0000 (22:49 +0100)]
Add support for black/whitelisting by name, and forgetting nodes.
Guus Sliepen [Sat, 9 Nov 2019 16:57:55 +0000 (17:57 +0100)]
Fix __warn_unused_result__, add more of it and fix the resulting warnings.
Due to a bug in the autoconf test for function attributes, we were always
disabling __warn_unused_result__. Fix this, and add this function attribute
to a lot more functions whose results are definitely important.
This change makes it clear where we ignore the results of a function that
might fail. The proper fix in most cases is to propagate the result to the
caller. For meshlink_blacklist() and meshlink_whitelist(), we were not
return an error condition, even if we might fail to commit the blacklist
operation to permanent storage. So we now make these functions return a
bool.
Guus Sliepen [Sat, 9 Nov 2019 16:52:19 +0000 (17:52 +0100)]
Use a separate lockfile to lock the configuration directory.
We can't use meshlink.conf as the lock file, since it can move around.
Instead, create meshlink.lock right below confbase, and keep it always
locked while a meshlink handle is valid.
meshlink_destroy() will also use this lockfile to ensure we don't destroy
a directory that is still in use, and prevent race conditions between
meshlink_destroy() and meshlink_open().
Guus Sliepen [Tue, 5 Nov 2019 19:57:31 +0000 (20:57 +0100)]
Sync the host config directory after accepting an invitee.
Guus Sliepen [Tue, 5 Nov 2019 19:46:58 +0000 (20:46 +0100)]
Refuse invitees if we can't delete the invitation file.
If the call to unlink() or a subsequent sync of the invitation directory
fails, don't allow the invitee access, to prevent an invitation from
being used twice.
Guus Sliepen [Tue, 5 Nov 2019 19:41:50 +0000 (20:41 +0100)]
Sync invitation directory when calling meshlink_invite().
Guus Sliepen [Tue, 5 Nov 2019 18:33:04 +0000 (19:33 +0100)]
Don't fail to start MeshLink if some host config files couldn't be read.
Guus Sliepen [Tue, 5 Nov 2019 18:27:11 +0000 (19:27 +0100)]
Handle host config files without a public key.
Commit
fa05f996c5500c056a36c1d43e33a407f876643c broke reading config
files for hosts for which no public key is known.
Guus Sliepen [Tue, 5 Nov 2019 18:22:09 +0000 (19:22 +0100)]
Add missing calls to fflush().
We need to ensure file handles have flushed the stream buffer to disk
before calling fsync().
Also remove redundant calls to fsync(), config_write_file() already
takes care of it.
Guus Sliepen [Thu, 31 Oct 2019 21:26:25 +0000 (22:26 +0100)]
Allow nodes to learn their own reflexive UDP address.
When a node gets a succesful UDP probe reply, it informs the peer of the
UDP address and port that it has. The peer can then use this information to
inform other nodes it wants to communicate with.
Currently, this is only done to close the time window where two nodes have
established an SPTPS session via a third node, while the third node hasn't
learned of the two nodes' UDP addresses, in which case it was too late for
the third node to assist with hole punching.
Guus Sliepen [Thu, 31 Oct 2019 19:46:36 +0000 (20:46 +0100)]
Try to get a new reflexive UDP address if UDP probes failed.
If we are sending data to a node, but we don't have working UDP, it could
be because we don't have a good reflexive UDP address. Send a dummy ANS_KEY
once in a while as long as we still want to exchange data.
Guus Sliepen [Thu, 31 Oct 2019 19:42:10 +0000 (20:42 +0100)]
Avoid compiler warnings when compiling with -DNDEBUG.
Guus Sliepen [Thu, 31 Oct 2019 18:44:27 +0000 (19:44 +0100)]
Only add confirmed reflexive UDP addresses to ANS_KEY messages.
Guus Sliepen [Wed, 30 Oct 2019 18:45:30 +0000 (19:45 +0100)]
Ensure NDEBUG is not set in the test suite.
We rely on assert() in the test suite, so it should not be compiled out
when doing a release build with -DNDEBUG.
Guus Sliepen [Wed, 30 Oct 2019 18:21:09 +0000 (19:21 +0100)]
Fix another case of assert() with side-effects.
Guus Sliepen [Wed, 30 Oct 2019 17:19:31 +0000 (18:19 +0100)]
Fix signal pipe creation when compiling with -DNDEBUG.
Argument of a call to assert() should never have side effects.
Guus Sliepen [Mon, 28 Oct 2019 20:14:17 +0000 (21:14 +0100)]
Don't call terminate_connection() from meshlink_blacklist().
If meshlink_blacklist() is called from a callback function, this can result
in a use-after-free bug. Instead, shut down the socket, so the event loop
will take care of it.
Guus Sliepen [Mon, 28 Oct 2019 20:12:14 +0000 (21:12 +0100)]
Set meshlink_errno when trying to create a channel to a blacklisted node.
Create a new errno value MESHLINK_EBLACKLISTED, which is used when trying
to send something or create a channel to a blacklisted node. Also improve
some log messages.
Guus Sliepen [Sun, 27 Oct 2019 13:47:38 +0000 (14:47 +0100)]
Ensure an invitation timeout of 0 means no invitations are valid.
Guus Sliepen [Sun, 27 Oct 2019 13:38:18 +0000 (14:38 +0100)]
Don't close active connections when a node is discovered via Catta.
When a node was discovered by Catta, but we already had an active
meta-connection with it, it would erroneously close that connection.
Guus Sliepen [Sun, 27 Oct 2019 13:04:29 +0000 (14:04 +0100)]
Don't call graph() twice when a new connection replaces an older one.
This prevents a node from being considered unreachable for just a small
moment, even though we know it is reachable.
Guus Sliepen [Sun, 27 Oct 2019 13:02:47 +0000 (14:02 +0100)]
Restart UDP SPTPS when a node reconnects with a new session ID.
If a node restarts, but its old connection was not considered closed
yet, the graph algorithm never saw it go down. We still want to ensure
we restart UDP SPTPS in this case.
Guus Sliepen [Sun, 27 Oct 2019 12:47:38 +0000 (13:47 +0100)]
Drop severity of log messages regarding ADD/DEL_EDGE messages.
Guus Sliepen [Mon, 14 Oct 2019 20:36:04 +0000 (22:36 +0200)]
Fix retransmission timeout calculation in UTCP.
This could result in unnecessary retransmissions and lower throughput of
channels.
Guus Sliepen [Mon, 14 Oct 2019 19:34:54 +0000 (21:34 +0200)]
Remove support for broadcast packets.
This is not used in MeshLink. Removing support for this also means we no
longer have to calculate a minimum spanning tree whenever the graph is
updated.
Guus Sliepen [Sun, 13 Oct 2019 21:32:03 +0000 (23:32 +0200)]
Allow the mesh to detect when a node has completely restarted.
When calling meshlink_open(), a node creates a unique ID that is passed
along ADD_EDGE messages. When a node becomes unreachable and then reachable
again, this allows other nodes to detect whether it was just a temporary
network issue, or whether the node completely restarted (either because
meshlink_close() was called or because it crashed).
At the moment, when this is detected, other nodes close all open channels
with the restarted node.
Guus Sliepen [Sun, 13 Oct 2019 19:46:01 +0000 (21:46 +0200)]
Set the priv pointer at channel open time in the channels-cornercases test.
Guus Sliepen [Sun, 13 Oct 2019 19:39:11 +0000 (21:39 +0200)]
Add a way to set the channel's priv pointer when opening it.
This prevents a race condition where between opening a channel and
setting the priv pointer, the receive callback is called.
Guus Sliepen [Sun, 13 Oct 2019 12:31:14 +0000 (14:31 +0200)]
Set a very small channel timeout in channels-failure test.
This tests both the meshlink_set_node_channel_timeout() function and greatly
speeds up the test.
Guus Sliepen [Sun, 13 Oct 2019 12:24:03 +0000 (14:24 +0200)]
Remove some more superfluous parentheses.
Guus Sliepen [Sun, 13 Oct 2019 12:16:52 +0000 (14:16 +0200)]
Rename mesh_mutex to mutex.
Guus Sliepen [Sun, 13 Oct 2019 12:05:07 +0000 (14:05 +0200)]
Add meshlink_set_node_channel_timeout().
This function allows setting the user timeout for UTCP connections.
Guus Sliepen [Thu, 10 Oct 2019 20:16:20 +0000 (22:16 +0200)]
Fix waiting for two nodes to become reachable in the test suite.
Guus Sliepen [Thu, 10 Oct 2019 20:12:36 +0000 (22:12 +0200)]
Correctly handle incoming retransmissions of SYN packets.
If the SYNACK got lost, the peer that initiated a channel would retransmit
the SYN packet. However, the responder would ignore the retransmitted SYN
packet, causing the channel to stall and eventually time out.
Guus Sliepen [Thu, 10 Oct 2019 19:23:58 +0000 (21:23 +0200)]
Don't load config files partially.
There are various points where we update node information and store a new
host config file to disk. However, at startup we read only part of the host
config files. This allowed a corner case where we never read the full host
config file, but did write it to disk, losing information in the process.
Guus Sliepen [Thu, 10 Oct 2019 18:50:43 +0000 (20:50 +0200)]
Fix spurious channel closures after meshlink_stop()+meshlink_start().
When restarting the mesh without fully closing the handle, channels should
continue to work MeshLink was only stopped briefly. However, we were
accidentily always setting the connection timeout when a node's UDP status
changed, which would cause channels that didn't have any unsent data to
time out.
Guus Sliepen [Thu, 10 Oct 2019 05:58:21 +0000 (07:58 +0200)]
Check the return value of node_write_config() while handling invitations.
Guus Sliepen [Mon, 7 Oct 2019 19:00:01 +0000 (21:00 +0200)]
Add an error callback.
Normally, API functions report potential errors. However, it might happen
that the background thread runs into a serious error that prevents
MeshLink from operating as expected. Add a callback that is called in those
cases.
Currently, the only time it is called is when it can not create or modify
config files in the background thread.
Guus Sliepen [Mon, 7 Oct 2019 12:23:15 +0000 (14:23 +0200)]
Fix a double call to pthread_mutex_unlock().
Found by ThreadSanitizer.
Guus Sliepen [Mon, 7 Oct 2019 12:13:42 +0000 (14:13 +0200)]
Don't use static variables when choosing a broadcast address.
Found by ThreadSanitizer.
Guus Sliepen [Mon, 7 Oct 2019 11:38:37 +0000 (13:38 +0200)]
Unlock the mesh_mutex before destroying it.
Found by ThreadSanitizer.
Guus Sliepen [Mon, 7 Oct 2019 10:53:18 +0000 (12:53 +0200)]
Refactor the non-blackbox test suite.
- Add a default log callback function to utils.[ch]
- Use the functions from utils.[ch] where appropriate.
- Use assert() where possible.
- Make functions and variables static where possible.
- Remove the need for wrapper scripts.
Guus Sliepen [Sat, 5 Oct 2019 17:40:15 +0000 (19:40 +0200)]
Replace rand() by xoshiro256** with per-mesh state.
We need a reasonable fast PRNG that doesn't have to be secure, for things
like timer randomization, port number generation and so on. Already some
platforms warn about the use of rand(). Also, when calling fork(), both
parent and child have the same PRNG seed, which causes some inefficiencies
in the test suite.
Guus Sliepen [Sat, 5 Oct 2019 12:34:40 +0000 (14:34 +0200)]
Add the /whitelist command to the chat examples.
This allows undoing a /kick.
Guus Sliepen [Sat, 5 Oct 2019 12:32:19 +0000 (14:32 +0200)]
Fix the channels-no-partial test case.
Guus Sliepen [Sat, 5 Oct 2019 12:16:43 +0000 (14:16 +0200)]
Remove unused functions, and make more functions static.
Guus Sliepen [Sat, 5 Oct 2019 12:15:35 +0000 (14:15 +0200)]
Add assert() calls to the library.
To aid in debugging, start using assert() to ensure preconditions hold.
At the moment, we assume that NULL-pointer dereferences will always cause
segfaults, so we don't add assert(ptr) statements in those cases, but that
might change in the future.
Guus Sliepen [Fri, 4 Oct 2019 19:10:10 +0000 (21:10 +0200)]
Clean up resources in the test cases.
Not doing so prevents tools such as AddressSanitizer and Valgrind from
declaring the tests free of memory leaks.
Guus Sliepen [Fri, 4 Oct 2019 19:08:59 +0000 (21:08 +0200)]
Fix potential memory leaks in the autoconnect algorithm.
Guus Sliepen [Fri, 4 Oct 2019 19:08:34 +0000 (21:08 +0200)]
Avoid casting function pointers.
Guus Sliepen [Fri, 4 Oct 2019 19:06:33 +0000 (21:06 +0200)]
Fix memory leaks from timers.
Guus Sliepen [Fri, 4 Oct 2019 14:55:48 +0000 (16:55 +0200)]
Have meshlink_get_node() and _submesh() set MESHLINK_ENOENT when appropriate.
These functions can return NULL both when the parameters are invalid or if
the node or submesh does not exist, meshlink_errno must be set correctly
to distinguish between the two cases.
Guus Sliepen [Fri, 4 Oct 2019 14:53:51 +0000 (16:53 +0200)]
Fix memory leaks in the outgoing packet queue.
Found by AddressSanitizer.
Guus Sliepen [Fri, 4 Oct 2019 14:08:51 +0000 (16:08 +0200)]
Fix several memory leaks found by AddressSanitizer.
Guus Sliepen [Sun, 29 Sep 2019 09:38:09 +0000 (11:38 +0200)]
Ensure only valid hostnames end up in the invitation URL.
Guus Sliepen [Thu, 26 Sep 2019 20:43:45 +0000 (22:43 +0200)]
Fix errors found by Clang's static analyzer.
Guus Sliepen [Thu, 26 Sep 2019 20:01:24 +0000 (22:01 +0200)]
Fix winerror() returning a pointer to a stack-allocated array.
Found by cppcheck.
Guus Sliepen [Thu, 26 Sep 2019 18:50:11 +0000 (20:50 +0200)]
Fix potential double fclose().
We were calling fclose() inside config_read_file(), which never called
fopen() itself. It is the caller's responsibility to close the file on
error. Also fix two error cases where the caller forgot to close fclose().
Guus Sliepen [Wed, 25 Sep 2019 19:47:44 +0000 (21:47 +0200)]
Update mesh->loop.now right after select() returns.
It can be that select() waits for a few seconds before an event or
timeout arives. We didn't update mesh->loop.now before calling any of
the callback functions, which meant they could use a time a few seconds
in the past. In particular, the last_ping_time of connections could be
set to a value such that they would immediately be considered timed out.
Guus Sliepen [Wed, 25 Sep 2019 19:09:51 +0000 (21:09 +0200)]
Add missing mutex locks.
Guus Sliepen [Wed, 25 Sep 2019 05:42:00 +0000 (07:42 +0200)]
Add missing mutex locks.
meshlink_channel_close() and meshlink_channel_shutdown() did not lock
the mutex, which could cause some race conditions.
Guus Sliepen [Mon, 23 Sep 2019 20:08:00 +0000 (22:08 +0200)]
Avoid using typedefs in meshlink.h.
Use struct meshlink_foo instead of meshlink_foo_t in meshlink.h, so
links to function parameter types go to the struct definitions.
Guus Sliepen [Mon, 23 Sep 2019 19:44:03 +0000 (21:44 +0200)]
Add \memberof annotations to the C API documentation.
Guus Sliepen [Mon, 23 Sep 2019 19:17:43 +0000 (21:17 +0200)]
Fix doxygen warnings.
Guus Sliepen [Mon, 23 Sep 2019 19:15:31 +0000 (21:15 +0200)]
Add missing parts of meshlink_set_node_pmtu_cb().
Guus Sliepen [Mon, 23 Sep 2019 18:54:16 +0000 (20:54 +0200)]
Add a callback for PMTU changes.
This can be used to detect changes in UDP reachability of peers, and
allows the application to change the maximum packet size for UDP
channels.
Guus Sliepen [Sun, 15 Sep 2019 11:22:17 +0000 (13:22 +0200)]
Reset UTCP timers when learning a node's public key.
If we try to open a channel to a node whose public key we do not yet
have, a REQ_KEY request is sent out. When the answer comes back we know
the key, but we have to tell UTCP to reset the timers, otherwise we have
to wait for a retransmission timeout to occur before resending the SYN
packet.
Guus Sliepen [Mon, 23 Sep 2019 10:40:17 +0000 (12:40 +0200)]
Fix compiler warnings from Clang 10 and GCC 9.
Guus Sliepen [Sun, 22 Sep 2019 20:26:00 +0000 (22:26 +0200)]
Test UDP channels.
Simulate three simultaneous UDP channels transmitting 40 Mbps. Check that
>95% of the packets arrive, and that there is no TCP-like merging or
splitting of packets. This also checks that channels can be opened and
closed like regular TCP channels.
Guus Sliepen [Sun, 22 Sep 2019 14:08:48 +0000 (16:08 +0200)]
Ensure Catta gets a valid service name.
Normally the appname is used as the service name as well. However,
Catta's rules are that the service name must be at least two characters,
and only contain alphanumeric characters, dashses and underscores.
Guus Sliepen [Tue, 17 Sep 2019 19:50:38 +0000 (21:50 +0200)]
Ensure the channel poll callback is called with len 0 on error.
Guus Sliepen [Mon, 9 Sep 2019 19:35:14 +0000 (21:35 +0200)]
Correctly update our own host config file after meshlink_set_port().
We wrote the host config file before we updated mesh->self, causing the
old port number to remain in the host config file. This would cause
subsequent calls to meshlink_invite() to have the wrong port number in
the invitation.
sairoop-elear [Fri, 6 Sep 2019 10:36:16 +0000 (16:06 +0530)]
Fix deadlock during discovery failure
Guus Sliepen [Thu, 5 Sep 2019 17:57:57 +0000 (19:57 +0200)]
Fix starting the channels-fork test.
The test started by cleaning the wrong config files, and could potentially
fail when started at the same time as the non-forking channels test.
Guus Sliepen [Thu, 5 Sep 2019 17:56:29 +0000 (19:56 +0200)]
Call fsync() on the configuration directories where appropriate.
Just ensuring individual config files are written atomically is good enough
to keep internal consistency, but we want to make sure directory metadata
is also synced to disk when returning from functions that expect the changes
to have been made permanent, such as meshlink_import() and
meshlink_blacklist(). Also do this when we create the initial directory
structure.
We already took care of syncing directory metadata when rotating keys for
encrypted storage.