Guus Sliepen [Fri, 27 Mar 2020 21:52:46 +0000 (22:52 +0100)]
Reduce how often we have to poll the packet queue.
Packets are moved to the MeshLink thread via the packet queue. However,
each packet required a trigger byte to be sent to the event loop, requiring
more calls to select() than necessary. Now we make event loop signals level
triggered, and dequeue all enqueued packets at once.
This also adds debug log statements for the packet queue.
Guus Sliepen [Tue, 10 Mar 2020 21:42:33 +0000 (22:42 +0100)]
Add all recent addresses resolved from a hostname in meshlink_invite().
When a canonical hostname or an invitation address resolves to multiple
numeric addresses, add all of them as recent addresses for ourself, so
they are all part of the host config file we send to the invitee.
Guus Sliepen [Tue, 10 Mar 2020 21:37:28 +0000 (22:37 +0100)]
Update the invite-join test.
- Check that duplicate addresses get culled correctly.
- Check that we can add lots of extra invitation addresses, and that
they are in the expected order in the invitation URL.
Guus Sliepen [Fri, 6 Mar 2020 23:19:57 +0000 (00:19 +0100)]
Handle not being able to bind to the configured port at startup.
When starting a MeshLink node that has already been configured to run on a
certain port, but that port is in use (for one or more of the supported
address families), it would either ignore some address families, or would
try to bind to port 0 if all address families failed. However, this is
problematic, because it makes discovery and invitation URL generation much
harder.
Fix this by checking if any port binding fails for a supported address
family, and if so, try to find another port that does support binding on
all address families. If it fails to find any available port, it will fall
back to binding to port 0, so that outgoing connections are still possible.
Guus Sliepen [Fri, 6 Mar 2020 22:24:49 +0000 (23:24 +0100)]
Don't abort on empty lines in receive_request().
Remove the assertion that lines are not empty, since this could lead to
a DoS attack. Empty lines are already handled correctly by the rest of
the logic in receive_request().
This adds a function to add one or more application-controlled address and
port combinations to invitation URLs. It is meant to replace
meshlink_add_address(), which is too limited because it only allows one
address to be set, and doesn't allow a different port number to be set.
Guus Sliepen [Fri, 28 Feb 2020 19:09:11 +0000 (20:09 +0100)]
Avoid ports that are in use by not all address families.
It could happen that a port is bound by another application, but only
for some of the supported address families (ie, only IPv4 but not IPv6).
We don't want MeshLink to then bind to the other address familie(s), but
rather have it try another port altogether.
Guus Sliepen [Fri, 28 Feb 2020 18:25:52 +0000 (19:25 +0100)]
Further improve try_bind().
Make try_bind() do the same checks as add_listen_address() does: try to
create both a TCP and UDP socket on a given port for all address
families. If one address family succeeds for both TCP and UDP, consider
this a valid port.
Guus Sliepen [Tue, 25 Feb 2020 19:39:48 +0000 (20:39 +0100)]
Fix logic in try_bind().
Fix the check for successful socket creation. Also make sure we only
return success if we can bind to IPv4 and IPv6, but ignore other
network protocols.
Guus Sliepen [Tue, 11 Feb 2020 21:28:24 +0000 (22:28 +0100)]
Make the join commit order configurable.
By default, when an invitee joins a mesh, it will commit its configuration
to disk first, then the inviter. This adds a function to reverse that order.
Guus Sliepen [Tue, 11 Feb 2020 20:37:33 +0000 (21:37 +0100)]
Move join state out of meshlink_handle_t, and ensure proper cleanup on errors.
Move the state we keep when calling meshlink_join() out of meshlink_handle_t
and just put it on the stack of meshlink_join(). Also make sure we properly
release allocated resources in all error conditions during a join.
Guus Sliepen [Sat, 8 Feb 2020 13:55:21 +0000 (14:55 +0100)]
Fall back to getifaddrs() to get an interface address if there is no default route.
When generating invitations, we try to find a suitable local interface
address by faking an outgoing connection to the Internet. However,
that doesn't work if there is no default route. In this case, fall back
to using getifaddrs() if that function is available, and filter out any
link-local and loopback addresses.
Guus Sliepen [Thu, 6 Feb 2020 20:34:43 +0000 (21:34 +0100)]
Use bind() to check if a local address is still valid.
Some platforms don't support getifaddrs(). We use this to check if the
local address of a socket is still available on any network interface.
Instead, try to bind() a new socket to the same address (but port 0) as
existing sockets. If it returns EADDRNOTAVAIL, we know that this address
is no longer valid.
Guus Sliepen [Mon, 3 Feb 2020 16:43:50 +0000 (17:43 +0100)]
Clear reachability times in host config files received during a join.
When a node joins an existing mesh, it gets passed one or more host config
files from the inviter. However, these might contain non-zero reachability
times, but the invitee has never seen those nodes, so clear them before
storing the host config files.
Guus Sliepen [Mon, 3 Feb 2020 16:03:07 +0000 (17:03 +0100)]
Prevent meshlink_errno from being set incorrectly by meshlink_invite()
We called a public API function inside meshlink_invite() to check that we
don't try to invite a node that's already known. That causes it to set
meshlink_errno to MESHLINK_ENOENT. Fix this by calling lookup_node()
instead.
Guus Sliepen [Wed, 29 Jan 2020 08:28:25 +0000 (09:28 +0100)]
Fix potential segmentation fault on iOS.
The PONG handler could call freeaddrinfo() on a struct that was not
allocated with getaddrinfo(). On most platforms this apparently works
fine, but on iOS it will try to free memory that wasn't allocated. Fix
this by moving the code to reset an outgoing_t to a separate function,
and calling that from the PONG handler.
Guus Sliepen [Mon, 27 Jan 2020 14:07:35 +0000 (15:07 +0100)]
Only let mesh->self be reachable when the mesh is started.
This ensures meshlink_node_get_reachability(mesh->self) returns true only
if the mesh has been started. It also handles reachability of self in
graph.c just like any other node. This means there will now also be a
node status callback generated when the mesh is started and stopped.
Guus Sliepen [Sun, 19 Jan 2020 23:45:09 +0000 (00:45 +0100)]
Add meshlink_get_node_reachability().
This function returns the current state of a node's reachability, as
well as the last time the node became reachable and the last time it
became unreachable.
Guus Sliepen [Fri, 6 Dec 2019 21:58:29 +0000 (22:58 +0100)]
Remember the address used when connecting to an inviting node.
The inviter sends us its own host config file, which should be populated
with its known addresses. However, if a symbolic hostname is in the
invitation URL and it can resolve to multiple IP addresses, or if the IP
address associated with it is currently different from when the invitation
was generated, the address used to connect to the inviter might not be
present in its host config file. This could cause the invitation to succeed,
but then the nodes would fail to make a regular MeshLink connection.
Guus Sliepen [Fri, 6 Dec 2019 20:47:11 +0000 (21:47 +0100)]
Don't add duplicates to the list of recently seen addresses.
Duplicate addresses would be appended to the list, and could push out other
addresses. If the address already exists, only move it to the top if it is
not already there.
Also don't force an immediate write of the host config file when trying to
add an address that already exists.
Guus Sliepen [Sun, 1 Dec 2019 22:56:10 +0000 (23:56 +0100)]
Add meshlink_get_all_nodes_by_last_reachable().
MeshLink now keeps track of when a node was last reachable. This can be
used by an application to detect nodes that were never reachable or which
have not been reachable for a certain amount of time.
Guus Sliepen [Thu, 28 Nov 2019 21:24:05 +0000 (22:24 +0100)]
Sync the base configuration directory at the end of meshlink_join().
While joining a mesh, we create a new current/ subdirectory. While the
contents were already synced to disk, we need to ensure the subdirectory
itself is also synced before returning.
Guus Sliepen [Thu, 14 Nov 2019 20:48:02 +0000 (21:48 +0100)]
Fix logic error preventing fast update of reflexive address.
When we are trying to communicate with peers that don't know our
reflexive address, and we just learned our own one, we want to inform
those peers of it immediately, so they can send PMTU probes to the right
address. A logic error prevented this from happening in the common case.
Guus Sliepen [Sat, 9 Nov 2019 16:57:55 +0000 (17:57 +0100)]
Fix __warn_unused_result__, add more of it and fix the resulting warnings.
Due to a bug in the autoconf test for function attributes, we were always
disabling __warn_unused_result__. Fix this, and add this function attribute
to a lot more functions whose results are definitely important.
This change makes it clear where we ignore the results of a function that
might fail. The proper fix in most cases is to propagate the result to the
caller. For meshlink_blacklist() and meshlink_whitelist(), we were not
return an error condition, even if we might fail to commit the blacklist
operation to permanent storage. So we now make these functions return a
bool.
Guus Sliepen [Sat, 9 Nov 2019 16:52:19 +0000 (17:52 +0100)]
Use a separate lockfile to lock the configuration directory.
We can't use meshlink.conf as the lock file, since it can move around.
Instead, create meshlink.lock right below confbase, and keep it always
locked while a meshlink handle is valid.
meshlink_destroy() will also use this lockfile to ensure we don't destroy
a directory that is still in use, and prevent race conditions between
meshlink_destroy() and meshlink_open().
Guus Sliepen [Tue, 5 Nov 2019 19:46:58 +0000 (20:46 +0100)]
Refuse invitees if we can't delete the invitation file.
If the call to unlink() or a subsequent sync of the invitation directory
fails, don't allow the invitee access, to prevent an invitation from
being used twice.
Guus Sliepen [Thu, 31 Oct 2019 21:26:25 +0000 (22:26 +0100)]
Allow nodes to learn their own reflexive UDP address.
When a node gets a succesful UDP probe reply, it informs the peer of the
UDP address and port that it has. The peer can then use this information to
inform other nodes it wants to communicate with.
Currently, this is only done to close the time window where two nodes have
established an SPTPS session via a third node, while the third node hasn't
learned of the two nodes' UDP addresses, in which case it was too late for
the third node to assist with hole punching.
Guus Sliepen [Thu, 31 Oct 2019 19:46:36 +0000 (20:46 +0100)]
Try to get a new reflexive UDP address if UDP probes failed.
If we are sending data to a node, but we don't have working UDP, it could
be because we don't have a good reflexive UDP address. Send a dummy ANS_KEY
once in a while as long as we still want to exchange data.
Guus Sliepen [Mon, 28 Oct 2019 20:14:17 +0000 (21:14 +0100)]
Don't call terminate_connection() from meshlink_blacklist().
If meshlink_blacklist() is called from a callback function, this can result
in a use-after-free bug. Instead, shut down the socket, so the event loop
will take care of it.
Guus Sliepen [Mon, 28 Oct 2019 20:12:14 +0000 (21:12 +0100)]
Set meshlink_errno when trying to create a channel to a blacklisted node.
Create a new errno value MESHLINK_EBLACKLISTED, which is used when trying
to send something or create a channel to a blacklisted node. Also improve
some log messages.
Guus Sliepen [Sun, 27 Oct 2019 13:02:47 +0000 (14:02 +0100)]
Restart UDP SPTPS when a node reconnects with a new session ID.
If a node restarts, but its old connection was not considered closed
yet, the graph algorithm never saw it go down. We still want to ensure
we restart UDP SPTPS in this case.
Guus Sliepen [Mon, 14 Oct 2019 19:34:54 +0000 (21:34 +0200)]
Remove support for broadcast packets.
This is not used in MeshLink. Removing support for this also means we no
longer have to calculate a minimum spanning tree whenever the graph is
updated.
Guus Sliepen [Sun, 13 Oct 2019 21:32:03 +0000 (23:32 +0200)]
Allow the mesh to detect when a node has completely restarted.
When calling meshlink_open(), a node creates a unique ID that is passed
along ADD_EDGE messages. When a node becomes unreachable and then reachable
again, this allows other nodes to detect whether it was just a temporary
network issue, or whether the node completely restarted (either because
meshlink_close() was called or because it crashed).
At the moment, when this is detected, other nodes close all open channels
with the restarted node.
Guus Sliepen [Thu, 10 Oct 2019 20:12:36 +0000 (22:12 +0200)]
Correctly handle incoming retransmissions of SYN packets.
If the SYNACK got lost, the peer that initiated a channel would retransmit
the SYN packet. However, the responder would ignore the retransmitted SYN
packet, causing the channel to stall and eventually time out.
Guus Sliepen [Thu, 10 Oct 2019 19:23:58 +0000 (21:23 +0200)]
Don't load config files partially.
There are various points where we update node information and store a new
host config file to disk. However, at startup we read only part of the host
config files. This allowed a corner case where we never read the full host
config file, but did write it to disk, losing information in the process.
Guus Sliepen [Thu, 10 Oct 2019 18:50:43 +0000 (20:50 +0200)]
Fix spurious channel closures after meshlink_stop()+meshlink_start().
When restarting the mesh without fully closing the handle, channels should
continue to work MeshLink was only stopped briefly. However, we were
accidentily always setting the connection timeout when a node's UDP status
changed, which would cause channels that didn't have any unsent data to
time out.
Guus Sliepen [Mon, 7 Oct 2019 19:00:01 +0000 (21:00 +0200)]
Add an error callback.
Normally, API functions report potential errors. However, it might happen
that the background thread runs into a serious error that prevents
MeshLink from operating as expected. Add a callback that is called in those
cases.
Currently, the only time it is called is when it can not create or modify
config files in the background thread.
Guus Sliepen [Mon, 7 Oct 2019 10:53:18 +0000 (12:53 +0200)]
Refactor the non-blackbox test suite.
- Add a default log callback function to utils.[ch]
- Use the functions from utils.[ch] where appropriate.
- Use assert() where possible.
- Make functions and variables static where possible.
- Remove the need for wrapper scripts.
Guus Sliepen [Sat, 5 Oct 2019 17:40:15 +0000 (19:40 +0200)]
Replace rand() by xoshiro256** with per-mesh state.
We need a reasonable fast PRNG that doesn't have to be secure, for things
like timer randomization, port number generation and so on. Already some
platforms warn about the use of rand(). Also, when calling fork(), both
parent and child have the same PRNG seed, which causes some inefficiencies
in the test suite.