We don't actually support the full mDNS spec, we just send something that
passes for a valid mDNS packet, and expect other nodes to send packets back
with exactly the same format. All other mDNS packets will be ignored.
Guus Sliepen [Sat, 27 Jun 2020 21:19:17 +0000 (23:19 +0200)]
Parse Netlink NEW/DELLINK and NEW/DELADDR messages.
Parse link and address information. Also send GETLINK and GETADDR
messages to the Netlink socket, so we get the current state of the
interfaces and addresses.
Guus Sliepen [Thu, 18 Jun 2020 20:41:12 +0000 (22:41 +0200)]
Monitor a PFROUTE socket on *BSD and macOS.
Catta is not handling network changes correctly on *BSD and macOS. In
particular, after the initial startup, interfaces that go up and down
do not cause a callback to be generated, so MeshLink is not notified of
the changes.
To ensure MeshLink responds rapidly to network changes on these
platforms, we open a PFROUTE socket and monitor it ourself. We only
check the message type, and don't track exactly what addresses get added
or removed.
Don't set the maximum size to low values; keep it at the minimum of the
previous size or of the default maximum size. If it is set to something
smaller than one MTU, this would prevent receiving packets from the
peer, and channel traffic would not progress and not close properly.
We also don't move memory when shrinking the internal buffer, so we
should keep the size large enough so the last byte in the buffer is
covered when the offset is non-zero.
Don't allocate a send buffer by default, since an application might only
want to receive data. Also try to shrink the send and receive buffers on
channel close, or even free them if the buffers are empty.
Guus Sliepen [Wed, 17 Mar 2021 20:19:45 +0000 (21:19 +0100)]
Fix potential channel buffer corruption when using an application-provided buffer.
If data was still in MeshLink's internal buffer in a wrapped state, and
the application called meshlink_set_channel_*buf_storage(), the wrong
amount of data was copied into the new buffer.
Guus Sliepen [Mon, 15 Feb 2021 22:48:09 +0000 (23:48 +0100)]
Allow a different location for the lock file.
In order to reduce the write load on the configuration directory as much as
possible, allow the location of the lock file to be set by the application
to somewhere outside the configuration directory. On Linux, it can be put
in /dev/shm.
Guus Sliepen [Mon, 8 Feb 2021 17:14:38 +0000 (18:14 +0100)]
Add missing config file update on receiving a proactive REQ_KEY request.
A REQ_KEY request from a node that wants our key can contain their key as
well. The node status was not set to dirty in this case, which could
prevent their key from being written to disk with the default settings, but
this actually always forgot to write the key to disk with storage policy
set to KEYS_ONLY.
Guus Sliepen [Thu, 4 Feb 2021 21:46:58 +0000 (22:46 +0100)]
Don't allow meshlink_join() when the storage policy is DISABLED.
It does not make sense to allow this, since we need to write host config
files during a join, otherwise MeshLink's directory would be left in an
invalid state.
Guus Sliepen [Wed, 3 Feb 2021 22:30:36 +0000 (23:30 +0100)]
Don't clear node dirty flag in meshlink_stop().
It's node_write_config() itself that should unset the dirty flag, depending
on the storage policy and if storage succeeded. Otherwise, we risk that
setting the storage policy back to ENABLED after calling meshlink_stop()
will not cause pending updates to be written out by meshlink_close().
Guus Sliepen [Fri, 29 Jan 2021 21:59:57 +0000 (22:59 +0100)]
Add meshlink_set_storage_policy().
This allows control over when MeshLink stores configuration files:
- MESHLINK_STORAGE_ENABLED: on all updates
- MESHLINK_STORAGE_KEYS_ONLY: only on new keys and black/whitelist updates
- MESHLINK_STORAGE_DISABLED: never
Guus Sliepen [Mon, 15 Feb 2021 21:16:04 +0000 (22:16 +0100)]
Never call timeout_set() outside callbacks if no callback is set.
We should never try to update a timer if it was never added to the event
loop. There were a few places that didn't have an explicit check to
prevent this from happening.
Guus Sliepen [Tue, 9 Feb 2021 20:00:35 +0000 (21:00 +0100)]
Try to recover from select() returning an error.
If we get an error back from select(), it could be because one of the
filedescriptors is in a bad state. This shouldn't occur, but if it does,
just call all the registered I/O callbacks to have them check for errors.
Only quit the event loop if the error persists.
Also fix some I/O callbacks that didn't expect to be called when the
filedescriptor isn't signalled.
Guus Sliepen [Mon, 1 Feb 2021 13:07:54 +0000 (14:07 +0100)]
Fix a socket fd leak.
Due to a logic error, whenever an interface went down and up, one or
more socket fds could be leaked. A long running process on a roaming
device or an unstable network could therefore run out of fds.
Guus Sliepen [Wed, 30 Dec 2020 13:50:56 +0000 (14:50 +0100)]
Use the canonical address during UDP probes.
It is possible that a node has a canonical address, but due to NAT or other
reasons, the meta-connections with that node use other addresses. If UDP is
only possible to the canonical address, then we need to include that during
the initial UDP probing phase.
Guus Sliepen [Sun, 22 Nov 2020 10:59:54 +0000 (11:59 +0100)]
Fix potential NULL pointer dereference.
It is possible that an attempt is made to forward a request to a node
that is not reachable via meta-connections. This would trigger an assert
in debug builds, or cause a segmentation fault in release builds. Add
checks before attempts to send a request to node->nexthop->connection.
Guus Sliepen [Mon, 16 Nov 2020 19:57:46 +0000 (20:57 +0100)]
Don't try to renew SPTPS keys for unreachable nodes.
This caused other functions that incorrectly used status.validkey to check
if they could send data to a node to pass a NULL pointer to send_request(),
causing a crash about an hour after a node went offline.
Guus Sliepen [Tue, 10 Nov 2020 20:10:00 +0000 (21:10 +0100)]
Only reset UDP SPTPS sessions if the session ID changed.
Previously we reset the SPTPS session if we detected if a node was
unreachable. However, that node might not think it was unreachable,
leading to only one side to reset the SPTPS connection. This would then
take some time to resolve itself.
We already had code to detect whether a node was restarted, so we use
that to detect if, once a node becomes reachable again, it remembers the
old SPTPS session or whether we have to start a new one. This should be
deterministic and not depend on the exact timing of events.
Guus Sliepen [Thu, 29 Oct 2020 22:38:22 +0000 (23:38 +0100)]
Also send the blacklist notification when we already have a connection.
Instead of just closing the connection, and having to wait for the
reconnection to happen to send the blacklist notification, we do it
immediately when meshlink_blacklist() is called.
Guus Sliepen [Sun, 25 Oct 2020 21:17:29 +0000 (22:17 +0100)]
Check blacklist status before committing an invitation.
Although we delete invitation files when blacklisting a node, there is a
race condition where an invitation connection is created right before the
invitee is blacklisted. So check that the node is blacklisted right before
committing the node config file to disk.
Guus Sliepen [Sun, 11 Oct 2020 14:16:31 +0000 (16:16 +0200)]
When a new connection is activated, terminate any pending connections to the same peer.
This prevents issues mainly in the test suite where peers try to connect to
each other simultaneously, and have to terminate one of the connections.
Before both connections would succeed, and both would be terminated, leading
to a loop of reconnections until enough randomness got in to break the tie.
Guus Sliepen [Sun, 11 Oct 2020 13:40:34 +0000 (15:40 +0200)]
Don't reset the UDP SPTPS session when a node becomes reachable.
Only do this when it becomes unreachable. This fixes an issue where right
after a meta-connection is established, the initiator sends a proactive
REQ_KEY, before the peer really becomes reachable according to the graph.
When the latter happened, it would reset the session so far, causing a new
REQ_KEY to be sent, which could cross the ANS_KEY from the peer. This would
resolve itself after a few seconds, but causes an unnecessary delay that is
easy to trigger.
Closing a channel while there was data in the receive buffer would cause a
RST to be sent instead of a FIN. We now always send a FIN, and let data
in the receive buffer be handled for a later data handling (which would
then send a RST if necessary).
The RST could be dropped if the ACK seqno was not in the correct range.
We now always accept RSTs for established connections.
Finally, when receiving more data after closing the channel, we would just
accept the data but discard it, instead of sending a RST back. Now we do
send a RST back.
Send RST packets when receiving data after we closed a UDP channel.
If the application closed a channel, we keep the UTCP connection alive for
a bit longer to handle resends of FIN packets. However, if this is missed
for some reason, either because the FIN got lost or the peer ignored the
receive callback, and the peer is sending new data, we need to inform it
that we are no longer listening. To do this, send a RST back.
Don't use fast timeouts for fully established connections.
During the fast retry period, we want to have a fast ping timeout until we have
a fully working connection. However, the code still used fast timeouts during
the fast retry window even if the connection was fully established.
Allow sptps_force_kex() while a key exchange is in progress
We should not do anything if we are already exchanging a new key, and
just return true. This change prevents higher layers in MeshLink from
terminating a connection between two nodes if both peers call
sptps_force_kex() at nearly the same time.
Use the canonical address exclusively for making outgoing meta-connections.
If we have a node's canonical address, we now always use that as a source
for addresses for outgoing meta-connection attempts. This commit also adds
the function meshlink_clear_canonical_address() to ensure the canonical
address can be removed if it is no longer valid.
Guus Sliepen [Tue, 4 Aug 2020 13:24:07 +0000 (15:24 +0200)]
Remove temporary files at startup.
When something happens while a host config files is written, a temporary
file might be left over. Clean these up when we find them when starting
MeshLink.
The accept callback is called when the peer has already fully established a
connection. The listen callback is called earlier, when there is no
fully established channel yet. However, the listen callback itself does not
get a channel handle, it can only make a decision based on the peer node
and port number whether to accept the channel, and if so the accept callback
will be called later.
Always let the initiator send a REQ_KEY once a connection is activated.
Before, the logic was to do this when the graph reported a bidirectional
edge. However, there was a possibility that if two nodes connect to each
other simultaneously, causing a second connection to be activated while the
first was also still active, which caused the REQ_KEY to not be sent.
This function is similar to meshlink_set_node_status_cb(), except that
this callback will only be called when a meta-connection to a node is
activated or terminated. This is mainly useful for the test suite.
Fix invitation URL generation when running in a network namespace.
MeshLink could call getifaddrs() in the namespace of the caller instead of
the MeshLink thread, causing the wrong addresses to be put in the inviation
URL.
Don't use assert() to check the results of pthread_*() calls.
This was done to debug the code, but it fails when MeshLink is compiled with
-DNDEBUG. Remove all assert()s from calls to pthread functions, and instead
add explicit checks to only those functions that can fail.
Guus Sliepen [Sun, 14 Jun 2020 12:45:18 +0000 (14:45 +0200)]
React faster to network changes, including point-to-point links.
Tell Catta to also include point-to-point links, and when we get an
update from the Catta thread, wake up the main MeshLink thread so we
react to it immediately.
Guus Sliepen [Thu, 11 Jun 2020 19:52:00 +0000 (21:52 +0200)]
Use atomic operations to check whether to write to the signal pipe.
We need to do an atomic test-and-set operation to check whether we can
avoid writing to the signal pipe. Use C11 atomics to do this in a portable
way (hopefully).