Guus Sliepen [Thu, 28 May 2020 19:02:32 +0000 (21:02 +0200)]
Wake up the MeshLink thread if framed channel data is pending to be flushed.
When sending data on framed UDP channels, if there is a partial frame in
the send buffer waiting to be flushed after the flush timer expires, this
data was added by the application thread. The MeshLink thread does not know
that a timer was updated, and might use an old timeout value and not
respond in time. So if we detect that this might happen, we signal the
MeshLink thread so it can calculate a new timeout and call select() again.
Guus Sliepen [Wed, 27 May 2020 19:09:00 +0000 (21:09 +0200)]
Fix reception of a trailing, zero-length frame.
If the last (or only) frame received on a framed UDP channel had zero length,
we would not send it to the application, but keep it in the receive buffer
until more frames had been received.
Guus Sliepen [Wed, 27 May 2020 19:07:32 +0000 (21:07 +0200)]
Ensure the flush timer is started if we never had any full packets to send.
The flush timer ensures that if there is partial data left in the send
buffer for framed UDP channels, that this is sent after the flush timeout.
This was done correctly if we had previously sent full packets, but if there
never was a full packet the timer wouldn't be started, and the small frames
wouldn't be sent unless the application would send more data on the channel.
Guus Sliepen [Wed, 27 May 2020 19:00:32 +0000 (21:00 +0200)]
Ensure the poll callback is called when a channel is fully established.
The optimization that reduced how often the poll callback is called when
the application did not write anything to the channel in the callback also
inadvertently stopped it from being called right when the channel is
fully established.
Guus Sliepen [Wed, 27 May 2020 18:57:43 +0000 (20:57 +0200)]
Update the channels-*-framed tests to test for isolated packets.
The test sent a lot of packets in succession, and only at the end checked
if everything was received. This prevented it from detecting the case where
sending a single small frame would not cause the frame to be received.
Guus Sliepen [Sun, 24 May 2020 22:11:10 +0000 (00:11 +0200)]
Implement MESHLINK_CHANNEL_FRAMED.
Both UDP and TCP style channels can now be set to use message framing.
Allowed message sizes are 0 to 65535 bytes.
For TCP, this means every meshlink_channel_send() will cause exactly one
receive callback on the other end, with the same size as the sent data.
For UDP style channels, this was already the normal behaviour (absent
packet loss), but with framing enabled UTCP will now concatenate multiple
messages in a single packet if possible.
Guus Sliepen [Thu, 21 May 2020 12:48:02 +0000 (14:48 +0200)]
Explicitly set the stack size for the MeshLink thread.
Different libcs have different default sizes for newly created threads. In
particular, Musl defaults to 80 kB, which is too small for MeshLink. We now
request 1 MB, which should be more than enough to handle the deepest call
stacks.
Guus Sliepen [Fri, 15 May 2020 21:12:34 +0000 (23:12 +0200)]
Include our own key in REQ_PUBKEY requests.
If we don't know a peer's public key, it most likely means the peer
doesn't know our public key, so proactively send it along with the
REQ_PUBKEY request.
Before we allowed buf->offset to be equal to buf->size. This caused an
issue where buffer_call() would call the callback twice, once for 0
bytes at the end of the buffer, and once for len bytes at the start of
the buffer. This would cause the callback function to think the channel
had encountered an error.
If the data in the ringbuffer wraps around, and we call the receive
callback for the first part of the data, the callback function might
close the channel, so we must not call the callback for the second part
of the data.
Guus Sliepen [Mon, 11 May 2020 17:52:00 +0000 (19:52 +0200)]
Move UTCP into the MeshLink repository.
UTCP is not used outside of MeshLink at the moment, and there is a tight
coupling between the two, so it makes more sense to have it as part of
MeshLink itself.
Guus Sliepen [Fri, 8 May 2020 10:48:44 +0000 (12:48 +0200)]
Handle meshlink_channel_close() being called in callbacks.
When it's called in a callback, we can't free the channel until the
function that called the callback has a chance to safely complete. This
is not a problem for regular receive and poll callbacks, but it is for AIO,
where there can be multiple outstanding AIO buffers that each need their
callback called to signal completion, and each of them could potentially
call meshlink_channel_close().
This also ensures that when the channel is explicitly closed by the
application, it will not receive any further callbacks.
The event loop was assuming that a timespec value of {0, 0} meant that the
timer was not added to the timer tree. However, it was possible for other
parts of the code to set the value to {0, 0}, which could result in a
segmentation fault. Use the splay_node_t data pointer to check whether a
timeout is linked into the tree instead.
Several fixes for channel AIO send and receive functions.
- Process multiple buffers if possible
- Better handling error conditions
- fd errors now cancel the AIO buffer
- channel errors cancel all outstanding AIO buffers
- Don't call the poll callback with a length larger than the remaining
UTCP send buffer.
Make UTCP retranmissions trigger PMTU probes immediately.
If there are network problems while data is being transferred over a
channel, we want to react to this as soon as possible. Set the retranmission
callback to trigger the next PMTU probe immediately if there as none in
progress.
Allow meshlink_open() to be called with a NULL name.
This will use the name used last time the MeshLink instance was initialized.
If there is no initialized instance at the given confbase, it will return
an error.
Opening an instance with a different name than the one in the configuration
files will now also result in an error.
When resetting timers that use CLOCK_MONOTONIC, use a negative value.
CLOCK_MONOTONIC might be implemented as the time since the CPU booted, so
if MeshLink starts soon after booting, setting timers to "0" might not
actually be far enough in the past to trigger a timeout.
This has almost no effect in practice, since most timeouts are a minute or
less, but it might affect running tests in virtual machines.
These block for a limited amount of time, preventing lookups from taking
too long. Because these requests can be done without the main MeshLink
thread running, we don't use the request queue, but instead spawn a
thread for each blocking request.
Guus Sliepen [Thu, 23 May 2019 21:02:43 +0000 (23:02 +0200)]
Add an asynchronous DNS thread.
Add a thread dedicated to making DNS lookups. There are two queues, one
for pending DNS requests and one for done DNS requests. The async DNS
thread reads from the pending request queue, checks for each request if
the deadline has not been met yet, and if so calls getaddrinfo(). Once
the result is obtained, it adds that to the done request queue and
signals the main meshlink thread, which will then call the callback
function associated with the DNS request.
Update UTCP to support fragmenting packets on UDP style channels.
This allows the application to send packets of arbitrary size (up to 64 kiB)
without worrying about the path MTU to the destination node, which might
vary, especially at the start of a connection.
If the application doesn't want packets to fragment, it should use
meshlink_channel_get_mss() to query the maximum size for unfragmented
packets.
Roop [Tue, 4 Feb 2020 10:46:29 +0000 (16:16 +0530)]
Add meshlink_get_all_nodes_by_last_reachable API, meshlink_get_node_reachability API and its test vectors
MeshLink now keeps track of when a node was last reachable. This can be
used by an application to detect nodes that were never reachable or which
have not been reachable for a certain amount of time.
Guus Sliepen [Fri, 27 Mar 2020 21:52:46 +0000 (22:52 +0100)]
Reduce how often we have to poll the packet queue.
Packets are moved to the MeshLink thread via the packet queue. However,
each packet required a trigger byte to be sent to the event loop, requiring
more calls to select() than necessary. Now we make event loop signals level
triggered, and dequeue all enqueued packets at once.
This also adds debug log statements for the packet queue.
Guus Sliepen [Tue, 10 Mar 2020 21:42:33 +0000 (22:42 +0100)]
Add all recent addresses resolved from a hostname in meshlink_invite().
When a canonical hostname or an invitation address resolves to multiple
numeric addresses, add all of them as recent addresses for ourself, so
they are all part of the host config file we send to the invitee.
Guus Sliepen [Tue, 10 Mar 2020 21:37:28 +0000 (22:37 +0100)]
Update the invite-join test.
- Check that duplicate addresses get culled correctly.
- Check that we can add lots of extra invitation addresses, and that
they are in the expected order in the invitation URL.
Guus Sliepen [Fri, 6 Mar 2020 23:19:57 +0000 (00:19 +0100)]
Handle not being able to bind to the configured port at startup.
When starting a MeshLink node that has already been configured to run on a
certain port, but that port is in use (for one or more of the supported
address families), it would either ignore some address families, or would
try to bind to port 0 if all address families failed. However, this is
problematic, because it makes discovery and invitation URL generation much
harder.
Fix this by checking if any port binding fails for a supported address
family, and if so, try to find another port that does support binding on
all address families. If it fails to find any available port, it will fall
back to binding to port 0, so that outgoing connections are still possible.
Guus Sliepen [Fri, 6 Mar 2020 22:24:49 +0000 (23:24 +0100)]
Don't abort on empty lines in receive_request().
Remove the assertion that lines are not empty, since this could lead to
a DoS attack. Empty lines are already handled correctly by the rest of
the logic in receive_request().
This adds a function to add one or more application-controlled address and
port combinations to invitation URLs. It is meant to replace
meshlink_add_address(), which is too limited because it only allows one
address to be set, and doesn't allow a different port number to be set.
Guus Sliepen [Fri, 28 Feb 2020 19:09:11 +0000 (20:09 +0100)]
Avoid ports that are in use by not all address families.
It could happen that a port is bound by another application, but only
for some of the supported address families (ie, only IPv4 but not IPv6).
We don't want MeshLink to then bind to the other address familie(s), but
rather have it try another port altogether.
Guus Sliepen [Fri, 28 Feb 2020 18:25:52 +0000 (19:25 +0100)]
Further improve try_bind().
Make try_bind() do the same checks as add_listen_address() does: try to
create both a TCP and UDP socket on a given port for all address
families. If one address family succeeds for both TCP and UDP, consider
this a valid port.
Guus Sliepen [Tue, 25 Feb 2020 19:39:48 +0000 (20:39 +0100)]
Fix logic in try_bind().
Fix the check for successful socket creation. Also make sure we only
return success if we can bind to IPv4 and IPv6, but ignore other
network protocols.
Guus Sliepen [Tue, 11 Feb 2020 21:28:24 +0000 (22:28 +0100)]
Make the join commit order configurable.
By default, when an invitee joins a mesh, it will commit its configuration
to disk first, then the inviter. This adds a function to reverse that order.
Guus Sliepen [Tue, 11 Feb 2020 20:37:33 +0000 (21:37 +0100)]
Move join state out of meshlink_handle_t, and ensure proper cleanup on errors.
Move the state we keep when calling meshlink_join() out of meshlink_handle_t
and just put it on the stack of meshlink_join(). Also make sure we properly
release allocated resources in all error conditions during a join.
Guus Sliepen [Sat, 8 Feb 2020 13:55:21 +0000 (14:55 +0100)]
Fall back to getifaddrs() to get an interface address if there is no default route.
When generating invitations, we try to find a suitable local interface
address by faking an outgoing connection to the Internet. However,
that doesn't work if there is no default route. In this case, fall back
to using getifaddrs() if that function is available, and filter out any
link-local and loopback addresses.
Guus Sliepen [Thu, 6 Feb 2020 20:34:43 +0000 (21:34 +0100)]
Use bind() to check if a local address is still valid.
Some platforms don't support getifaddrs(). We use this to check if the
local address of a socket is still available on any network interface.
Instead, try to bind() a new socket to the same address (but port 0) as
existing sockets. If it returns EADDRNOTAVAIL, we know that this address
is no longer valid.
Guus Sliepen [Mon, 3 Feb 2020 16:43:50 +0000 (17:43 +0100)]
Clear reachability times in host config files received during a join.
When a node joins an existing mesh, it gets passed one or more host config
files from the inviter. However, these might contain non-zero reachability
times, but the invitee has never seen those nodes, so clear them before
storing the host config files.