Add utcp_set_clock_granularity().

[utcp] / README
diff --git a/README b/README

index d5edf564fb93533662533529eba3da378f1fcd21..8451b6aa142fc1665a98d07c2ee47cff447a084c 100644 (file)
--- a/README
+++ b/README
@@ -21,10 +21,11 @@ TODO v1.0:
  * Implement send buffer
  * Window scaling
  * Handle retransmission
+  - Proper timeout handling
  
  TODO v2.0:
  
-* Nagle
+* Nagle (add PSH back to signal receiver that now we want an immediate ACK?)
  * NAK and SACK
  * Congestion window scaling
  * Timestamps?
@@ -43,3 +44,206 @@ Fast transaction:
         ACK ->
  
  Does this need special care or can we rely on higher level MACs?
+
+RFCs
+----
+
+793  Transmission Control Protocol (Functional Specification)
+2581 TCP Congestion Control
+2988 Computing TCP's Retransmission Timer
+
+
+
+INVARIANTS
+----------
+
+- snd.una: the sequence number of the first byte we did not receive an ACK for
+- snd.nxt: the sequence number of the first byte after the last packet we sent (due to retransmission, this may go backwards)
+- snd.wnd: the number of bytes we have left in our (UTCP/application?) input buffer
+- snd.last: the sequence number of the last byte that was enqueued in the TCP stream (increases only monotonically)
+
+- rcv.nxt: the sequence number of the first byte after the last one we passed up to the application
+- rcv.wnd: the number of bytes the receiver has left in its input buffer (may be more/less than our send buffer size)
+
+- The only packets that do not have ACK set must either have SYN or RST set
+- Only packets received with rcv.nxt <= hdr.seq <= rcv.nxt + rcv.wnd are valid, drop others.
+- If it has ACK set, and it's higher than snd.una, update snd.una.
+  But don't update it past c->snd.next. (RST in that case?)
+
+- SYN and FIN each count as one byte for the sequence numbering, but no actual byte is transferred in the payload.
+
+CONNECTION TIMEOUT
+------------------
+
+This timer is intended to catch the case when we are waiting very long for a response but nothing happens.
+The timeout is in the order of minutes.
+
+- The conn timeout is set whenever there is unacknowledged data, or when we are in the TIME_WAIT status.
+- If snd.una is advanced while the timeout is set, we re-set the timeout.
+- If the conn timeout expires, close the connection immediately.
+
+RETRANSMIT TIMEOUT
+------------------
+
+(See RFC 6298.)
+
+This timer is intended to catch the case where we didn't get an ACK from the peer.
+In principle, the timeout should be slightly longer than the maximum latency along the path.
+
+- The rtrx timer is set whenever we send a packet that must be ACKed by the peer:
+  - when it contains data
+  - when SYN or FIN is set
+- The rtrx timer is reset when we receive a packet that advances snd.una.
+  - it is cleared when snd.una == snd.last
+  - otherwise the timeout is set to the value of utcp->rto
+- If the rtrx timer expires, retransmit at least one packet, multiply the timeout by two, and rearm the timeout.
+
+The value of RTO is calculated according to the RFC. At the moment, no
+timestamps are added to packets. When the RTT timer is not set, start it when
+sending a packet. When the ACK arrives, stop the timer and use the time
+difference as a measured RTT value.  Use the algorithm from RFC 6298 to update
+RTO.
+
+STATES
+------
+
+CLOSED: this connection is closed, all packets received will result in RST.
+  RX: RST
+  TX: return error
+  RT: clear timers
+  RST: ignore
+
+LISTEN: (= no connection yet): only allow SYN packets, it application does not accept, return RST|ACK, else SYN|ACK.
+  RX: on accept, send SYNACK, go to SYN_RECEIVED
+  TX: cannot happen
+  RT: cannot happen
+  RST: ignore
+
+SYN_SENT: we sent a SYN, now expecting SYN|ACK
+  RX: must be valid SYNACK, send ACK, go to ESTABLISHED
+  TX: put in send buffer (TODO: send SYN again with data?)
+  RT: send SYN again
+
+SYN_RECEIVED: we received a SYN, sent back a SYN|ACK, now expecting an ACK
+  RX: must be valid ACK, go to ESTABLISHED
+  TX: put in send buffer (TODO: send SYNACK again with data?)
+  RT: send SYNACK again
+
+ESTABLISHED: SYN is acked, we can now send/receive normal data.
+  RX: process data, return ACK. If FIN set, go to CLOSE_WAIT
+  TX: put in send buffer, segmentize and send
+  RT: send unACKed data again
+
+FIN_WAIT_1: we want to close the connection, and just sent a FIN, waiting for it to be ACKed.
+  RX: process data, return ACK. If our FIN is acked, go to FIN_WAIT_2, if a FIN was also received, go to CLOSING
+  TX: return error
+  RT: send unACKed data or else FIN again
+
+FIN_WAIT_2: our FIN is ACKed, just waiting for more data or FIN from the peer.
+  RX: process data, return ACK. If a FIN was also received, go to CLOSING
+  TX: return error
+  RT: should not happen, clear timeouts
+
+CLOSE_WAIT: we received a FIN, we sent back an ACK
+  RX: only return an ACK.
+  TX: put in send buffer, segmentize and send
+  RT: send unACKed data again
+
+CLOSING: we had already sent a FIN, and we received a FIN back, now waiting for it to be ACKed.
+  RX: if it's ACKed, set conn timeout, go to TIME_WAIT
+  TX: return an error
+  RT: send unACKed data or else FIN again
+
+LAST_ACK: we are waiting for the last ACK before we can CLOSE
+  RX: if it's ACKed, go to CLOSED
+  TX: return an error
+  RT: send FIN again
+
+TIME_WAIT: connection is in princple closed, but our last ACK might not have been received, so just wait a while to see if a FIN gets retransmitted so we can resend the ACK.
+  RX: if we receive anything, reset conn timeout.
+  TX: return an error
+  RT: should not happen, clear rtrx timeout
+
+SEND PACKET
+-----------
+
+- Put the packet in the send buffer.
+- Decide how much to send:
+  - Not more than receive window allows
+  - Not more that congestion window allows
+- Segmentize and send the packets
+- At the end, snd.nxt is advanced with the number of bytes sent
+- Set the rtrx and conn timers if they have not been set
+
+RETRANSMIT
+----------
+
+- Decide how much to send:
+  - Not more than we have in the send buffer
+  - Not more than receive window allows
+  - Not more that congestion window allows
+- Segmentize and send packets
+- No advancement of sequence numbers happen
+- Reset the rtrx timers
+
+RECEIVE PACKET
+--------------
+
+1 Drop invalid packets:
+  a Invalid flags or state
+  b ACK always set
+  c hdr.seq not within our receive window
+  d hdr.ack ahead of snd.nxt or behind snd.una
+2 Handle RST packets
+3 Advance snd.una?
+  a reset conn timer if so
+  b check if our SYN or FIN has been acked
+  c check if any data been acked
+    - remove ACKed data from send buffer
+    - increase cwnd
+  d no advance? NewReno
+4 If snd.una == snd.nxt, clear rtrx and conn timer
+5 Process state changes due to SYN
+6 Send new data to application
+7 Process state changes due to FIN
+
+CONGESTION AVOIDANCE
+--------------------
+
+We want to send as much packets as possible that won't cause any packets to be
+dropped.  So we should not send more than the available bandwidth, and not more
+in one go than buffers along the path can handle.
+
+To start, we use "self-clocking". We send one packet, and wait for an ACK
+before sending another packet. On a network with a finite bandwidth but zero
+delay (latency), this will send packets as efficiently as possible. We don't
+need any timers to control the outgoing packet rate, that's why we call this
+self-clocked. However, latency is non-zero, and this means a number of packets
+is always on the way between the sender and receiver. The amount of packets
+"inbetween" is in principle the bandwidth times the delay (bandwidth-delay
+product, or BDP).
+
+Delay is fairly easy to measure (equal to half the round-trip time of a packet,
+which in TCP is easily obtained from the SYN and SYNACK pair, or the ACK in
+response of a segment), however bandwidth is more difficult and might change
+more rapidly than the latency.
+
+Back to the "inbetween" packets: ideally we would like to fill the available
+inbetween space completely. It should be easy to see that in that case,
+self-clocking will still work as intended. Our estimate of the amount of
+packets in the inbetween space is called the congestion window (CWND).  If we
+know the BDP, we can set the CWND to it, however if we don't know it, we can
+start with a small CWND and gradually increase it (for example, every time we
+receive an ACK, send the next 2 segments). At some point, we will start sending
+at a higher rate than the available bandwidth, in which case packets will
+inevitably be lost. We detect that because we do not receive an ACK for our
+data, and then we have to reduce the CWND (for example, by half).
+
+The trick is to choose an algorithm that best keeps the CWND to the effective
+BDP.
+
+A nice introduction is RFC 2001.
+
+snd.cwnd: size of the congestion window.
+snd.nxt - snd.una: number of unacknowledged bytes, = number of bytes in flight.
+snd.cwnd - (snd.nxt - snd.una): unused size of congestion window