X-Git-Url: http://git.meshlink.io/?p=utcp;a=blobdiff_plain;f=README;h=8451b6aa142fc1665a98d07c2ee47cff447a084c;hp=1c9baf62e2917fa1466b9f53bff1706fe9180988;hb=HEAD;hpb=bfde83247774e40b3d178852d286b00e6be75132 diff --git a/README b/README index 1c9baf6..8451b6a 100644 --- a/README +++ b/README @@ -21,6 +21,7 @@ TODO v1.0: * Implement send buffer * Window scaling * Handle retransmission + - Proper timeout handling TODO v2.0: @@ -44,15 +45,25 @@ Fast transaction: Does this need special care or can we rely on higher level MACs? +RFCs +---- + +793 Transmission Control Protocol (Functional Specification) +2581 TCP Congestion Control +2988 Computing TCP's Retransmission Timer + + + INVARIANTS ---------- - snd.una: the sequence number of the first byte we did not receive an ACK for -- snd.nxt: the sequence number of the first byte after the last one we ever sent +- snd.nxt: the sequence number of the first byte after the last packet we sent (due to retransmission, this may go backwards) - snd.wnd: the number of bytes we have left in our (UTCP/application?) input buffer +- snd.last: the sequence number of the last byte that was enqueued in the TCP stream (increases only monotonically) - rcv.nxt: the sequence number of the first byte after the last one we passed up to the application -- rcv.wnd: the number of bytes the receives has left in its input buffer (may be more/less than our send buffer size) +- rcv.wnd: the number of bytes the receiver has left in its input buffer (may be more/less than our send buffer size) - The only packets that do not have ACK set must either have SYN or RST set - Only packets received with rcv.nxt <= hdr.seq <= rcv.nxt + rcv.wnd are valid, drop others. @@ -61,30 +72,97 @@ INVARIANTS - SYN and FIN each count as one byte for the sequence numbering, but no actual byte is transferred in the payload. +CONNECTION TIMEOUT +------------------ + +This timer is intended to catch the case when we are waiting very long for a response but nothing happens. +The timeout is in the order of minutes. + +- The conn timeout is set whenever there is unacknowledged data, or when we are in the TIME_WAIT status. +- If snd.una is advanced while the timeout is set, we re-set the timeout. +- If the conn timeout expires, close the connection immediately. + +RETRANSMIT TIMEOUT +------------------ + +(See RFC 6298.) + +This timer is intended to catch the case where we didn't get an ACK from the peer. +In principle, the timeout should be slightly longer than the maximum latency along the path. + +- The rtrx timer is set whenever we send a packet that must be ACKed by the peer: + - when it contains data + - when SYN or FIN is set +- The rtrx timer is reset when we receive a packet that advances snd.una. + - it is cleared when snd.una == snd.last + - otherwise the timeout is set to the value of utcp->rto +- If the rtrx timer expires, retransmit at least one packet, multiply the timeout by two, and rearm the timeout. + +The value of RTO is calculated according to the RFC. At the moment, no +timestamps are added to packets. When the RTT timer is not set, start it when +sending a packet. When the ACK arrives, stop the timer and use the time +difference as a measured RTT value. Use the algorithm from RFC 6298 to update +RTO. + STATES ------ -CLOSED: this connection is cloed, all packets received will result in RST. +CLOSED: this connection is closed, all packets received will result in RST. + RX: RST + TX: return error + RT: clear timers + RST: ignore LISTEN: (= no connection yet): only allow SYN packets, it application does not accept, return RST|ACK, else SYN|ACK. + RX: on accept, send SYNACK, go to SYN_RECEIVED + TX: cannot happen + RT: cannot happen + RST: ignore SYN_SENT: we sent a SYN, now expecting SYN|ACK + RX: must be valid SYNACK, send ACK, go to ESTABLISHED + TX: put in send buffer (TODO: send SYN again with data?) + RT: send SYN again SYN_RECEIVED: we received a SYN, sent back a SYN|ACK, now expecting an ACK + RX: must be valid ACK, go to ESTABLISHED + TX: put in send buffer (TODO: send SYNACK again with data?) + RT: send SYNACK again ESTABLISHED: SYN is acked, we can now send/receive normal data. + RX: process data, return ACK. If FIN set, go to CLOSE_WAIT + TX: put in send buffer, segmentize and send + RT: send unACKed data again FIN_WAIT_1: we want to close the connection, and just sent a FIN, waiting for it to be ACKed. + RX: process data, return ACK. If our FIN is acked, go to FIN_WAIT_2, if a FIN was also received, go to CLOSING + TX: return error + RT: send unACKed data or else FIN again -FIN_WAIT_2: FIXME +FIN_WAIT_2: our FIN is ACKed, just waiting for more data or FIN from the peer. + RX: process data, return ACK. If a FIN was also received, go to CLOSING + TX: return error + RT: should not happen, clear timeouts CLOSE_WAIT: we received a FIN, we sent back an ACK + RX: only return an ACK. + TX: put in send buffer, segmentize and send + RT: send unACKed data again CLOSING: we had already sent a FIN, and we received a FIN back, now waiting for it to be ACKed. + RX: if it's ACKed, set conn timeout, go to TIME_WAIT + TX: return an error + RT: send unACKed data or else FIN again LAST_ACK: we are waiting for the last ACK before we can CLOSE + RX: if it's ACKed, go to CLOSED + TX: return an error + RT: send FIN again TIME_WAIT: connection is in princple closed, but our last ACK might not have been received, so just wait a while to see if a FIN gets retransmitted so we can resend the ACK. + RX: if we receive anything, reset conn timeout. + TX: return an error + RT: should not happen, clear rtrx timeout SEND PACKET ----------- @@ -111,15 +189,61 @@ RETRANSMIT RECEIVE PACKET -------------- -- Drop invalid packets: - - Invalid flags or state - - hdr.seq not within our receive window - - hdr.ack ahead of snd.nxt -- Handle RST packets -- Advance snd.una? - - reset conn timer if so - - remove ACKed data from send buffer -- If snd.una == snd.nxt, clear rtrx and conn timer -- Process state changes due to SYN -- Send new data to application -- Process state changes due to FIN +1 Drop invalid packets: + a Invalid flags or state + b ACK always set + c hdr.seq not within our receive window + d hdr.ack ahead of snd.nxt or behind snd.una +2 Handle RST packets +3 Advance snd.una? + a reset conn timer if so + b check if our SYN or FIN has been acked + c check if any data been acked + - remove ACKed data from send buffer + - increase cwnd + d no advance? NewReno +4 If snd.una == snd.nxt, clear rtrx and conn timer +5 Process state changes due to SYN +6 Send new data to application +7 Process state changes due to FIN + +CONGESTION AVOIDANCE +-------------------- + +We want to send as much packets as possible that won't cause any packets to be +dropped. So we should not send more than the available bandwidth, and not more +in one go than buffers along the path can handle. + +To start, we use "self-clocking". We send one packet, and wait for an ACK +before sending another packet. On a network with a finite bandwidth but zero +delay (latency), this will send packets as efficiently as possible. We don't +need any timers to control the outgoing packet rate, that's why we call this +self-clocked. However, latency is non-zero, and this means a number of packets +is always on the way between the sender and receiver. The amount of packets +"inbetween" is in principle the bandwidth times the delay (bandwidth-delay +product, or BDP). + +Delay is fairly easy to measure (equal to half the round-trip time of a packet, +which in TCP is easily obtained from the SYN and SYNACK pair, or the ACK in +response of a segment), however bandwidth is more difficult and might change +more rapidly than the latency. + +Back to the "inbetween" packets: ideally we would like to fill the available +inbetween space completely. It should be easy to see that in that case, +self-clocking will still work as intended. Our estimate of the amount of +packets in the inbetween space is called the congestion window (CWND). If we +know the BDP, we can set the CWND to it, however if we don't know it, we can +start with a small CWND and gradually increase it (for example, every time we +receive an ACK, send the next 2 segments). At some point, we will start sending +at a higher rate than the available bandwidth, in which case packets will +inevitably be lost. We detect that because we do not receive an ACK for our +data, and then we have to reduce the CWND (for example, by half). + +The trick is to choose an algorithm that best keeps the CWND to the effective +BDP. + +A nice introduction is RFC 2001. + +snd.cwnd: size of the congestion window. +snd.nxt - snd.una: number of unacknowledged bytes, = number of bytes in flight. +snd.cwnd - (snd.nxt - snd.una): unused size of congestion window