1 This is a light-weight, user-space implementation of RFC 793 (TCP), without any
2 reliance on an IP layer. It can be used to provide multiple in-order, reliable
3 streams on top of any datagram layer.
5 UTCP does not rely on a specific event system. Instead, the application feeds
6 it with incoming packets using utcp_recv(), and outgoing data for the streams
7 using utcp_send(). Most of the rest is handled by callbacks. The application
8 must however call utcp_timeout() regularly to have UTCP handle packet loss.
10 The application should run utcp_init() for every peer it wants to communicate
13 DIFFERENCES FROM RFC 793:
15 * No checksum. UTCP requires the application to handle packet integrity.
16 * 32-bit window size. Big window sizes are the default.
21 * Implement send buffer
23 * Handle retransmission
24 - Proper timeout handling
28 * Nagle (add PSH back to signal receiver that now we want an immediate ACK?)
30 * Congestion window scaling
42 SYN|FIN + request data ->
43 <- SYN|ACK|FIN + response data
46 Does this need special care or can we rely on higher level MACs?
51 793 Transmission Control Protocol (Functional Specification)
52 2581 TCP Congestion Control
53 2988 Computing TCP's Retransmission Timer
60 - snd.una: the sequence number of the first byte we did not receive an ACK for
61 - snd.nxt: the sequence number of the first byte after the last one we ever sent
62 - snd.wnd: the number of bytes we have left in our (UTCP/application?) input buffer
64 - rcv.nxt: the sequence number of the first byte after the last one we passed up to the application
65 - rcv.wnd: the number of bytes the receives has left in its input buffer (may be more/less than our send buffer size)
67 - The only packets that do not have ACK set must either have SYN or RST set
68 - Only packets received with rcv.nxt <= hdr.seq <= rcv.nxt + rcv.wnd are valid, drop others.
69 - If it has ACK set, and it's higher than snd.una, update snd.una.
70 But don't update it past c->snd.next. (RST in that case?)
72 - SYN and FIN each count as one byte for the sequence numbering, but no actual byte is transferred in the payload.
77 This timer is intended to catch the case when we are waiting very long for a response but nothing happens.
78 The timeout is in the order of minutes.
80 - The conn timeout is set whenever there is unacknowledged data, or when we are in the TIME_WAIT status.
81 - If snd.una is advanced while the timeout is set, we re-set the timeout.
82 - If the conn timeout expires, close the connection immediately.
89 This timer is intended to catch the case where we didn't get an ACK from the peer.
90 In principle, the timeout should be slightly longer than the maximum latency along the path.
93 - The rtrx timeout is set whenever snd.nxt is advanced.
94 - If the rtrx timeout expires, retransmit at least one packet, and re-set the timeout.
99 CLOSED: this connection is closed, all packets received will result in RST.
105 LISTEN: (= no connection yet): only allow SYN packets, it application does not accept, return RST|ACK, else SYN|ACK.
106 RX: on accept, send SYNACK, go to SYN_RECEIVED
111 SYN_SENT: we sent a SYN, now expecting SYN|ACK
112 RX: must be valid SYNACK, send ACK, go to ESTABLISHED
113 TX: put in send buffer (TODO: send SYN again with data?)
116 SYN_RECEIVED: we received a SYN, sent back a SYN|ACK, now expecting an ACK
117 RX: must be valid ACK, go to ESTABLISHED
118 TX: put in send buffer (TODO: send SYNACK again with data?)
119 RT: send SYNACK again
121 ESTABLISHED: SYN is acked, we can now send/receive normal data.
122 RX: process data, return ACK. If FIN set, go to CLOSE_WAIT
123 TX: put in send buffer, segmentize and send
124 RT: send unACKed data again
126 FIN_WAIT_1: we want to close the connection, and just sent a FIN, waiting for it to be ACKed.
127 RX: process data, return ACK. If our FIN is acked, go to FIN_WAIT_2, if a FIN was also received, go to CLOSING
129 RT: send unACKed data or else FIN again
131 FIN_WAIT_2: our FIN is ACKed, just waiting for more data or FIN from the peer.
132 RX: process data, return ACK. If a FIN was also received, go to CLOSING
134 RT: should not happen, clear timeouts
136 CLOSE_WAIT: we received a FIN, we sent back an ACK
137 RX: only return an ACK.
138 TX: put in send buffer, segmentize and send
139 RT: send unACKed data again
141 CLOSING: we had already sent a FIN, and we received a FIN back, now waiting for it to be ACKed.
142 RX: if it's ACKed, set conn timeout, go to TIME_WAIT
144 RT: send unACKed data or else FIN again
146 LAST_ACK: we are waiting for the last ACK before we can CLOSE
147 RX: if it's ACKed, go to CLOSED
151 TIME_WAIT: connection is in princple closed, but our last ACK might not have been received, so just wait a while to see if a FIN gets retransmitted so we can resend the ACK.
152 RX: if we receive anything, reset conn timeout.
154 RT: should not happen, clear rtrx timeout
159 - Put the packet in the send buffer.
160 - Decide how much to send:
161 - Not more than receive window allows
162 - Not more that congestion window allows
163 - Segmentize and send the packets
164 - At the end, snd.nxt is advanced with the number of bytes sent
165 - Set the rtrx and conn timers if they have not been set
170 - Decide how much to send:
171 - Not more than we have in the send buffer
172 - Not more than receive window allows
173 - Not more that congestion window allows
174 - Segmentize and send packets
175 - No advancement of sequence numbers happen
176 - Reset the rtrx timers
181 1 Drop invalid packets:
182 a Invalid flags or state
184 c hdr.seq not within our receive window
185 d hdr.ack ahead of snd.nxt or behind snd.una
188 a reset conn timer if so
189 b check if our SYN or FIN has been acked
190 c check if any data been acked
191 - remove ACKed data from send buffer
193 d no advance? NewReno
194 4 If snd.una == snd.nxt, clear rtrx and conn timer
195 5 Process state changes due to SYN
196 6 Send new data to application
197 7 Process state changes due to FIN
202 We want to send as much packets as possible that won't cause any packets to be
203 dropped. So we should not send more than the available bandwidth, and not more
204 in one go than buffers along the path can handle.
206 To start, we use "self-clocking". We send one packet, and wait for an ACK
207 before sending another packet. On a network with a finite bandwidth but zero
208 delay (latency), this will send packets as efficiently as possible. We don't
209 need any timers to control the outgoing packet rate, that's why we call this
210 self-clocked. However, latency is non-zero, and this means a number of packets
211 is always on the way between the sender and receiver. The amount of packets
212 "inbetween" is in principle the bandwidth times the delay (bandwidth-delay
215 Delay is fairly easy to measure (equal to half the round-trip time of a packet,
216 which in TCP is easily obtained from the SYN and SYNACK pair, or the ACK in
217 response of a segment), however bandwidth is more difficult and might change
218 more rapidly than the latency.
220 Back to the "inbetween" packets: ideally we would like to fill the available
221 inbetween space completely. It should be easy to see that in that case,
222 self-clocking will still work as intended. Our estimate of the amount of
223 packets in the inbetween space is called the congestion window (CWND). If we
224 know the BDP, we can set the CWND to it, however if we don't know it, we can
225 start with a small CWND and gradually increase it (for example, every time we
226 receive an ACK, send the next 2 segments). At some point, we will start sending
227 at a higher rate than the available bandwidth, in which case packets will
228 inevitably be lost. We detect that because we do not receive an ACK for our
229 data, and then we have to reduce the CWND (for example, by half).
231 The trick is to choose an algorithm that best keeps the CWND to the effective
234 A nice introduction is RFC 2001.
236 snd.cwnd: size of the congestion window.
237 snd.nxt - snd.una: number of unacknowledged bytes, = number of bytes in flight.
238 snd.cwnd - (snd.nxt - snd.una): unused size of congestion window