Websocket Protocol: Closing Handshake

An interesting insight on how the protocol manages to close connections gracefully
Published on 2024/04/17

Continuing the exploration of the Websocket RFC, we previously covered:

This might be a quick one since the Closing Handshake section is small. Don't let that fool you, a proper way to close a connection is key to prevent data loss. I'll take a quick detour to explain the sentence that you'll find at the beginning of many sections in this RFC:

This section is non-normative.

This is specified to let the reader know that the following paragraph is informational and does NOT provide the set of rules that need to be respected to comply with the Websocket standard. These are used to build an understanding of what the protocol is about without the need to dive into technical requirements yet.

That said, let's dig in.

Either peer can send a control frame with data containing a specified control sequence to begin the closing handshake (detailed in Section 5.5.1).

The first thing to recall here is the "control frame". We mentioned the different types of data exchanged in the Websocket Protocol Overview. Control frames are not for carrying any data the peers intentionally want to exchange, they are instead used for protocol-level signaling, just like communicating the intention to close the connection. Typically they don't carry any significant data other than a status code and a reason (most likely this is covered by section 5.5.1).

Upon receiving such a frame, the other peer sends a Close frame in response, if it hasn't already sent one. Upon receiving that control frame, the first peer then closes the connection, safe in the knowledge that no further data is forthcoming.

This sounds familiar. This is what a handshake is all about! Both peers have to acknowledge the intention to close the connection. It doesn't matter who starts this as long as they acknowledge one another in doing so. It's a way to notify the other peer that no other data should be sent over since they are planning to close up shop anyway.

After sending a control frame indicating the connection should be closed, a peer does not send any further data; after receiving a control frame indicating the connection should be closed, a peer discards any further data received.

Our intuition was correct and confirmed in this paragraph.

It is safe for both peers to initiate this handshake simultaneously.

This seems fair and we somewhat inferred that from the previous paragraphs. It doesn't matter who starts the closing handshake as long as it's acknowledged by the other peer. What's important here is that the simultaneous initiation is handled by the protocol and doesn't cause any data loss.

The closing handshake is intended to complement the TCP closing handshake (FIN/ACK), on the basis that the TCP closing handshake is not always reliable end-to-end, especially in the presence of intercepting proxies and other intermediaries.

Even if you're not familiar with how a TCP connection is closed and how segments are exchanged, what you should get out of this paragraph is that the TCP closing handshake is unreliable. Several scenarios can make it so: early closure by intermediaries (e.g. proxy), loss of packets at the network layer, ...

What's not clear here is that the whole Websocket closing handshake happens on an open TCP connection. It doesn't rely on TCP only to close it. This guarantees that eventually, both peers are aware the WebSocket connection is getting closed. Once that's completed they can proceed with the regular closing of the TCP connection.

By sending a Close frame and waiting for a Close frame in response, certain cases are avoided where data may be unnecessarily lost. For instance, on some platforms, if a socket is closed with data in the receive queue, a RST packet is sent, which will then cause recv() to fail for the party that received the RST, even if there was data waiting to be read.

This might need a bit of untangling. The idea is to provide an example of what could happen if we didn't have a process to gracefully close a Websocket connection. Here's the scenario we avoid:

  • Peer 1 has closed the TCP connection without sending a Close frame
  • Peer 2 currently has some data in its receive queue that was sent previously by Peer 1
  • Peer 1 will send an RST packet, this happens when the TCP connection abruptly closes
  • Peer 2 receives the RST packet, which causes the recv() to fail, discarding any data that was in the buffer

Since the RFC expects a graceful close while the TCP connection is open, no data is lost in the process.

Thoughts

I thought this section would have been a breeze, but it had some gotchas here and there. I didn't know the TCP connection was kept open while the Websocket handles the closing handshake. Hopefully, my digging through this brought some more clarity.

0
← Go Back