Websocket Protocol RFC: Background

Continuing the exploration of this RFC focusing on the why
Published on 2024/03/06

As I shared in RFCs Are Not Mystical I wanted to continue exploring it and dissect every section that I find relevant. While the Abstract was a quick intro I think learning the pain points this RFC is trying to address is very important. Generally when writing it's a good habit to explain the problem first, and then describe what comes next. So let's dig in with 1.1 Background.

Historically, creating web applications that need bidirectional communication between a client and a server (e.g., instant messaging and gaming applications) has required an abuse of HTTP to poll the server for updates while sending upstream notifications as distinct HTTP calls [RFC6202].

First of all, I'm pretty thrilled that there's a whole RFC about the nature of this problem. For the sake of what we need to know about this protocol, this paragraph covers enough. If you didn't have a way to keep communication open between a client and a server, intuitively you would probably just keep asking the server for updates. This is what polling is! You just keep asking at regular intervals if there's any update. Additionally, if you need to send an update to the server, you also have to do so as individual HTTP calls. So it's a lot of back and forth both for polling and for sending updates.

We could naively think that, at the end of the day, it's just a bunch of calls. Sure it's not ideal to have that many but maybe it's not that bad either. Thankfully the RFC goes into further detail about many problems that this entails.

  • The server is forced to use a number of different underlying TCP connections for each client: one for sending information to the client and a new one for each incoming message.

This is what our first intuition translates to. A lot of different TCP connections between one server and the same client. Can you imagine this at scale? It would require an incredible amount of I/O per client. If we could keep one connection only that would be reasonable and that's what a websocket tries to offer. Let's see the second problem.

  • The wire protocol has a high overhead, with each client-to-server message having an HTTP header.

That's a fair point, in order to meet the requirements of the wire protocol, each request comes with an HTTP header. This means that for every data exchanged there's an additional effort needed to process the HTTP header on top of the message content. This data is redundant when it's the same communication between the server and the client, if we could keep a communication open then we would only need to exchange the metadata once and the rest can be mostly data. Lastly:

  • The client-side script is forced to maintain a mapping from the outgoing connections to the incoming connection to track replies.

Alright, we get it, having plenty of HTTP requests for the same communication between two hosts is a lot. The mapping referred to here is necessary mostly because each HTTP request is stateless. So how would the client know the response to a request belongs to conversation A vs conversation B? You would have to maintain this mapping as to make the requests stateful on the client. It's quite a challenge if you think about it. Just imagine good old AOL. For every chat you initiate the client would have to keep an internal mapping since it's going to send, for example, 10 polls for each of the 10 chats you have running. Then it has to make sure that when a response is received it is linked to the correct conversation.

A simpler solution would be to use a single TCP connection for traffic in both directions. This is what the WebSocket Protocol provides. Combined with the WebSocket API [WSAPI], it provides an alternative to HTTP polling for two-way communication from a web page to a remote server.

The same technique can be used for a variety of web applications: games, stock tickers, multiuser applications with simultaneous editing, user interfaces exposing server-side services in real time, etc.

This makes a lot of sense, and it's the conclusion you get to once you understand the nature of all the problems we listed. This is all nice and dandy but we will have to learn about the protocol itself and the contract between a client and a server. One step at a time, we'll get there!

Thoughts

My 5min max read time rule forces me to cut this here but I think this is enough to understand the problem and develop how we would go about it. The technical details are where things become more challenging and I'm looking forward to getting there. This initial part is important though so we understand the reason why the Websocket Protocol exists.

0
← Go Back