RFCs Are Not Mystical

These deeply technical documents are surprisingly clear and you should read them
Published on 2024/02/23

I've been interested in web technologies (saying that makes me feel old) since the beginning of the internet (not getting better is it). Sometimes I reflect on my learning journey and introduction to computers, the resources available today are such a privilege and back when I started I was stuck with old books and more confusion than anything. I just couldn't wrap my head around some fundamental concepts and I kept giving up and going back to it. Picked up some C, then some Python, then some minor scripting with JS. I was a bit all over the place. A more formal introduction started only when I joined college and even then it was all so formal and abstract that I felt I was learning things in a vacuum. Just for the sake of it.

Without derailing too much, with time and as I got more comfortable with some fundamentals I slowly found out that some of the "deeply" technical documents are not as scary as they sound. I remember the first times I was skeptical about reading an RFC until I had to. While I won't digress on why I had to do this, I found myself having to read the HTTP2 RFC. I dreaded every minute BEFORE starting the read and as I jumped around the parts that I was interested in, I realized that it was a much nicer and understandable read than I thought.

I occasionally push people to read an RFC and, admittedly, I should do this more often. Just do it after you use whichever technology you're trying to learn more about. Bridging the gap between what the document is about and its practical application is part of being able to go from abstract to real examples and consolidate your understanding. So here we are exploring the WebSocket Protocol RFC because why not? Let me see if I can convince you it is not too wild of a read. Let's start with the Abstract.

The WebSocket Protocol enables two-way communication between a client running untrusted code in a controlled environment to a remote host that has opted-in to communications from that code.

This is intuitive. The goal is to allow communication between two entities. The client cannot be trusted by either the protocol or the server, this is expected since they don't have control over what code is run by that client so it is considered "untrusted". It can't be vetted in any way. A typical example of a controlled environment is the web browser which enforces security policies and whatnot. The subtle part here is more of an expectation, the protocol per se cannot guarantee for the environment to be controlled and the responsibility is left to the developers. The remote host can be anything that has agreed to start that communication, the way to agree is detailed later.

The security model used for this is the origin-based security model commonly used by web browsers.

If you've ever battled with CORS in your career then you know what this is about. This is a security measure typically used by web browsers to limit communication between two origins. A simple example can help clarify this, if your server has origin protocol://my-domain:port and let's say a client is requesting access to a resource from that origin but its origin is protocol://other-domain:port by default that request will be rejected because it breaks the origin-based security model. In this case, the problem is that the origin is different so the request is not considered trusted. This all makes sense, it leverages an existing security model.

The protocol consists of an opening handshake followed by basic message framing, layered over TCP.

You remember earlier when we mentioned that we're enabling two-way communication between a client and a host that has opted-in. The opening handshake is how this is achieved. It's a way for two hosts to agree on initiating a communication. When it comes to message framing, you don't need to know much at this point of the RFC. Intuitively it sounds like it's a specific way to send messages between hosts. I am familiar with framing because they are referenced in the HTTP2 RFC but they are explained in more detail in this RFC as well as applicable to web sockets. If you've made it this far I hope you're familiar with TCP. If not that's OK, you can go read the RFC and then come back! The main point here is that the WebSocket Protocol expects to leverage TCP over UDP. This means that it prioritizes reliability over time. It's more important that messages are exchanged reliably and in order rather than faster but unreliably (meaning it can tolerate losses).

The goal of this technology is to provide a mechanism for browser-based applications that need two-way communication with servers that does not rely on opening multiple HTTP connections (e.g., using XMLHttpRequest or