Websocket Protocol: Opening Handshake - Part 1

Introducing a dissection of the protocol initial handshake
Published on 2024/03/22

Back at it with the Websocket Protocol, this time we'll go into more detail about the Opening Handshake.

The opening handshake is intended to be compatible with HTTP-based server-side software and intermediaries, so that a single port can be used by both HTTP clients talking to that server and WebSocket clients talking to that server.

I believe the gist of it is that we don't want to recreate anything too special for websockets and leverage existing architecture for this type of communication. As we mentioned in the protocol overview, intermediaries can be any device part of the network. I feel like this is important because they already know how to speak to one another so we don't need special devices to introduce a websocket. In fact, we will use the same port so that regular HTTP clients can use it as well as WebSocket clients.

We saw this already in the overview:

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13

We explained a bit the first line but there's a detail about it that tripped me up as I gave it for granted. That is the Request-URI (i.e. "/chat"). Specifically:

The "Request-URI" of the GET method [RFC2616] is used to identify the endpoint of the WebSocket connection, both to allow multiple domains to be served from one IP address and to allow multiple WebSocket endpoints to be served by a single server.

Don't worry I had to read it twice as well but then realized that I was overthinking this. One server with a given IP address can serve multiple domains (e.g. mytodoapp.com, mycryptoapp.com), so the combination of Host and the Request-URI is what gives you the flexibility to allow multiple domains AND multiple WebSocket endpoints from a single domain. It's not the Request-URI by itself to give you this full flexibility though, which I find a bit misleading.

There are a few options available used as headers but in this case, we specify the subprotocol from Sec-WebSocket-Protocol. The client is saying that it will accept either chat or superchat, and the server will pick one and communicate that in its response.

The |Origin| header field [RFC6454] is used to protect against unauthorized cross-origin use of a WebSocket server by scripts using the WebSocket API in a web browser. The server is informed of the script origin generating the WebSocket connection request. If the server does not wish to accept connections from this origin, it can choose to reject the connection by sending an appropriate HTTP error code. This header field is sent by browser clients; for non-browser clients, this header field may be sent if it makes sense in the context of those clients.

This is common for HTTP requests, the server can and will specify domains of origin it accepts requests from. Especially if the client domain is different from the server, that's when you can incur into the good old CORS issue.

Thoughts

I was hoping to get further ahead but that's all I got for today! I think the way the protocol wants the handshake to be confirmed is interesting enough that I didn't want to squeeze it here and take our time to cover it. I also think I'm about to get really sick tonight, the aches have started so I'm taking it easy. Cheers and take care!

0
← Go Back