Data Alignment

Creating a mental model that is easier to digest
Published on 2024/06/04

Continuing the journey of "Understanding Software Dynamics" I stumbled upon an innocent section called "Data Alignment". This was not the first time I read about this topic, if you haven't picked up "100 Go Mistakes and How to Avoid Them" you should! No matter your level of experience it's very well put together. I'm due for a re-read. At the time I couldn't go in-depth so I had to skim through some sections (being a new dad and all with little sleep).

Data alignment is subtle but can make a difference, I would recommend being aware of it and only digging in when you are operating at a large scale. I'll go over it very briefly, hopefully it will help you out.

When we talk about an aligned reference it's always with respect to a specific item. Such items can be for example integer values. For simplicity let's say they are 4-byte items. For a reference to be aligned it would have to be positioned at multiples of 4. If you have an integer at position 0, an aligned reference would be at position 4, 8, 12, and so on. Thanks to my wonderful visual skills, here it is:

0    4    8    12
|xxxx|----|----| ...

It's simple enough to imagine that if the integer you are trying to retrieve from memory is saved at position 0 (beautifully represented by the xs above), you will only need one CPU cycle. This is because memory (and cache) accesses are done in aligned quantities. This means that my other access attempts will happen at 4, 8, 12, and so on. The "problem" comes when the integer is not stored at an aligned reference. This means that it is not stored at position 0, 4, 8, you get the idea.

0    4    8    12
|--xx|xx--|----| ...

In this case, just fetching the 4-byte block at position 0 is not enough. The CPU knows we are trying to fetch an integer which is 4 bytes. So it needs another cycle to fetch the block at position 4 containing the remainder of the integer value. At this point, the CPU will use 2 cycles to fetch both blocks, it will then do some shifting and merging to reconstruct the 4-byte integer. I feel like this is also pretty intuitive, it is clear that more work is required thus the operation is more expensive.

Some confusion for me came from a different example. Let's say we have an architecture with 8-byte words. My integer is located at position 2 as follows:

0  2345  8        16
|--xxxx--|--------| ...

This will only require one CPU cycle to be fetched since it is capable of handling 8-byte words. The CPU now has all the data it needs to extract the integer. It is clear that we saved a CPU cycle but this example is still not an aligned reference since it's a 4-byte item at position 2 (and 2 is not a multiple of 4). The challenge for me (since I recall very little from college) was to understand the benefit of alignment in these cases. I thought that as long as the data I need is within the same word, then it is more efficient for the CPU to fetch it (and this is correct). What I didn't understand is the role alignment plays in this case. The reason is that no matter where it is positioned, the CPU will have to do some work to extract the data.

Here's what helped me a bit more. The CPU optimizations depend on the data type (e.g. 2-byte vs 4-byte) and where that item is positioned. If we are fetching a 2-byte item at position 2:

0  23    8        16
|--xx----|--------| ...

The CPU recognizes that given the data type and its position, this is an aligned reference. This knowledge allows it to use optimized paths. On the other hand, if this data type needed 4 bytes (like the previous example), the CPU would recognize that, based on that type, the data is not an aligned reference. This will require some shifting and masking to extract the information correctly and make sure it is aligned to use optimized paths.

Thoughts

This is as far as I can push it today and should be enough to continue my reading. What still bothers me is these "optimized paths", I wasn't able to explore this in more detail and I might ask on the book club Google group to understand what it means. I'm happy I spent this time trying to clarify a 7-line paragraph, it does feel like these are all building blocks that will make it easier to reason in later chapters. Either way, I have a better idea in mind and I hope you do too!

0
← Go Back