L1 Cache Interactions

Why it can matter to know more about your system cache
Published on 2024/06/12

I'm both happy and frustrated that I'm reading Understanding Software Dynamics. It's been a while since I've been digging into these topics. Chapter 4 starts with an example of interaction with an L1 cache. The example given is a 32KB 8-way L1 data cache. The first obvious step is to visualize it.

         Way 0  Way 1  Way 2   ...   Way 6  Way 7
       +-----------------------------------------+
Set 0  |  0   |  0   |  0   |  0   |  0   |  0   |
       +-----------------------------------------+
Set 1  |  64  |  64  |  64  |  64  |  64  |  64  |
       +-----------------------------------------+
...
       +-----------------------------------------+
Set 63 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 |
       +-----------------------------------------+

Sorry, this is as good as it gets for now! I'll need a few things to access a specific byte in an L1 cache, here is another wonderful visual:

Address: 0x12345678
Binary : 0001 0010 0011 0100 0101 0110 0111 1000

+------------------+--------+--------+
| Tag              |  Set   | Offset |
+------------------+--------+--------+
| 0001001000110100 | 011010 | 011000 |
+------------------+--------+--------+

Given a hex value to represent an address, once it is converted to binary every bit serves a purpose, specifically:

  • <5:0> represents the offset, simply put once I figure out which line I'm interested in, this helps me figure out which byte in that line I'm looking for
  • <11:6> tells me which set of the 64 available I'm targeting (for the lazy 2^6 is 64 so 6 bits is all I need)
  • <63:12> is the tag used to match against the L1 cache tags, without getting into too much detail, this is how I know which "Way" I'm trying to access

Don't fret, there's a reason why this is useful to know. Consecutive memory locations are spread across all 64 sets. This means that 4KB of consecutive data will use all of Way 0 for example. Visually this means:

         Way 0  Way 1  Way 2   ...   Way 6  Way 7
       +-----------------------------------------+
Set 0  |  x  |  0   |  0   |  0   |  0   |  0   |
       +-----------------------------------------+
Set 1  |  x  |  64  |  64  |  64  |  64  |  64  |
       +-----------------------------------------+
...
       +-----------------------------------------+
Set 63 |  x  | 4032 | 4032 | 4032 | 4032 | 4032 |
       +-----------------------------------------+

If you continue with this pattern, you will fill out way 2, ... and so on. This can be an ideal scenario because you are utilizing the cache to its fullest. Unfortunately, that's not always the case. Imagine now an access pattern that is 4KB apart. This means that you need to access set 0 only. Let's say you start at set 0, way 0, you then need to access the next 4KB. Since with this L1 cache a whole way is exactly 4KB, that means that you need to go to the next way. This continues until you have no more ways (specifically when you reach way 7 after eight cache lines).

         Way 0  Way 1  Way 2   ...   Way 6  Way 7
       +-----------------------------------------+
Set 0  |  x   |  x   |  x   |  x   |  x   |  x   |
       +-----------------------------------------+
Set 1  |  64  |  64  |  64  |  64  |  64  |  64  |
       +-----------------------------------------+
...
       +-----------------------------------------+
Set 63 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 |
       +-----------------------------------------+

After the 8th cache line access, you start over from set 0 way 0. Effectively you're only using 8 cache lines (8 x 64 = 512B vs 32KB), what a waste! If you didn't know how your L1 cache was designed, this would remain a mystery and you could have rejoiced in your blissful ignorance. I'm here to rain on your parade.

Thoughts

The book provides a practical example of having to deal with a large matrix. In that example, each value happens to be large enough that when your access pattern goes by column, you are faced with the same wasteful cache interaction that we just went over. You don't need to lose any sleep over this. There are a few techniques that can help you prevent these types of access patterns so you can make better use of your caches. In most applications, this won't likely matter as much but if you need to squeeze as much out of your system, it might be worth exploring and optimizing with this newfound knowledge (but pretty please make sure to measure first!).

0
← Go Back