How CPU cache works

Steve Mu
4 min readFeb 11, 2018

In this article we’re going to discuss some lower level concept in Computer Science. Why talking about those stuff? After a few years working with higher-level programming languages, I believe understanding the the underlying structure of the computer can help me increase my problem solving abilities.

CPU caches are very fast memory reside inside the CPU chip. It is faster for CPU to get information from the cache than from the main memory. Thus, if CPU get data from the cache, it needs to wait less time than it does when getting information from the main memory.

As cache is proportionally small compare to the main memory, the cache can only store a small amount of information compare to the main memory. There are several ways of organizing the cache. They are fully-associative, directed mapped and self-associative. I will just talk about directed map way or organizing cache in this article.

How direct mapping works? It means one location in the memory will always map to one location in the cache. Since cache is small, multiple locations in the main memory will map to exactly one location in the cache.

How does the mapping works?

Each memory address can be divided to 3 parts: tag, block number and offset.

So a 32bit memory address can be divided into something like this:

0000 0000 0000 0000 | 0000 0000 0000 | 0000
tag block offset

Why is so? Because this is related to how the cache is oranized.

Cache is like a big table. Each row has three fields: valid, tag and block. We can see both tag and block are present in cache and in the memory address.

tag and block varies depend on cache size and block size. What is block size? You may ask. Good question. When the cpu first want to retrieve some information, it first goes to cache to get it. If the information is not in the cache, it goes to main memory get the content and put them in the cache and then the cpu can use this information. What block comes into play? The trick is when the computer copys inofrmation from main memory to the cache, it does not just copy one byte, it copy a BLOCK of byte. Why? Because the information close to the byte we were requesting for may be requested in the near future.

How does all this work?

Let’s use some numbers. Suppose we make the block to be 16 bytes, and we have a cache of the size 4096 bytes.

First question, how many slots in the cache? 4096 ones. As the memory is byte-addressable.

The computer knows the slot number, so in the cache table, we don’t need to specify a slot number.

The first slot is slot number.1, and 2 and so on.

Suppose the cpu need to access this memory address: 0x00071a38. First, it extract the tag, slot and block from it: 0007 is the tag, 1a3 is the slot number, and block offset is 8. In this case, the first hex on the right of the number is always the block offset, because 1 hex = 4 binary digts (2⁴ = 16 bytes). The size of block determines how many binary digits we use on the right to extract the block offset.

Then it extract the slot number, which is 1a3. Why 1a3? 3 hex digits = 12 binary digits, then 2¹² = 4096, which is the number of slots in the cache. See the relations here?

Finally, the tag. Tag is the rest of binary digits. Remember hex just representation of binary digits. The computer use tag to identify if the cache contains right information for the requested address. 0007 is the tag in this case. Memory address with different tags, but with the same slot number, use the same slot to store that block.

So far we haven’t talked about what a block is. Let’s illustrate with an example, when you request an memory address 00071a38, the “block” of content for this address is in, is from x00071a30 to x00071a3f. The block, which contains the content in this area, is copied to the cache.

The computer use slot number to go to the slot in the cache, and use tag to determine if the right information is there, if it does, then it use the offset to get that byte out that block.

Hope this helps, let me know if you any questions.

--

--