Computational Fairy Tales: caching

Monday, March 28, 2011

Computer Memory and Making Dinner

Memory in a computer is a hierarchy. At the very top you have the ultra-expensive and ultra-fast memory called registers. And at the bottom you have mind-numbingly slow but large storage such as a hard disk. Why does this matter? From a programmer's point of view, understanding the memory hierarchy can allow the programmer to optimize the program to work efficiently within this hierarchy. The correct bits of information are kept at the lowest (and fastest) levels. And from a non-programmers point of view, these concepts provide insight into how a computer works.

It has been rumored* that the entire basis of the modern computer memory hierarchy is based off a single late night conversation between Dr. Simon Babbage XI and Chef Louis Von-Register IX (aka the Iron Cache). The two had been arguing late into the night about the most efficient way to organize ingredients in a kitchen. It was a heated debate between empirical evidence and learned rules of thumb. Unfortunately, the argument became so heated that neither of the men realized that they were both proposing the exact same scheme. Ultimately, the night marked both the end of a years' long friendship and a major step forward in computer architecture.

Thus, a good analogy for understanding how computer memory works is naturally to view it in terms of food storage and dinner preparation:

Registers - Registers hold values that you are using at this moment. For example, these could be the inputs into an addition operation in the CPU. This storage is analogous to your own hands or maybe the current mixing bowl. It is literally the "stuff" that you are working with at that moment.

L1 Cache - The L1 Cache holds data that is cached from main memory, but may not be in use at the moment. It is relatively small, but very fast to access. This cache is analogous to the kitchen counter right next to where you are working. You can put stuff down there that you know you will need in just one minute. Just do not leave open containers of milk there all night.

L2 Cache - The L2 Cache is a larger cache between the L1 cache and main memory. It is like the far end of the counter - still easy to get things, but somehow a little more annoying.

RAM - RAM is your kitchen cabinets: larger, but further. You can store a lot more ingredients there than you will need for just the current meal. But, when you are carefully stirring the soup, while trying to keep it at the perfect simmer, it can be a little annoying to run to the cabinet to get more salt.

Hard Drive / Network / Flashdrive - These represent the slow, but very large storage devices. They are the neighborhood supermarket of the memory system. There is a lot of food there, but if you need to go there to get something for a particular recipe, you will probably miss dinner. Ideally, you want to limit your trips to the supermarket and not dash out at every step of the recipe.

Of course most of these levels are conveniently hidden from even low-level programmers. The obvious analogy here is the assistant chef that almost reads your mind by: helping you get the correct ingredients, putting things on the counter, cleaning up unused items from the counter, and running simple errands. Fortunately, the computer's memory system does not get cranky and talk about you behind you back (I think).

*Actually, no such rumor existed until I made up that story. There is probably a much better (or at least more accurate) story for how the computer memory hierarchy came into existence, but I doubt it involves a thrown souffle.

Wednesday, March 23, 2011

Caching and the Librarian of Alexandria

Caching data means storing a copy of the data in a quickly accessible location to speedup future accesses of that data. The key trade-off is that the faster storage locations for data tend to be smaller and thus hold less data. For example, accessing data from RAM is much faster than reading it off a hard drive, but RAM is both smaller and more expensive per-unit of storage. By caching data that is used often, you can greatly speed up a program.

The Library of Alexandria is still famous to this day for the size and breadth of its collection. However, few people know that the true magic of the library was the librarian’s ability to find relevant resources with amazing speed. Within a few seconds he could often locate the canonical reference to answer a patron’s question.

For young Peter, this magical ability had always been a thing of wonder. He had grown up in a small house four blocks from the library and had spent many afternoons sitting at the tables listening to the librarian answer questions. “What should I feed my new pet yak?” “Where can I find the sweetest prunes in Alexandria?” The scope of knowledge commanded by the librarian was almost legendary. Peter dreamed of one day learning the magic. And, after ten years of training, Peter got his wish: he was accepted as the librarian’s apprentice.

On the first day of his apprenticeship, Peter arrived three hours early. As the librarian took him on a tour of the library, Peter eagerly followed him around, scribbling notes into his new notebook. Although Peter knew the public areas by heart, he soaked in every word. After the tour was complete, the librarian and his new apprentice took their places behind the reference desk and waited for the first patron of the day.

The first question came from a small man holding a rather large rock. “I’m trying to build a wheel. And, well … I’m not sure what shape to go with. Do you have any good references?”

The librarian disappeared into the stacks, only to reappear a few seconds later with a scroll entitled “Wheels and How to Design Them.” The patron was delighted. He took the scroll over to a nearby desk and started to quickly skim through it.

“Oh … I see … a circle … hmm … brilliant,” the patron quietly mumbled to himself.

“How did you do that?” The apprentice asked. “How did you find the scroll so quickly? Please teach me the magic of the library.”

“Magic?” The librarian asked. “I’m not sure about that, but I can show you my system.”

The librarian led Peter into the back room. The room was dimly lit by two small oil lamps. At the back was a single door leading to the vaults where the thousands of scrolls were kept. The librarian walked over to the a small table that held a coffee pot and a small stack of scrolls. He pointed at the scrolls. “I usually find the answers in one of these.”

Peter was confused. “Twenty scrolls? The library has tens of thousands of scrolls. How does having twenty in the back room help? Are they magic?”

“Not at all,” chuckled the librarian. “It turns out that people are pretty boring. They all ask the same types of questions. In fact, I’ve found that 95% of the questions in a single day can be answered with only twenty scrolls.”

“Is that all?” Peter asked, visibly disappointed.

“Yes,” replied the librarian. “What did you hope for?”

“I’m not sure. But I was certainly expecting something magical. Maybe you were summoning the scrolls or could teleport yourself to the correct shelf. But … this seems like cheating.”

“Cheating?” The librarian laughed. “Do you realize how much effort goes into making this efficient enough to appear magical? You can’t pick any twenty scrolls. They have to be the right twenty scrolls.”

At the news of this challenge, the apprentice perked up a bit. “And how do you pick them? Telepathy?”

“When I first started, I kept the most frequently read scrolls in my cache. Every time someone read a scroll, I would note that. I kept a count for each scroll.”

“I see why that would work,” said the apprentice, sounding slightly bored.

“Aha! But it didn’t work!” exclaimed the librarian. “Everything seemed to be going fine, until the same scrolls got stuck in my cache. Remember that massive flood we had two years ago? During the entire flood, I still had three scrolls on how to survive a drought. It was a disaster.”

“Then I had a new idea,” he continued. “I could keep more data and use statistics. I meticulously kept track of which scrolls people were viewing and correlated it to the season, weather, and fads of the day. I created complex predictive models of which scrolls would be most requested on a given day. I could pre-fill my shelf every morning!”

“Did that work?” asked Peter.

“Yes. But was a lot of work. So I stopped doing it.”

Peter stared at the floor. Disappointment flowed through him. He wondered if he should have become a blacksmith instead.

“Eventually, I got lazy and left the most recent scrolls on the shelf. When I needed to go down to the stacks, I’d grab whatever scroll had been used least recently and put it back. It turned out to be a pretty good system.”

“Really? So the magic behind the Library of Alexandria is that you keep the twenty most recently viewed scrolls on a shelf in the back room?”

“Yes. It allows me to answer 95% of people’s questions without going past this room.

The apprentice nodded numbly. Now he was certain that he should have become a blacksmith.

Pages

Monday, March 28, 2011

Computer Memory and Making Dinner

Wednesday, March 23, 2011

Caching and the Librarian of Alexandria