Unlocking System Efficiency: The Art of Caching 🚀

3 min readDec 15, 2023

Caching is a pivotal technique in modern computing, enhancing system performance and reducing response time across various applications. From front-end to back-end, caching plays a crucial role in improving overall system efficiency.

1. Layers of Caching in a Typical System:

In the computer itself, we find three layers of cache:

L1 Cache: Extremely fast and embedded in the processor chip as CPU cache.
L2 Cache: More capacious than L1, connected to the cache and CPU via an alternative system bus.
L3 Cache: Specialized memory shared between multiple CPU cores, enhancing the performance of L1 and L2.

1.1 Cache Memory Mapping Configurations:

Caching configurations evolve with three primary mappings:

Direct Mapped Cache: each block mapped to exactly one cache memory location. Conceptually, a direct mapped cache is like rows in a table with three columns: the cache block that contains the actual data fetched and stored, a tag with all or part of the address of the data that was fetched, and a flag bit that shows the presence in the row entry of a valid bit of data.
Fully Associative Cache: mapping is similar to direct mapping in structure but allows a memory block to be mapped to any cache location rather than to a prespecified cache memory location as is the case with direct mapping.
Set Associative Cache: mapping can be viewed as a compromise between direct mapping and fully associative mapping in which each block is mapped to a subset of cache locations. It is sometimes called N-way set associative mapping, which provides for a location in main memory to be cached to any of “N” locations in the L1 cache.

1.2 Data Writing Policies:

Two main techniques involve cache memory:

Write-Through: Data is written to both the cache and main memory simultaneously.
Write-Back: Data is initially written to the cache, with potential later writing to main memory.

2. Hardware and OS Level Cache:

TLB (Translational Look Aside Buffer): Stores recently used virtual memory addresses to physical memory addresses, reducing access time.
OS Level Cache: Includes Inode Cache and Page Cache, saving time by not directly fetching data from the disk.

Application System Architecture Level Caching:

Web Browser: WEB BROWSER can cache http responces for faster retrival of data.
so when we request data over http for the first time , it recives response with the expiration policy in the http header,hence it same request is hit then browser returns the cached data if available in cache.
CDN are widely used to improve the delivery of static content like images ,videos and other web based assets. the power of CDN lies in their power of caching it uses, if a client hits the CDN edhe SERVER if it has the data cached then CDN will respond and if not then will hit the origin server for the data on behalf of client and fulfill the request of the client and will cache the data on its edge server for another /same user looking for the same content the CDN will deliver the content directly from its cache eliminating the interaction with the origin server again.
Load Balancers: Contribute to improved content delivery and reduced server load.
Messaging Infrastructure (e.g., Kafka): Utilizes caching for storing and retrieving massive data.
Database Levels: Leverage caching for optimizing performance and reducing response time.

In conclusion, caching is a linchpin for system optimization and efficiency, reducing response time across diverse applications and systems.