HTTP caching for engineers who forgot the sharp edges

HTTP caching for engineers who forgot the sharp edges

HTTP caching looks like a small performance feature until it decides to preserve the wrong thing, ignore the right thing, or make your carefully fixed bug appear haunted for another ten minutes. Then it stops looking like an optimization and starts looking like distributed state with opinions.

The trick is to stop thinking of the cache as a dumb folder full of files. A useful cache is a contract between the origin server, browsers, shared caches, CDNs, and whoever is staring at production traffic wondering why the origin is suddenly so quiet. The contract says what may be stored, how long it may be reused, what has to be checked again, and which request details make one response different from another.

Caches are optimistic by design

HTTP caching is not built around a server explicitly saying "please cache this" on every response. It is more subtle than that. The protocol describes when a cache is allowed to store and reuse a response, and then gives the server a vocabulary for narrowing or expanding that freedom.

That is why a missing Cache-Control header is not automatically the same thing as "never cache this". A cache can still make decisions based on status code, method, validators, Expires, Last-Modified, and heuristic freshness. Modern applications should not rely on those defaults, because they are easy to misunderstand and vary between private browser caches and shared infrastructure. But the optimism matters. If you do not say what you mean, some cache in the path may still make a reasonable decision that is very different from the decision you expected.

A browser cache and a CDN also have different responsibilities. The browser is private to one user agent. A shared cache sits between many users and has to be much more careful with personalized responses. A header that is harmless in a browser can become a data leak when interpreted by a shared cache.

Freshness is the first question

The first reuse question is freshness. If the cached response is still fresh, the cache can serve it without contacting the origin. If it is stale, the cache needs a reason to reuse it anyway or a way to revalidate it.

Cache-Control: max-age=60 means the response can be considered fresh for 60 seconds after it is generated. In shared caches, s-maxage can override that lifetime so a CDN can keep a response longer or shorter than the browser does. Expires is the older absolute-date version of the same idea, but Cache-Control is easier to reason about because relative seconds do not care about clock drift in the same way.

The Age header is the cache telling you how long it believes the response has already been sitting around. When debugging a CDN, Age is often the difference between "the origin is wrong" and "the origin fixed this eight minutes ago but the edge still has a valid response".

For fingerprinted assets, the rule can be beautifully aggressive because the URL changes when the content changes.

HTTP/1.1 200 OK
Cache-Control: public, max-age=31536000, immutable
ETag: "app-css-341f0"
Vary: Accept-Encoding

That header says the asset is public, fresh for a year, and not expected to change at the same URL. The immutable directive is a promise. Use it for filenames with a content hash. Do not use it for /app.css because someone will eventually hotfix that file and wonder why browsers refuse to care.

Stale does not always mean useless

Stale is not the same as invalid. It only means the cache can no longer reuse the response silently under the ordinary freshness rules.

Validators are what make stale responses cheap. An ETag lets the client ask whether its exact representation is still current. Last-Modified lets the client ask a weaker time-based version of the same question. When both exist, ETag is usually the better validator because it can represent content identity instead of timestamp precision.

The happy path is a small conditional request and a 304 Not Modified response.

GET /reports/42 HTTP/1.1
If-None-Match: "report-42-v7"
HTTP/1.1 304 Not Modified
ETag: "report-42-v7"
Cache-Control: private, no-cache

That 304 response does not send the report again. It freshens the cached copy and updates the relevant metadata. This is why no-cache is such a badly named directive. It does not mean "do not store". It means the response may be stored, but it must be revalidated before reuse.

If you really mean "do not store this response", the directive is no-store. Even then, treat it as a protocol instruction, not a magic privacy eraser. Intermediaries, browser features, extensions, screenshots, logs, and memory all exist. Use no-store for genuinely sensitive or one-time responses, but do not pretend it replaces careful data handling.

Stale responses can also be intentionally useful. stale-while-revalidate lets a cache serve an old response briefly while it refreshes in the background. stale-if-error lets it serve an old response when the origin is having a bad day. Those directives are not an excuse to cache nonsense forever. They are a graceful degradation policy.

Vary is part of the cache key

The URL is not always enough to identify a response. A compressed response is different from an uncompressed one. A Polish page is different from an English page. A JSON representation can be different from an HTML representation. The Vary header tells caches which request headers are part of that identity.

Vary: Accept-Encoding is normal because the same URL may be served with gzip, br, or no compression. Vary: Accept-Language can be valid for language-negotiated pages, although it can create a lot of variants. Vary: User-Agent is usually a warning sign because it explodes the cache key across a messy, high-cardinality header. Vary: Cookie is almost always a shared-cache killer, because every different cookie value becomes a different version of the page.

The safest way to think about Vary is this: every field you add buys correctness at the cost of reuse. If the field truly changes the representation, add it. If it merely correlates with something you were too tired to model directly, it will punish your hit rate.

Query strings have a similar effect because they are part of the URL. /articles/intro?page=1 and /articles/intro?utm_source=newsletter are different cache keys unless some layer is configured to normalize or ignore selected parameters. That can be good when the parameter changes content and wasteful when it only carries tracking data.

Private content needs boring headers

Most application bugs around caching come from personalized content being treated like public content, or public content being treated so cautiously that the cache never gets a chance to help.

For user-specific HTML, boring is good.

HTTP/1.1 200 OK
Cache-Control: private, no-cache
ETag: "user-account-42-v17"

This says the browser may store the response, but it has to revalidate before reuse. Shared caches should stay away because the response is private. That is a reasonable default for account dashboards, settings screens, admin pages, and anything that depends on the signed-in user.

For public pages that are expensive to generate but acceptable to serve briefly stale from a CDN, the contract should say that explicitly.

HTTP/1.1 200 OK
Cache-Control: public, max-age=60, s-maxage=600, stale-while-revalidate=30
ETag: "article-index-v20260520"
Vary: Accept-Encoding

Here the browser gets a short freshness window, the shared cache gets a longer one, and the CDN can keep the site fast during background refresh. That is a very different contract from a logged-in dashboard, and it should look different in headers.

Authenticated requests add another layer of caution. A shared cache should not reuse a response to a request with Authorization unless the response explicitly allows it through directives such as public or s-maxage. In product code, I prefer to make personalized responses plainly private anyway. It is less clever, and less clever is exactly what you want near account data.

Reload does not mean what you think

Browser reload behavior can make caching feel inconsistent because a reload is also an HTTP request with cache directives. A normal navigation, a reload button, a hard reload, and DevTools with "Disable cache" are not the same experiment.

When debugging, look at both sides of the exchange. The response tells you what the server allowed. The next request tells you what the browser asked for. A request with Cache-Control: no-cache is the client forcing revalidation. A response with Cache-Control: no-cache is the server allowing storage but requiring revalidation. Same words, different direction, different effect.

This is also why "I refreshed and it changed" is not proof that ordinary users are seeing the new version. You may have forced validation. They may still have a fresh response. The only reliable path is to inspect the response headers, request headers, Age, status code, and cache status from the layer you are actually testing.

A practical mental model

When setting cache headers, ask four questions in order.

First, may any cache store this response at all? If the answer is no, use no-store and keep the data genuinely sensitive. If the answer is yes but only for the current browser, use private. If shared infrastructure may store it, use public or a shared-cache lifetime that makes the intent obvious.

Second, how long may the response be reused without talking to the origin? Use max-age for general freshness and s-maxage when shared caches deserve their own rule. Avoid relying on Expires alone unless you are supporting legacy behavior deliberately.

Third, how should stale responses recover? Add ETag or Last-Modified so the cache can revalidate cheaply. Consider stale-while-revalidate and stale-if-error for public content where slightly old is much better than slow or unavailable.

Fourth, what makes one representation different from another? Keep Vary honest and small. The fastest cache is the one with a key that matches real product behavior instead of accidental request noise.

The headers are the design

Caching problems rarely come from one magical missing directive. They come from a mismatched model. The product thinks a page is public, the controller treats it as personalized, the CDN sees no explicit rule, the browser has a validator, and the engineer debugging it only looks at the final status code.

The fix is to make the model visible in the headers. Static fingerprinted assets should look fearless. Public pages should say how stale they may be. Private pages should be boring and explicit. Sensitive flows should avoid storage. Variant-producing pages should declare the real variant inputs and no more.

HTTP caching is not old trivia. It is still one of the highest-leverage performance tools on the web, and it rewards engineers who can be precise about time, identity, and ownership.

Happy caching!