HTTP caching semantics

IMPORTANT: The content on this page uses the following versions of Compute SDKs: JavaScript SDK: 3.30.1 (current is 3.34.0, see changes), Rust SDK: 0.11.2 (current is 0.11.5, see changes), Go SDK: 1.4.2 (current)

One of the most common uses of the Fastly edge cache is to store HTTP resources, such as webpages, JavaScript, CSS, images, and video. The HTTP Caching specification describes how to store a response associated with a request and reuse the stored response for subsequent requests.

Fastly's readthrough cache interface interprets and processes the instructions encoded into HTTP responses. For example, the most common (and best practice) means of controlling cache lifetime is by setting an appropriate Cache-Control header on a backend response.

This page describes the amount of time that HTTP resources are cached, and how you can effectively control the caching behavior.

WARNING: The before-send and after-send callbacks discussed on this page are part of customized readthrough (HTTP) cache behavior. For the Compute JavaScript and Go SDKs, this is an opt-in feature. See this note for details.

Response processing

When a response is received from a backend, the readthrough cache interface parses relevant response headers to determine whether it can be cached, and for how long.

VCL
Compute

In a VCL service, response processing results can be inspected and overridden during the vcl_fetch subroutine, which is executed once the response has been parsed (unless the request is a revalidation).

Parsing cache controls

HTTP responses are parsed for the following cache semantics:

Property	Parsing logic	Default
Is response cacheable?	If the fetch is a result of an earlier explicit pass on the request, then no; otherwise if the fetch is a result of a hit-for-pass, then no; otherwise if HTTP status is `200`, `203`, `300`, `301`, `302`, `404`, or `410`, then yes; otherwise no	N/A
Cache TTL	Response headers in order of preference: `Surrogate-Control: max-age={n}`, otherwise `Cache-Control: s-maxage={n}`, otherwise `Cache-Control: max-age={n}`, otherwise `Expires: {date}`	2 min
Stale-while-revalidate TTL	Response headers in order of preference: `Surrogate-Control: stale-while-revalidate={n}`, otherwise `Cache-Control: stale-while-revalidate={n}`	0
Stale-if-error TTL	Response headers in order of preference: `Surrogate-Control: stale-if-error={n}`, otherwise `Cache-Control: stale-if-error={n}`	0

For example, an HTTP 200 (OK) response with no cache-freshness indicators in the response headers is cacheable and will have a TTL of 2 minutes. A 500 Internal Server Error response with Cache-Control: max-age=300 is not cacheable, because of its HTTP status code, and therefore the 5 minute TTL (300 seconds) indicated in the Cache-Control header is irrelevant.

VCL
Compute

In a VCL service, once the response has been parsed, the status code, headers received with the response, and cache controls resulting from parsing the response headers are available as VCL variables during vcl_fetch:

Age

A backend can set the Age HTTP response header to indicate that an object has already spent some time in a cache upstream before being served to Fastly. If the response includes an Age header with a positive value, that value will be subtracted from the response's max-age, if it has one. If the resulting TTL is negative, it is considered to be zero. If the TTL of a response is derived from an Expires header, any Age header also present on the response will not affect the TTL calculation.

Age does not affect the initial values of stale-while-revalidate or stale-if-error TTLs. If a response includes a Cache-Control: max-age=60, stale-while-revalidate=300 and also Age: 90, then the object's TTL will be set to 0 (because Age is higher than 60) but the separate stale-while-revalidate TTL will still be 300 seconds.

VCL
Compute

In a VCL service, it's possible to change or remove the Age header on the response during the vcl_fetch subroutine. However, this will not affect the TTL that the object will receive in the cache, as the TTL will have already been calculated by that point.

If you need to modify the TTL, see overriding semantics below.

Fastly's readthrough cache interface also sets the Age header each time it returns a response. Each response receives a new value for the Age header, equal to the amount of time that the object has spent in the Fastly cache, plus (if set) the value of the Age header on the cached object. This mechanism is used to ensure that objects cached in multiple tiers of the Fastly platform as a result of shielding will not accrue more cache freshness than was originally intended.

VCL
Compute

In VCL services, the Age header is set in this way just before the response is delivered to the client.

Surrogate control

The Surrogate-Control: max-age and Cache-Control: s-maxage header directives express a desired TTL for server-based caches (such as Fastly's readthrough cache). Therefore, these will be given preference over Cache-Control: max-age when calculating the initial value of the response object's TTL.

Additionally, Fastly will remove any Surrogate-Control header before a response is sent to an end user. Fastly does not, however, remove the s-maxage directive from any Cache-Control header.

IMPORTANT: If your service uses shielding, then the 'end user' making the request to the Fastly edge may be another Fastly POP. In this situation Fastly does not strip the Surrogate-Control header, so that both POPs will parse and respect the Surrogate-Control instructions.

Overriding semantics

VCL
Compute

During the vcl_fetch subroutine, you can affect the caching behavior in a number of ways:

Modifying Fastly cache TTL
To change the amount of time the readthrough cache interface will cache an object, override the value of beresp.ttl, beresp.stale_while_revalidate, and beresp.stale_if_error:
```
set beresp.ttl = 300s;
```
HINT: This will override entirely the TTL that Fastly has determined by parsing the response's freshness semantics. If your service uses shielding, you may want to subtract Age manually. See the beresp.ttl docs for more information.
Modifying downstream (browser) cache TTL
To change the way that downstream caches (including browsers) treat the resource, override the value of the caching headers attached to the object. Take care if you use shielding since you may also be changing the caching policy of a downstream Fastly cache:
```
if (req.backend.is_origin) {
  set beresp.http.Cache-Control = "max-age=86400"; # Rules for browsers
  set beresp.http.Surrogate-Control = "max-age=31536000"; # Rules for downstream Fastly caches
  unset beresp.http.Expires;
}
```

The standard VCL boilerplate (which is also included in any Fastly VCL service that does not use custom VCL) applies some logic that affects freshness:

If the response has a Cache-Control: private header, execute a return(pass).
If the response has a Set-Cookie header, execute a return(pass).
If the response does not have any of Cache-Control: max-age, Cache-Control: s-maxage or Surrogate-Control: max-age headers, set beresp.ttl to the fallback TTL configured for your Fastly service.

WARNING: If you are using custom VCL, the fallback TTL configured via the web interface or API will not be applied, and the fallback TTL will be as hard-coded into your VCL boilerplate (you're free to remove any of the default interventions, including the fallback TTL logic, if you wish)

Cache outcome

VCL
Compute

After parsing the response for freshness information and executing the vcl_fetch subroutine, the readthrough cache decides whether to save the object based on the following criteria, in this order of priority:

	Outcome	Trigger	Result
1	Deliver stale	`return(deliver_stale)` is executed in `vcl_fetch` (see more about stale content for details).	An existing, stale object is served from the cache. The downloaded response is discarded, regardless of its cacheability or proposed TTL. No changes are made to the cache.
2	Deliver uncached	The content is deemed uncacheable or has a total TTL¹ of zero. Fastly's cache deems a response uncacheable based on its HTTP status and other factors, following the HTTP Caching RFC. The default behavior of the readthrough cache also excludes responses that include a `set-cookie` header. This behavior can be overridden using `beresp.cacheable`.	The new response is served to the end user, and no record is made in the cache. Requests queued up due to request collapsing are dequeued and forwarded individually to the backend.
3	Cache and pass	`return(pass)` is executed in `vcl_fetch`.	The new response is served to the end user, and an empty hit-for-pass object is saved into the cache. This object exists to allow subsequent requests to proceed directly to a backend fetch without being queued by request collapsing. The hit-for-pass object is stored for the duration specified by its TTL, but subject to a minimum of 120 and a maximum of 3690 seconds.
4	Cache and deliver	All other cases (`return(deliver)` either explicitly or implicitly).	The new response is served to the end user, used to satisfy queued requests, and stored in cache for up to the duration specified by its TTL.

IMPORTANT: Objects may not be stored for the full TTL requested, as they may get evicted earlier in favor of more popular objects, especially if they are large. Objects are not automatically evicted when they reach their TTL, they simply become stale.

If you are experiencing a slow request rate or timeouts on uncacheable resources, it may be because they are forming queues that can be solved by creating a hit-for-pass. For more details, see request collapsing.

Stale objects and revalidation

An object that has reached its TTL becomes stale. If an object is requested while it is stale, it may trigger a revalidation request to the backend. Learn more about staleness and revalidation.

Preventing content from being cached

Since Fastly respects HTTP caching semantics in the readthrough cache, the best way to avoid caching content is to set the appropriate Cache-Control header on responses at the backend.

Preventing caching at the edge and in browsers

Responding with the following header will ensure that the object will not be cached by Fastly (the private directive), and that it will not be cached by any other downstream cache, such as a browser (both private and no-store directives):

Cache-Control: private, no-store

Cache at the edge, not in browsers

You may want the content to be cached by Fastly but not by browsers. You can do this purely in the initial HTTP response header from the backend:

Cache-Control: s-maxage=3600, max-age=0

VCL
Compute

In a VCL service, you can apply an override in vcl_fetch:

set beresp.http.Cache-Control = "private, no-store"; # Don't cache in the browser
set beresp.ttl = 3600s; # Cache in Fastly
set beresp.ttl -= std.atoi(beresp.http.Age);
return(deliver);

Cache in browsers, not at the edge

Fastly will not cache private content, making it a good way to apply this kind of differentiated caching policy via a single header attached to the response from your origin server:

Cache-Control: private, max-age=3600

VCL
Compute

In a VCL service, you can also apply the same logic in vcl_fetch:

set beresp.http.Cache-Control = "max-age=3600"; # Cache in the browser
return(pass); # Don't cache in Fastly

Overriding cache behavior on requests

Sometimes you may know what cache behavior you'd like for the response before forwarding a request to the backend.

For details, see the following sections.

IMPORTANT: As noted in cache outcome above, where requests are flagged to bypass the readthrough cache or have an override TTL of 0, the response will never be cached.

Best practices

Here are some general best practices to apply when caching resources with Fastly's readthrough cache:

Set long TTLs at the edge

It's easy to purge a Fastly service, whether for a single URL, a group of tagged resources, or an entire service cache, and it takes only a few seconds at most. To increase your cache hit ratio and the responsiveness of your site for end users, consider setting a long cache lifetime when saving things into the Fastly cache. When content changes, send a purge request to clear the old content.

Serve stale

Serving a slightly stale response may be preferable to paying the cost of a trip to a backend, and it's almost certainly better than serving an error page to the user.

VCL
Compute

Consider using the stale-while-revalidate and stale-if-error caching directives in your Cache-Control headers, or consider setting the beresp.stale_while_revalidate and beresp.stale_if_error variables in VCL services.

Learn more about staleness and revalidation.

Reduce origin first byte timeout

When making a request to a backend server, Fastly waits for a configurable interval before deciding that the backend request has failed. This is the first byte timeout and by default is fairly conservative. If you expect your backend server to be more responsive, you can choose to 'fail faster' by decreasing this value, in conjunction with serving stale objects from the cache.

Don't allow the fallback TTL to apply

VCL
Compute

Fallback TTLs are a primitive solution, and very unlikely to be an ideal TTL for any specific resource. Try to configure an appropriate Cache-Control header on all responses you send from your backend servers, or if that isn't possible, include logic in your VCL to address those responses more explicitly.

"Total TTL" is beresp.ttl + beresp.stale_while_revalidate + beresp.stale_if_error↩
"Total TTL" is resp.get_ttl() + resp.get_stale_while_revalidate()↩

Network services

Security

Compute

Quick start

Building blocks

Integrations

Tutorials

Demos

Use Cases

Code Examples

Starter Kits