HTTP caching semantics
The most common use of the Fastly edge cache is to store HTTP resources, such as webpages, JavaScript, CSS, images, and video. The HTTP Caching specification describes how to store a response associated with a request and reuse the stored response for subsequent requests.
Fastly's readthrough cache interface interprets and processes the instructions encoded into HTTP responses. This page describes the amount of time that HTTP resources are cached, and how you can effectively control the caching behavior.
Response processing
The most common (and best practice) means of controlling cache lifetime is by setting an appropriate Cache-Control
header on a backend response. When a response is received from a backend, the readthrough cache interface parses relevant response headers to determine whether it can be cached, and for how long.
- VCL
- Compute
In a VCL service, response processing results can be inspected and overridden in the vcl_fetch
subroutine.
Parsing cache controls
HTTP responses are parsed for the following cache semantics:
Property | Parsing logic | Default |
---|---|---|
Is response cacheable? | If the fetch is a result of an earlier explicit pass on the request, then no; otherwise if the fetch is a result of a hit-for-pass, then no; otherwise if HTTP status is 200 , 203 , 300 , 301 , 302 , 404 , or 410 , then yes;otherwise no | N/A |
Cache TTL | Response headers in order of preference:Surrogate-Control: max-age={n} , otherwiseCache-Control: s-maxage={n} , otherwiseCache-Control: max-age={n} , otherwiseExpires: {date} | 2 min |
Stale-while-revalidate TTL | Response headers in order of preference:Surrogate-Control: stale-while-revalidate={n} , otherwiseCache-Control: stale-while-revalidate={n} | 0 |
Stale-if-error TTL | Response headers in order of preference:Surrogate-Control: stale-if-error={n} , otherwiseCache-Control: stale-if-error={n} | 0 |
For example, an HTTP 200
(OK) response with no cache-freshness indicators in the response headers is cacheable and will have a TTL of 2 minutes. A 500 Internal Server Error
response with Cache-Control: max-age=300
is not cacheable, because of its HTTP status code, and therefore the 5 minute TTL (300 seconds) indicated in the Cache-Control
header is irrelevant.
- VCL
- Compute
In a VCL service, the TTLs resulting from parsing the response headers are available as VCL variables in vcl_fetch
:
Age
A backend can set the Age
HTTP response header to indicate that an object has already spent some time in a cache upstream before being served to Fastly. If the response includes an Age
header with a positive value, that value will be subtracted from the response's max-age
, if it has one. If the resulting TTL is negative, it is considered to be zero. If the TTL of a response is derived from an Expires
header, any Age
header also present on the response will not affect the TTL calculation.
Age
does not affect the initial values of stale-while-revalidate
or stale-if-error
TTLs. If a response includes a Cache-Control: max-age=60, stale-while-revalidate=300
and also Age: 90
, then the object's TTL will be set to 0 (because Age
is higher than 60) but the separate stale-while-revalidate
TTL will still be 300 seconds.
- VCL
- Compute
In a VCL service, it's possible to change or remove the Age
header on the response in the vcl_fetch
subroutine.
However, this will not affect the TTL that the object will receive in the cache, as the TTL will have already been calculated by that point. If you need to modify the TTL, assign a value to beresp.ttl
. See overriding semantics below for details.
Fastly's readthrough cache interface also sets the Age
header each time it returns a response. Each response receives a new value for the Age
header, equal to the amount of time that the object has spent in the Fastly cache, plus (if set) the value of the Age
header on the cached object. This mechanism is used to ensure that objects cached in multiple tiers of the Fastly platform as a result of shielding will not accrue more cache freshness than was originally intended.
- VCL
- Compute
In VCL services, the Age
header is set in this way just before the response is delivered to the client.
Surrogate control
The Surrogate-Control: max-age
and Cache-Control: s-maxage
header directives express a desired TTL for server-based caches (such as Fastly's readthrough cache). Therefore, these will be given preference over Cache-Control: max-age
when calculating the initial value of the response object's TTL.
Additionally, Fastly will remove any Surrogate-Control
header before a response is sent to an end user. Fastly does not, however, remove the s-maxage
directive from any Cache-Control
header.
IMPORTANT: If your service uses shielding, then the 'end user' making the request to the Fastly edge may be another Fastly POP. In this situation Fastly does not strip the Surrogate-Control
header, so that both POPs will parse and respect the Surrogate-Control
instructions.
Overriding semantics
- VCL
- Compute
In VCL services, once the response has been parsed, the vcl_fetch
subroutine is executed (unless the request is a revalidation). The headers received with the response are populated into beresp.http.{NAME}
VCL variables and the freshness information is populated into the following variables:
Within the vcl_fetch
subroutine, you can affect the caching behavior in a number of ways:
Modifying Fastly cache TTL
To change the amount of time for which Fastly will cache an object, override the value ofberesp.ttl
,beresp.stale_while_revalidate
, andberesp.stale_if_error
:set beresp.ttl = 300s;HINT: This will override entirely the TTL that Fastly has determined by parsing the response's freshness semantics. If your service uses shielding, you may want to subtract
Age
manually. See theberesp.ttl
docs for more information.Modifying downstream (browser) cache TTL
To change the way that downstream caches (including browsers) treat the resource, override the value of the caching headers attached to the object. Take care if you use shielding since you may also be changing the caching policy of a downstream Fastly cache:if (req.backend.is_origin) {set beresp.http.Cache-Control = "max-age=86400"; # Rules for browsersset beresp.http.Surrogate-Control = "max-age=31536000"; # Rules for downstream Fastly cachesunset beresp.http.Expires;}
The standard VCL boilerplate (which is also included in any Fastly VCL service that does not use custom VCL) applies some logic that affects freshness:
- If the response has a
Cache-Control: private
header, execute areturn(pass)
. - If the response has a
Set-Cookie
header, execute areturn(pass)
. - If the response does not have any of
Cache-Control: max-age
,Cache-Control: s-maxage
orSurrogate-Control: max-age
headers, setberesp.ttl
to the fallback TTL configured for your Fastly service.
WARNING: If you are using custom VCL, the fallback TTL configured via the web interface or API will not be applied, and the fallback TTL will be as hard-coded into your VCL boilerplate (you're free to remove any of the default interventions, including the fallback TTL logic, if you wish)
Cache outcome
- VCL
- Compute
After parsing the response for freshness information and executing the vcl_fetch
subroutine, the readthrough cache decides whether to save the object based on the following criteria, in this order of priority:
Outcome | Trigger | Result | |
---|---|---|---|
1 | Deliver stale | return(deliver_stale) is executed in vcl_fetch (see more about stale content for details). | An existing, stale object is served from the cache. The downloaded response is discarded, regardless of its cacheability or proposed TTL. No changes are made to the cache. |
2 | Deliver uncached | The content is deemed uncacheable or has a total TTL1 of zero. Fastly's cache deems a response uncacheable based on its HTTP status and other factors, following the HTTP Caching RFC. The default behavior of the readthrough cache also excludes responses that include a set-cookie header.This behavior can be overridden using beresp.cacheable . | The new response is served to the end user, and no record is made in the cache. Requests queued up due to request collapsing are dequeued and forwarded individually to the backend. |
3 | Cache and pass | return(pass) is executed in vcl_fetch . | The new response is served to the end user, and an empty hit-for-pass object is saved into the cache. This object exists to allow subsequent requests to proceed directly to a backend fetch without being queued by request collapsing. The hit-for-pass object is stored for the duration specified by its TTL, but subject to a minimum of 120 and a maximum of 3690 seconds. |
4 | Cache and deliver | All other cases (return(deliver) either explicitly or implicitly). | The new response is served to the end user, used to satisfy queued requests, and stored in cache for up to the duration specified by its TTL. |
IMPORTANT: Objects may not be stored for the full TTL requested, as they may get evicted earlier in favor of more popular objects, especially if they are large. Objects are not automatically evicted when they reach their TTL, they simply become stale.
If you are experiencing a slow request rate or timeouts on uncacheable resources, it may be because they are forming queues that can be solved by creating a hit-for-pass. For more details, see request collapsing.
Stale objects and revalidation
An object that has reached its TTL becomes stale. If an object is requested while it is stale, it may trigger a revalidation request to the backend. Learn more about staleness and revalidation.
Preventing content from being cached
Since Fastly respects HTTP caching semantics in the readthrough cache, the best way to avoid caching content is to set the appropriate Cache-Control
header on responses at the backend.
Preventing caching at the edge and in browsers
Responding with the following header will ensure that the object will not be cached by Fastly, and that it will not be cached by any other downstream cache, such as a browser:
Cache-Control: private, no-store
Cache at the edge, not in browsers
You may want the content to be cached by Fastly but not by browsers. You can do this purely in the initial HTTP response header from the backend:
Cache-Control: s-maxage=3600, max-age=0
- VCL
- Compute
In a VCL service, you can apply an override in vcl_fetch
:
set beresp.http.Cache-Control = "private, no-store"; # Don't cache in the browserset beresp.ttl = 3600s; # Cache in Fastlyset beresp.ttl -= std.atoi(beresp.http.Age);return(deliver);
Cache in browsers, not at the edge
Fastly will not cache private
content, making it a good way to apply this kind of differentiated caching policy via a single header attached to the response from your origin server:
Cache-Control: private, max-age=3600
- VCL
- Compute
In a VCL service, you can also apply the same logic in vcl_fetch
:
set beresp.http.Cache-Control = "max-age=3600"; # Cache in the browserreturn(pass); # Don't cache in Fastly
Overriding cache behavior on requests
Sometimes you may know what cache behavior you'd like for the response before forwarding a request to the backend.
For details, see the following sections.
IMPORTANT: As noted in cache outcome above, where requests are flagged to bypass the readthrough cache or have an override TTL of 0, the response will never be cached.
Best practices
Here are some general best practices to apply when caching resources with Fastly's readthrough cache:
Set long TTLs at the edge
It's easy to purge a Fastly service, whether for a single URL, a group of tagged resources, or an entire service cache, and it takes only a few seconds at most. To increase your cache hit ratio and the responsiveness of your site for end users, consider setting a long cache lifetime when saving things into the Fastly cache. When content changes, send a purge request to clear the old content.
Serve stale
Serving a slightly stale response may be preferable to paying the cost of a trip to a backend, and it's almost certainly better than serving an error page to the user.
- VCL
- Compute
Consider using the stale-while-revalidate
and stale-if-error
caching directives in your Cache-Control
headers, or consider setting the beresp.stale_while_revalidate
and beresp.stale_if_error
variables in VCL services.
Learn more about staleness and revalidation.
Reduce origin first byte timeout
When making a request to a backend server, Fastly waits for a configurable interval before deciding that the backend request has failed. This is the first byte timeout and by default is fairly conservative. If you expect your backend server to be more responsive, you can choose to 'fail faster' by decreasing this value, in conjunction with serving stale objects from the cache.
Don't allow the fallback TTL to apply
- VCL
- Compute
Fallback TTLs are a primitive solution, and very unlikely to be an ideal TTL for any specific resource. Try to configure an appropriate Cache-Control
header on all responses you send from your backend servers, or if that isn't possible, include logic in your VCL to address those responses more explicitly.
- "Total TTL" is
beresp.ttl
+beresp.stale_while_revalidate
+beresp.stale_if_error
↩ - "Total TTL" is
resp.get_ttl()
+resp.get_stale_while_revalidate()
↩