Caching with CORS
Quick shout-out to JSONP
Before diving into CORS (Cross-origin resource sharing), I need to mention JSONP, which is the other solution to getting data from a different “Origin.” In Using ESI, Part 2: Leveraging VCL and ESI to Use JSONP, Simon explains what JSONP is, and how to cache it with Fastly, using one Fastly specific feature, req.topurl
. Now, with Varnish 4.1, req.top.url
(note the extra period) is available, and it allows you to do the same thing with vanilla Varnish.
The problem
So what is the problem that JSONP and CORS are trying to solve? Getting data (usually JSON) from a 3rd party and using it from Javascript. For security reasons, AJAX (XMLHttpRequest
) requests are not allowed to retrieve data from (or send data to) another origin. So if your website is http://www.example.com/
, Javascript on your site is not allowed to load data from https://api.thirdparty.example/
. JSONP gets around this by wrapping the data in a callback function, and CORS accomplishes it by explicitly granting access through response headers.
Origin in this context is the scheme, hostname and (optional) port part of the URL. So for both http://www.example.com/
and http://www.example.com/js/foo.js
the origin is http://www.example.com
. For https://api.thirdparty.example/v1/data
the origin is https://api.thirdparty.example
.
The advantages of CORS over JSONP
Even though JSONP is cacheable using ESI, it’s a rather complex setup in Varnish, and requires more CPU because of the ESI sub-request. As I will show in this post, CORS only requires some header manipulation.
JSONP only allows
GET
requests, since it makes use of the<script>
tag. CORS allows the origin to specify what methods are allowed, so thatPUT
,POST
,DELETE
, and more become available.
Caching CORS responses
If you don’t care which domains access your API, all you need to do is add the following header to its responses:
Access-Control-Allow-Origin: *
Since there’s no variance in this header, there’s nothing special in caching these responses. You can just set the TTL as you normally would using beresp.ttl
or Cache-Control: max-age
.
The same principle applies if you’re only allowing a single origin, except you would list the origin instead of the *
:
Access-Control-Allow-Origin: http://www.example.com
And caching is just as simple as with allowing all origins; set a TTL and you’re done.
Where things get tricky is if you want to allow multiple origins, say both http://www.example.com
and https://www.example.com
. The W3C Recommendation for CORS specifies that the Access-Control-Allow-Origin
header can take a space-separated list of origins, but immediately warns that in practice, browsers only allow a single origin to be listed. Which means that Access-Control-Allow-Origin
needs to be set depending on the value of the Origin
header in the request.
To still be able to cache these requests, you will have to use the Vary
header. If you are not familiar with how this header works, I refer you to a blog post about Vary that I wrote a while ago, which explains it in depth.
The easiest implementation would be to just add Origin
to the Vary
header. A typical request and response would look something like this:
GET /v1/data HTTP/1.1
Host: api.example.com
Origin: http://www.example.com
HTTP/1.1 200 Ok
Content-Type: application/json
Content-Length: 4365
Access-Control-Allow-Origin: http://www.example.com
Vary: Origin
Cache-Control: max-age=3600
This would cache for an hour, and be served from cache for any requests that have http://www.example.com
as origin. The problem with this approach is that any request from an origin that you do not have a response in your cache for will cause a request to go to your backend. So normalizing the Origin
header is key.
You should normalize to either one of the values that are allowed, or nothing. Basically you’re whitelisting certain values, and deleting everything else. Here’s what the VCL would look like:
sub vcl_recv {
if (req.http.Origin != "https://www.example.com"
&& req.http.Origin != "http://www.example.com"
&& req.http.Origin != "http://www.friends.example") {
unset req.http.Origin;
}
...
}
And if your backend doesn’t send a Vary
header with Origin
in it:
sub vcl_fetch {
if (beresp.http.Vary) {
set beresp.http.Vary = beresp.http.Vary + ",Origin";
} else {
set beresp.http.Vary = "Origin";
}
...
}
However, now the list of allowed origins is in both your VCL and your application. And for each allowed origin, there’s a copy of the response in the cache, which uses up space, and each copy is the result of a backend request.
Luckily, setting headers is something Varnish is really good at. :)
So here’s some VCL that does not Vary on Origin, so you have a single copy of the response in your cache, and then sets the Access-Control-Allow-Origin
header if the Origin in the request is on your whitelist.
sub vcl_deliver {
if (req.http.Origin == "https://www.example.com"
|| req.http.Origin == "http://www.example.com"
|| req.http.Origin == "http://www.friends.example") {
set resp.http.Access-Control-Allow-Origin = req.http.Origin;
}
if (resp.http.Vary) {
set resp.http.Vary = resp.http.Vary + ",Origin";
} else {
set resp.http.Vary = "Origin";
}
...
}
You might have noticed that Vary
is still set, but in this case we’re setting it on resp
, not on beresp
. This is to make sure that any caches between your Varnish and the browser, which you have no control over, still do the right thing, which is to cache the response, but still serve different variations based on the Origin
header.
The value of beresp.http.Vary
(which you can only set in vcl_fetch
, before the object enters the cache) is used to determine how the object should be cached. You can only set resp.http.Vary
in vcl_deliver
, which is after the object has been inserted into the cache, but before the response is sent downstream, i.e. to the browser or an intermediate cache.
The VCL example above does assume that your backend does not know about CORS, and doesn’t send either Vary: Origin
or Access-Control-Allow-Origin
. If it does, for some reason, you will have to take that into account. Like so:
sub vcl_recv {
# Save Origin in a custom header
set req.http.X-Saved-Origin = req.http.Origin;
# Remove Origin from the request so that backend
# doesn’t add CORS headers.
unset req.http.Origin;
...
}
sub vcl_deliver {
if (req.http.X-Saved-Origin == "https://www.example.com"
|| req.http.X-Saved-Origin == "http://www.example.com"
|| req.http.X-Saved-Origin == "http://www.friends.example") {
set resp.http.Access-Control-Allow-Origin =
req.http.X-Saved-Origin;
}
if (resp.http.Vary) {
set resp.http.Vary = resp.http.Vary + ",Origin";
} else {
set resp.http.Vary = "Origin";
}
...
}
Other CORS headers
Access-Control-Allow-Origin
is the most used header, but other response headers have similar ramifications. If they are different depending on Origin
, make sure that Origin
is in your Vary
, or add them to the response using VCL.
CORS also defines two more request headers, which browsers might use in a pre-flight request, before doing something like a PUT
or DELETE
request. I will discuss those in a future Varnish tip.
Recap
To deal with caches where you are not in control, it’s important that Vary
contains Origin
. For the most efficient caching with Varnish, it is best to put all CORS logic in VCL.