Instant cache invalidation is essential for dynamic content delivery. Breaking news, live event updates, and e-commerce promotions all rely on the ability to quickly purge outdated information.
So why is instant purge just as hard to solve today as it was when we tackled it over 10 years ago? Well, modern edge networks are distributed across the globe. Ensuring near-simultaneous invalidation across all those cache nodes is no small feat. We need to communicate invalidation events to every active node within milliseconds. Plus, there's a constant juggling act between reliability, speed, scale, and coverage. We need to minimize latency without sacrificing cache consistency while making it performant enough to truly be "instant." That's a tall order. No wonder it’s taken over a decade for others to attempt to catch up.
To Centralize or Decentralize?
When we approached this problem many years ago, we realized there were two main ways to tackle it: centralized or decentralized. The typical solution involves sending purge requests to a central system, which then coordinates removing content from caches worldwide. While this method may seem straightforward, it introduces a single point of failure and higher latency - hardly ideal for achieving instant results.
We started looking at this challenge differently. Instead of relying on a centralized command structure, we asked ourselves: What if we treated purging as a distributed messaging problem that could be solved in a decentralized way? Instead of bringing purge requests back to a central location, why not intercept them at the edge and use logic right there to distribute them rapidly?
Now, this decentralized approach isn't without its challenges. It's trickier to build and adds some complexity. Servers might temporarily lose contact with each other, or messages could get lost or delayed in transit. But we found a way to solve for that.
How does Instant Purge work?
We based our system on an algorithm called Bimodal Multicast. It's fast, understandable, and guarantees that messages will eventually be delivered. Practically speaking, here's how it works: When a cache server receives a purge request, it immediately broadcasts the message to all other servers using UDP. With typical packet loss rates below 0.1% between our Points of Presence (POPs), purge latency is often limited only by network delay.
One key concept in our system: we don't require acknowledgment of purge messages. By skipping this step, we can cut latency by up to half. We also don't try to be too clever about routing. Instead, we broadcast the purge everywhere. This might seem inefficient, but it actually simplifies things. All our nodes are equally capable of serving the same traffic. This means we can send purge requests everywhere without maintaining complex maps of content location. It's a simpler, more robust approach that scales well as our network grows.
Our approach to network architecture also gives us an edge: we've built our POPs for maximum edge cache storage and smarter programmatic control. By intentionally running fewer but more capable POPs, we reduce the complexity of our network and minimize the number of servers that need to receive purge messages. This architectural decision offers significant advantages for instant purging. Plus, our software-defined network allows us to adapt and optimize our purging system on the fly, without being constrained by hardware limitations.
How has this panned out?
Now I’m sure you’re wondering - “Has this system scaled well? Has it continued to perform well as we’ve added more and more customers and traffic? Did your approach pan out?” Well, the short answer is yes… as we expected from our initial design, we’ve had to iterate on our purging system in order to increase the traffic we can handle and continue to scale. Over time, we’ve gone from 2-3k purges per second in 2018 to around 60k purges per second today. When we first implemented Bimodal Multicast, it was almost straight out of the academic papers, but there were scaling limits. Over time, our system has become our own and isn’t really purely Bimodal Multicast anymore.
There have been three points over the years where we’ve made optimizations and changes to the way we do purging. A few years after we originally introduced our purging capability, we started to run into some scale and reliability issues as we expanded our network and dealt with the ever-present internet weather. We originally set up all nodes on a POP as peers with no hierarchy, where every node received every purge request and had the duty to distribute packets to every other node. In an effort to optimize for scale as our number of nodes increased and reliability when it comes to packet loss due to internet weather, we thought, instead of sending purge messages to every node in a POP, what if we send them to one and let it rebroadcast it to the rest of the nodes in the POP.
This dramatically increased our ability to scale and reduced the overall number of operations. Purge requests are encapsulated in UDP packets that are delivered to at least two healthy nodes at each point of presence of the cluster, which then re-broadcast the packets to neighboring nodes on the local network. This "double-delivery" mechanism is our first layer of defense for reliability and significantly lowers the odds of a request being lost in transit due to internet weather.
And finally, last year we did work on batch purging that had enormous effects on scalability (and eliminated alerts that were waking some of us up at night!). Our API has always allowed customers to batch purge with a series of surrogate keys, but this wasn’t truly “batch purging”. The system would slice and dice the batch into individual purges, which in high volumes had the ability to flood the system, alerting us and forcing us to rate limit to not degrade performance. Not ideal for our customers or for us. So, we modified the system to actually handle real and true batch purging, improving scalability and eliminating alerts and manual rate limiting. We are now confident we can scale to hundreds of thousands of purges per second.
The Future of Instant Purge
Looking ahead, we're working on ways to make our purging system even more performant and scalable, and always before we need to. We always want to stay ahead of demand with 10x the capacity we currently need. Our goal is to increase capacity from hundreds of thousands of requests per second to millions per second, and we believe we can expand on the optimizations and techniques we’ve already implemented around grouping things together to achieve the next level of performance and scale. It's a continuous process of innovation to stay ahead of growing demands.
In the end, it's about building a dynamic architecture that can adapt and scale as needs evolve. By approaching the problem differently and embracing decentralization, we've created a system that delivers on the promise of truly instant purging.
This article contains “forward-looking” statements that are based on Fastly’s beliefs and assumptions and on information currently available to Fastly on the date of this article. Forward-looking statements may involve known and unknown risks, uncertainties, and other factors that may cause our actual results, performance, or achievements to be materially different from those expressed or implied by the forward-looking statements. These statements include, but are not limited to, statements regarding future product performance and our vision and objectives for future operations. Except as required by law, Fastly assumes no obligation to update these forward-looking statements publicly, or to update the reasons actual results could differ materially from those anticipated in the forward-looking statements, even if new information becomes available in the future. Important factors that could cause Fastly’s actual results to differ materially are detailed from time to time in the reports Fastly files with the Securities and Exchange Commission (SEC), including in our Annual Report on Form 10-K for the fiscal year ended December 31, 2024, and our Quarterly Reports on Form 10-Q. Copies of reports filed with the SEC are posted on Fastly’s website and are available from Fastly without charge.