Regular expressions in Fastly VCL

Fastly VCL supports regular expressions as an operand to the ~ comparison operator and also as parameters to the following functions:

We support PCRE expressions, with some minor exceptions, using the PCRE2 library. Expressions are evaluated at compile time, so they cannot be dynamic nor converted from any other type. Where an operator or function expects a regex, you must provide a literal pattern in your code. See regular-expressions.info for a good introduction to pattern matching using regex.

HINT: You can use the freely available Regex101 tool to test out your patterns, but be aware of potential minor syntax differences with Fastly's engine. Fastly Fiddle can be used to test expressions on the actual Fastly platform.

Example usage

Regular expressions are a very common way to match path prefixes or segments in VCL, along with many other use cases:

Example usageDescription
var.my_str ~ "foo"Variable contains "foo"
req.url.path ~ "^/admin(/.*)?\z"URL path starts with /admin segment
req.url.ext ~ "^(jpe?g|png|gif)\z"File extension match
req.url.path ~ "^/bins/([0-9a-f]+)"Path slug with hex encoding
req.url.path ~ "^/([^/])+/foo"Path segment containing any character except /
req.http.host ~ "^www\."Hostname starting with www.

Capture groups and replacement

Every time a regular expression is evaluated as part of a conditional expression involving the ~ operator, the re.group.{N} variables will be populated with the matched text and any capturing subgroups in the order that they are matched:

if (req.url.path ~ "/products/(uk|us|au|jp)/(\d+)") {
set req.http.product-region = re.group.1;
set req.http.product-id = re.group.2;
}

The regsub and regsuball functions take a replacement parameter whose value is used to replace pattern matches in the source data. These replacement values may include references to the capture groups in the pattern using a \{n} syntax:

// /12345-blue-unisex-stripe-t-shirt => /products/12345
set req.url = regsub(req.url, "^/(\d+)\-\w+\z", "/products/\1");

Regular expressions used as function parameters, as in regsub, don't populate or affect the value of the re.group.{N} capture variables.

Use (?: ... ) to prevent the grouping meta-characters () from capturing into a re.group.{N} variable. This usage is preferred whenever possible, for efficiency reasons.

We do not support named capture groups in any regular expressions.

Pattern modifiers

Fastly VCL doesn't provide a way to set regex modifiers outside of the pattern, but they can be prefixed to the pattern using the (?_) syntax. We support standard PCRE2 modifiers. The most common ones used in Fastly customer code are:

  • (?i): ignore case. Makes the pattern case insensitive.
  • (?s): dot all: Allows the . to match any character, including newlines.
  • (?m): multi-line: Makes ^ and $ match at the beginning and end of lines (\z continues to only match the end of the string)

Text encoding and multi-byte characters

VCL source code, including any contained regular expressions, is interpreted as UTF-8, which means that one character of text can be a variable number of bytes. It is possible to match multi-byte characters using regex, but the regex parser will see only sequences of bytes, not characters or code points. The following patterns will all match a πŸ‘‹ (waving hand emoji), which is represented by a 4-byte sequence, at the start of a string:

  • ^πŸ‘‹
  • ^....
  • ^\xF0\x9F\x91\x8B

Notice that a single . will not match a multi-byte character (because . matches one byte, not one character), and multi-byte characters in URL paths (along with anything else that is not RFC3966 compliant) will be automatically URL-encoded. So, when matching on req.url, use the encoded form or pass through urldecode first. See the following examples:

if (req.url ~ {"^/foo/πŸ‘‹"}) { ... } // No match
if (req.url ~ {"^/foo/%F0%9F%91%8B"}) { ... } // Matches
if (urldecode(req.url) ~ {"^/foo/\xF0\x9F\x91\x8B"}) { ... } // Matches
if (urldecode(req.url) ~ {"^/foo/πŸ‘‹"}) { ... } // Matches
if (urldecode(req.url) ~ "^/foo/%F0%9F%91%8B") { ... } // Matches

Complicating this, in VCL regular expressions are expressed using STRING syntax, which means URL-escape notation (e.g., "%20") is transformed by the string type and not by the regex engine. As a result, the final example above matches because the req.url on the left starts out with the emoji in encoded form but it is decoded by urldecode and, on the right side, the URL encoding is decoded by the STRING type.

As a result, we recommend that any regular expression that includes URL-escape notation should be expressed as a long string (e.g., {"%20"}). The long string notation does not decode URL escape notation, so it will be passed to the regex engine unmodified.

Since the regex engine has no concept of a multi-byte character we do not support \uXXXX notation for unicode escapes.

Best practices and common mistakes

Here is some of our most common advice to customers who are writing regular expressions in VCL:

  • Anchor the pattern: Often you will want to find a match at the beginning or end of a URL path or hostname. Don't forget to include ^ at the beginning or \z at the end, otherwise you may find a match anywhere in the string.

    βœ… ^web\d+\.example\.com\zβ€ƒβ€ƒβ€ƒβŒ web\d+\.example\.com

  • Prefer \z over $: \z always matches the end of the string. $ will also match a trailing newline at the end of the string, so if you use this in combination with capturing groups, you may not be capturing what you expect. Also, \z is more efficient, so it is better to use it in places where \n cannot appear.

    βœ… req.url ~ "/foo\z" ❌ req.url ~ "/foo$"

  • Escape dots: The . pattern matches any character, so remember to escape it if you want to match a dot:

    βœ… example\.comβ€ƒβ€ƒβ€ƒβŒ example.com

  • Don't escape slashes: In some languages regular expressions are bounded by a delimiter character, commonly a slash (e.g., /abc/). This isn't the case in VCL and there's therefore no need to escape forward slashes:

    βœ… /foo/barβ€ƒβ€ƒβ€ƒβŒ \/foo\/bar

  • Don't use regsub for extraction: To extract a substring into a variable, use the if function. If you use regsub, and there is no match, you would assign the full source string to the target variable, which probably isn't what you want.

    βœ… set var.lang = if(req.url ~ "^/(\w{2})/", re.group.1, "en");
    ❌ set var.lang = regsub(req.url, "^/(\w{2})/.*\z", "\1");

  • Use long strings to avoid double encoding: Strings expressed in VCL using double quotes (e.g., "foo") automatically decode URL-escape sequences, such as %20 (which is a space). To ensure characters are processed by the regular expression parser and not by the string parser, use a long string. For example:

    βœ… req.url ~ {"/%2ehidden"}β€ƒβ€ƒβ€ƒβŒ req.url ~ "/%2ehidden"

  • RFC3986-non-compliant URLs get URL-encoded: If Fastly receives a URL path containing characters not allowed in RFC3966, we will URL encode them, which means a regex that attempts to match the original form will fail. Use a case-insensitive regex in a long-string to match the URL-encoded version:

    βœ… req.url ~ {"(?i)^/foo/%3C%%20\w+%20%%3C"}
    ❌ req.url ~ "^/foo/<% \w+ %>"

  • Don't use regular expressions to match query parameters: It's easy to make a mistake when trying to match or filter a query string parameter with a regular expression, but VCL has a whole set of query string-related functions to help with these use cases.

    βœ… set req.url = querystring.filter(req.url, "foo");
    ❌ set req.url = regsub(req.url, "([?&])foo=[^&]*&?", "\1");

  • Use non-capturing groups when possible: These are more efficient. You can make a group non-capturing by prefixing it with ?::

    βœ… if (beresp.http.Cache-Control ~ "(?:private|no-store)") {
    ❌ if (beresp.http.Cache-Control ~ "(private|no-store)") {