Why isn't serve stale working as expected?
Here are some things to consider when troubleshooting why a stale object wasn’t served from cache while your origins were weathering temporary issues:
-
Custom VCL: the most effective way to set up serving stale is to use this updated document to instantiate all the proper code in each subroutine. This can only be set up via custom VCL upload, not through the UI.
-
Cache: Stale objects are only available for cacheable content.
-
Shielding: If you don’t have shielding enabled, each datacenter can only serve stale under the condition that a request for that cacheable object was made through that datacenter before. Enabling shielding will increase the probability that stale exists and it’s a good way to refill the cache a little faster after a Purge All.
-
Requests: As traffic to your site increases, you’re more likely to see those stale objects available (even if shielding is disabled) as it’s reasonable to assume requests for a hot asset will come into various datacenters and get cached at multiple locations.
-
LRU: As a follow-up, Fastly does have an LRU (least recently used) list, so objects are not necessarily guaranteed to stay in cache for the entirety of their TTL. But eviction is dependent on many factors, including the object’s request frequency, its TTL, the POP from which it’s being served, etc. For instance, objects with a TTL of longer than 3700s get written to disk, whereas objects with shorter TTLs end up in transient, in-memory-only, storage. Set your TTL to >3700s when possible.
-
Purges: Limiting purges (issuing a purge by URL or surrogate key versus a Purge ALL) and utilizing our new soft purge feature can help ensure that your content remains in our caches to serve stale.
-
Hi, It's not clear if setting TTL to >3700 is absolute MUST in order for stale to work, or it just lowers the probability of having it cached long after TTL ends.
Can you explain how exactly it will work, if I set TTL to 10m and staleiferror to 30d, and then origin goes down for few hours? I'd guess it depends on when the object was last accessed (and thus refreshed). So 3 cases: 1. Last user accessed it minute ago - object still within TTL. 2. Last user accessed it 11 minutes ago - object just passed TTL. 3. Last user accessed it week ago - object long ago passed the TTL, but still within staleiferror time.
Thanks.
-
Do you plan on changing this functionality in future? For example, write object to disk if it's need to be thrown out of memory but it still has grace period.
It seems to me, that the case that I mentioned in example, is highly used case, where page is cached with some reasonable refresh rate, 10m-1h, less than 3700, but still with desire to server stale in case of errors.
-
Hey Ilya,
At this time, we have no plans on changing the setup. An object with a low TTL is one that needs to be refreshed constantly (whether or not the object is changing), and isn't suited for being stored on SSD.
If you have an object that isn't being accessed very often, and doesn't change so much that a synthetic error page wouldn't be preferable to the stale object, then I would recommend increasing the standard TTL, and using our soft purge feature when the object does change (I specify soft purging so that you maintain your Stale cache in case of error).
Best, Amanda
Please sign in to leave a comment.
Comments
3 comments