Inconsistent cache lookup
Hi all,
We have a fairly complicated setup running with multiple backends and URL rewrites and fallback in case one of our backends returns an error. The gist of it is that when a request to our main backend fails with a 5xx error, we rewrite the request url and set the req.host and backend to another backend which serves up some static backup content, so things can still run smoothly. Another thing to note is that we have shielding enabled. The last thing is that to enable more consistent PURGE behavior, we’ve rewritten the hash function to use the host and url from the initial request (we save these as headers on the request and use them later in vcl_hash. Our hash function looks like:
sub vcl_hash {
call log_hash;
set req.hash += req.http.Appcues-Original-Url;
set req.hash += req.http.Appcues-Original-Host;
#FASTLY hash
return (hash);
}
I’m seeing inconsistent behavior with caching when we do this fallback. Because we have shielding enabled, I’m expecting that the content should be cached on the shield after we make our first request. This seems to happen sometimes, but only occasionally - normally we get a MISS. This isn’t the behavior when we’re not in the fallback case, which seems very odd. I’m listing a sequence of HTTP requests/responses (irrelevant headers are removed) that illustrate what’s going on:
This is a sequence of reqests for the fallback case (doesn’t work as expected):
GET /timeout/timeout.js HTTP/1.1
Connection: keep-alive
Fastly-Debug: True
HTTP/1.1 200 OK
Age: 0
Cache-Control: max-age=10
Date: Wed, 11 Mar 2015 17:25:20 GMT
ETag: "b9f7f844934f40cdc6e66918a32052ac"
Fastly-Debug-Path: (D cache-atl6231-ATL 1426094720) (F cache-atl6225-ATL 1426094720) (D cache-jfk1023-JFK 1426094720) (F cache-jfk1023-JFK 1426094720)
Fastly-Debug-TTL: (M cache-atl6231-ATL - - 0) (M cache-jfk1023-JFK - - 0)
Fastly-Restarts: 1
Surrogate-Control: max-age=2592000
X-Cache: MISS, MISS
X-Cache-Hits: 0, 0
X-Served-By: cache-jfk1023-JFK, cache-atl6231-ATL
X-Timer: S1426094720.491773,VS0,VE101
GET /timeout/timeout.js HTTP/1.1
Connection: keep-alive
Fastly-Debug: True
HTTP/1.1 200 OK
Age: 0
Cache-Control: max-age=10
Date: Wed, 11 Mar 2015 17:25:23 GMT
ETag: "b9f7f844934f40cdc6e66918a32052ac"
Fastly-Debug-Path: (D cache-jfk1030-JFK 1426094723) (F cache-jfk1030-JFK 1426094723)
Fastly-Debug-TTL: (M cache-jfk1030-JFK - - 0)
Fastly-Restarts: 1
Surrogate-Control: max-age=2592000
X-Cache: MISS
X-Cache-Hits: 0
X-Served-By: cache-jfk1030-JFK
X-Timer: S1426094723.587333,VS0,VE70
This is a sequence of reqests for the normal case (works as expected):
GET /12.js HTTP/1.1
Connection: keep-alive
Fastly-Debug: True
HTTP/1.1 200 OK
Age: 0
Date: Wed, 11 Mar 2015 17:38:03 GMT
Etag: W/"ude9ClBZbGhd+qyfhXZaNQ=="
Fastly-Debug-Path: (D cache-atl6223-ATL 1426095483) (F cache-atl6230-ATL 1426095483) (D cache-jfk1024-JFK 1426095483) (F cache-jfk1030-JFK 1426095483)
Fastly-Debug-TTL: (M cache-atl6223-ATL - - 0) (M cache-jfk1024-JFK - - 0)
X-Cache: MISS, MISS
X-Cache-Hits: 0, 0
X-Served-By: cache-jfk1024-JFK, cache-atl6223-ATL
X-Timer: S1426095482.819359,VS0,VE261
GET /12.js HTTP/1.1
Connection: keep-alive
Fastly-Debug: True
HTTP/1.1 200 OK
Age: 0
Date: Wed, 11 Mar 2015 17:38:06 GMT
Etag: W/"ude9ClBZbGhd+qyfhXZaNQ=="
Fastly-Debug-Path: (D cache-iad2133-IAD 1426095486) (F cache-iad2140-IAD 1426095486) (D cache-jfk1031-JFK 1426095486) (F cache-jfk1030-JFK 1426095483)
Fastly-Debug-TTL: (M cache-iad2133-IAD - - 0) (H cache-jfk1031-JFK - - 3)
X-Cache: HIT, MISS
X-Cache-Hits: 1, 0
X-Served-By: cache-jfk1031-JFK, cache-iad2133-IAD
X-Timer: S1426095486.263026,VS0,VE20
EDIT: A bit of further debugging – I’m logging the fastly_info.state
variable and in the broken case, it shows MISS as the state for any node that lives in my shield cluster. On the edge nodes, it shows MISS-CLUSTER. In the case that does work properly, it shows MISS-CLUSTER as the state for the shield cluster nodes, however. This definitely seem to be indicative of the problem but I don’t know what’s causing it. Could the fact that req.restarts > 0
be causing this somehow?
I can’t seem to determine why the caching works inconsistently in the fallback case. Any thoughts?
Thanks!
-
Hi- If you look at your generated VCL from within the app, you will see the expanded
#FASTLY hash
macro shows the additional code of:```
-FASTLY HASH CODE WITH GENERATION FOR PURGE ALL
#
#if unspecified fall back to normal { set req.hash += req.url; set req.hash += req.http.host; set req.hash += "#####GENERATION#####"; return (hash); } #--FASTLY END HASH CODE ```
So the below edits are most likely adding to the hash which already has
req.http.host
and `req.url``[quote="rayd, post:1, topic:305"] sub vcl_hash {
call log_hash;
set req.hash += req.http.Appcues-Original-Url; set req.hash += req.http.Appcues-Original-Host;
FASTLY hash
return (hash); } [/quote]
Do you think this might be explaining the behavior you're seeing?
-
Unfortunately I don't think that's it. Because I'm overwriting the entire vcl_hash function in my custom VCL and just including that
#FASTLY hash
macro (which just adds the generation part), I get this in my generated VCL:sub vcl_hash { call log_hash; set req.hash += req.http.Appcues-Original-Url; set req.hash += req.http.Appcues-Original-Host; #--FASTLY HASH start # support purge all set req.hash += "#####GENERATION#####"; #--FASTLY HASH end return (hash); }
I really think it's somehow related to the
req.restarts > 0
but it doesn't seem like that should affect things. Unless the restart causes the hash to occur twice? Is that what you're suggesting? -
and following up a bit...did you get a chance to look at this article? https://community.fastly.com/t/redirect-to-another-backend-based-on-response-status/51
-
Hi Ryad,
What I think is getting in your way is that we disable clustering on restarts. Clustering is where one machine in a datacenter (cluster) is primary and one is secondary for any specific object and all the other nodes go there instead of your origin.
Because restarts are often because of errors, and the error might have been in the clustering (we are reevaluating this), the clustering is disabled. You can, however, force the clustering if you create the
Fastly-Force-Shield
header. That should take care of making sure it is the primary or secondary that do the requests to origin, and make sure that your objects are cached for the whole cluster.Let us know how this goes.
Cheers,
Doc
-
Hi Doc,
We ended up changing our VCL completely and this ended up being a non-issue at that point (We decided not to cache the fallback content at all). But your explanation certainly seems to explain the behavior I was seeing. I think this would have fixed it. So for anyone else in a similar situation to me, take a look at Doc's answer above!
Thanks a lot for your help!
Please sign in to leave a comment.
Comments
6 comments