Fastly and S3 - dealing with 'read after update' eventual consistency



  • Amir Yalon

    Here is a workaround. Set your desired TTL for stale-while-revalidate, and use a shorter TTL for the cache, so Fastly will continue to always serve from cache, but newer objects will be picked up eventually. When the objects don’t change, Fastly should get a 304 Not Modified response from S3, so the added bandwidth in this setup should be minimal, even for large objects.

  • marcosscriven

    Thanks @iopov

    I'm not sure that covers my requirement unfortunately. I have a bunch of HTML pages in S3. Vast majority of pages won't be changing for months, but individual pages might be updated and users would expect to see updates within a few seconds. Hence we have a massive range.

    The flow I wanted was:

    • Page uploaded to S3 for an existing key
    • Single page purge submitted to Fastly
    • User requests page (set for no caching on client side)
    • Fastly gets the page, caches for a very long period (months)

    By all accounts 'eventual' read-after-update in S3 can be within 100ms, so it's only an edge case here. It's just if Fastly gets a request just after the purge, but just before S3 object is consistent.

    Your workaround I believe is for relatively short TTLs?

  • marcosscriven

    I wondered if Edge dictionaries might work?

    Instead of submitting a purge, a key+hash is submitted to the edge dictionary, and it will check for that hash on the S3 response headers. If hash matches, it caches if and removes from the dictionary.

  • Ilya Kaplan

    Why not do delayed flush. I.e. purge immediatey, then, using resque or any other scheduler, purge after a second and again after minute. Is there any guarantee on s3 update?

  • marcosscriven

    The author of the post I linked to claimed that reading from a bucket in the same region after an update was consistent. However, Amazon's own documentation claims otherwise.

    Someone did a test of 100,000 writes, and in one region found just one read after overwrite inconsistent:

    This idea of waiting some indeterminate time comes up all the time in software, and I never like it. In this instance it would mean blindly doing something for 99,999 writes, that was actually only needed once.

    It would be good if someone from Fastly would pitch in, as their documention mentions S3 backends, but not this consistency issue.

  • Leon Brocard

    I've been testing this particular use case recently, so let me pitch in ;-)

    It's probably best to read about the Amazon S3 Data Consistency Model:

    Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all regions... (If) A process replaces an existing object and immediately attempts to read it. Until the change is fully propagated, Amazon S3 might return the prior data.

    So unfortunately there is an undefined amount of time between the object being replaced on S3 and Fastly being able to pull the new version. If you replace an object and then purge on Fastly, occasionally Fastly will pull the old version of the object after the purge request, which would be bad.

    Lowering the TTL seems to be the best solution at this moment.

    Changing object store might be a large undertaking, but GCS does not have this limitation. Google Cloud Storage Consistency says:

    When you upload an object to Cloud Storage, and you receive a success response, the object is immediately available for download and metadata operations from any location where Google offers service. This is true whether you create a new object or overwrite an existing object. Because uploads are strongly consistent, you will never receive a 404 Not Found response or stale data for a read-after-write or read-after-metadata-update operation.

Please sign in to leave a comment.