Snowplow Analytics and log streaming into BigQuery for real-time dashboards?

Comments

9 comments

  • Justin

    Hi Ian

    Yes, it'll be possible using VCL. From what you've said, all that's needed is to extract the value each of the parameters that you need to store and add them to the log format. All that would be necessary is to know the parameter names beforehand; looking at the protocol this looks like it is the case.

    You can use VCL like this in order to achieve it (either via the UI, as a VCL snippet or using Custom VCL):

    ```

    set a header with the value of the 'tv' parameter

    set req.http.snowplow-tv = subfield(req.url.qs, "tv", "&"); ```

    You can then pick this up in the log by adding it to the format that produces your JSON output. It will use a different format depending on if the logging endpoint is configured to use logging version 1 or 2.

  • Florian Scheel

    Hi Justin,

    Thank you very much for your quick response, that's awesome news!

    Snowplow requests can contain a large number of URL parameters. Is there a limit regarding number of headers?

    As Snowplow's URL parameter values can contain JSON objects, is there a character limit for a header's value?

    Or is there an overall limit for the total headers payload, especially in conjunction with log streaming requests?

    It wouldn't be a problem if there were certain limits because the idea is to use log streaming only for some few key metrics, e.g. pageviews, sessions and visitors. In addition to the log streaming, we would set it up so that the entire URL payload is also retrieved on our upstream server, the so-called Snowplow Collector.

    Best regards, Ian

  • Justin

    Hi Ian

    Yes, there is a limit on the number of headers. We've listed them here. Max total headers is 96 (including internal ones), so practically you have 80+ to work with. I don't believe there is a limit on a single header but there is a 64K limit on total size of all headers, which is the same as the log line limit.

  • Florian Scheel

    (post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

  • Florian Scheel

    (post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

  • Florian Scheel

    Hi Justin,

    Basic log streaming to BigQuery is working great, now I’d like to set up the headers. Because we will have a quite large number of URL parameters that we want to set, I'd like to use a VCL snippet.

    When we take a request URL like https://...global.ssl.fastly.net/i?...&duid=d0f6acca-c6cb-41c5-a768-8f296bd69d8b&... we can use [\?|\&]duid\=([^&]+) to get the value of the duid parameter.

    I added your code example from above ( set req.http.tv = subfield(req.url.qs, "tv", "&");) to a VCL snippet that runs with the hit subroutine (vcl_hit) but unfortunately the value in BigQuery is (null) which might be a -1 from VCL?

    I also tried this: set req.http.tv = regsub(req.url, "[\?|\&]tv\=([^&]+)", "\1");

    Thanks in advance, Ian

  • Justin

    Hi Ian

    Apologies for the delay in getting back to you here. I just tested the solution end-to-end and it's working for me.

    One difference is your snippet is in vclhit (which won't run on every request) and mine is in vclrecv.

    If changing it to the above doesn't fix it for you, I'd recommend opening a support ticket so we can look into the specifics of your case, and if there's a wider issue that we need to address I'll swing back here and update this post.

  • Florian Scheel

    Hi Justin,

    Thank you very much for your response and sorry for not updating you on this. I had already moved the code to vcl_recv and it is now working perfectly, we love it!

    Best regards, Ian

  • oaustegard

    Hi Ian, I'd love to learn more about your implementation if you'd be able and willing to share?

    Thanks! Oskar add @gannett.com to my username for email

Please sign in to leave a comment.