Filtering out Synthetic Monitoring Traffic from Google Analytics

We have been using automated scripts to monitor web sites performance and availability. These scripts are usually launching a real browser and navigating through pre-defined steps to capture the timing and report back at least one checkpoint showing a pass/fail. The issue that recently came up that the Google Analytics reports show a few spots on the map with high traffic and those were a direct result of these scripting running every 15 minutes from 3 locations in the USA. The request came in to come up with a way to filter this monitoring traffic so that the reports would reflect real users only and exclude the monitors.

We explored a few ideas based on what Google Analytics (GA) offered. The first idea was to modify the Accept-Language header to include a test value like zzzz. We tried both adding the value to the header and replacing the entire string with the new value and neither approach worked – GA just reported the same language of en-us.

We also tried adding in a custom header name/value pair. Custom headers have an X as their first letter followed by a name we wanted so we tried X-synthetic-monitor along with value like the name of the script. For some reason, these custom headers could not be viewed in the GA tool.

We then tried a custom cookie name/value pair hoping those would be passed along to GA and alas they were not.

We tried resizing the browser to known value like 1234×987 however the JavaScript running on the page was resizing the viewport based on its own algorithms. The viewport size was also changing during the life the script as the code stepped through a workflow and the JS responded to those inputs. This is how most sites are set up today so we ended up with a range of values for width and height for the browser instead of fixed values we could pick out of the noise. Therefore resize was not a good candidate for filtering traffic.

We also took a look at filtering by the IP addresses of the machines that run the synthetic monitoring scripts but we realized those IPs were dynamic and we had no way of knowing what they were at any given time. If we looked at the current IP of a monitoring machine that could change quickly due to a VM migration, a reboot, or even load balancing to another VM. IPv4 addresses are an unreliable way to identify a particular user even a synthetic one.

Ultimately we settled on blocking the domains for Google Analytics completely in the script’s runtime settings. GA depends on JavaScript code running in the browser, gathering up data, and then sending that data to Google. If we just prevent those requests from being made by blacklisting the domains then the data never makes it to the GA system. This achieved the goal for the team as they no longer had to tweak the filters and could trust that the scripts we deployed were not included in the analytics data.

These are the domain suffixes we chose to prevent or blacklist:
– google-analytics.com
– googletagmanager.com

The other approaches that we tried would probably work for web servers that the customer controls, especially the one at the origin. We could probably investigate the logs and see the custom headers and cookies that are being sent along. We chose to leave those in place just in case the wanted to analyze logs or traffic at other points in the system outside of what Google Analytics shows.

Technical Implementation Details

Tool: LoadRunner 12.60
Protocol: TruClient Web
Domain Blacklisting: Runtime Settings > Download Filters > Exlude only addresses in list
HostSfx: google-analytics.com
HostSfx: googletagmanager.com

Headers

Modifying an existing header and merging in the new value:

truclient_step(“1”, “Execute Utils.addAutoHeader ( ‘Accept-Language’ , ‘Zzzz’ , true )”, “snapshot=Action_1.inf”);

Adding in a completely new custom header name and value:

truclient_step(“2”, “Execute Utils.addAutoHeader ( ‘X-synthetic-monitor’ , ‘mymonitoringscript’ , false )”, “snapshot=Action_2.inf”);

Resizing the browser to a specific width and height:

truclient_step(“6”, “Resize browser window to 1234 px / 987 px”, “snapshot=Action_6.inf”)


Written by JJ Welch