Filtering out Synthetic Monitoring Traffic from Google Analytics
We have been using automated scripts to monitor websites’ performance and availability. These scripts are usually launching a real browser and navigating through pre-defined steps to capture the timing and report back at least one checkpoint showing a pass/fail. The issue that recently came up is that the Google Analytics reports show a few spots on the map with high traffic and those were a direct result of these scripting running every 15 minutes from 3 locations in the USA. The request came in to come up with a way to filter this monitoring traffic so that the reports would reflect real users only and exclude the monitors.
We explored a few ideas based on what Google Analytics (GA) offered. The first idea was to modify the Accept-Language header to include a test value like zzzz. We tried both adding the value to the header and replacing the entire string with the new value and neither approach worked – GA just reported the same language of en-us.
We also tried adding in a custom header name/value pair. Custom headers have an X as their first letter followed by a name we wanted so we tried X-synthetic-monitor along with value like the name of the script. For some reason, these custom headers could not be viewed in the GA tool.
We then tried a custom cookie name/value pair hoping those would be passed along to GA and alas they were not.
We also took a look at filtering by the IP addresses of the machines that run the synthetic monitoring scripts but we realized those IPs were dynamic and we had no way of knowing what they were at any given time. If we looked at the current IP of a monitoring machine that could change quickly due to a VM migration, a reboot, or even load balancing to another VM. IPv4 addresses are an unreliable way to identify a particular user even a synthetic one.
These are the domain suffixes we chose to prevent or blacklist:
The other approaches that we tried would probably work for web servers that the customer controls, especially the one at the origin. We could probably investigate the logs and see the custom headers and cookies that are being sent along. We chose to leave those in place just in case they wanted to analyze logs or traffic at other points in the system outside of what Google Analytics shows.
Technical Implementation Details
Tool: LoadRunner 12.60
Protocol: TruClient Web
Domain Blacklisting: Runtime Settings > Download Filters > Exclude only addresses in list
Modifying an existing header and merging in the new value:
truclient_step(“1”, “Execute Utils.addAutoHeader ( ‘Accept-Language’ , ‘Zzzz’ , true )”, “snapshot=Action_1.inf”);
Adding in a completely new custom header name and value:
truclient_step(“2”, “Execute Utils.addAutoHeader ( ‘X-synthetic-monitor’ , ‘mymonitoringscript’ , false )”, “snapshot=Action_2.inf”);
Resizing the browser to a specific width and height:
truclient_step(“6”, “Resize browser window to 1234 px / 987 px”, “snapshot=Action_6.inf”)