Check Out Tricentis Flood’s New Blog Post:
We have been using automated scripts to monitor web sites performance and availability. These scripts are usually launching a real browser and navigating through pre-defined steps to capture the timing and report back at least one checkpoint showing a pass/fail. The issue that recently came up that the Google Analytics reports show a few spots on the map with high traffic and those were a direct result of these scripting running every 15 minutes from 3 locations in the USA. The request came in to come up with a way to filter this monitoring traffic so that the reports would reflect real users only and exclude the monitors.
We explored a few ideas based on what Google Analytics (GA) offered. The first idea was to modify the Accept-Language header to include a test value like zzzz. We tried both adding the value to the header and replacing the entire string with the new value and neither approach worked – GA just reported the same language of en-us.
We also tried adding in a custom header name/value pair. Custom headers have an X as their first letter followed by a name we wanted so we tried X-synthetic-monitor along with value like the name of the script. For some reason, these custom headers could not be viewed in the GA tool.
We then tried a custom cookie name/value pair hoping those would be passed along to GA and alas they were not.
We also took a look at filtering by the IP addresses of the machines that run the synthetic monitoring scripts but we realized those IPs were dynamic and we had no way of knowing what they were at any given time. If we looked at the current IP of a monitoring machine that could change quickly due to a VM migration, a reboot, or even load balancing to another VM. IPv4 addresses are an unreliable way to identify a particular user even a synthetic one.
These are the domain suffixes we chose to prevent or blacklist:
The other approaches that we tried would probably work for web servers that the customer controls, especially the one at the origin. We could probably investigate the logs and see the custom headers and cookies that are being sent along. We chose to leave those in place just in case the wanted to analyze logs or traffic at other points in the system outside of what Google Analytics shows.
Tool: LoadRunner 12.60
Protocol: TruClient Web
Domain Blacklisting: Runtime Settings > Download Filters > Exlude only addresses in list
Modifying an existing header and merging in the new value:
truclient_step(“1”, “Execute Utils.addAutoHeader ( ‘Accept-Language’ , ‘Zzzz’ , true )”, “snapshot=Action_1.inf”);
Adding in a completely new custom header name and value:
truclient_step(“2”, “Execute Utils.addAutoHeader ( ‘X-synthetic-monitor’ , ‘mymonitoringscript’ , false )”, “snapshot=Action_2.inf”);
Resizing the browser to a specific width and height:
truclient_step(“6”, “Resize browser window to 1234 px / 987 px”, “snapshot=Action_6.inf”)
Excellent software has been produced by development teams that had total control over what tools they used and also by teams that had their tools mandated by management. On the other hand, poor software has also been delivered by both approaches.
Here are some thoughts to consider as you decide how your tooling choices are made.
Development teams are more autonomous and have more tool choice
The teams are less autonomous and have less tool choice
Development teams are more autonomous and have more tool choice
The teams are less autonomous and have less tools choice
Who is responsible for the maintenance of the code, the Site Reliability Engineer’s or development team?
The team responsible for the viability of the tools should have influence.
Who pays for the tool and its administrative upkeep?
The group fiscally responsible for the tool should have influence.
How well does a particular tool integrate with your existing tooling ecosystem?
Tools that integrate easier and better should carry more weight than those that don’t.
Does a tool have a killer feature that your team MUST have?
This should be considered and weighted appropriately.
How comfortable are you with staff turnover?
If you have a well-defined requisition that is easily filled, then you’re in a position to accommodate a higher turnover rate than if you are trying to replace a specialist. In other words, let specialists have more influence than less specialized roles.
The easier it is to collect the data required for your quality and delivery KPI’s, the less time is spent on normalization; which reduces your data’s time to market.
Normalization efforts are typically a highly manual effort. Reducing the amount of manual intervention increases the accuracy of your governance metrics.
By applying control at the point you perform evaluations, the maximum opportunity is provided for individual tool choice.
A happy developer makes useful software and happy users. We don’t want your engineers feeling like Bill Parcells.
There will be a proper balance of accuracy, time to market, and staffing for your situation.
Your business has constructed a variety of application environments designed to pursue and expand your mission-critical business plans and gain market share. Over time, security concerns, maintenance and user convenience steered you to building a unified, shared authentication infrastructure such as Identity and Access Management (IAM).
In this situation your financials system uses IAM, as does your time recording, expense, sales tracking, ticketing and development systems. They all tie-in to a single system for authentication and datacenter asset access control. This has saved your business countless hours in maintenance, unified your account management efforts, and put a singular, controllable and effective system in place to manage this part of your business. So what’s wrong, doesn’t this solve your problems?
Foulk Consulting can help you identify and avoid the type of problems that the following case study highlights. Here’s how we did it with a high profile Internet E-Retailer.
An E-Retailer built multiple, external, customer–facing applications that they also invested a lot of time and effort to performance test them all. The testing goals were ultimately clear and refined, they were based on each of those application’s usage models during the preceding peak season, combined with other business requirements and usage projections for the upcoming peak season that had to be met. In the course of these performance testing efforts, each test was focussed on a specific customer-facing application, testing and validating that application’s ability to sustain the expected load and perform within its defined success criteria.
Through their hard work they built a case internally that they were very well prepared for the upcoming peak season. However, when the peak season hit, the reality of the situation became clear, that something was terribly wrong and the systems could not handle the actual peak customer loads.
The IAM infrastructure was tested with each application independently and it performed as expected.
During their annual peak event, user sessions were dropped at an unexpected and unacceptable level. Most of the new users could not even register new accounts, nor could most returning customers log in to make their purchases. The failures to convert shopping activities to sales resulted in them missing their projected revenue numbers by a large margin, in fact they faced crippling losses for the year. A story that has been played out too many times in recent years.
So how did this happen? Didn’t we say that they validated this load before the peak season? It sure seems that they did their due diligence and stressed the machines beyond the traffic levels that they expected, which were even higher than the year before. How is that possible? Did they test the wrong things? No. Did they fail to test dependent subsystems to the right levels? Not exactly. How can you create good tests, find defects and configuration problems, resolve them, validate your efforts, demonstrate that your testing exceeded the loads you were expecting, and still watch your systems fail under real user load?
As a result of this missed business opportunity and their public hit to customer confidence, Foulk Consulting was contacted to assist in analyzing the situation. We came in and worked with all the involved teams. We reviewed their various testing practices, their planning and assumptions, we dove into the peak load logs and analytics and the event data that was collected, in an attempt to try to find the keys to how this situation manifested itself, and more importantly help this valued customer devise an approach to prevent this sort of thing from happening ever again.
Coming out of the peak season with this blackeye, the business hired Foulk Consulting to assist them in analyzing the planning and testing leading up to the event as well as the forensic data from the event itself. Our goal was to not only illuminate where the problem originated and manifested itself, but how they could prevent this in the future – to help them solve the actual reasons why they were losing customers and market share, following up on all the hard work they had done before.
After thorough investigation and analysis, it was found that despite each of their applications being tested thoroughly and each of them proving to meet performance objectives, maintaining stability and performance under load, what appeared out of the piles of data was surprising to everyone in the organization.
The root cause appeared to be a complete lack of preparation on the IAM tier as well as the dependent login integrations across the various applications when the real customers loaded multiple applications at the same time during peak usage it created a classic bottleneck; the request demand overloaded the system’s ability to respond in a performant manner.
When each application was tested in isolation, the IAM tier was capable of handling that application load without any problems. When the actual peak production load hit all the applications at the same time, however, the IAM infrastructure became a complete bottleneck and customers could not interact with the systems. This was found to be the root of the peak usage collapse.
Finding the root cause was only part of the puzzle. Foulk then took that information and applied a performance engineering approach to testing that would ensure that the business was able to identify, isolate and test the systems that demanded attention. The plan was designed to focus the customer’s efforts and identify exactly who was responsible for resolving the issues during their next preparation and deployment cycle.
Ultimately we were able to work with the various application teams and construct a performance test that stressed the IAM tier realistically, reproducing the production peak load failure model.
Armed with this repeatable test structure, the customer was then able to work with the application support, operations and vendor teams to work through the issues and build the IAM infrastructure to pass the load test criteria and remove their significant production performance bottleneck in the IAM tier.
Foulk Consulting is an IT solutions provider specializing in delivering consulting services geared toward optimizing your business.
Foulk Consulting Services Inc.
38777 W 6 Mile Rd
Livonia, MI, 48152