How Much Influence Should Developers Have Over the Tools They Use?

Overview

Excellent software has been produced by development teams that had total control over what tools they used and also by teams that had their tools mandated by management.  On the other hand, poor software has also been delivered by both approaches.

Here are some thoughts to consider as you decide how your tooling choices are made.


Pro’s

Development teams are more autonomous and have more tool choice

  • The team becomes more invested in the process of building code.
  • The more invested your teams are, the better their product should be.
  • This reduces the need for a systems administration team and the number of employees in it.

The teams are less autonomous and have less tool choice

  • Fewer tools to rely on for producing output means there is less variance to consider in effort projections
  • Fewer pockets of tooling knowledge reduces knowledge fracturing
  • Less license sprawl reduces cost
  • Teams that use the same tools can cross-pollinate the lessons they learn

Con’s

Development teams are more autonomous and have more tool choice

  • Tool sprawl, which leads to consolidated pockets of tool-centric knowledge and devalues remediation groups such as SRE’s.
  • Lack of standardization in the development process, which leads to difficult deconstruction when remediation is done by groups not involved with the initial development process.
  • An individual developer becomes a chokepoint, should they become unavailable.

The teams are less autonomous and have less tools choice

  • There may be better tools to use.
  • Your team may feel de-valued and become disinterested

Additional Influencers and Considerations

Who is responsible for the maintenance of the code, the Site Reliability Engineer’s or development team?

The team responsible for the viability of the tools should have influence.

Who pays for the tool and its administrative upkeep?

The group fiscally responsible for the tool should have influence.

How well does a particular tool integrate with your existing tooling ecosystem?

Tools that integrate easier and better should carry more weight than those that don’t.

Does a tool have a killer feature that your team MUST have?

This should be considered and weighted appropriately.

How comfortable are you with staff turnover?

If you have a well-defined requisition that is easily filled, then you’re in a position to accommodate a higher turnover rate than if you are trying to replace a specialist.  In other words, let specialists have more influence than less specialized roles.


The easier it is to collect the data required for your quality and delivery KPI’s, the less time is spent on normalization; which reduces your data’s time to market.

Normalization efforts are typically a highly manual effort.  Reducing the amount of manual intervention increases the accuracy of your governance metrics.

By applying control at the point you perform evaluations, the maximum opportunity is provided for individual tool choice.

A happy developer makes useful software and happy users.  We don’t want your engineers feeling like Bill Parcells.

There will be a proper balance of accuracy, time to market, and staffing for your situation.

Shared Application Infrastructure Can Be A Critical Performance Constraint and Blindspot To Your Business

The Scenario

Your business has constructed a variety of application environments designed to pursue and expand your mission-critical business plans and gain market share. Over time, security concerns, maintenance and user convenience steered you to building a unified, shared authentication infrastructure such as Identity and Access Management (IAM). 

In this situation your financials system uses IAM, as does your time recording, expense, sales tracking, ticketing and development systems. They all tie-in to a single system for authentication and datacenter asset access control. This has saved your business countless hours in maintenance, unified your account management efforts, and put a singular, controllable and effective system in place to manage this part of your business. So what’s wrong, doesn’t this solve your problems? 

Foulk Consulting can help you identify and avoid the type of problems that the following case study highlights. Here’s how we did it with a high profile Internet E-Retailer. 

Case Study: E-Retailer Preparing for Peak Usage Season

An E-Retailer built multiple, external, customerfacing applications that they also invested a lot of time and effort to performance test them all. The testing goals were ultimately clear and refined, they were based on each of those application’s usage models during the preceding peak season, combined with other business requirements and usage projections for the upcoming peak season that had to be met. In the course of these performance testing efforts, each test was focussed on a specific customer-facing applicationtesting and validating that application’s ability to sustain the expected load and perform within its defined success criteria. 

Through their hard work they built a case internally that they were very well prepared for the upcoming peak season. However, when the peak season hit, the reality of the situation became clearthat something was terribly wrong and the systems could not handle the actual peak customer loads. 

The IAM infrastructure was tested with each application independently and it performed as expected.  

During their annual peak eventuser sessions were dropped at an unexpected and unacceptable level. Most of the new users could noeven register new accounts, nor could most returning customers log in to make their purchases. The failures to convert shopping activities to sales resulted in them missing their projected revenue numbers by a large margin, in fact they faced crippling losses for the year. A story that has been played out too many times in recent years. 

So how did this happen? Didn’t we say that they validated this load before the peak season? It sure seems that they did their due diligence and stressed the machines beyond the traffic levels that they expected, which were even higher than the year before. How is that possible? Did they test the wrong things? No. Did they fail to test dependent subsystems to the right levels? Not exactly. How can you create good tests, find defects and configuration problems, resolve them, validate your efforts, demonstrate that your testing exceeded the loads you were expecting, and still watch your systems fail under real user load? 

As a result of this missed business opportunity and their public hit to customer confidence, Foulk Consulting was contacted to assist in analyzing the situation. We came in and worked with all the involved teams. We reviewed their various testing practices, their planning and assumptionswe dove into the peak load logs and analytics and the event data that was collected, in an attempt to try to find the keys to how this situation manifested itself, and more importantly help this valued customer devise an approach to prevent this sort of thing from happening ever again. 

Critical Analysis & Planning

Coming out of the peak season with this blackeye, the business hired Foulk Consulting to assist them in analyzing the planning and testing leading up to the event as well as the forensic data from the event itself. Our goal was to not only illuminate where the problem originated and manifested itself, but how they could prevent this in the future – to help them solve the actual reasons why they were losing customers and market share, following up on all the hard work they had done before. 

After thorough investigation and analysis, it was found that despite each of their applications being tested thoroughly and each of them proving to meet performance objectives, maintaining stability and performance under load, what appeared out of the piles of data was surprising to everyone in the organization. 

The root cause appeared to be a complete lack of preparation on the IAM tier as well as the dependent login integrations across the various applications when the real customers loaded multiple applications at the same time during peak usage it created a classic bottleneck; the request demand overloaded the system’s ability to respond in a performant manner. 

 

When each application was tested in isolation, the IAM tier was capable of handling that application load without any problems. When the actual peak production load hit all the applications at the same time, however, the IAM infrastructure became a complete bottleneck and customers could not interact with the systems. This was found to be the root of the peak usage collapse. 

Finding the root cause was only part of the puzzle. Foulk then took that information and applied a performance engineering approach to testing that would ensure that the business was able to identify, isolate and test the systems that demanded attention. The plan was designed to focus the customer’s efforts and identify exactly who was responsible for resolving the issues during their next preparation and deploymencycle. 

Ultimately we were able to work with the various application teams and construct a performance test that stressed the IAM tier realistically, reproducing the production peak load failure model 

Armed with this repeatable test structure, the customer was then able to work with the application support, operations and vendor teams to work through the issues and build the IAM infrastructure to pass the load test criteria and remove their significant production performance bottleneck in the IAM tier.