The Human Side of Observability: Why Socializing Alerts Matters More Than Writing Them

By Nate Rich, Principal Engineer at Foulk Consulting

In my last post, I touched on the lifecycle of a high-performing observability strategy. We often spend 90% of our time on the technical “how”: perfecting queries, fine-tuning thresholds, and tweaking dashboard elements. But if there’s one thing I’ve learned at Foulk Consulting, it’s that the most sophisticated alert in the world is a failure if the person receiving it doesn’t know why it’s firing or how to respond to it.

We need to stop talking about “writing” alerts and start talking about socializing them.

The technical syntax of an alert is easy; the social contract behind it is hard. If you want to reduce on-call anxiety and ensure your observability stack actually improves reliability, you have to move the conversation from the IDE to the conference room (or the Zoom call).

Here is how we move observability from a technical task to a human-centric practice.

The “Silent Killer” of DevOps: Alert Anxiety

We’ve all been there: It’s 3:00 AM, the pager goes off, and you stare at a notification that says High CPU Usage – Service A. Your heart rate spikes, not because the CPU is high, but because you have no idea if this is a “the building is on fire” situation or a “this happens every Tuesday during a cron job” situation.

This is the result of alerts written in a vacuum. When SREs or Platform Engineers write alerts for developers without collaboration, they aren’t just sending data, they’re sending stress.

Strategy 1: The Alert Review Session (The “Pre-Flight” Check)

Before any service hits production, I advocate for a formal Alert Review Session. This isn’t a code review; it’s a “context review.”

Gather the developers who wrote the code and the engineers who will support it. For every alert, ask three questions:

  1. Is this actionable? If the responder’s first step is “check the logs to see if this is real,” the alert is noisy.
  2. What is the “So What?” If this alert fires, what is the specific user impact? If we can’t define the impact, it’s a metric, not an alert.
  3. Is there a Runbook? Does the alert link to a living document that tells the responder what to do next?

The Goal: Buy-in. When a developer agrees that a specific threshold represents a failure, they aren’t just “on-call” anymore, they are owners of a standard they helped set.

Strategy 2: The Game Day (Testing the Human, Not the Code)

If an Alert Review is the theory, a Game Day is the practice. At Foulk, we encourage teams to run controlled “Chaos Engineering” experiments in a staging environment specifically to test the observability of the system.

During a Game Day, you intentionally break a component and watch the dashboards.

  • Did the alert fire?
  • Did it go to the right person?
  • Did the person feel empowered to act, or were they confused by the terminology?

Game Days turn “on-call anxiety” into “muscle memory.” It allows the team to fail in a safe environment where the stakes are zero. By the time the code hits production, the team has already “lived” through the failure once. The fear of the unknown is gone.

Moving from “Monitoring” to “Socializing”

Socializing alerts changes the culture of an organization in three specific ways:

  1. Reduced Burnout: People don’t quit because they are on-call; they quit because they feel helpless while on-call. Socialized alerts provide clarity, which reduces fatigue.
  2. Higher Signal-to-Noise Ratio: When teams have to “defend” the alerts they write in a group setting, they tend to delete the “just in case” alerts that clutter Slack channels.
  3. True Shared Responsibility: Observability stops being a “task” handled by the Ops team and becomes a shared language between Dev and Ops.

The Bottom Line

At the end of the day, observability isn’t about the data in your backend; it’s about the mental model in your engineers’ heads.

If you spend all your time writing alerts and zero time socializing them, you aren’t building a resilient system, you’re just building a loud one. Let’s start focusing on the human on the other side of the pager.

Want to learn more about how to run an effective Game Day or audit your current alerting strategy? Contact us today—we help teams bridge the gap between complex systems and the people who run them.

Related Posts

About Us
foulk consulting text

Foulk Consulting is an IT solutions provider specializing in delivering consulting services geared toward optimizing your business.

Popular Posts