How to fix your event data like a grownup

Klara Arnalds

, Customer Experience Architect

February 24, 2025

How to fix your event data like a grownup

Applying data mesh principles to achieve data quality at scale.

This post is adapted from my talk at Malmö Measurecamp in January 2025.

To build a scalable data culture, each team needs to be empowered to run with their own data without data governance bottlenecks. This calls for systems and workflows that enable everyone to define and deliver consistent, high quality data.

We at Avo have been obsessing over this entire process for over a decade. Beyond building tools, we’ve written extensively about how to leverage frameworks like Data Mesh Principles and Data contracts to build a data culture that scales.

We are also very lucky to have incredibly engaged customers, who let us dig into their culture and processes and understand what it takes to get great event data. Through this, we’ve seen some clear patterns in their journey from reactive damage control to proactive data management.

So I’d love to share what this journey actually looks like – to help you meet your own goals for event data quality. Let’s dig in.

The stages of data culture maturity

A mature data culture rarely falls from the sky. And it’s helpful to understand the stages companies go through on the way to maturity, via trial and error – to help companies accelerate their journey. Roughly, you can categorize the stages into 4.

Winging it
The Wild West
Centralized Governance
Self-Serve Governance

Winging it

Typically when companies are starting out, they don’t collect data in a structured way. They might be relying on qualitative insights, intuition, and experience. If you’re reading this post, I’ll bet you a cupcake you’re not in “Winging it”, so let’s move on.

The Wild West

As the user base grows, and with it the company’s complexity, so does the appetite for more reliable data. At this point, there is no formalized governance in place, each product team (hopefully) ships the events they need to meet their own data needs. Hence we name this stage the Wild West because of two characteristics of this phase:

There are no rules
Everyone is out for themselves.

Here’s how this may play out.

You might have a Product team shipping a new feature on a tight deadline. The Product Manager has some concept of the purpose of the feature, but the process for translating that purpose into actual metrics and events is pretty informal. At the PM’s request, a developer implements some form of tracking for the feature. There’s probably not a standardized process for validation either, the tracking is simply shipped.

Reactive damage control in the wild west

The Wild West workflow is fine in the context of one isolated feature for one single team. But once you need to do some sort of central analysis that incorporates multiple feature releases, from multiple teams, the real damage surfaces.

If some poor soul is tasked with doing a cross company analysis, they must first assess what data already exists. This is hard (remember—there’s no formalized way to define or document how the feature goal was converted into analytics events).

They probably discover that the data is somehow incomplete or broken. So they need to reach out to the product team to get the data fixed, and that request has to compete for developer time with everything else the product team has on their plate.

And it gets worse. Let’s not forget that we’re talking about company-wide requests. So it isn’t just data from one team we’re talking about, but probably several. All of whom have no visibility into what data structures already exist, and who all have their own ways to define and track common features.

That means the analyst must negotiate this priority across multiple teams. And when it comes to the data that does exist—it isn’t consistent. There’s likely 7 events laying around for the same thing that need to be fixed through downstream transformations. Or maybe it’s all lumped together in some kind of Frankenstein SQL query.

If that sounds like a nightmare, it’s because it is. And it’s extremely inefficient. Aside from painful processes for analysts, there’s little chance for the organization to leverage data for exciting use cases at scale, like building GenAI products, personalization, or advanced analysis for business decisions.

So that brings us to the next step, where a data team attempts to tame the chaos via a centralized function.

Where most of us live: Centralized governance

The fact that you’re reading this and you’re even ready to have a conversation about improving data quality means you’re likely at least at this point: in the Centralized Governance phase. If so, congrats—you’re making concerted efforts to ensure accurate, reliable data. That’s a good thing.

What sets you apart from the Wild-Westers is that you’ve added two formal processes before new data structures get released: defining metrics and defining events. (Remember how Wild West rogues would skip these stages).

In other words, you take on the task of liaising with the team who requests new tracking, align with them on what they need, and hand off to developers to implement the tracking code.

Nothing gets past the central team. This is great as a means to ensure high quality data. But it’s slow, and in reality those alignment sessions are tiresome. The data team spends most of their time in these tedious review stages and playing catch up on the context of new tracking requests, which isn’t fun or effective.

What if there was a way to recreate the autonomy of the Wild West stage, without compromising the quality of centralized governance.

Best of both worlds: Self-Serve governance

The last stage on the maturity journey–the holy grail, if you will–is self-serve governance. Good data, fast.

In short: teams can run with their own data (fast) but there are guardrails and other measures in place to ensure quality (good). This is ideal for everyone involved.

There are a few main reasons why you want to aim for this stage:

It scales; you can onboard more and more requests without exhausting a central data team.
It allows teams across the organization to ship data faster, without sacrificing data quality.
It results in more people being empowered to work with the data because they took part in creating it. They now understand how the sausage is made and, wouldn’t you know it, care more about delivering high quality data.
- This uplift in quality data means data teams (and the whole company) can leverage their data in more impactful ways, faster.

How to get there

I won’t make an already long post even longer by going into depth here on how to achieve self-serve governance.

But here’s the TL;DR: you’re going to need to apply data mesh principles to your data governance, and to enforce them with some kind of data contract.

Those guardrails I mentioned earlier are one example. It boils down to this: Put systems in place that enable domains to run with their own data in a way that's scalable and consistent with the whole organization.

If you’re a visual person it might look something like this:

You’ll notice the process isn’t dramatically different from the centralized governance model we showed before. The key difference here is that we have systems, not humans doing the work. And that’s huge.

The recipe for self-serve governance

If you’re ready to break out of the centralized governance plateau and run towards a better data culture, applying the recipe for self serve governance should be your first port of call.

We will cover these in greater depth in future content, but for now, we recommend familiarizing yourself with the recipe. In particular, get acquainted with the ingredients you need to make self-serve governance possible and the secret sauce that makes it reality.

If the key ingredients seem familiar, it’s because they directly correspond to data mesh principles.

Yes that’s right—we’re advocates for data mesh principles as a framework for better data governance. Because just as data mesh can unlock decentralized data management, it can also unlock a path to self-serve governance, lessening the burden on a central team to wade through data structure reviews.

In the meantime, if you’re on this journey and found any of this interesting or helpful, I’d love to hear from you. What challenges are you facing as you transition to a higher maturity stage? How can I help you get there? Let me know!

Block Quote

Next up for you

Avo updates to help you stay aligned and scale

With a new guide, Adobe integrations and cross-branch workflows.

Thora Gudfinnsdottir, Growth Product Manager

Product updates

Introducing cross-branch data design

Push and pull tracking plan items for clean, reusable data design

Bragi Bergþórsson, Product Engineer

Product updates

Why your event data needs a design system

Klara Arnalds

September 26, 2024

Why your event data needs a design system

When designing for quality data, apply design systems principles to optimize your data structures

How to successfully do the scary thing and trust anyone in your company to create great event data

Stefania Olafsdottir

September 24, 2024

How to successfully do the scary thing and trust anyone in your company to create great event data

The complete checklist that a federated data team can put in place to ensure event data quality that doesn’t create bottlenecks, and thus finally put a data mesh in place for event based data.