You’re so vain (you’re so vain) I bet you think this metric is about you Don’t you don’t you?
What are vanity metrics?
Vanity metrics make us feel good but don’t help us do better work or make better decisions. Vanity metrics put optics before rigor, learning, and transparency. The metric and/or an outcome is heralded as a win, but things don’t add up. Most of the time, it boils down to a lack of experience with data storytelling, selecting meaningful KPIs, and communicating outcomes. In some cases, vanity metrics are the only metrics available.
But everyone, at some point, has been lured in by good news and has let their guard down.
"Hey everyone, check out the unique user count from yesterday!"
"Hey everyone, look at registration for the event!"
It’s easy to criticize vanity metrics, but we’ve all been there.
In this post, I will describe three common problems that lead us to vanity metrics. Then I will share The Vanity Metric Test, a way to review metrics and know if you are veering into vanity metric territory. If you’re short on time and want to jump straight to the review, click here.
Vanity metric problems
In chatting with teams about vanity metrics, I’ve noticed three fundamental problems.
- Vanity metrics lack context.
- Vanity metrics have unclear intent.
- Vanity metrics do not guide action and learning.
Problem 1: Vanity metrics lack context
First, we have the problem of missing context. Page Views, Daily Active Users, and Sign-Ups mean something but aren’t very helpful in isolation. The problems arise when we communicate these metrics without referencing the bigger picture. It’s not what we say, but rather what we don’t say—e.g., “compared to,” “as an input into,” “balanced by,” “an early signal of,” “part of the…” “as a ratio of,” “with the following caveats,” etc.
Missing context impacts everyone:
- Marketing: There are many ways to boost content views in the short term. It is much harder to create a piece of evergreen content that attracts potential buyers for weeks and years. Getting a boost of initial traffic is a positive early signal, but it needs a footnote.
- Sales: Hitting a quarterly sales goal is a huge accomplishment. It is noteworthy for a variety of reasons. But how did the team hit the goal? Did they bend on pricing? Did they move deals forward? Did they rob Peter to pay Paul? More context is required (e.g., comparing pricing to prior quarters).
- Product: Launching a new feature is a huge milestone. Early feature adoption product metrics are a positive signal. But customers aren’t necessarily using the feature. They may just be trying the feature. In fact, all of the in-app pop-ups suggesting people try the feature may be increasing curiosity clicks. Trying the feature is an input into the probability of longer-term use.
Other examples of potentially missing context: Average purchases are up, but so are order returns. Conversions are up from ads that don’t speak to your value proposition. One channel is cannibalizing another channel. The app is easier for new users but harder for experienced users. Time spent in the app is up, but your goal is to save people time. People are querying the data more, but that’s because they are having trouble understanding the results. Customers are more active in the app, but they’ve shifted to wasting time instead of valuable networking.
Note how in each of these examples, context is everything. The lack of counterbalancing information makes it hard to make sense of the big picture and where the metric fits.
In addition to the surrounding context, we need to ensure people understand the Why.
Problem 2: Vanity metrics have unclear intent
Second, we have confusion about the intent of the metric. The definition of the metric may be explicit, but what we are trying to measure is unclear. A classic example here is Return Visits. Did I return to the product because I liked the product? Or because the product was hard to use, and I needed to take a break? Or needed customer service’s help?
Many classic web “engagement” metrics like Page Views, Time on Page, and Average Session Duration are remnants of a pre-mobile, pre-device-swapping, pre-30-browser-tab, pre-single-page-app era. They were the best proxies for engagement and value exchange available at the time, but aren’t the best measures we have available now.
The connection between what we are attempting to measure and the “proxy” we’ve chosen is extremely clear with some metrics. Or so we think! For example, I tell a friend that I was able to sleep eight hours last night. My friend interprets my intent as, “John is trying to communicate that he had a good night of sleep.”
But hours of sleep is but one of many variables. This study mentions ~23 sleep variables used when studying sleep quality, including REM latency, REM sleep, small movements in sleep, the timings of different sleep cycles, the number of cycles, etc. This study mentions that sleep duration may have a “direct association with mortality.” Yikes!
Its authors introduce the Pittsburgh Sleep Quality Index and clearly outline the intent of the metric.
The Pittsburgh Sleep Quality index was developed with several goals: (1) to provide a reliable, valid, and standardized measure of sleep quality; (2) to discriminate between “good” and “poor” sleepers; (3) to provide an index that is easy for subjects to use and for clinicians and researchers to interpret; and (4) to provide a brief, clinically useful assessment of a variety of sleep disturbances that make affect sleep quality.
Communicating intent is critical. These authors likely faced trade-offs. Ease of use for subjects may not immediately equal depth of use for researchers. Standardization is helpful for comparability but often involves reducing contextual factors. The assessment is “brief”, which involves a trade-off between assessment completion rates and the depth of the assessment.
A great statement of intent covers the fundamental tradeoffs and goals.
What does effectively stating metric intent look like?:
Relaying the facts. Seeking theories/insights:
Here is the number of outages we had in the last 30 days and how that compares to past periods. Note the increase. What’s going on here, do you think? What are we seeing?
As a proxy for something not directly measurable:
Our North Star Metric is “Loyal DIYers,” defined as the number of users who performed high-value DIY project actions combined with their community involvement. It is a proxy for a combination of loyalty, satisfaction, and using our product in ways congruent with our community-oriented strategy. The data suggests—but does not prove (yet)—that this is a leading indicator of higher customer lifetime value and viral acquisition.
We want to find an actionable metric that 1) a team can move and 2) will contribute to the mid-term success of the business.
The Hex Pistols are going to focus on improving the effectiveness of the onboarding workflow. It is a juggling act. We know we can rush people through and not set them up for success. Or we can make it very comprehensive, reducing the probability of them seeing the product in action. To guide our work, we will focus on decreasing the 90% percentile time to project sharing. Project sharing is an early signal that users are comfortable and able to use the product.
Problem 3: Vanity metrics do not guide action and learning
- What is your test for when something is a vanity metric? (Twitter)
- How do you know when a metric is a vanity metric? (LinkedIn)
One of the highest-ranking “tests” was whether the metric guided actions and decisions.
When no one can act in a meaningful way upon what it shows us. When no possible value for the metric will prompt us to actually improve anything. Ola Berg
The result is not actionable. Regardless [of whether] the metric goes up or down, we don’t change what we do. Chris Lukassen
When nobody gets worried if it stops rising/plateaus/or declines. ex: “Our NPS score is 90!” one month followed by “Our NPS score is 50!” next month. Heidi Atkinson
Action, decisions, and learning are a big deal.
If a number keeps going up, and the only action it inspires is a furrowed brow in an all-hands meeting, you probably have a vanity metric on your hands. If a team carts out a metric to celebrate, but when it drops, they don’t shift their strategy or tactics, you’re probably looking at a vanity metric.
Examples include not-very actionable metrics include:
- Average Session Length. It goes up or down. What do you do?
- New Users (minus acquisition channel). It goes up or down. What do you do?
- New Followers. It goes up or down. What do you do?
There are a couple of caveats here.
A metric can be meaningful but not immediately actionable.
In our North Star Workshops, we stress that the North Star Metric should ideally be a bit out of reach. It is the output of teams influencing the various North Star Inputs. Why wouldn’t you want an actionable North Star Metric? The NSM intends to act as a leading indicator of sustainable business performance (in the multi-year timeframe). Almost by definition, it will be a bit distant from day-to-day work. We need inputs to serve as the “bridge” between everyday work and that meaningful input into business success.
We track our North Star Metric, and if it stalls, it will force us to reconsider our strategy, but a team doesn’t wake up each morning hoping to influence it directly.
A metric can be exploratory. We don’t know what to do with it yet.
Teams are generally aware of the “actionability” test, but almost to a fault. They will spend months and months trying to figure out a “magic metric” or set of magic metrics that do it all—actionable, predictive, explanatory, etc. Product leaders get seriously stressed when handed a metric to “own” but are unsure whether they can “control” movements in the metric.
The result? Teams use vanity metrics that are “safe” because they convey good news. They aren’t helpful, but they don’t pretend to be actionable, so they don’t ruffle any feathers. We don’t want this.
It is OK to use exploratory metrics instead. Just call them out.
A slight reduction in uncertainty may be enough to inspire action.
Product work is about making decisions under conditions of uncertainty. If you want until you are 100% certain about something, you will be acting too late. Therefore, we shouldn’t shoot for perfect metrics that reduce all uncertainty about the actions we take.
Goodhart’s Law and the tension between good measurement and good targets
Goodhart’s Law states that:
“When a measure becomes a target, it ceases to be a good measure.”
Contrast this with my co-worker Adam Greco’s guidance about Vanity Metrics:
If someone isn’t going to be promoted or fired if a metric goes up or down, it is probably a vanity metric
Here we have a tension/paradox. Once a metric becomes a target and becomes a signal of doing a good/bad job, you risk it becoming a vanity metric because people will make sure it goes up. And yet we want our metrics to mean something—to be relevant, to be good proxies, and to inform relevant decisions.
Examples of Goodhart’s Law:
- If a team has a target of predictably shipping features, they will be less likely to process disconfirming new feedback that might appear “unpredictable.”
- If a team has a target of increasing average order size, they will be more likely to increase average order size at the expense of future outcomes, brand loyalty, etc.
- If a manager has a target of hiring a certain number of people in a quarter, they will be more likely to hire someone who isn’t the best candidate.
So what can this tell us about using more effective metrics and fewer vanity metrics? First are responsible for selecting meaningful goals and targets and defining effective “guardrails” to understand any adverse 2nd or 3rd order effects. We can’t defeat Goodhart’s Law completely—you have to assume that people will play the game you insist on them playing—but we can strive to establish checks and balances.
Using Adam’s tip, you can also ask yourself, “what do we want to reward here?” Being accountable for business results makes sense. But you don’t want to promote people based on them hitting arbitrary metrics and success theater. I’m a big believer in Bill Walsh’s idea of The Score Takes Care of Itself. Targets should encourage positive habits and routines.
We described three common problems associated with vanity metrics:
- Vanity metrics lack context
- Vanity metrics have unclear intent
- Vanity metrics do not guide action and learning
The effective use of metrics includes providing context, stating your intent, and picking metrics that guide action and learning. Pointing to a metric and saying “that is a vanity metric” is equivalent to saying “you are using that metric as a vanity metric.”
The Vanity Metric Test
We’ve discussed various problems that contribute to using vanity metrics and problems associated with vanity metrics. Now it is time to put your metrics to the test.
In this section, we present ten statements that describe the healthy and effective use of metrics. You’ll notice the themes we explored earlier in this post: context, intent, responsible action, and learning.
For each statement, we suggest you:
- Discuss the prompt with your team
- Seek diverse perspectives
- Flag items that need attention
Want to download the worksheet and use it with your team? Download here.
S1: The team understands the underlying rationale for tracking the metric.
Tip: Include metrics orientation in your employee onboarding plan. Amplitude customers frequently use our Notebooks feature to provide context around key metrics.
S2: We present the metric alongside related metrics that add necessary context. When presented in isolation, we add required footnotes and references.
Tip: Normalize displaying guardrail and related metrics in presentations.
S3: The hypotheses (and assumptions) connecting the metric to meaningful outcomes and impact are clearly articulated, available, and open to challenge/discussion.
Tip: Use tree diagrams (driver trees, North Star Framework, assumption trees, etc.) and causal relationship diagrams to communicate hypothesized causal relationships. Consider playing the “Random Jira Ticket” game. Can you randomly pick a Jira ticket and “walk the tree” up from that item to something that will matter in the long term?
S4: The metric calculation/definition is inspectable, checkable, and decomposable. Its various components, clauses, features, etc., can be separated. Someone with good domain knowledge can understand how it works.
Tip: Whenever possible, share the metric so that someone can “click in” to how it is calculated. For example, if the metric involves a filter like “shared with more than 7 users in the 7 days”, it should be possible to adjust that clause and see how that number compares to the total number of users. Build trust by enabling people to recreate the metric.
S5: The metric is part of a regularly reviewed and discussed dashboard, scorecard, or report. It has survived healthy scrutiny. If the metric is more exploratory and untested (or an “I was curious whether….”), that context is clear from the outset.
Tip: Scrutiny is a good thing. The more eyes you can get on a metric, the better. Invite criticism. Record questions as they come up. Make each “showing” of the metric (e.g., at all-hands or product review) successively better.
S6: The team has a working theory about what changes in the metric indicate.
Tip: Here’s a basic prompt to get you thinking: “An increase in this metric is a signal that _______ , and a decrease in this metric is a signal that _______.”
S7: Over time, the metric provides increasing value and confidence. We can point to specific decisions and actions resulting from using the metric (and those actions are reviewable). The company would invest in continuing tracking it and communicating it.
Tip: Indicate confidence levels when displaying metrics, and keep a decision/action log. Try to normalize not being 100% sure at first and balancing displaying metrics with high confidence levels with new candidate metrics with lower confidence levels.
S8: The team establishes clear thresholds of action (e.g., “if it exceeds X, then we may consider Y”). The metric can go down. And if it goes down, it will likely inspire inspection/action.
Tip: Conduct a scenario planning workshop to understand better how movements in the metric will dictate future behavior. Set monitors in your analytics tool to warn you when you have reached a threshold.
S9: The metric is comparative (over time, vs. similar metrics, etc.) Put more broadly, if tracking it for a protracted period, it is possible to make apples vs. apples comparisons between periods.
Tip: Include period over period views in your dashboards to get more eyes on comparisons.
S10: The team uses the metric to communicate challenges AND wins. Not just wins.
Tip: Leaders set the tone here. Discuss situations that didn’t work out as you expected and how you used data to figure that out.
Vanity metrics are metrics that make us feel good, but don’t help us do better work or make better decisions. No one is immune to using vanity metrics! The key is ensuring you provide context, state the intent of the metrics you use, and clarify the actions and decisions that the metric (or metrics) will drive.
To define meaningful metrics, check out the North Star Playbook. Establishing a North Star Metric and constellation of actionable inputs is a powerful way to avoid using vanity metrics.