One of the most common requests from clients that we receive is: "Create a dashboard with business KPIs for us. We want to find some insights in our data to understand the growth points." The problem with this task formulation is that dashboards are good for monitoring business KPIs, but they directly hinder finding insights in the data.
There's an old joke:
A policeman approaches a man on the street who is crawling under a lamp.
— What are you doing here? — asks the policeman.
— Looking for my keys, — the man replies.
They both search for the keys unsuccessfully for a while, and finally, the policeman asks:
— Are you sure you lost them here?
— No, — says the man. I lost them in the park!
— Then why are you looking here?
— Because it's brighter here!
The phenomenon of searching under the lamp for what is lost elsewhere is known as the "Streetlight Effect" (or the Drunkard's Search), and it's surprisingly common. People tend to look for answers where information is easily accessible, rather than where the answers actually lie.
In the case of data analysis, this effect manifests as follows: business users look at the dashboard and see that, for instance, the average order value dropped last week. They ask a perfectly valid question: "Why did the average check drop?"
But, chances are, there isn't a single dashboard that can answer the question "why did the average check drop?". The answer can only be found by exploring the full set of available data, not limited by dashboards.
However, due to the "Streetlight Effect", business users often:
As a result, the real reasons why the average order value dropped elude them, and the decision on what to do with the dropping average order value is made intuitively.
That's why we are not very fond of designing dashboards, but very fond of designing datasets that business users can independently explore. This approach is called self-service analytics.
On all projects for clients, we make sure to prepare data for use in a self-service mode. Over five years of practice, we've understood how to make the self-service approach easier for business users.
Typical Data Problems in Self-Service Approach
1. Only web analytics data is available in self-service mode.
The most common example of data available for independent exploration by business users is web analytics data. Often, product managers and other business users are adept at tracking metrics in Google Analytics or Amplitude. However, by only looking at web analytics events, one can miss important things that cannot be logged as web events.
For instance, a product manager might optimize the conversion to order on the site but overlook the fact that each order from the site eventually causes a loss to the company due to poor logistics and warehouse processes.
2. Only the most obvious data is available in self-service.
Often, the answer to why the average check dropped lies in marginal datasets related, for instance, to warehouses, logistics operations, and even weather conditions. These data often are not found in the self-service layer, and true causes are missed during analysis.
When designing the self-service layer, we strive to identify all known datasets, including marginal ones, and prepare them for independent user work. With this approach, at least users know that certain data is available and is just a click away from them.
3. Multiple entry points in Self-service, unclear where to start data exploration.
When designing analytical showcases, the first thing one wants to do is to assemble a separate table for each concept in the subject area (for example, USERS, ORDERS, SUBSCRIPTIONS, etc.). The relationships between these tables quickly become complex, and one has to revert to SQL or seek help from analysts.
Now, we try to consolidate the entire Self-service into a single showcase, which we call a timeline, to which all other tables (users, orders, subscriptions, etc.) are “attached.” This provides business users with a single entry point to all the data, significantly simplifying the beginning of the work.
4. “Technical noise” in data.
In any dataset, there's always a lot of partial duplication of information and technical columns, such as partition keys, metadata. Some attributes are duplicated across multiple tables.
Such “technical noise” distracts attention and hinders data exploration.
In the Self-service layer, only clear and significant data attributes should remain.
5. Lack of documentation.
For business users to work with the Self-service layer, each attribute brought into Self-service must be:
Since there's never time for documentation, we have transitioned to a Document-first approach. That is, we first model and document the attributes that should appear in the Self-service layer, and only then proceed to implement them.
Summarizing everything mentioned above, we would like to formulate a few theses:
1. Companies have a lot of data now, but often little insight. The lack of insight is largely due to the “streetlight effect”: everyone looks for insights where it’s brighter (on dashboards), not where they actually are (in the full dataset).
2. Full datasets in BI systems are now available only to the “elite”: analysts and Data Scientists. And are not available to people who should make business decisions based on data: product managers, marketers, financiers. Often, people making specific project decisions don't know what data is available in principle and therefore make business decisions intuitively.
3. The primary task of analysts should not be “preparing reports at the request of the business,” but preparing an environment in which the business could independently get answers to its questions.
4. Insight cannot be generated on-demand, but it is possible to create an environment that leads to the emergence of insights. If investment is made to make data accessible to business users, then the quality of insights from data, and consequently the quality of decisions made in the company, will significantly improve.