This article originally appeared on Financial Express
E-commerce brands generate a huge amount of data every day. This data, generated at such an unprecedented scale and speed, has become the key to unlocking growth for the brands. As they start to scale, brands can no longer rely on gut instinct or personal experience alone to make daily decisions. Data, and insights derived using data, has increasingly become critical for decision-making. However, the challenge with data-driven decision-making is that unless the data is accurate and reliable, it can lead to costly mistakes and poor outcomes.
Before we go into how brands can ensure data accuracy and reliability, it is important to remember one thing. When it comes to data driven decision making, there is no one-size-fits-all. Every brand has a different way of looking at its data. For example, here is what we call the eCommerce equation:
Revenue = Traffic x Conversion Rate x AOV
It might seem simple, but what is interesting is that different brands view each of the above elements differently. The way a brand defines what goes into each element and those metrics can influence how they interpret the data. For one brand, Revenue could actually be Revenue without Discount and for another it could be Revenue minus the Returned and Cancelled Orders. So every brand has to first consider their specific needs and goals and decide what to focus on.
Today, brands have access to a lot of first-party data – for example, their order and revenue data on Shopify, Amazon and Flipkart, or website analytics and user behaviour data on their customer data platform, or ads data on Google, Facebook, Whatsapp and email. However, all this data sits on different platforms. The way brands collect, store and process this data can have a big impact on the outcomes.
Let’s take the example of a mid-sized brand. Typically, such a brand, with limited resources, would have to do everything manually. They would have to check every platform individually, multiple times a day. They would have to download the data periodically from each platform, perhaps use Microsoft Excel or Google Sheets to analyse each data set, manually connect the dots and finally create reports that can be consumed by a larger audience. This process, however, is prone to many issues.
Firstly, data is not homogenous. Some data is structured (tables), other is not (PDF reports for example). Secondly, pulling data (and then processing it) is laborious, time consuming and prone to human errors. Last, but not the least, data can quickly become stale. Not having access to real time data, or worse, using stale data can lead to bad decisions with catastrophic results.
The situation may be different for a larger brand, who might choose to build their own system to manage their big data. First, data must be extracted. Next, it must be prepped, for it to become usable. Then comes analysis, deriving insights, and then visualising it. And only then would they get to the point where it can be used for decision making. This requires an investment in large teams. A Data Engineering team with expertise in cloud, infrastructure, scaling, security and governance. A team of Data Scientists and analysts to prep the data, build models and analyse the results. An execution team to act on the insights. And, of course, it requires investments in infrastructure and tools.
A robust system built like this is less likely to have accuracy and reliability issues. But building a system like this is not without its challenges. There could be code level issues that could cause the system to break down completely. There could be operational issues and that would require constant monitoring. Not to mention the costs involved, not just to build an expert team, but also operational costs. A simple thing like scalability could be a potential landmine. How brands set up and manage their infrastructure, especially to cope with events such as sale days, when the load on their system would go up exponentially, can impact their bottom line significantly.
The other point to consider when it comes to relying on data is that your output is only as good as your input. Garbage in, garbage out, as they say. You may have some fantastic algorithmic models, but unless the input data is properly collected and prepped, the analysis and conclusions will be compromised. It is equally important to figure out the right data sources for specific metrics. For example, your order data is best retrieved from your own store platform. Data also keeps changing with time. For example, different ad platforms have different attribution windows, from seven days to thirty days. It is important to take this into account when analysing and interpreting data.
Besides first party data, brands also have access to second- and third-party data. The former is essentially data shared by partners in the ecosystem who have collected it first-hand. While this information is available to the brands second hand, it could be very insightful to learn from other’s first party data. Typical use cases could be building buyer personas for new products, or segmenting audiences into cohorts to target new users or deciding pricing and promotion strategies to improve conversions. When used along with the intelligence gathered from their own first party data such as customer profiles, user behaviour and buyer journeys, it can be very powerful. And then, of course, there is the third-party data, which basically is publicly available information. This could be used more as a benchmark.
Today, on an average 3.5 quintillion bytes of data is being created every day. By using data to inform their decisions, brands can not only reduce their risks, but can also quickly act upon new opportunities for growth. However, they must carefully consider which data sources to trust, be fanatic about the quality of data and identify their specific needs in order to derive any meaningful outcomes.
Authored By Prem Bhatia, Co-Founder and CEO, of Graas.
Prem Bhatia
27 May 2023