My wife loves taking photos. She has thousands of digital photos on her phone, and recently showed me them. There were pictures of people, memes, jokes, dinners, clothes and so on. Many of these are group shots at a dinner, likely taken by an overzealous waiter who took a burst of pictures to make sure there was one with everyone smiling. As I viewed the countless numbers of photos on the phone, I started thinking back to my childhood, way before mobile phones. Back then, of course, we had to pay for film, put it into a camera, take pictures and then extract the film and have it developed at a camera store or a pharmacy. I know this sounds barbaric to many of the younger generation, but that was our reality. I probably have a dozen pictures of my high school years and a few dozen from college. Documenting our lives via photos wasn’t a thing back then.
Given the work involved in picture taking back then, we were much more judicious about the pictures we took. Since each picture cost money (film and development), we tried to avoid wasting pictures on insignificant things (I never took a picture of my meal!). Instead, we focused our pictures on key moments, key people and key events. We put thought into whether the picture was worth the cost and the effort.
But when the world went digital, the cost and effort of taking pictures was essentially zero. While it could cost money to have digital pictures printed, most people just shared digital pictures online or through social media apps. Maybe you have to pay for backup storage, but even that is pretty cheap these days. Therefore, since there is no cost involved, people began taking as many pictures as they wanted. Taking pictures of meals became the norm. Every dinner became an excuse for a group photo and, of course, the infamous “selfie” was born.
The Ease of Digital Data Collection
But, in the early days of data, it took time to build databases, get data into databases and to maintain them. I recall my first Microsoft Access database back in the 1990s. It was nothing like what is available today. You had to think about every data point you wanted to collect and determine how many rows of data you really needed since having too many would make your database more difficult, slower and might make it too large to store on your PC.
But now that data collection is so easy, there has been an explosion in the amount of data being collected by organizations in the digital analytics field. As more business moved to websites and mobile apps, every click, tap and swipe have been captured somewhere in a digital analytics product. Streaming products often send data every few seconds as we watch videos. APIs send data from one analytics product to another. I read somewhere that the amount of data collected each day is equal to the total amount of data that was collected in the entire 20th century. And the daily use of digital products by end consumers is still growing at more than 50%, so the amount of data being created is growing with it.
Too Many Photos? Too Much Data?
Here is where I see some parallels between digital photos and digital data. Just as it has become almost too easy to take digital photos, it has become almost too easy to collect and store digital data. Now don’t get me wrong…. I love data. Amplitude loves data. We believe that data is essential to digital transformation and that the use of data is what will help many businesses improve in the future. But there is also such a thing as too much data.
I have long advocated that organizations should strategize about what data they want to collect. Typically I advocate for the identification of business requirements or questions as a way to identify your ideal data solution architecture. While no one can anticipate every data point they will eventually want to analyze, I—and the team at Amplitude—prefer to be more prescriptive about what data is collected than randomly collecting all data. This is why we didn’t build auto-tracking into Amplitude. Tracking too much data, auto-tracking being the worst culprit of this, can lead to too much noise and is incredibly difficult to govern. And it is why we acquired Iteratively – an innovative data planning solution – and have built a robust home with Govern to proactively manage analytics data.
Even though it may be incredibly easy and cheap to collect digital analytics data, there are indirect costs that people don’t always consider:
- All data collected has to be tagged in some way, which oftentimes requires developers;
- Every piece of data collected should be vetted for data quality assurance, which takes time and effort on a recurring basis;
- The more data elements you collect, the more difficult it is for your users to understand what is in your implementation, which impacts training and adoption.
So, while I don’t think organizations should go back to the early days of data, like my Microsoft Access example, I do think that regardless of how easy it is to collect digital data, they should exercise restraint when determining what digital data should be collected. Be thoughtful about your digital data collection strategy. Or, in other words, don’t capture data that would be the equivalent of pictures of your meal!