Applying the Quality vs. Quantity Paradigm to Big Data
“Houston, we have a problem” is a phrase erroneously attributed to Apollo 13 astronaut Jack Swigert and an incident that occurred during a spaceflight back in 1970. Even though that’s not what Swigert actually said, the phrase has come to mean something very important: we have a big problem that needs to be fixed right now.
With that in mind, “Houston, we have a problem” is very much applicable to the big data sciences. The field of big data has become so large and pervasive that it is actually causing more problems than it’s solving. Big data has become an insatiable monster that continues to consume unimaginable volumes of information with no end in sight. Perhaps it’s time to abandon the old ways and start applying the quality versus quantity paradigm to whatever data we do collect.
We Have Become Data Hoarders
People who fail to recognize the current big data problem probably don’t even realize how much of their own data is floating around out there. There is almost nothing we can do in 2018 that doesn’t generate some kind of data. If you own a smartphone, consider this: iOS, Android, or whatever operating system your phone uses is collecting data every moment that phone is on. It’s collecting data in your house, at work, and every place in between.
Our society has devolved into a large group of data hoarders whose hunger for information can never be satisfied. The hoarding illustration is a good one because it perfectly describes the state of big data. Go into the average hoarder’s house and you’ll see it stacked from floor to ceiling with all kinds of stuff in no particular manner of organization.
As Business Daily Africa’s Tony Watima explained so precisely in the July 30 (2018) piece, “most data being analyzed today is unstructured, poorly formatted, poorly documented and not designed with the data scientist in mind.”
He’s right. Moreover, we cannot even say that our data collection is like a junkyard. At least at a junkyard, the owner of the property divides the junk by separating metal from plastic and car parts from old appliances. Even a basic structure is virtually nonexistent in big data today. We are hoarders for the sake of hoarding itself.
Quality Should Be More Important
If you take what Watima wrote in his piece to its logical conclusion, it seems that the solution to our big data problem is to immediately cut down on quantity and refocus our energies on quality. Rather than simply collecting data for its own sake, we need to start collecting data for a purpose. That means determining the purpose before developing collection methods. Once purpose is established, ways to produce structured, formatted, and useful data can be developed.
At Rock West Solutions in Southern California, they deal a lot with big data and signal processing. They know firsthand that effective signal processing is commensurate with the quality of the data being analyzed. In short, signal processing goes a lot more smoothly as the amount of useless data in the signal goes down.
A lot of what Rock West does involves the medical field. They know better than anyone else how important quality is over quantity. But applying the quality versus quantity paradigm transcends medical research. It applies equally in every area in which the big data sciences are employed.
Houston, we have a big data problem. But just like there was a solution to the Apollo 13 crisis, there is a solution for big data as well. That solution is quality instead of quantity.