Big Data: Major Challenges and Steps

I had a fascinating evening courtesy of Graham Ruddick of Digital Doughnut, discussing Big Data issues with brilliant minds. Ctrl-Shift’s Jamie Smith was particularly interesting, and we found ourselves taking a deep dive into the thorny issues of privacy, corporate governance, marketing strictures and business process engineering.

One of the things I found most striking, whilst being challenged on my thinking about the transfer of data power into the hands of the consumer, was a recurring thought, spurred by a talk I’d recently heard at the Royal Society for the Arts given by Frank Pasquale: that while the journey marches towards the ultimate goal of individuals’ ownership of their own data, there is a huge hurdle to be jumped in the form of the technology required to provide realtime control. In other words, in order for a person to have meaningful control over the data held on her, she must be able to correct and update the data and have that correction (or perhaps more commonly change in circumstantial information) propagate universally and instantly.

Pasquale refers to these corrections as legalistic, for example erroneous information (say, a misreported financial default which negatively impacts credit decisions), but in practical terms this might more often be more prosaic (change in income, relationship, need state or taste). The requirement for these changes to be instant and pervasive is hamstrung by the current data distribution model, where old personal data is held, resold, resold again, and may exist in slightly different forms in fifty databases. Even the UK government allows at least nine different login authentication data sources. The potential and the reality for misinformation to be stored and redistributed is staggering.

The holy grail is that the individual controls a single data source, and rather than providing data each time on every sign-in, gives permission for each piece of data to be accessed from a source she controls fully.

But this requires real, single point control by the user, not the brand or the intermediate data holder. And in turn that requires both a radical shift in consumer understanding and a universal personal data protocol implemented by every single person. That’s going to be tough until there is something akin to universal biometric identity technology.

In the meantime we have to deal with the multi-database reality. Perhaps the solution ultimately is a distributed means of storing data, as illustrated by the Blockchain model. But we do have examples of realtime – or at least 48-hour – propagation of information to millions of databases via DNS. Perhaps this shows a methodological approach to how we deal with individuals’ data accuracy, while we await single person, single source personal data that people can lend to brands, rather than having it inaccurately and incompletely collected, stored and distributed by brands. Big Data may then be able to articulate the steps towards its potential as a liberator, instead of undermining it through misuse or mishap and impracticality.