Big Data: The Necessity of Mixed Methods

Posted: May 30th, 2010 | No Comments »

In the concluding chapter of my PhD thesis, I stepped back from the contributions and contemplated their implications. I entitled one section “From data-driven urbanism to human/data-based urbanism”, in which I set some of the limitations of my work:

However, there is a big assumption in seeing the world as consisting of bits of data that can be processed into information that then will naturally yield some value to people. Inspired by Julian’s *-computing. [...] the understanding of a city goes beyond logging machine states and events. In consequence, let us not confuse the development of novel maps from previously uncollectable and inaccessible data with the possibility to produce “intelligent maps”.

Taken this caution into account, I argue for the necessity to mix quantitative and qualitative data to build knowledge on a city/building/shared space, our relations with it and its infrastructures. Both types of data con feed inductive and deductive methods:

The qualitative analysis to inform the quantitative queries: This approach first focuses on people and their practices, without the assumption that something computational or data process is meant to fall out from that. This qualitative angle can then inform a quantitative analysis to generate more empirical evidences of a specific human behavior or pattern. [...]

The quantitative data mining to inform the qualitative enquiries: In that approach, the quantitative data help to reveal the emerging and abnormal behaviors, mainly raising questions. The qualitative angle then can help explaining phenomenon in situation. The qualitative approaches actually requests to ask the right questions to learn anything meaningful about a situation. [...]

So, I was particularly pleased with recent’s Big Data: Opportunities for Computational and Social Sciences in which Danah Boyd acknowledges the tremendous opportunities “Big data” creates in social sciences, but, analogously to my conclusion, she points out the limitations of computational scientists in the “web science” domain:

Just because you see traces of data doesn’t mean you always know the intention or cultural logic behind them. And just because you have a big N doesn’t mean that it’s representative or generalizable. Scott knows this, but too many people obsessed with Big Data don’t.

One major problem when it comes to “large archives of naturalistically-created behavioral data” is their “subjectivity” or the “hidden intentions” behind them (see “Embracing the Subjectivity of Georeferenced Photos“) or as Danah argues “Just because you see traces of data doesn’t mean you always know the intention or cultural logic behind them“. Grasping the value in subjective data is still a concept computational scientists must get the mind around, and I doubt they will be able to do alone in their research communities.

Besides data analysis methods, creativity comes to play in collecting both precise and relevant data for mixed methods. For our current investigation on hyper-congestion at the Louvre Museum, at a very early stage we thought of integrating observations of the surveillance team to help explain phenomena quantitative data analysis reveal.

Le Louvre Milo
A surveillance staff member as a source of information to explain phenomena quantitative data analysis reveal.
Why do I blog this: My PhD thesis taught me that patterns invite more questions than they answer. On the field, to answer questions, there is necessity to know how to mix quantitative with qualitative methods. So also The “Quants”, their Normalizations and their Abstractions.