Highlighted Selections from:

Making big data, in theory


DOI: 10.5210/fm.v18i10.4869

BOELLSTORFF, Tom. Making big data, in theory. First Monday, [S.l.], sep. 2013. ISSN 13960466. Available at: http://journals.uic.edu/ojs/index.php/fm/article/view/4869.

p.1: In this paper, I explore four conceptual interventions that can contribute to the “big theory” sorely needed in regard to big data. This includes temporality and the possibilities of “dated theory,” the implicit histories of the metaprefix shaping notions of metadata, “the dialectic of surveillance and recognition,” and questions of interpretation understood in terms of “rotted data” and “thick data.” In developing these concepts, I seek to expand frameworks for addressing issues of time, context, and power. It is vital that a vibrant theoretical discussion shape emerging regimes of “big data,” as these regimes are poised to play an important role regarding the mutual constitution of technology and society. -- Highlighted mar 21, 2014

p.2: One analysis of over 27,000 social science articles published between 2000 and 2009 found that “only about 30% of Internet studies cite one or more theoretical references, suggesting that Internet studies in the past decade were modestly theorized.” -- Highlighted mar 21, 2014

p.2: Data always has theoretical enframings that are its condition of making: those who actually work with big data know that although it “can be illuminating, it is not unproblematic. Any dataset offers a limited representation of the world” (Loudon et al., 2013). -- Highlighted mar 21, 2014

p.3: A discussion of dated theory is a discussion of dated theorists. Consider the well-documented temporal politics of anthropology. Originating largely in the colonial encounter, anthropology was dominated by an “denial of coevalness... a persistent and systematic tendency to place the referent(s) of anthropology in a Time other than the present of the producer of anthropological discourse.” Within anthropology, this tendency has been deeply critiqued: the notion of “salvage anthropology” dates to 1970 (Gruber, 1970), and there have been many calls to transcend the “savage slot” to which anthropology traditionally consigned its object of study, to “find better anchor for an anthropology of the present.” -- Highlighted mar 21, 2014

p.3: there is a widespread understanding that addressing researcher subjectivity makes research more scientific, robust, and ethical. -- Highlighted mar 21, 2014

p.3: In contrast, there has been little discussion of the temporal imaginary of big data researchers. How does time shape their subjectivities and the making of big data? It may be that the temporal imaginary is not one of digging into the past but looking into a future more than proximate—a distal future that can be predicted and even proleptically anticipated. -- Highlighted mar 21, 2014

p.3: And the paradigmatic figure of this researcher? One candidate might be Hari Seldon, the protagonist of Isaac Asimov’s classic 1951 science fiction novel Foundation. Seldon, the greatest of all “psychohistorians,” has been put on trial by the Galactic Empire, twenty thousand years from now. His crime is one of anticipation: to threaten panic by using what we can term big data to predict the Empire’s fall “on the basis of the mathematics of psychohistory.” -- Highlighted mar 21, 2014

p.3: Asimov’s Wikipedia-before-its-time, his vision of what we can anachronistically but accurately term “big data as social engineering,” resonates with a contemporary context where “the deployment of algorithmic calculations... signals an important move—from the effort to predict future trends on the basis of fixed statistical data to a means of pre-empting the future.” -- Highlighted mar 21, 2014

p.4: The notion of metadata precedes that of big data, having been coined in 1968 by the computer scientist Philip R. Bagley (1927–2011):

To any data element... can be associated... certain data elements which represent data “about” the related element. We refer to such data as “metadata”...

-- Highlighted mar 21, 2014

p.5: I seek to challenge assumptions of a neat division between data and metadata not just because metadata can be more intrusive than data, but because the very division of the informational world into two domains—the zero-degree and the meta—establishes systems of implicit control. Indeed, once a zero-degree/meta distinction is accepted, it becomes impossible to know when to stop. -- Highlighted mar 21, 2014

p.6: However, I want to question this and all divisions between the zero-degree and the meta. What if the division was framed not in terms of letters in envelopes with their interiors and exteriors, but the two sides of a postcard? Postcards were controversial during their emergence in the late nineteenth century because their “contents” could be read by anyone (Cure, 2013); they trouble the distinction between form and content (Boellstorff, 2013). How might analogizing the postcard provide one way to rethink this binarism? If I could take a postcard and bend it into a Mobius strip I would be even happier: a vision of form and content as intertwined at the most fundamental level, such that acts of “meta” assignation are clearly the cultural and political acts they are, rather than pregiven characteristics. -- Highlighted mar 21, 2014

p.6: These examples underscore the practical and political consequences of theory. It is not just that terming things “data” is an act of classification; terming things “metadata” is no less an act of classification and no less caught up in processes of power and control. It is founded in a long and convoluted history of tensions between hierarchical and lateral thinking that shape everything from file systems to societies. This history undermines any attempt to treat the distinction between zero-degree data and metadata as selfevident. -- Highlighted mar 21, 2014

p.7: Perhaps this is why Snowden invoked not George Orwell but Michel Foucault: “if a surveillance program produces information of value, it legitimizes it... In one step, we’ve managed to justify the operation of the Panopticon.” -- Highlighted mar 21, 2014

p.7: The Panopticon provided a visual metaphor that seems prescient when an NSA surveillance program can be code-named “prism”: “in order to be exercised, this power had to be given the instrument of permanent, exhaustive, omnipresent surveillance... thousands of eyes posted everywhere, mobile attentions ever on the alert, a long, hierarchized network.” -- Highlighted mar 21, 2014

p.7: The confession is a modern mode of making data, an incitement to discourse we might now term an incitement to disclose. It is profoundly dialogical: one confesses to a powerful Other. This can be technologically mediated: as Foucault noted, it can take place in the “virtual presence” of authority. -- Highlighted mar 21, 2014

p.8: It is not yet clear what kind of reverse discourses will emerge with regard to big data and its dialectic of surveillance and recognition. However, one clue can be seen in the fact that many responses to the making of big data are implicitly calls not for its abolition, but its extension. In a critique of big data, Kate Crawford noted how “data are assumed to accurately reflect the social world, but there are significant gaps, with little or no signal coming from particular communities,” so that “with every big data set, we need to ask which people are excluded. Which places are less visible? What happens if you live in the shadow of big datasets?” (Crawford, 2013). Many other scholars echo this concern that we “be aware of... doubts over data representativeness when generalizing from search engine users to an entire population.” I share this concern that more people be included. The point is that in an almost homeopathic fashion, the remedy lies within the conceptual horizon of the illness it is to mitigate—within the dialectic of surveillance and recognition. -- Highlighted mar 21, 2014