Highlighted Selections from:

Relational and Non-Relational Models in the Entextualization of Bureaucracy

Castelle, Michael. 2013. “Relational and Non-Relational Models in the Entextualization of Bureaucracy.” Computational Culture (3). http://computationalculture.net/article/relational-and-non-relational-models-in-the-entextualization-of-bureaucracy

p.1: To what can we ascribe the commercial dominance and organizational ubiquity of relational databases in the 1990s and early 2000s? By integrating perspectives from classical sociological theory, the history of administrative computing, and linguistic anthropology with papers and proceedings from the nascent database research community, this paper argues that the success of relational databases (and their corresponding tabular data models) can be ascribed to three distinctive components: 1) a semiotic distinctiveness from previously-existing (and now largely forgotten) hierarchical and network models; 2) a bureaucratic correspondence with practices of file organization and index-based retrieval; and 3) support for transaction processing, which facilitated adoption by a wide variety of organizations in commerce, banking, and finance. These three aspects are in turn used to suggest ways for understanding the potential significance (or lack thereof) of an emerging 21st-century market for non-relational and/or “Big Data” database systems. -- Highlighted mar 22, 2014

p.2: But even at the time, Weber saw the ‘office’ (Bureau)—“the combination of written documents and a continuous operation by officials”—as “the central focus of all types of modern organized action”. -- Highlighted mar 22, 2014

p.3: we are dealing here with the first legitimate theorist of “data society”; for Weber’s list of aspects of the highly rationalized organization can also be plausibly read as an promotional list of features of a present-day commercial database. [The term file, in the present-day computing sense, has its obvious origin in the bureau; before the advent of magnetic tape and disk drives, the computer ‘file’ was a stack of punch cards, each of which might have represented one entity (a supplier, say); additionally, the stack may have been pre-sorted in some way amenable to the computation at hand.] -- Highlighted mar 22, 2014

p.4: Following the lead of James Cortada, we should consider the history of computing not as the history of one monolithic tool considered independently from its various manifestations and the uses to which they were put, but as a history of a heterogeneity of applications (by which we mean the social use of various tools and techniques—or the interactive embodiment of such practices as ‘software’—for specific goals). -- Highlighted mar 22, 2014

p.4: Lars Heide comments that for punched-card applications in the 1930s, a large scientific calculation might have used “several hundred thousand punched cards.. in contrast to the several tens of millions of cards consumed four times a year by the Social Security Administration” (Heide, Punched-Card Systems and the Early Information Explosion, 1880-1945 (Johns Hopkins University Press, 2009), 250-251). For a dual timeline of developments in scientific and commercial computing from the 1950s onward, see R.L. Nolan, “Information Technology Management Since 1960,” in A Nation Transformed by Information: How Information Has Shaped the United States from Colonial Times to the Present, edited by A. Chandler and J. Cortada (Oxford University Press, 2000).] -- Highlighted mar 22, 2014

p.8: I will argue that the success of the relational model can be understood on three more-or-less independent axes, which I will label semiotic, bureaucratic, and transactional: 1. The relational model differs primarily in its wholly symbolic and tabular representation to the user, as opposed to the explicitly encoded referential relations of the hierarchical and network models. This fundamental semiotic difference produces a highly valued effect recognized more typically as ‘data independence’. 2. Somewhat surprisingly, the relational model mapped onto the cognitive practices of traditional administrative batch processing better than the IMS-style hierarchical model or the DBTG-style network model (which were both, in part, explicitly trying to preserve the batch processing “record-at-a-time” logic). 3. The relational model, because of its symbolic representation of entity relations, was also highly amenable to the formalization of concurrent transactions, a technology which emerged and improved contemporaneously in the late 70s and early 80s. The ability to process atomic transactions at high speed while maintaining a consistent state was an enormous selling point for database management systems regardless of whether or not they used a relational model. -- Highlighted mar 22, 2014

p.15: “In looking at the progress towards integration of files, we have noticed that the entities, previously processed separately, now have to be more and more heavily interrelated. This has resulted in the very elaborate data structure diagrams that we have seen displayed here.. Now it so happens that a flat file and a collection of flat files has all of the power you want to deal with the n-dimensional world. You do not need to introduce any separate concept of relationships: the pointer-style relationships that we see with arrows on [Bachman’s] diagrams. It so happens that you can consider the entities like parts, suppliers, and so forth and relationships like ‘so-and-so supplies such-and-such a part’ in one way, a single way, namely the flatfile way. What are the advantages of doing this? By adopting a uniform flat-file approach you can simplify tremendously the set of operators needed to extract all the information that can possibly be extracted from a data base.” -- Highlighted mar 22, 2014

p.19: The sociotechnical phenomena presently going under the terminological umbrella of “Big Data” are, on initial inspection, reminiscent of the heterogenous complexities of the early days of commercialized database software, with a wide variety of contenders—some developed for and within established computing businesses with extreme data management needs, some from small startups—filling a complex, overdetermined online discourse with a mixture of genuine success stories and unverifiable claims. -- Highlighted mar 22, 2014

p.20: But in general, these new non-relational databases tend to either relax the requirements of transactional stability (by, e.g., providing potentially less consistent results from a more decentralized database) or the relational requirement of purely tabular modeling (by, e.g. storing data as simple flat files of values, or as collections of keyvalue-oriented ‘documents’) -- Highlighted mar 22, 2014

p.20: By considering data stores as mediating text-artifacts, we can say that whatever “Big Data” is, it clearly involves an order-of-magnitude increase in entextualization. -- Highlighted mar 22, 2014

p.20: On the other hand, it is not clear that what traditional bureaucracy—as a paragon of organizational control—needs, or wants, such infinite quantities of entextualizations. If we return to our proposed aspects of bureaucracy relating to data processing, we can see that with some of the “Big Data” non-relational databases we have (1) potentially reduced the degree of unification and centralization of the data (because consistency is not the same thing as a distributed “eventual consistency”); (2) potentially increased the efficiency of processing very large quantities of data (for there is typically still a facility for some kind of indexing); and certainly (3) diminished the support for reliable and serialized representation of economic transactions. These observations should give us some skepticism as to any recent announcements of the impending demise of relational technology (unless these drawbacks begin to be resolved legitimately by non-relational database systems). -- Highlighted mar 22, 2014

p.21: As we extend the explicit encoding of referential relations from single data types (’individuals’ and their ‘ties’; or ‘papers’ and their ‘citations’) to more elaborate sociotechnical scaffoldings, are we gaining analytical purchase from the added complexity of our datasets? Perhaps such extreme network-centric perspectives, in the future, will be considered a oncepromising but ultimately misguided empirical fad—as in the not-dissimilar Codasyl DBTG proposal, of which no implementations currently survive. -- Highlighted mar 22, 2014