Highlighted Selections from:

What is in PageRank? A Historical and Conceptual Investigation of a Recursive Status Index


Rieder, Bernhard. 2012. “What Is in PageRank? A Historical and Conceptual Investigation of a Recursive Status Index.” Computational Culture (2). http://computationalculture.net/article/what_is_in_pagerank

p.1: The elegant concept of computation then quickly begins to bloat up with many different things: real computers, not just abstract Turing machines; real software, lodged in tight networks of other software, all written for a purpose; knowledge, ideas, skills, tools, methodology, habits, and values that permeate practices embedded in layers of social organization, cultural configurations, economic rationales, and political struggles. -- Highlighted mar 22, 2014

p.1: My main goal is to show how a multilayered yet contained reading of a very specific computational artifact can produce a nuanced account that is attentive to ‘cultural logics’, but does not dissolve concrete technical concepts and decisions in a homogeneous and homogenizing logic of ‘computationalism’ -- Highlighted mar 22, 2014

p.2: In the introduction to his Theorie der endlichen und unendlichen Graphen from 1936, the first textbook on graph theory, Dénes König wrote that ‘[p]erhaps even more than to the contact between mankind and nature, graph theory owes to the contact of human beings between each other’ and thus points to the often overlooked fact that modern graph theory developed, perhaps even more so than statistics, in close contact with the social sciences. -- Highlighted mar 22, 2014

p.3: Developed out of group psychotherapy, sociometry was essentially organized around the sociometric test, a questionnaire distributed to smalland mid-sized groups – schools, factories, etc. – that asked people to choose the individuals they liked best, admired most, had the most contact with, or a similar question. The resulting (network) data was thought to reveal the ‘psychological structure of society’ and was displayed and analyzed as a sociogram, a manually arranged ‘point and line’ diagram that would be called a ‘network visualization’ nowadays. But while Moreno indeed spearheaded the visual display of the network data produced by the sociometric test, his ‘mathematical study’ was severely deficient in terms of actual mathematical method, which prompted criticism ‘for lack of the methodological stringency appropriate to science’ -- Highlighted mar 22, 2014

p.3: First, pictorial representations of networks and visual forms of analysis faded into the background. Network diagrams were still used, but rather as teaching aids in textbooks than as research tools in empirical work. Only the spread of graphical user interfaces in the 1990s and the development of layout algorithms based on physics simulations lead to a true renaissance of the now so familiar point and line diagrams. Second, network metrics and methods for mathematical analysis proliferated and became both conceptually and computationally more demanding, sometimes to a point where their empirical applicability became technically forbidding and methodologically questionable, even to scholars sympathetic to the general approach. Third, while exchange between the mathematical methods of graph theory and empirical work in the social sciences has stayed strong over the last decades, the movement towards abstraction and ‘purification’ implied by mathematization has led to a certain demarcation between the two. -- Highlighted mar 22, 2014

p.4: What takes shape in these lines is a separation between the empirical and the analytical, characteristic to quantitative empirical research, that shifts the full epistemological weight onto the shoulders of formalization, that is, onto the ‘appropriate coordination’ between the two levels. I do not want to contest this separation in philosophical terms here, but rather put forward the much simpler critique that in order to produce a formalization of social phenomena that can be ‘mapped’ unto the axioms of graph theory, a commitment has to be made, not to the ‘network’ as an ontological category, but rather to a theory of the social that supports and substantiates the formalization process. In Moreno’s case, for example, it is a theory of psychological ‘attractions and repulsions’ between ‘social atoms’ that bears the epistemological weight of formalization in the sense that it justifies the mapping of a relationship between two people onto two points and a line by conceiving ‘the social’ as a primarily dyadic affair. -- Highlighted mar 22, 2014

p.4: The idea that social structure is hierarchic – even in the absence of explicit, institutional modes of hierarchical organization – and that a network approach can identify these stratifications is recognizable here, eighty years before a ‘new’ science of networks began to find power-law distributions in connectivity in (most of) the places it looked. But Moreno’s ‘socionomic hierarchy’ is only the beginning of a long lasting and constant relationship between applied network mathematics and hierarchies of various kinds. -- Highlighted mar 22, 2014

p.4: The question of social power becomes a question of calculation. Here, a specific paper, cited in both PageRank patents, stands out: in 1953, Leo Katz publishes A New Status Index Derived From Sociometric Analysis in the journal Psychometrika and introduces the notion of a ‘recursive’ or ‘cumulative’ status index. Building on his earlier work on matrix representations of data collected by Moreno, Katz proposes a ‘new method of computation’ for calculating social status from sociometric data. Status measures were already common in studies using the sociometric test, but they were essentially based on simply counting ‘votes’, as seen in the quote by Moreno above. Katz explicitly rejects this method and argues that ‘most serious investigators [...] have been dissatisfied with the ordinary indices of “status,” of the popularity contest type’ because this ‘balloting’ ultimately would not allow to ‘pick out the real leaders’. His goal was not to measure popularity but social power, even if the latter term was not explicitly used. Therefore, Katz proposed a new index that takes into account ‘who chooses as well as how many choose’, which means that a vote from ‘small fry’ would simply count less. This shift rests on the idea that status is ‘cumulative’ in the sense that the topology of a social network expresses a latent ‘socionomic hierarchy’ in which the status of an individual largely depends on the status of her network neighborhood. Who you are is who you know. -- Highlighted mar 22, 2014

p.5: A second arena where concrete techniques for the mathematical exploration and measurement of networks are pioneered – and actually applied to decision-making – is citation analysis. In 1963 the Institute for Scientific Information, founded by Eugene Garfield, publishes the first edition of the Science Citation Index (SCI), an index of citations, manually extracted but sorted by computer, from 613 journal volumes published in 1961. With the first edition storing already 1.4 million citations on magnetic tape, this index is perhaps the first ‘big data’ file available in the social sciences, and over the following years a significant number of researchers participate in analyzing it with various computational methods. Despite Eugene Garfield’s intention to promote the SCI first and foremost as an ‘association-ofideas index’, a tool for finding scientific literature, it quickly becomes obvious that a series of evaluative metrics could be derived from it without much effort. Thus, in 1972, Garfield presents a fleshed-out version of a concept he had initially introduced as a tool for the historiographical study of science, the (in)famous ‘impact factor’, now adding that ‘[p]erhaps the most important application of citation analysis is in studies of science policy and research evaluation’ -- Highlighted mar 22, 2014

p.5: A paper by Pinski and Narin, published in 1976, pushes things significantly further by pointing out two problems with Garfield’s measure. First, citations have equal value in the impact factor scheme, although ‘it seems more reasonable to give higher weight to a citation from a prestigious journal than to a citation from a peripheral one’. Pinski and Narin therefore propose a recursive index for importance, based on the same eigenvector calculations that we found in Bonacich’s work, although the authors were apparently not aware of the work done in sociometry. Second, Pinski and Narin argue that the impact factor attributes disproportional importance to review journals, and ‘can therefore not be used to establish a “pecking order” for journal prestige’ -- Highlighted mar 22, 2014

p.5: While Pinski and Narin do not quote the work of economist Wassily Leontief, unlike Hubbell who acknowledges this source of inspiration in his sociometric clique identification scheme, there is reason to believe that their version of an input-output model was equally inspired by economic thought, in particular if we take into account that Pinski and Narin’s metric was intended for a ‘funding agency with its need to allocate scarce resources’ -- Highlighted mar 22, 2014

p.6: While the sociometric test was generally lauded as a tool for producing interesting data, Moreno’s socio-psychological theory of ‘attraction and repulsion’ between ‘social atoms’ was based on highly contested assumptions and his goal to develop a ‘technique of freedom’ to reorganize society by training individuals to transcend their social prejudices in order to liberate the forces of ‘spontaneous attraction’, did not go well with the sober and pragmatic mindset of most empirical sociologists. The sociometric papers working with Moreno’s data therefore generally subscribed to a vague version of the same atomistic and dyadic view of society, which enabled and justified the ‘point and line’ formalization needed to apply graph theoretical methods, but shunned the deeper aspects of the theoretical horizon behind it. -- Highlighted mar 22, 2014

p.6: What all of the strands I have presented here furnish – and this is fundamental for any form of normative and operational application of evaluative metrics, whether in citation ranking or in Web search – is a narrative that sustains what I would like to call the ‘innocence of the link’: whether it is spontaneous attraction, rational choice or simply an ‘inspirational’ account of scientific citation, the application of the metrics to actual ranking, with concrete and tangible consequences, can only be justified if the link is kept reasonably pure. In this vision, the main ‘enemy’ is therefore the deceitful linker, whether they come in form of scientific citation cartels or their contemporary cousins, link farms. It is not surprising that a central argument against citation analysis as a means for research evaluation builds on a critique of actual citation practices. -- Highlighted mar 22, 2014

p.6: in the sense that a critical reading of formulas or source code not only requires a capacity to understand technical languages but also ‘interpretive ammunition’ to refill computational artifacts that have most often been cleansed from easily accessible markers of context and origin. -- Highlighted mar 22, 2014

p.7: Building on a paper by Harary, named Status and Contrastatus82, they formalized a hypertext as a directed graph and began to calculate metrics for the purpose of ‘recovering lost hierarchies and finding new ones’. The goal was not to find the most important document nodes for document retrieval, but rather to assist hypertext authors in designing the structure of their text networks more explicitly – ‘remember that structure does reflect semantic information’. Structural hierarchies were seen as helpful navigational devices and status metrics should make it easier to build them. -- Highlighted mar 22, 2014

p.7: Interestingly though, Botafogo, Rivlin, and Shneiderman underscored the idea that software would make it easy to implement different network metrics and thereby provide not just one ‘view’ on the hypertext, but many different ones, granting the ‘ability to view knowledge from different perspectives’ -- Highlighted mar 22, 2014

p.7: If sociometric measurement means that ‘authority no longer rests in the [social] relationships and instead migrates towards the measuring instrument’, the question remains how we can better understand the way authority is configured, concretely, once it has migrated. I believe that engaging PageRank as a computational model can bring us closer to an answer -- Highlighted mar 22, 2014

p.9: If we consider the Google search engine as a central site of power negotiation and arbitrage for the Web, an analysis focusing on PageRank – which in practice is certainly not enough – would have to conclude that its authority ranking mechanism applies, in a universal fashion, a largely conservative vision of society to the document graph in order to ‘pick out the real leaders’ and distribute visibility to them. Rather than showing us the popular it shows us the authoritative, or, to connect back to citation analysis, the canonical. If we consider the link indeed as ‘innocent’, as a valid indicator of disinterested human judgment, PageRank shows us a meritocracy; if we take the link to be fully caught up in economic forces however, we receive the map of a plutocracy. The search engine as a visibility engine subjects both to the self-reinforcing dynamic of cumulative advantage. -- Highlighted mar 22, 2014

p.11: This means that a local theory of power is computationally and, because computation costs money, economically cheaper than a theory where status is an effect of more global structural properties. In this case, the menace from spam apparently looked sufficiently substantial to convince Google that a larger radius of influence – and consequently a reduced capacity for local change to affect global structure – was well worth the additional cost. -- Highlighted mar 22, 2014

p.11: So what is in the algorithms producing the evaluative metrics discussed in this text? I would argue that they encode ways of putting things into relation that (can) fundamentally reconfigure how power is constituted, how it operates, and how is may be negotiated. By blending description and prescription in a particularly pervasive fashion, software introduces a set of new techniques into the arsenal of control, techniques that make commitments to certain conceptions of the social and, by becoming operative machinery, produce specific social consequences themselves. But this transposition into the world of algorithms not only modifies the materiality and performativity of these techniques and often renders them near invisible; paradoxically it also opens them up to new ways of pondering, criticizing, and – maybe – changing them. We risk missing a genuinely political moment if we lose sight of how software can sometimes make it astonishingly easy to do things differently. -- Highlighted mar 22, 2014