Highlighted Selections from:

Data Visualization in Sociology

Healy, Keeran, and James Moody. “Data Visualization in Sociology.” Annual Review of Sociology (2013): 1–29. Pre-Print.

p.1: Visualizing data is central to social scientific work. Despite a promising early beginning, sociology has lagged in the use of visual tools. We review the history and current state of visualization in sociology. Using examples throughout, we discuss recent developments in ways of seeing raw data and presenting the results of statistical modeling. We make a general distinction between those methods and tools designed to help explore datasets, and those designed to help present results to others. We argue that recent advances should be seen as part of a broader shift towards easier sharing of the code and data both between researchers and with wider publics, and encourage practitioners and publishers to work toward a higher and more consistent standard for the graphical display of sociological insights. -- Highlighted apr 28, 2014

p.2: Given the power of statistical visualization, then, it is puzzling that quantitative sociology is so often practiced without visual referents. One need only compare a recent issue of the American Sociological Review or the American Journal of Sociology to Science, Nature or PNAS to see the radical difference in visual acuity. It is common for the premier journals in sociology to publish articles with many tables, but no figures. The opposite is true in the premier natural science journals. -- Highlighted apr 28, 2014

p.4: Exemplars of bar-charts (Hart 1896), line graphs (Marrow 1899), parametric density plots and dot-plots with standard errors (Chapin 1924), scatterplots (Sletto 1936), and social network diagrams (Lundberg and Steele 1938) are easy to find in early sociological journal articles. -- Highlighted apr 28, 2014

p.4: Du Bois’s The Philadelphia Negro (1898) is filled with innovative visualizations, including choropleth maps, table and histogram combinations, time-series graphs, and others. -- Highlighted apr 28, 2014

p.4: But, somewhere along the line, sociology became a field where sophisticated statistical models were almost invariably represented by dense tables of variables along rows and model numbers along columns. Though they may signal scientific rigor, such tables can easily be substantively indecipherable to most readers, and perhaps even at times to authors. The reasons for this are beyond the scope of this review, although several possibly complementary hypotheses suggest themselves. First, to the extent that graphical imagery was thought of as “descriptive,” statistical images may have been collateral damage in the war between causal-inferential modeling and descriptive reportage. Second, figures may have seemed unsophisticated. The very clarity of a (good) figure made the work seem too simple. Third, and more charitably, visualization in sociology might have been a victim of the field’s relatively rapid embrace of quantitative methods. -- Highlighted apr 28, 2014

p.4: In a review of a history of graphical methods in statistics written in 1938, John Maynard Keynes remarked that he wished the author

... could have added a warning, supported by horrid examples, of the evils of the graphical method unsupported by tables of gures. Both for accurate understanding, and particularly to facilitate the use of the same material by other people, it is essential that graphs should not be published by themselves, but only when supported by the tables which lead up to them. It would be an exceedingly good rule to forbid in any scientific periodical the publication of graphs unsupported by tables (Keynes 1938, 282, emphasis added). -- Highlighted apr 28, 2014

p.6: Sometimes the graphical capabilities particular software applications are loosely related to the more theoretical work, taking from them a concern with aesthetic principles, and possibly specific sorts of plots. In other cases, the linkage is closer. Sarkar (2008) describes a data visualization package for R that closely follows Cleveland’s ideas (and some earlier associated so ware), while Wickham (2009,2010) describes a so ware package for R that implements and extends principles worked out inWilkinson’s Grammar of Graphics. -- Highlighted apr 28, 2014

p.7: Tufte acknowledges that a tour de force such as Minard’s “can be described and admired, but there are no compositional principles on how to create that one wonderful graphic in a million” (Tufte 1983, 177). The best one can do for “more routine, workaday designs” is to suggest some guidelines such as “have a properly chosen format and design,” “use words, numbers, and drawing together,” “display an accessible complexity of detail” and “avoid content-free decoration, including chartjunk” (Tufte 1983, 177). -- Highlighted apr 28, 2014

p.8: In the Foreword to the new edition of Semiology of Graphics, Howard Wainer reflects on the hope he and others once felt that easy to-use graphical tools and software would lead to better general practice by way of smarter defaults (Bertin 2010, p.xi). But, he argues, this has not happened. In the end, high quality graphical presentation requires crafting a deliberately designed message rather than pushing the pre-established setting. -- Highlighted apr 28, 2014

p.8: Recent theoretical work explicitly recognizes the limits of relying on defaults. Following Wilkinson in implementing ggplot’s “grammar of graphics” for R, Wickham (2010, p.3) notes that the analogy to grammar is useful because while “A good grammar will allow us to gain insight into the composition of complicated graphics, and reveal unexpected connections between seemingly different graphics ... there will still be many grammatically correct but nonsensical graphics ... good grammar is just the rst step in creating a good sentence.” -- Highlighted apr 28, 2014

p.10: Advocacy of exploratory data analysis (EDA), of looking carefully and creatively before modeling, is most closely associated with John Tukey (1972,1977). Historically, EDA has been closely tied to the rise of graphical capabilities in statistical computing, particularly tools that allow rapid interactive visualization. A mild sense of unease with EDA is a feature of the statistical literature. It is explicitly inductive, and concerned with exploring data in a relatively freewheeling fashion as an aid to discovery, which at times can seem uncomfortably opportunistic or unstructured. To working social scientists are often virtues, but statistics is also the discipline where the avoidance of spurious associations is a major focus of technical work. -- Highlighted apr 28, 2014

p.13: Harrell (2001) remains an exemplary book-length demonstration of the virtues of integrating graphical methods with the process of data exploration—including exploring patterns of missingness in the data—model-building, diagnostics, and presentation. -- Highlighted apr 28, 2014

p.15: From the EDA side, Wickham et al. (2010) and Buja et al. (2009) provide some principled ways for assessing, in a broadly graphical manner, whether or not the patterns one is seeing are likely to be spurious. For example, a “permutation lineup” presents observed data in a small multiple context surrounded by “null plots” of generated data. “Which plot shows the real data?” Buja et al. (2009, p.4372) ask. If observers cannot reliably pick it out, then we should doubt both the utility of the plot and the soundness of any inferences (or arguments) based on it. -- Highlighted apr 28, 2014

p.16: Dimensional reduction of this sort typically characterizes the problem of interest in terms of space or distance, leading to natural affinities to mapping social systems. Sociologists have been among the earliest users of these visualization tools, particularly with network analysis. The earliest “interactive” network tools were literally peg-boards and rubber-bands (see Figure 8, and Freeman 2004) or pins-and-strings. -- Highlighted apr 28, 2014

p.21: To many working statisticians, infographics are the descendants of Tufte’s “Ducks”—those “self-promoting graphics” where “the overall design purveys Graphical Style rather than quantitative information” (Tufte 1983, p.116). The contemporary infographic in its pure form is a supercharged megaduck incorporating not only the bells and whistles derided by Tufte but far more besides, usually in the form of a quasi-narrative structure, pictographic sequencing, or excessive dynamic elements. Gelman & Unwin (2013) discuss “Infovis” style work from a statistical point of view. ey argue that most infographics do not meet the standards normally demanded of statistical graphics, but concede that sometimes the goals of the latter are not those of the former. -- Highlighted apr 28, 2014

p.23: Visualizations of categorical data remain more difficult to convey effectively, partly because the general public is not always familiar with conventional ways to present it. Mosaic plots, for instance, can be effective representations of contingency tables but people are not taught to “read” them in the same way they can read bar charts or scatterplots. The effective visualization of network data presents similar issues. -- Highlighted apr 28, 2014

p.25: Good graphics are not, of course, the only thing—see Godfrey (2013) for a discussion of the situation of blind and visually impaired users of current statistical so ware. But the dominant trend is toward a world where the visualization of data and results is a routine part of what it means to do social science. -- Highlighted apr 28, 2014

p.26: Just as training in elementary visualization methods should be a standard component of graduate education, our flagship journals should encourage their authors to think about the most effective ways to encourage visual clarity. is should not take the form of overly strict style guides, but instead aim for an ideal of consistent, considered good judgment in the presentation of data and results in the service of sociological argument. -- Highlighted apr 28, 2014