Highlighted Selections from:

Infovis and Statistical Graphics: Different Goals, Different Looks


Gelman, Andrew, and Antony Unwin. "Infovis and Statistical Graphics: Different Goals, Different Looks." Journal of Computational and Graphical Statistics. (2012). Print.

p.1: The importance of graphical displays in statistical practice has been recognized sporadically in the statistical literature over the past century, with wider awareness following Tukey’s Exploratory Data Analysis (1977) and Tufte’s books in the succeeding decades. But statistical graphics still occupies an awkward in-between position: Within statistics, exploratory and graphical methods represent a minor subfield and are not wellintegrated with larger themes of modeling and inference. Outside of statistics, infographics (also called information visualization or Infovis) is huge, but their purveyors and enthusiasts appear largely to be uninterested in statistical principles. We present here a set of goals for graphical displays discussed primarily from the statistical point of view and discuss some inherent contradictions in these goals that may be impeding communication between the fields of statistics and Infovis. One of our constructive suggestions, to Infovis practitioners and statisticians alike, is to try not to cram into a single graph what can be better displayed in two or more. We recognize that we offer only one perspective and intend this article to be a starting point for a wide-ranging discussion among graphics designers, statisticians, and users of statistical methods. The purpose of this article is not to criticize but to explore the different goals that lead researchers in different fields to value different aspects of data visualization. -- Highlighted apr 29, 2014

p.2: We are also disturbed that many talented information-visualization experts do not seem interested in the messages of statistics, most notably the admonitions from William Cleveland and others to consider the effectiveness of graphical displays in highlighting comparisons of interest. We worry that designers of non-statistical data graphics are not so focused on conveying information and that the very beauty of many professionally-produced images may, paradoxically, stand in the way of better understanding of data in many situations. -- Highlighted apr 29, 2014

p.3: On the statistical side, data analysts and statisticians are interested in finding effective and precise ways of representing data, whether raw data, statistics or model analyses. Providing the right comparisons is important, numbers on their own make little sense, and graphics should enable readers to make up their own minds on any conclusions drawn, and possibly see more. -- Highlighted apr 29, 2014

p.3-4: On the Infovis side, computer scientists and designers are interested in grabbing the readers’ attention and telling them a story. When they use data in a visualization (and data-based graphics are only a subset of the field of Information Visualization), they provide more contextual information and make more effort to awaken the readers’ interest. -- Highlighted apr 29, 2014

p.4: One issue that arises is the familiar distinction between exploratory and presentation graphics. With presentation graphics you prepare some small number of graphs, which may be viewed by thousands, and with exploratory graphics you prepare thousands of graphs, which are viewed by one person, yourself. Exploratory graphics is all about speed and flexibility and alternative views. Presentation graphics is all about care and specifics and a single view. Presentation graphics can really benefit from a graphic designer's contribution; for exploratory graphics it's not so relevant. That said, the first consumer of any graph is the person who makes it, and it can often be useful to use “presentation” skills to communicate to ourselves as well as to others. In either context, much can be gained by thinking carefully about goals. -- Highlighted apr 29, 2014

p.5: In the present article, we lay out some of these conflicting goals and discuss how awareness of some underlying principles of statistical communication could improve the work of statisticians and graphics designers alike. -- Highlighted apr 29, 2014

p.6: There are several fine books on presentation graphics for statisticians, including the theoretical works of Bertin and Wilkinson, the style advice books of Cleveland and others, and the attractively polemical books of Tufte. In fact some of Tufte’s publications are a bridge to the Infovis world, which has a newer and more scattered literature. Articles by Heer, Kosara, Munzner, and Shneiderman are a good starting point and Kosara’s blog, eagereyes.org, is a useful place to look for enlightened discussion of the issues. Amongst other contributions, Shneiderman (1996) has proposed and promoted his mantra: Overview first, zoom and filter, then details-on-demand. In effect this is a drill down for details and there is no mention of any comparisons. For statisticians there always have to be comparisons; numbers on their own are not enough. -- Highlighted apr 29, 2014

p.6: There is a series of Infovis workshops, BELIV (BEyond time and errors: novel evaLuation methods for Information Visualization), concerned with the evaluation of visualizations. Substantial progress has not been made, but the aim of trying to determine what insights may be obtained from a graphic and how well they are presented is well worth pursuing and such research should encourage statisticians too to think more formally of what they are trying to achieve with their graphics. -- Highlighted apr 29, 2014

p.7: Tukey (1993) was more specific about what he called the true purpose of graphic display, which he set down in four parts:

  1. Graphics are for the qualitative/descriptive—conceivably the semi-quantitative—never for the carefully quantitative (tables do that better).
  2. Graphics are for comparison—comparison of one kind or another—not for access to individual amounts.
  3. Graphics are for impact—interocular impact if possible, swinging-finger impact if that is the best one can do, or impact for the unexpected as a minimum—but almost never for something that has to be worked at hard to be perceived.
  4. Finally, graphics should report the results of careful data analysis—rather than be an attempt to replace it. (Exploration—to guide data analysis—can make essential interim use of graphics, but unless we are describing the exploration process rather than its results, the final graphic should build on the data analysis rather than the reverse.)

-- Highlighted apr 29, 2014

p.8: Discovery goals: Giving an overview—a qualitative sense of what is in a dataset, checking assumptions, confirming known results, looking for distinct patterns. Conveying the sense of the scale and complexity of a dataset. For example, graphs of networks notoriously reveal very little about underlying structure but, if constructed well, can give an impression of interconnectedness and of central and peripheral nodes. And maybe that is the point. The picture tells the story as well as, and in less space than, the equivalent thousand words. Exploration: flexible displays to discover unexpected aspects of the data; small multiples or, even better, interactive graphics to support making comparisons. -- Highlighted apr 29, 2014

p.11: Beyond this, statisticians should remember that, contrary to the impressions they may have received from a hasty reading of Tukey, graphics are not just for visualizing data. For example, the parallel coordinate plot (Inselberg, 2009) is a modern standby, an excellent tool for the clear display of multivariate data, but it was originally developed as a way of visualizing highdimensional structures in pure math, with no data in sight. Visualization has its own principles which are relevant to statistics without being part of it. -- Highlighted apr 29, 2014

p.14: How does this all relate to statistical theory and practice? The statistical literature on visualization tends to focus on the display of raw data (for example, the book Graphics of Large Datasets: Visualizing a Million, by Unwin et al., 2006) but graphical visualization can also be important in understanding and checking the fit of complex models (Gelman, 2003, 2004, Buja et al., 2009, Wickham et al., 2010) and for exploration across models (Unwin, Volinsky, and Winkler, 2003, Urbanek, 2006, Wickham, 2006). -- Highlighted apr 29, 2014

p.23: Consider the famous image drawn by Florence Nightingale (1858), which is often considered as an exemplar of data display (see Figure 12). In a recent discussion of the coxcomb plot, Rehmeyer (2008) writes: The conventional way of presenting this information would have been a bar graph, which William Playfair had created a few decades earlier. Nightingale may have preferred the coxcomb graphic to the bar graph because it places the same month in different years in the same position on the circle, allowing for easy comparison across seasons. It also makes for an arresting image. She said her coxcomb graph was designed “to affect thro’ the Eyes what we fail to convey to the public through their word-proof ears.” -- Highlighted apr 29, 2014

p.26: In the language of the present paper, the Nightingale graph is an excellent example of “infographics”—it is attractive, grabs one's attention, and gets you thinking—but it is not so great as “statistical graphics” in that it does not directly facilitate a deeper understanding of the data. In Nightingale’s political context, the goal of attracting attention was arguably much more important than the goal of understanding and communicating subtle patterns in the data. -- Highlighted apr 29, 2014

p.26: We do not claim that our graphs are better than Nightingale’s classic; rather, the two displays serve different purposes, and in the modern high-bandwidth era, there is room for both. The first step is to understand the different goals involved in a graphical display. -- Highlighted apr 29, 2014

p.36: One key difference between the two approaches is that Infovis prizes unique, distinctive displays, while statisticians are always trying to develop generic methods that have a similar look and feel across a wide range of applications. Few statisticians are trying to develop anything new; they are using the standard well-tried tools. Infovis places a high value on creativity and difference, whereas statistics is centered on objectivity and replication. -- Highlighted apr 29, 2014

p.36: Another important difference is in the expected audience. Statisticians assume that their viewers are already interested and want to provide structured information, often a carefully prepared argument. For statisticians, graphics are part of an explanation. Even exploratory analysis typically has a clear structure. In contrast, Infovis deisgners want to draw attention to their graphics and thus to the subject matter. For them, graphics are more of a door opener. This is reflected in how both groups use interactivity. -- Highlighted apr 29, 2014

p.38: Back in the nineteenth and early twentieth century, there were some very attractive time series graphs, scatter plots, maps and more complicated statistical graphics—but these were artisan work, the infographics of their day. They were not really used routinely enough for statistical researchers to get a sense of what worked and what did not (and they often misfired, as with Florence Nightingale’s coxcomb plot above, which, although beautiful, is ultimately less informative than a simple time series plot). But progress on these led to our current state in which graphs, not tables, are the standard in data communication. Perhaps the infographics of today will evolve into the statistical data visualization tools of future decades, and we hope our discussion of goals and examples will help move this process along. -- Highlighted apr 29, 2014

p.38: As is illustrated in the historical reviews such as Wainer (1997) and Friendly (2006), there is a centuries-long tradition of data graphics that are both informative and beautiful. We should seek to continue this collective endeavor, and we hope the present article sparks a discussion among statisticians, computer scientists, graphic designers, psychologists, and others who are interested in the graphical presentation of data and inferences. -- Highlighted apr 29, 2014