Highlighted Selections from:

Crowd Science: The Organization of Research in Open Collaboration Projects

DOI: 10.1016/j.respol.2013.07.005

Franzoni, C., Sauermann, H., Crowd science: The organization of scientific research in open collaborative projects. Res. Policy (2013), http://dx.doi.org/10.1016/j.respol.2013.07.005

p.2: Because the computation was extremely intensive, in the fall of 2005 the team launched Rosetta@home, a grid system that allowed volunteers to make available the spare computational capacity of their personal computers. A critical feature of Rosetta@home was a visual interface that showed proteins as they folded. Although the volunteers were meant to contribute only computational power, looking at the Rosetta screensavers, some of them posted comments suggesting better ways to fold the proteins than what they saw the computer doing. These otherwise naïve comments inspired graduate students at the Computer Science and Engineering department and post-docs at Baker's lab. They began to wonder if human visual ability could complement computer power in areas where the computer intelligence was falling short. -- Highlighted mar 6, 2014

p.8: Thus, much of the process-related knowledge that has traditionally remained tacit in scientific research (Stephen, 1996) is codified and openly shared within and outside a particular project. -- Highlighted mar 6, 2014

p.10: While openness with respect to project participation and with respect to intermediate inputs are by no means the only interesting aspects of crowd science projects, and while not all projects reflect these features to the same degree, these two dimensions highlight prominent qualitative differences between crowd science and other regimes of knowledge production. Moreover, we focus on these two dimensions because they have important implications for our understanding of potential benefits and challenges of crowd science. As highlighted by prior work on the organization of problem solving, the first dimension - open participation - is important because it speaks to the labor inputs and knowledge a project can draw on, and thus to its ability to solve problems and generate new knowledge.16 Openness with respect to intermediate inputs - our second dimension - is an important requirement for distributed work by a large number of crowd participants. At the same time, our discussion fo the role of secrecy in traditional science suggests that this dimension may have fundamental implications for the kinds of rewards project contributors are able to appropriate and thus for the kinds of motives and incentives projects can rely on in attracting participants. -- Highlighted mar 6, 2014

p.10: In complex tasks, however, the best solution to a subtask depends on other subtasks and, as such, a contributor needs to consider these other contributions when working on his/her own subtask. The term task structure captures the degree to which the overall task that is outsourced to the crowd is "well-structured" versus "ill-structured". Well-structured tasks involved a clearly defined set of subtasks, the criteria for evaluating contributions are well understood, and the "problem space" is essentially mapped out in advance (Simon, 1973) in ill-structured tasks, the specific subtasks that need to be performed are not clear ex ante, contributions are not easily evaluated, and the problem space becomes clear as the work progresses and contributions build on each other in a cumulative fashion. While task complexity and structure are distinct concepts, they are often related in that complex tasks tend to also be more ill-structured, partly due to our limited understanding of the interactions among different subtasks (Felin and Zenger, 2012). Considering differences across projects with respect to the nature of the task is particularly important because complex and ill-structured tasks provide fewer opportunities for the division of labor and may face distinct organizational challenges. -- Highlighted mar 6, 2014

p.11: The scholarly literature has developed several different conceptualizations of the science and innovation process, emphasizing different problem solving strategies such as the recombination of prior knowledge, the search for extreme value outcomes, or the systematic testing of competing hypotheses (see Kuhn, 1962; Fleming, 2001; Weisberg, 2006; Boudreau et al., 2011; Afuah and Tucci, 2012). Given the wide spectrum of problems crowd science projects seek to solve, it is useful to draw on multiple conceptualizations of problem solving in thinking about the benefits projects may derive from openness in participation and from the open disclosure of intermediate inputs. -- Highlighted mar 6, 2014

p.13: One key feature of crowd science projects is their openness to the contributions of a large number of individuals. However, there is a large and increasing number of projects, and the population of potential contributors is vast. Thus, organizational mechanisms are needed to allow for the efficient matching of projects and potential contributors with respect to both skills and interests. One potential approach makes it easier for individuals to find projects by aggregating and disseminating information on ongoing or planned projects. Websites such as scistarter.com, for example, offer searchable databases of a wide range of projects, allowing individuals to find projects that fit their particular interests and levels of expertise. -- Highlighted mar 6, 2014

p.14: The basic idea of modularization is that a large problem is divided into many smaller problems, plus a strategy (the architecture) the specifies how the modules will fit together. The goal is to design modules that have minimum interdependencies with one another, allowing for a greater division of labor and parallel work (Von Krogh et al., 2003; Baldwin and Clark, 2006). Modularization has already been used in many crowd science projects. For example, Galaxy Zoo and Old Weather keep the design of the overall project centralized and provide individual contributors with a small piece of the task so that they can work independently and at their own pace. Of course, not all problems can be easily modularized; we suspect, for example, that mathematical problems are less amenable to modularization. However, while the nature of the problem may set limits to modularization in the short term, advances in information technology and project management knowledge are likely to increase project leaders' ability to modularize a given problem over time (see Simon, 1973). -- Highlighted mar 6, 2014

p.15: While most existing crowd science projects are led by professional scientists, it is conceivable that leadership positions may also be taken by other kinds of individuals. For example, leadership roles might be taken on by designers of collaboration tools, who may have less of an interest in a particular domain per se, but who have - often through experience - built expertise in crowd science project management (e.g., some of the Zooniverse staff members). And of course, leaders may emerge from the larger crowd as a project develops. Indeed, OSS experience suggests that leadership should be thought of as a dynamic concept and can change depending on the particular leadership skills a project requires at a particular point in time. -- Highlighted mar 6, 2014

p.17: Second, while much of the current discussion focuses on how crowd science projects form and operate, very little is known regarding the quantity and quality of a research output. One particularly salient concern is that projects that are initiated by non-professional scientists may not follow the scientific method, calling in question the quality of research output. Some citizen science projects led by patients, for example, do not use the experimental designs typical of traditional studies in the medical sciences, making it difficult to interpret the results (Marcus, 2011). To ensure that crowd science meets the rigorous standards of science, it seems important that trained scientists are involved in the design of experiments. To some extent, however, rigor and standardized scientific processes may also be embedded in the software and platform tools that support a crowd science project. Similarly, it may be possible for crowd science platforms to provide "scientific consultants" who advise (and potentially certify) citizen science projects. Finally, to the extent that crowd science results are published in traditional journals, the traditional layer of quality control in the form of peer review still applies. However, results are increasingly disclosed through non-traditional channels as blogs and project websites. The question whether and how such disclosures should be verified and certified is an important area for future scholarly work and policy discussions. -- Highlighted mar 6, 2014

p.17: A related question concerns the efficiency of the crowd science approach. While it is impressive that the Zooniverse platform has generated dozens of peer reviewed publications, this output does not reflect the work of a typical academic research lab. Rather, it reflects hundreds of thousands of hours of labor supplied by project leaders as well as citizen scientists ( see a related discussion in Bikard and Murray, 2011). Empirical research is needed to measure crowd science labor inputs, possibly giving different weights to different types of skills (see Fig. 4). It is likely that most crowd science projects are less efficient than traditional projects in terms of output relative to input; however, that issue may be less of a concern given that most of the labor inputs are provided voluntarily and for free by contributors who appear to derive significant non-pecuniary benefits from doing so. Moreover, some large-scale projects would simply not be possible in a traditional lab. Nevertheless, understanding potential avenues to increase efficiency will be important for crowd science's long-term success. By way of example, the efficiency of distributed data coding projects such as Galaxy Zoo may be increased by tracking individuals' performance over time and limiting the replication of work done by contributors who have shown reliable performance in the past (see Simpson et al., 2012) -- Highlighted mar 6, 2014

p.18: While our discussion of future research has focused on crowd science as the object of study, crowd science may also serve as an ideal setting to study the range of issues central to our understanding of science and knowledge production more generally. For example, the team size in traditional science has been increasing in most fields (Wuchty et al., 2007), raising challenges associated with the effective division of labor and the coordination of project participants (see Cummings and Kiesler, 2007). As such, research on the effective organization of crowd science projects may also inform efforts to improve the efficiency of (traditional) team science. Similarly, crowd science projects may provide unique insights into the process of knowledge creation. For example, detailed discussion logs may allow scholars to study cognitive aspects of problem solving and the interactions among individuals in scientific teams (Singh and Fleming, 2010), or to compare the characteristics of both successful and unsuccessful problem solving attempts. Such micro-level insights are extremely difficult to gain in the context of traditional science, where disclosure is limited primarily to the publication of (successful) research results, and where the path to success remains largely hidden from the eyes of social scientists. -- Highlighted mar 6, 2014