Highlighted Selections from:

The Power of Data

Ohlhausen, Maureen K. "The Power of Data." Georgetown University McCourt School of Public Policy and Georgetown Law Center Privacy. Principles in the Era of Massive Data. Washington, DC. 22 Apr 2014: 1–13. Print. Written remarks of the Commissioner of the Federal Trade Commission. https://twitter.com/MOhlhausenFTC/status/458625142110035969

p.1: As society has integrated and adopted increasingly powerful computers and pervasive communications networks, we have created massive amounts of data. This trend will continue as we move into the era of the Internet of Things, a universe of far-flung devices that will massively increase the amount of passive data collection. The tools that will enable us to collect and analyze this “big data” promise significant benefits for consumers, businesses, and government. -- Highlighted apr 26, 2014

p.2: Although some potential uses of big data raise concerns about privacy and other values, we can address these concerns together, through a coalition of academics, regulators, businesses, and consumers. Big data is a tool; like all tools it has strengths and weaknesses. Keeping those strengths and weaknesses in perspective is important as we work together to adapt our laws, guidelines, best practices, customs, and society to integrate this new technology. As we adapt to big data, the FTC will serve an important role in protecting consumers and promoting innovation. -- Highlighted apr 26, 2014

p.4: As Professor Sinan Aral of New York University has explained, “Revolutions in science have often been preceded by revolutions in measurement.” -- Highlighted apr 26, 2014

p.4: And many new uses are emerging, particularly because consumers are no longer simply data points to be researched. Today’s consumers are themselves producers and consumers of big data, whether posting billions of cat photos on Facebook, using Bing’s flight price predictors to make travel plans, or joining the self-quantification movement by wearing a FitBit Flex and using a Withings [Wye-theengs] bathroom scale. -- Highlighted apr 26, 2014

p.5: David Spiegelhalter, Winton Professor of the Public Understanding of Risk at Cambridge University, is one of the skeptics. He has said, “There are a lot of small data problems that occur in big data... They don’t disappear because you’ve got lots of the stuff. They get worse.” -- Highlighted apr 26, 2014

p.5: In particular, there are two technical concerns regarding big data that I’d like to discuss today. -- Highlighted apr 26, 2014

p.5: First, there are so-called “signal problems,” where the data set, huge as it may be, is not representative of the real world. Kate Crawford describes the City of Boston’s StreetBump mobile app as an example of this kind of problem. The StreetBump app monitors GPS and accelerometer data on users’ phones to passively detect potholes and report them to the city. However, the data is noticeably tilted toward finding potholes in areas where a higher percentage of the driving population owns a smartphone. Thus, because the underlying data did not accurately reflect the real world, neither did the result of the analysis. -- Highlighted apr 26, 2014

p.5-6: Second, big data is susceptible to the “multiple comparisons problem.” Big data tools are particularly good at discovering correlations in complex data sets. However, as a recent New York Times op-ed pointed out, big data can’t tell us which correlations are important and which are spurious. If a scientist examines a single data set for 100 different correlations, probability says he will find five patterns that appear statistically significant but which are a result of random chance. This problem actually gets worse in larger data sets because there are more possible correlations to test. This may be an even more significant problem when the investigator is simply exploring a big data set without a particular question in mind. In such cases, it is easy to find “statistically significant” correlations that are actually the result of pure chance. -- Highlighted apr 26, 2014

p.6: Both of these problems are reminders that data, even big data, isn’t knowledge or wisdom. It can be misleading. Even worse, data-driven decisions can seem right while being wrong. -- Highlighted apr 26, 2014

p.6: These problems do not negate the significant potential benefits of big data techniques. But they do mean that big data analysis is not an all-powerful technique. It is a tool that has certain limitations, and like all tools, it can be used or misused. Both big data boosters and big data skeptics should pay attention to these limitations. By pulling some of the hype out of the debate, we can better ensure an appropriate and proportional response. -- Highlighted apr 26, 2014

p.8: The FTC’s data security enforcement framework is not perfect; I would like to develop more concrete guidance to industry, for example. But I haven’t seen anything that suggests that big data technology raises fundamentally new data security issues. -- Highlighted apr 26, 2014

p.8: Similarly, some groups also argue that certain types of particularly sensitive data, such as data about children, health, or finances, deserve heightened protection when stored in big data sets. Of course, the FTC already recognizes the need to more thoroughly protect such types of data, whether the data is in big data or small data environments. -- Highlighted apr 26, 2014

p.9: The FIPPs principle of data minimization is also in tension with the incentives of big data. Part of the promise of big data is to pull knowledge from data points whose value was previously unknown. Thus, retention of as much data as possible for lengthy amounts of time is a common practice. Strictly limiting the collection of data to the particular task currently at hand and disposing of it afterwards would handicap the data scientist’s ability to find new information to address future tasks. Certain de-identification techniques such as anonymization, although not perfect, can help mitigate some of the risks of comprehensive data retention while permitting innovative big data analysis to proceed. -- Highlighted apr 26, 2014

p.9: I believe FIPPs remains a solid framework and is flexible enough to accommodate a robust big data industry, but we have some work to do to resolve these tensions. I welcome your ideas on how we can do this. -- Highlighted apr 26, 2014

p.9: Finally, some advocates worry that companies will use big data techniques to prejudge or discriminate against individuals unfairly or erroneously without recourse. The concern is that a researcher could collect non-sensitive information about a consumer and then use big data analysis to infer certain sensitive characteristics about that consumer. -- Highlighted apr 26, 2014

p.9-10: This is a complicated issue that we need to know more about. First, companies have long engaged in this type of consumer targeting with more traditional tools. It is not clear how much additional value big data analysis will bring, because, as noted earlier, big data analysis is not a foolproof tool for all questions. Second, it is not yet clear how likely companies are to use such an approach. Third, if companies do engage in this sort of analysis, we need to determine how they might use such information. -- Highlighted apr 26, 2014

p.10: This third point, the type of use, matters, as our legal framework restricts certain uses of data regardless of how it was collected. Specifically, the Fair Credit Reporting Act establishes constraints for companies that make certain uses of data: creditworthiness, insurance eligibility, evaluation for employment, and renter background checks. Passed in 1970 in response to the creation of credit reporting bureaus, the FCRA could be considered the first “big data” bill. In fact, the FTC has applied the FCRA in a “big data” context. -- Highlighted apr 26, 2014

p.10: I believe the FCRA may provide a useful model for the types of big data uses that raise significant consumer concern. Any new exploration of FCRA-like use restrictions, however, should not undermine the continued application of many of the FIPPs principles, which have worked well for decades. But I hope we can explore whether specifically prohibiting certain clearly impermissible uses could help protect consumers while enabling continued innovation in big data. Any exploration of this FCRA-like approach should involve a detailed cost-benefit analysis, of course. -- Highlighted apr 26, 2014

p.10-11: None of this is to denigrate the establishment of principles to guide the collection of data. Such principles can and do serve as important best practices or industry standards. That is why I have repeatedly supported as best practices many (although not all) of the recommendations of the FTC’s 2012 report on “Protecting Consumer Privacy in an Era of Rapid Change.” the most relevant recommendations of that report for big data include: Privacy by Design – Companies should build in consumer privacy protections at every stage in developing their products. These protections include reasonable security for consumer data and reasonable procedures to promote data accuracy. In the big data context, built-in de-identification measures could play an important role in protecting consumer privacy. Simplified Choice for Businesses and Consumers – Recognizing that there is no single best way to offer notice and choice in all circumstances, companies should adopt notice and choice options that appropriately reflect the context of the transaction or the relationship the company has with the consumer. In the big data context, this may be challenging, but I believe it is a principle worth continuing to pursue. Greater Transparency – Companies should disclose details about their collection and use of consumers’ information and provide consumers access to the data collected about them. -- Highlighted apr 26, 2014

p.11: The FTC can help ensure that the promise of big data is realized by using our unique set of enforcement and policy tools. First, the FTC is an enforcement agency and it can and should use its traditional deception and unfairness authority to stop consumer harms that may arise from the misuse of big data. Strong enforcement will help not only consumers but also other companies using big data analysis by policing actors that may tarnish the technology itself. -- Highlighted apr 26, 2014

p.11-12: Second, we can use our convening power and our policy and R&D functions to better understand big data technology; the new business models it may enable; the applicability of existing regulatory structures, including self-regulation; market dynamics; and the nature and extent of likely consumer and competitive benefits and risks. -- Highlighted apr 26, 2014

p.12-13: As the FTC uses these various institutional tools to engage with big data issues, two principles should guide our work. First, as with all dynamic markets, we must approach big data technologies with what I call regulatory humility. Our most successful technological advances, such as the Internet itself, have generated massive amounts of consumer welfare and have thrived largely because market participants have enjoyed wide latitude to experiment with new technology-driven business models, allowing the market to determine which of those models succeeds or fails. This is the right approach. Second, we must identify substantial consumer harm before taking action. Thus, the FTC should remain vigilant for deceptive and unfair uses of big data, but should avoid preemptive action that could preclude entire future industries. Ultimately, our work as an agency should help strengthen competition and the market to better provide beneficial outcomes in response to consumer demand, rather than to try to dictate desired outcomes to the market. -- Highlighted apr 26, 2014