Surden, Harry, Machine Learning and Law (March 26, 2014). Washington Law Review, Vol. 89, No. 1, 2014. Available at SSRN: http://ssrn.com/abstract=2417415
p.1: What impact might artificial intelligence (AI) have upon the practice of law? According to one view, AI should have little bearing upon legal practice barring significant technical advances. The reason is that legal practice is thought to require advanced cognitive abilities, but such higher-order cognition remains outside the capability of current AI higher-order cognition remains outside the capability of current AI technology. Attorneys, for example, routinely combine abstract reasoning and problem solving skills in environments of legal and factual uncertainty. Modern AI algorithms, by contrast, have been unable to replicate most human intellectual abilities, falling far short in advanced cognitive processes—such as analogical reasoning—that are basic to legal practice. Given these and other limitations in current AI technology, one might conclude that until computers can replicate the higher-order cognition routinely displayed by trained attorneys, AI would have little impact in a domain as full of abstraction and uncertainty as law. -- Highlighted apr 17, 2014
p.2: Consider that outside of law, non-cognitive AI techniques have been successfully applied to tasks that were once thought to necessitate human intelligence—for example language translation. While the results of these automated efforts are sometimes imperfect, the interesting point is that such computer generated results have often proven useful for particular tasks where strong approximations are acceptable. In a similar vein, this Article will suggest that there may be a limited, but not insignificant, subset of legal tasks that are capable of being partially automated using current AI techniques despite their limitations relative to human cognition. In particular, this Article focuses upon a class of AI methods known as “machine learning” techniques and their potential impact upon legal practice. -- Highlighted apr 17, 2014
p.3: Part I of this Article explains the basic concepts underlying machine learning. Part II will convey a more general principle: non-intelligent computer algorithms can sometimes produce intelligent results in complex tasks through the use of suitable proxies detected in data. Part III will explore how certain legal tasks might be amenable to partial automation under this principle by employing machine learning techniques. This Part will also emphasize the significant limitations of these automated methods as compared to the capabilities of similarly situated attorneys. -- Highlighted apr 17, 2014
p.3: As will be discussed, the idea that the computers are “learning” is largely a metaphor and does not imply that computers systems are artificially replicating the advanced cognitive systems thought to be involved in human learning. Rather, we can consider these algorithms to be learning in a functional sense: they are capable of changing their behavior to enhance their performance on some task through experience. -- Highlighted apr 17, 2014
p.7: This capability to improve in performance over time by continually analyzing data to detect additional useful patterns is the key attribute that characterizes machine learning algorithms. Upon the basis of such an incrementally produced model, a well-performing machine learning algorithm may be able to automatically perform a task—such as classifying incoming emails as either spam or wanted emails—with a high degree of accuracy that approximates the classifications that a similarly situated human reviewer would have made. -- Highlighted apr 17, 2014
p.8: The problem with a manual, bottomup approach to modeling complex and changing phenomenon (such as spam) is that it is very difficult to specify a rule set ex-ante that would be robust and accurate enough to direct a computer to make useful, automated decisions. -- Highlighted apr 17, 2014
p.8: Machine learning algorithms, by contrast, are able to incrementally build complex models by automatically detecting patterns as data arrives. Such algorithms are powerful because, in a sense, these algorithms program themselves over time with the rules to accomplish a task, rather than being programmed manually with a series of predetermined rules. he rules are inferred from analyzed data and the model builds itself as additional data is analyzed. -- Highlighted apr 17, 2014
p.8: Such an incremental, adaptive, and iterative process often allows for the creation of nuanced models of complex phenomena that may otherwise be too difficult for programmers to specify manually, up front. -- Highlighted apr 17, 2014
p.9: Machine learning algorithms are often (although not exclusively) statistical in nature. Thus, in one sense, machine learning is not very different from the numerous statistical techniques already widely used within empirical studies in law. One salient distinction is that while many existing statistical approaches involve fixed or slow-tochange statistical models, the focus in machine learning is upon computer algorithms that are expressly designed to be dynamic and capable of changing and adapting to new and different circumstances as the data environment shifts. -- Highlighted apr 17, 2014
p.9: There are certain tasks that appear to require intelligence because when humans perform them, they implicate higher-order cognitive skills such as reasoning, comprehension, meta-cognition, or contextual perception of abstract concepts. However, research has shown that certain of these tasks can be automated—to some degree—through the use of non-cognitive computational techniques that employ heuristics or proxies (e.g., statistical correlations) to produce useful, “intelligent” results. By a proxy or heuristic, I refer to something that is an effective stand-in for some underlying concept, feature, or phenomenon. -- Highlighted apr 17, 2014
p.11: More generally, the example is illustrative of a broader strategy that has proven to be successful in automating a number of complex tasks: detecting proxies, patterns, or heuristics that reliably produce useful outcomes in complex tasks that, in humans, normally require intelligence. For a certain subset of tasks, it may be possible to detect proxies or heuristics that closely track the underlying phenomenon without actually engaging in the full range of abstraction underlying that phenomenon, as in the way the machine learning algorithm was able to identify spam emails without having to fully understand substance and context of the email text. As will be discussed in Part III this is the principle that may allow the automation of certain abstract tasks within law that, when conducted by attorneys, require higher order cognition. It is important to emphasize that such a proxy-based approach can have significant limitations. First, this strategy may only be appropriate for certain tasks for which approximations are suitable. By contrast, many complicated problems—particularly those that routinely confront attorneys—may not be amenable to such a heuristic-based technique. -- Highlighted apr 17, 2014
p.12: Second, a proxy-based strategy can often have significant accuracy limitations. Because proxies are stand-ins for some other underlying phenomenon, they necessarily are under- and over-inclusive relative to the phenomenon they are representing, and inevitably produce false positives and negatives. By employing proxies to analyze or classify text with substantive meaning for an abstract task, for example, such algorithms may produce more false positives or negatives than a similarly situated person employing cognitive processes, domain knowledge, and expertise. -- Highlighted apr 17, 2014
p.12: The strategy just described parallels changes among computer science artificial intelligence research over the last several decades. In the earliest era of AI research—from the 1950s through the 1980s—many researchers focused upon attempting to replicate computer-based versions of human cognitive processes. Behind this focus was a belief that because humans employ many of the advanced brain processes to tackle complex and abstract problems, the way to have computers display artificial intelligence was to create artificial versions of brain functionality.
However, more recently, researchers have achieved success in automating complex tasks by focusing not upon the intelligence of the automated processes themselves, but upon the results that automated processes produce. -- Highlighted apr 17, 2014
p.13: These systems have used machine learning and other techniques to develop combinations of statistical models, heuristics, and sensors that would not be considered cognitive in nature (in that they do not replicate human-level cognition) but that produce results that are useful and accurate enough for the task required. As described, these proxy-based approaches sometimes lack accuracy or have other limitations as compared to humans for certain complex or abstract tasks. But the key insight is that for many tasks, algorithmic approaches like machine learning may sometimes produce useful, automated approaches that are “good enough” for particular tasks. -- Highlighted apr 17, 2014
p.13: More recent research projects have taken a different approach, using statistical machine learning and access to large amounts of data to produce surprisingly good translation results without attempting to replicate human-linguistic processes. “Google Translate,” for example, works in part by leveraging huge corpuses of documents that experts previously translated from one language to another. The United Nations (UN) has for instance, over the years, employed professional translators to carefully translate millions of UN documents into multiple languages, and this body of translated documents has become available in electronic form. While these documents were originally created for other purposes, researchers have been able to harness this existing corpus of data to improve automated translation. Using statistical correlations and a huge body of carefully translated data, automated algorithms are able to create sophisticated statistical models about the likely meaning of phrases, and are able to produce automated translations that are quite good. -- Highlighted apr 17, 2014
p.15: Because machine learning has been successfully employed in a number of complex areas previously thought to be exclusively in the domain of human intelligence, this question is posed: to what extent might these techniques be applied within the practice of law? -- Highlighted apr 17, 2014
p.15: I emphasize that these tasks may be partially automatable, because often the goal of such automation is not to replace an attorney, but rather, to act as a complement, for example in filtering likely irrelevant data to help make an attorney more efficient. -- Highlighted apr 17, 2014
p.15: Rather, in many cases, the algorithms may be able to reliably filter out large swathes of documents that are likely to be irrelevant so that the attorney does not have to waste limited cognitive resources analyzing them. Additionally, these algorithms can highlight certain potentially relevant documents for increased attorney attention. In this sense, the algorithm does not replace the attorney but rather automates certain typical “easy-cases” so that the attorney’s cognitive efforts and time can be conserved for those tasks likely to actually require higherorder legal skills. -- Highlighted apr 17, 2014
p.16: By generalizing about the type of tasks that machine learning algorithms perform particularly well, we can extrapolate about where such algorithms may be able to impact legal practice. -- Highlighted apr 17, 2014
p.16: The ability to make informed and useful predictions about potential legal outcomes and liability is one of the primary skills of lawyering. Lawyers are routinely called upon to make predictions in a variety of legal settings. In a typical scenario, a client may provide the lawyer with a legal problem involving a complex set of facts and goals. A lawyer might employ a combination of judgment, experience, and knowledge of the law to make reasoned predictions about the likelihood of outcomes on particular legal issues or on overall issue of liability, often in contexts of considerable legal and factual uncertainty. On the basis of these predictions and other factors, the lawyer might counsel the client about recommended courses of action. -- Highlighted apr 17, 2014
p.16: However, as Daniel Katz has written, such prediction of likely legal outcomes may be increasingly subject to automated, computer-based analysis. -- Highlighted apr 17, 2014
p.17: Katz notes, there is existing data that can be harnessed to better predict outcomes in legal contexts. Katz suggests that the combination of human intelligence and computer-based analytics will likely prove superior to that of human analysis alone, for a variety of legal prediction tasks. -- Highlighted apr 17, 2014
p.17: One relevant technique to apply to such a process is the “supervised learning” method discussed previously. As mentioned, supervised learning involves inferring associations from data that has been previously categorized by humans. Where might such a data set come from? Law firms often encounter cases of the same general type and might create such an analyzable data set concerning past cases from which associations could potentially be inferred. On the basis of information from past clients and combining other relevant information such as published case decisions, firms could use machine learning algorithms to build predictive models of topics such as the likelihood of overall liability. If such automated predictive models outperform standard lawyer predictions by even a few percentage points, they could be a valuable addition to the standard legal counseling approach. Thus, by analyzing multiple examples of past client data, a machine learning algorithm might be able to identify associations between different types of case information and the likelihood of particular outcomes. -- Highlighted apr 17, 2014
p.18: For example, (to oversimplify) we could envision an algorithm learning that in workplace discrimination cases in which there is a racial epithet expressed in writing in an email, there is an early defendant settlement probability of 98 percent versus a 60 percent baseline. An attorney, upon encountering these same facts, might have a similar professional intuition that early settlement is likely given these powerful facts. However, to see the information supported by data may prove a helpful guide in providing professional advice.
More usefully, such an algorithm may identify a complex mix of factors in the data associated with particular outcomes that may be hard or impossible for an attorney to detect using typical legal analysis methods. For instance, imagine that the algorithm reveals that in cases in which there are multiple hostile emails sent to an employee, if the emails which there are sent within a three week time period, such cases tend to be 15 percent more likely to result in liability as compared to cases in which similar hostile emails are spread out over a longer one-year period. Such a nuance in timeframe may be hard for an attorney to casually detect across cases, but can be easily revealed through data pattern analysis. As such an algorithm received more and more exemplars from the training set, it could potentially refine its internal model, finding more such useful patterns that could improve the attorney’s ability to make reasoned predictions. -- Highlighted apr 17, 2014
p.19: Attorneys combine their judgment, training, reasoning, analysis, intuition, and cognition under the facts to make approximate legal predictions for their clients. To some extent, machine learning algorithms could perform a similar but complementary role, only more formally based upon analyzed data. -- Highlighted apr 17, 2014
p.19: Such a learned model is thus only useful to the extent that the heuristics inferred from past cases can be extrapolated to predict novel cases.
There are some well-known problems with this type of generalization. First, a model will only be useful to the extent that the class of future cases have pertinent features in common with the prior analyzed cases in the training set. In the event that future cases present unique or unusual facts compared to the past, such future distinct cases may be less facts compared to the past, such future distinct cases may be less predictable. In such a context, machine learning techniques may not be well suited to the job of prediction. For example, not every law firm will have a stream of cases that are sufficiently similar to one another such that past case data that has been catalogued contain elements that will be useful to predicting future outcomes. The degree of relatedness between future and past cases within a data-set is one important dimension to consider regarding the extent that machine learning predictive models will be helpful. Additionally, machine learning algorithms often require a relatively large sample of past examples before robust generalizations can be inferred. To the extent that the number of examples (e.g., past case data) are too few, such an algorithm may not be able to detect patterns that are reliable predictors.
Another common problem involves overgeneralization. This is essentially the same problem known elsewhere in statistics as overfitting. The general idea is that it is undesirable for a machine learning algorithm to detect patterns in the training data that are so finely tuned to the idiosyncrasies or biases in the training set such that they are not predictive of future, novel scenarios. -- Highlighted apr 17, 2014
p.20: Similarly, in the legal prediction context, the past case data upon which a machine learning algorithm is trained may be systematically biased in a way that leads to inaccurate results in future legal cases. The concern, in other words, would be relying upon an algorithm that is too attuned to the idiosyncrasies of the past case data that is being used to train a legal prediction algorithm. The algorithm may be able to detect patterns and infer rules from this training set data (e.g., examining an individual law firm’s past cases), but the rules inferred may not be useful for predictive purposes, if the data from which the patterns were detected were biased in some way and not actually reflective enough of the diversity of future cases likely to appear in the real world.
A final issue worth mentioning involves capturing information in data. In general, machine learning algorithms are only as good as the data that they are given to analyze. These algorithms build internal statistical models based upon the data provided. However, in many instances in legal prediction there may be subtle factors that are highly relevant to legal prediction and that attorneys routinely employ in their professional assessments, but which may be difficult to capture in formal, analyzable data. -- Highlighted apr 17, 2014
p.21: Similarly, there are certain legal issues whose outcomes may turn on analyzing abstractions—such as understanding the overall public policy of a law and how it applies to a set of facts—for which there may not be any suitable data proxy. Thus, in general, if there are certain types of salient information that are both difficult to quantify in data, and whose assessment requires nuanced analysis, such important considerations may be beyond the reach of current machine learning predictive techniques. -- Highlighted apr 17, 2014
p.21: Machine learning techniques are also useful for discovering hidden relationships in existing data that may otherwise be difficult to detect. Using the earlier example, attorneys could potentially use machine learning to highlight useful unknown information that exists within their current data but which is obscured due to complexity. For example, consider a law firm that tracks client and outcome data in tort cases over the span of several years. A machine learning algorithm might detect subtle but important correlations that might go unnoticed through typical attorney analysis of case information. -- Highlighted apr 17, 2014
p.22: Machine learning as a technique—since it excels at ferreting out correlations—may help to supplement the attorney intuitions and highlight salient factors that might otherwise escape notice. The discovery of such embedded information, combined with traditional attorney analysis, could potentially impact and improve the actual advice given to clients. -- Highlighted apr 17, 2014
p.22: There are some other potentially profound applications of machine learning models that can reveal non-obvious relationships, particularly in the analysis of legal opinions. A basis of the United States common law system is that judges are generally required to explain their decisions. -- Highlighted apr 17, 2014
p.23: Since machine learning algorithms can be very good at detecting hard to observe relationships between data, it may be possible to detect obscured associations between certain variables in legal cases and particular legal outcomes. It would be a profound result if machine learning brought forth evidence suggesting that judges were commonly basing their decisions upon considerations other than their stated rationales. Dynamically analyzed data could call into question whether certain legal outcomes were driven by factors different from those that were expressed in the language of an opinion.
An earlier research project illustrated a related point. Theodore Ruger, Andrew Martin, and collaborators built a statistical model of Supreme Court outcomes based upon various factors including the political orientation of the lower opinion (i.e. liberal or conservative) and the circuit of origin of the appeal. Not only did the statistical model outperform several experts in terms of predicting Supreme Court outcomes, it also highlighted relationships in the underlying data that may not have been fully understood previously. -- Highlighted apr 17, 2014
p.23: That project illustrates a basic point: that statistically analyzing decisions might bring to light correlations that could undermine basic assumptions within the legal system. If, for example, data analysis highlights that the opinions are highly correlated with a factor unrelated to the reasons articulated in the written opinions, it might lessen the legitimacy of stated opinions. -- Highlighted apr 17, 2014
p.25: Since about 2002, documents associated with lawsuits have been typically contained in online, electronically accessible websites such as the Federal “PACER” court records system. Such core documents associated with a lawsuit might include the complaint, multiple party motions and briefs, and the orders and judgments issued by the court. In a complicated court case, there may be several hundred documents associated with the case. However, obscured within such collections of hundreds litigation docket documents, there may be a few especially important documents—such as the active, amended complaint—that might be crucial to access, but difficult to locate manually. Electronic court dockets can become very lengthy, up to several hundred entries long. A particular important document—such as the active, amended complaint—may be located, for example, at entry 146 out of 300. Finding such an important document within a larger collection of less important docket entries often can be difficult. -- Highlighted apr 17, 2014
p.25: Such an algorithm could be trained to automate classifications of the documents based upon features such as the document text and other meta information such as the descriptive comments from the clerk of the court. Thus, key electronic court documents could be automatically identified as “complaints,” “motions,” or “orders,” by machine learning algorithms, and parties could more easily locate important docket documents -- Highlighted apr 17, 2014
p.28: The way that one determines that an applied-for invention is not new is by finding “prior art” documents, which are documents that describe the invention but predate the patent application. Such prior art typically consists of earlier published scientific journal articles, patents, or patent applications that indicate than the invention had been created or patent applications that indicate the invention had been created previously. Given the huge volumes of published patents and scientific journals, it is a difficult task to find those particular prior art documents in the wider world that would prove that an invention was invented earlier. The task of finding such a document is essentially a problem involving automatically determining a relationship between the patent application and the earlier prior art document. Machine learning document clustering may potentially be used to help make the search for related prior art documents more automated and efficient by grouping documents that are related to the patent application at hand. More generally, automated document clustering might be useful in other areas of the law in which finding relevant documents among large collections is crucial. -- Highlighted apr 17, 2014