Linguistic Computers

  • February 01, 2015
  • David J. Bilinsky

According to Tom Mitchell, chairman of the machine learning department at Carnegie Mellon University in Pittsburgh, as reported by the New York Times (NYT)

(nytimes.com/2011/03/05/science/05legal.html)

We’re at the beginning of a 10-year period where we’re going to transition from computers that can’t understand language to a point where computers can understand quite a bit about language.

 Nowhere are these advances clearer than in the legal world.

These language advances are currently being employed in e-discovery software. E-discovery technologies generally fall into two broad categories that can be described as “linguistic”
and “sociological.”

The Linguistic approach uses specific search words to find and sort relevant documents. As the technology becomes more advanced, documents are filtered through a large web of word and phrase definitions. For example, a lawyer who types “dog” will also find documents that mention “man’s best friend” and even the notion of a “walk.”

The sociological approach adds a new layer: inferential analysis – mimicking the deductive powers of a human Sherlock Holmes.

Of course the moneyball question is: Do these software developments work better than lawyers at discovery?

The NYT cites the example of Bill Herr, a former lawyer for a chemical company who used e-discovery software to go back and reanalyze discovery work his company’s lawyers did in the 1980s and 90s: “His human colleagues had been only 60% accurate, he found. Think about how much money had been spent to be slightly better than a coin toss….”

Which e-discovery technologies are being used now in the legal field? Here is a selection:

Chenope

(chenope.com)

This company acquired Cataphora whose software, according to Bloomberg Businessweek, “… can track employee behavior that might indicate some form of wrongdoing.

In court cases involving corporations, lawyers for plaintiffs often struggle to determine which employees knew that fraud or some other illegal activity was happening. Reconstructing the context surrounding the event can be painstaking as investigators wade through thousands or even millions of email messages. The task has become even more challenging in recent years as new forms of communication – instant messaging, text messages, or social media postings – have become more pervasive.

Cataphora’s software overcomes this challenge by correlating and analyzing different types of communications to try to create context. Using software to track the larger context around employee relationships can be used to incriminate – or acquit – defendants.”

Symantec e-discovery platform powered by Clearwell

(symantec.com/ediscovery-platform)

This software uses what Symantec calls Transparent Predictive Coding: Open up the black box of technology-assisted review with Transparent Predictive Coding. This feature leverages machine learning technology to improve the efficiency and effectiveness of traditional linear review with increased accuracy, workflow defensibility and tagging transparency.” Symantec states, “that using this software can reduce the time of lawyer document review, cut costs by up to 98% as well as reduce information for review by 90%.

Symantec cites, “Transatlantic Reinsurance’s use of Clearwell. The company’s IT and legal teams reduced the time needed to analyze tens of thousands of email messages from days to mere minutes.”

© 2015 David J. Bilinsky