Westlaw rises to legal publishing fame by selling free information

St. Paul company outprofits Gannett, McGraw Hill, New York Times

By Erin Carlyle

published: April 29, 2009

The road that leads to Opperman Drive is quiet, in a modest Minnesota manner. A few concrete-walled industrial structures line the highway. Patches of prairie grass fill in the open spaces. The scenery is so dull that a driver, lulled by its calm, might not notice a massive building lying just west of Highway 149—home to the most powerful company in the history of legal publishing: West.

Inside the monstrosity—2.8 million square feet of brick and glass and concrete data centers —7,500 employees labor over the law. They are attorneys, computer scientists, and MBAs.

West makes its money by selling free, public information—specifically, court documents—to lawyers. On this simple model, the company raked in $3.5 billion in revenue last year, placing it on a par, sales-wise, with retail giant Abercrombie and Fitch. But its operating profit margin really impresses: At a whopping 32.1 percent, West outpaces that of tech giants like Google (19.4 percent), Amazon (3.4 percent), and eBay (20.8 percent). Westlaw excels at one simple task: saving lawyers time by making legal information more readily accessible. The company charges a firm of six to ten lawyers as much as $30,000 a year to access its state and federal databases. But since attorneys' time is worth a lot of money, the service pays for itself. After all, the more work they can do, the more money they can make.

It all started back in 1872, when John West, a book peddler for a St. Paul bookstore, noticed that the judges and lawyers he called on were frustrated by the long wait times for legal documents. State court clerks often waited as much as a year before issuing court decisions in a bound volume—and by then new precedents would have been set. What's more, local attorneys had trouble getting legal books from East Coast publishers. This made it extremely difficult for lawyers to do the research they needed to build the arguments for their cases.

West had a solution. Together with his brother Horatio, John West founded West Publishing in downtown St. Paul. The brothers began issuing a serial publication they called a "reporter," which included all the latest court decisions. They started with Minnesota, then added Wisconsin, and eventually expanded into a network of regional publications that included all the states. Within a few years, West Publishing's National Reporter System was the standard for up-to-date court information.

The company rolled merrily along until the mid-1970s, when advances in technology dramatically changed its business model. A nonprofit startup associated with the Ohio Bar Association commissioned West Publishing's first significant competitor: Lexis, which offered a computerized law library. In 1975, two years after Lexis's debut, West Publishing presented its own computerized system, Westlaw.

Competition between the two companies was fierce. In 1995, West's management decided to put the company up for sale. Thomson Corporation, a Canadian information mega-firm, purchased West for $3.45 billion. Though the Department of Justice antitrust division got involved, Thomson West came out controlling about 40 percent of the legal publishing market.

At the time, people said Thomson paid too much. They doubted that Thomson would be able to squeeze more profit out of West, which was already posting 25 percent returns. But since its takeover, Thomson has consistently managed to attain 30 percent or higher profit margins. Legal information seems to be the sponge that won't dry.

Last year, Thomson acquired Reuters, the financial information and news firm. In its first year as a single entity, the combined company earned $11.7 billion in total revenue—more than any American-held printing and publishing company, including Gannett, McGraw Hill, and the New York Times.

Westlaw is one of the great successes of the information age. At a time when major newspapers are falling into bankruptcy, it's worth paying attention to what worked.

Rule 1:

Find a niche with growth potential

When John B. West founded his publishing company, the population of the United States was relatively small. Most people hadn't been to college. That meant that few people became attorneys. Disputes were often informally resolved out of court, often before a sheriff or marshal.

These factors meant that the volume of court cases—and the attorneys who needed access to court information—wasn't particularly high. In 1872, for instance, the Minnesota Supreme Court had only three judges, and there was no court of appeals. There was just one federal judge in the state, who roamed from St. Paul to Duluth to Fergus Falls to administer justice. A trio of circuit court judges traveled throughout seven Midwestern states to hear federal appeals.

As the country expanded westward and the population grew, so did the need for courts and attorneys. In 1880, there were 64,000 lawyers in the country. By 1920, that number had nearly doubled, to 122,000. As lawyers multiplied, so did the number of cases in court. West's National Reporter System had plenty of new case law to fill its reports.

West may not have realized it at the time, but the trajectory of the nation—its movement westward and its ever-growing population—meant that the demand for readily accessible legal information would expand exponentially.

What's more, the pool of information that lawyers required would never diminish. In most markets, there is a saturation point reached when all the potential buyers for a product have purchased it. Then a company tries to sell new and improved versions of its product, and the old model becomes obsolete.

But the market for legal information works differently. Case law is based on historical precedent: New rulings are built upon prior decisions. This means that for attorneys and legal professionals, old law does not, like an old product, become disposable. Instead, the demand for old cases remains intact.

West had hit a sweet spot in the market. He'd specialized in providing a type of information that was being produced at a faster and faster rate. And none of it would ever become obsolete.

Rule 2:

Organize information to make it useful

At about the same time that the broth-ers West founded their publishing company in St. Paul, a young librarian in Massachusetts began visiting libraries across the country, studying their financial constraints and the way they organized their books. He reasoned that libraries could become more useful, without added cost, simply by classifying and cataloguing books systematically, based on the decimal system. Knowledge was grouped into 10 categories, each assigned a number. Each of the 10 categories was divided into 10 subcategories, and the subcategories were divided yet again.

The young librarian's name was Melvil Dewey. He called his innovation the Dewey Decimal System, and he began to apply it to the Amherst College library, where he worked. Today it is used by most American public libraries.

What Melvil Dewey did for libraries, John West did for the law. West's National Reporter System made the law readily available, but as the pile of information grew, finding the relevant information became increasingly difficult.

To solve this problem, a brilliant West Publishing employee named John Mallory came up with his own version of the Dewey Decimal System in 1908. He divided the law into 400 topics, based upon an introductory legal course at Harvard Law School. He assigned each topic a key number, and created subcategories within each of those key numbers.

To make case law easier to search by subject, West Publishing began issuing a digest that identified all the key numbers and all the decisions that had come out related to them. Contracts, for instance, were one key number, bankruptcy another. Today, the system includes 100,000 subcategories.

West had earned a reputation for knowing how to organize. So, in the 1920s, when the federal government was ready to streamline its statutes, West was called in to help. Before 1875, United States federal statutes had been collected but not codified, meaning grouped by subject. The first attempt that year was riddled with errors. West's 1926 codification was the most thorough U.S. Code ever (though it still contained some 537 errors, 88 of them of substance).

West published the U.S. Code for years—often for free. But its real innovation was an unofficial version of the code, including notes about the changes, which lawyers found far more useful. West's version, U.S. Code Annotated, was easier to use because it was more organized, and more thorough, than the U.S. Code itself.

Over the years, West added enhancements to its system for organizing case law —like headnotes in the 1900s, which were case summaries that identified and spelled out the points of law in a case, and an electronic citation service in the 1990s, which notified attorneys when a particular law changed. These tools made West's products easier and faster to use. During the 20th century, the company's publications became so entrenched as the industry standard that judges required attorneys to cite the page number of the West volume—not the official court record or government code—in their written arguments.

"Their classification system covers almost all of the case law in the U.S.," says Suzanne Thorpe, associate director of the University of Minnesota's law library.

Rule 3:

The internet is a distribution channel -- not a product

In 1958, President Dwight Eisenhower founded the Advanced Research Projects Agency—or ARPA—for top-secret scientific and military research. The ARPA scientists needed access to expensive computers, so in 1968 the agency created the ARPANet, a way to connect computers over a telephone line. That technology formed the basis of the modern internet.

About the same time Eisenhower created ARPA, a law professor in Pennsylvania named John Horty began experimenting with ways to use computers to search legal documents. He coded the text of public health statutes onto punch cards and fed them into the University of Pittsburgh's massive computer, where they were loaded onto the computer's tape. Horty used key terms to search the tape for the information he wanted.

The Ohio State Bar Association was so impressed with Horty's work that it contracted with Data Corp. in Beavercreek, Ohio, to push it forward. In 1973, Mead Data Center (parent to Data Corp.) debuted a computerized database of the full text of a limited group of federal and state statutes and case law. Subscribers connected to it through a telephone on a system modeled after the ARPANet. They called the product Lexis.

Lexis charged lawyers for the time they spent connected to the mainframe. The president of West Publishing, Dwight Opperman, followed Lexis's entrance into the market closely. Opperman reasoned that West could provide a more efficient service at a lower cost with just its case summaries. The summaries were shorter and so would be faster—and therefore cheaper—to search. In 1975, West Publishing launched Opperman's vision: Westlaw.

Over the decades, the two companies strove to outdo each other in providing the best online legal research tools in the market. Each came up with more and more sophisticated ways to access legal information through the use of online technology.

Though it focused aggressively on improving delivery, West never confused the vehicle with the content. The internet itself was never a product. Rather, West's product was its value-added legal information. When Opperman invested in Westlaw, he had no way of knowing that the internet would overtake print publishing. "I saw it as another way of selling our material," he says.

Rule 4:

Turn words into math

At the end of World War I, a German engineer named Arthur Sherbius created a machine that could encrypt and decode messages. It was adopted by the Nazis and called the Enigma.

One of the devices was sent by mistake to the Biuro Szyfrów (Poland's codes bureau), where a Polish mathematician named Marian Rejewski applied mathematics to crack the cypher. For the next seven years, cryptologists at the Biuro Szyfrów regularly deciphered Enigma-encrypted messages.

Five weeks before the outbreak of World War II, the Poles shared the code-breaking method with their French and British allies. Cryptologists at Bletchley Park in England used the information to decode thousands of Nazi messages. The intelligence they collected became known by the code name ULTRA, for ultra-secret, and it has been credited with hastening the end of World War II by two years.

The story of Engima and ULTRA was kept secret from the public until 1974, but Warren Weaver, who headed up the Applied Mathematics Panel at the U.S. Office of Scientific Research and Development during the war, would have surely known of it. In July 1949, Weaver drafted a bold memo. He proposed that languages, like codes, could be cracked with math—using computers.

In the midst of the Cold War, Weaver's idea had vast appeal. His memo set off a renaissance in computer science, as researchers busily scratched out calculations in an attempt to translate Russian through the use of mathematical formulas.

But by about 1970, the flurry of research abruptly stopped, when scientists realized Weaver was fundamentally wrong. It turns out that languages can't be cracked through math because they aren't math-based. (In retrospect, Weaver should have known better. After all, the Allies used the Navajo language as an unbreakable code during the war.)

But there was a glimmer of the future in Weaver's idea: Powerful technologies can be created when words are treated like math. As the scientists worked with linguistic data, they discovered sophisticated mathematical formulas that could describe patterns in the data. These algorithms could be used to teach computers to recognize patterns, and once a computer understood a pattern, it could sort and categorize new data, even if it didn't technically understand what the language meant.

Westlaw's vast trove of legal documents turned out to be the perfect diet for the new technologies. The company's computer scientists designed a system of algorithms that they dubbed CARE, Categorization and Recommendation Engine. The computer uses a system of statistics, including Bayesian probability, to predict where documents should be categorized. CARE suggests key numbers for new cases, identifies cases affected by a new decision, and performs a host of other tasks. Before CARE, West had hired freelance attorneys to do this work. Now, a computer can do it more quickly and more accurately.

Rule 5:

Separate the signal from the noise

Type the word "jaguar" into Google's search engine and you'll get 64 million results. Some of the returns have to do with the animal; others refer to the luxury car. The jaguar problem is precisely the kind of search confusion that Westlaw tries to avoid.

"It's all about trying to find a needle in a haystack," says West CEO Peter Warwick.

The basic information-retrieval technology West uses is the same as the technology that underlies a search engine like Google. It's called TF/IDF, for term frequency/inverse document frequency, and it essentially measures the frequency of a term in a document and compares it to how rare that term is in the vast pool of data that composes the entire system. Those parameters tell the computer which information is most relevant for the search. But West's system has some important differences.

"Google knows how pages are linked, but it doesn't really know why," says Peter Jackson, chief scientist and head of research and development at Thomson Reuters.

West can return more targeted search results than Google for three reasons:

• The information in West's database is already connected through the key number system—the organizational structure that John West set up 100 years ago. West uses the connections between documents—citations as well as key numbers—to recommend search results the user might otherwise not have found.

• The pool of data is more limited because it is only legal information. West's database contains less irrelevant information than Google's massive database, which tries to index everything.

• The vocabulary in the pool of information is also more specific. Legal terms are by necessity uncreative. Rather than find a new word for the term "bankruptcy," an attorney will specifically use that term 20 times in a document, because it has a specific legal meaning that he is trying to convey. That repetition of terms makes legal information easier to search—and West's search technology includes a thesaurus that recognizes synonyms.

"No lawyer in his or her right mind would go to Google and start looking for case law. They'd be insane," says Ted Pederson, professor of computer science at the University of Minnesota, Duluth. "They are using Westlaw to make arguments to decide the fate of people and companies. These are very high stakes. This is not like searching for Britney Spears on Google."

Rule 6:

Computers can't do everything

The underlying search technology that West uses isn't complicated. Frankly, it's ubiquitous. Yet lower-cost online legal research services with access to the same search technology continue to lag in the market. Why?

The difference between West and the lower-cost services is its people. CARE may make recommendations and automate processes, but an army of 800 attorney-editors analyzes the cases, writes the summaries, and approves many of the recommendations that CARE provides. No free or low-cost service has anything near West's legion of human editors.

At West, every case goes through a 22-step editorial process. Multiple people work on each case, cross-checking each other's work to ensure that it is 100 percent accurate. Attorney-editors add searchable terms tuned to West's search engine. The editorial process is so specific that it identifies about 100,000 errors in court documents each year, and notifies the courts of the needed corrections.

Rule 7:

Treat content like patented material

In the 1980s, a group of blue suits from IBM walked into the headquarters of Sun Microsystems in Silicon Valley, California. IBM had accused Sun of committing seven patent violations, and the lawyers were there to talk about it.

The attorneys at Sun had looked at IBM's claim and thought much of it was frivolous, according to Sun's attorney at the time, Gary Reback, who tells the story in a 2002 Forbes article. Most glaring of all was IBM's claim to "fat line" technology, in which customers clicked on two points above a line and two below in order to thicken the line into a rectangle.

With animation, the Sun attorneys put black marker to whiteboard to illustrate the absurdity of the claims. IBM didn't have the right to that technology, they argued. Every kindergartener in the country had figured it out.

The IBM men watched, impassive. "Okay, maybe you don't infringe these seven patents," one of the company's men finally acknowledged. "But we have 10,000 patents. Do you really want us to go back [to headquarters] and find seven patents you do infringe? Or do you want to make this easy and just pay us $20 million?"

After some negotiation, Sun cut IBM a check.

What IBM did with its patents, West did with its copyrights. Throughout the 1980s, West and LexisNexis sued each other over a series of copyright claims. In a move roughly equivalent to IBM's broad line argument, West claimed it had rights to the way its information was arranged on a page.

Because judges had for decades required attorneys to cite the page numbers of Westlaw volumes in their arguments, LexisNexis began referring to the West page numbers, too. West objected to the references to internal page numbers, claiming its page numbers were copyrighted material.

"It was really an absurdity. There's no intellectual component," to page numbering, says Kendall Svengalis, a retired Rhode Island State Law librarian and author of The Legal Information Buyer's Guide & Reference Manual.

Nevertheless, LexisNexis found that fighting a court battle was more costly than settling. In the mid-1980s, Lexis agreed to pay West $50,000 a year to reference Westlaw's page numbers. The money was a pittance to both publishers, but the message was clear: West would fight to the death to protect its content.

Rule 8:

Print's not dead, it just needs online help

When Thomson purchased West, it gained control of the leading products in both print and online legal publishing. Some doubted that Thomson could improve on West's 25 percent profit margins as print publishing gave way to online. But Thomson found a way to keep print profitable.

From 1996 to 2005, the price for initial editions of Thomson's legal books went up about 4.5 percent each year—just slightly above the increase in inflation, and comparable to LexisNexis's 4.2 percent annual increase for similar materials.

But during the same period, Thomson's price for supplementation—updates to the initial books after changes in the law occurred—rose 11.5 percent each year, far higher than both the rate of inflation and Lexis's increase in prices for the same service.

That explains in part how in 2005, even after electronic media dominated the market and comprised 57 percent of West's revenues, the company still got 43 percent of its revenues from print.

"Thomson figured out in the early '80s where the money could be made," Svengalis says. "It's in professional publishing. It's certainly not in newspapers. When you're dealing with serial titles, supplementation, a lot of the customers are kind of captive."

The market for legal books is not likely to disappear, since some information is simply easier to absorb in book format. Electronic subscribers get deals on books and do not pay the full price for supplementation, company management says.

Westlaw's core business, though, has become its electronic products; last year, online made up 69 percent of the revenue stream. "Print will always be important, whether it will be one-third or one-fifth" of the business, says Peter Warwick, West's CEO. "But the primary thinking is online."

To keep its electronic market robust, Westlaw constantly develops new products. Says Rick King, head of operations: "If you're not the leader and the innovator, then you'll be overtaken."