How to Decode Law Histories

A rich source of information about laws is found in the history data that accompanies each law in most states, but you’ve probably never noticed it.

For example, Virginia’s Freedom of Information Act has a series of exemptions spelled out in § 2.2-3705.1 which has a cryptic series of numbers listed below the law, in the section titled “History”:

1999, cc. 485, 518, 703, 726, 793, 849, 852, 867, 868, 881, § 2.1-342.01; 2000, cc. 66, 237, 382, 400, 430, 583, 589, 592, 594, 618, 632, 657, 720, 932, 933, 947, 1006, 1064; 2001, cc. 288, 518, 844, § 2.2-3705; 2002, cc. 87, 155, 242, 393, 478, 481, 499, 522, 571, 572, 633, 655, 715, 798, 830; 2003, cc. 274, 307, 327, 332, 358, 704, 801, 884, 891, 893, 897, 968; 2004, c. 690; 2010, c. 553.

Most people’s eyes gloss right over that. (Really, did you read any of that, or just glance at it and acknowledge “yup, that’s a bunch of stuff that means nothing to me…I’ll just skip that and see what he’s got say about it”?) What looks like nonsense to most people turns out to be really rich data, which is simply stored in such a way to render it basically meaningless. Let’s peer inside and see what this means, starting with Virginia.

With Virginia’s history, the first pattern to emerge is that what looks like a long string of numbers is actually broken up into stanzas by semicolons. Here’s the first stanza:

1999, cc. 485, 518, 703, 726, 793, 849, 852, 867, 868, 881, § 2.1-342.01

The first four numbers—1999—are the year in which this section of the code passed into law, at least in its present form. (That was accomplished with then-delegate Chip Woodrum’s HB1985, which overhauled Virginia’s FOIA laws.) And the last string of numbers—§ 2.1-342.01—is the section number that this section had at the time. (Title 2.1 was recodified as Title 2.2 in 2000, which is when this was given its present section number.) In the middle, that series of three-digit numbers (485, 518, 703, etc.) refer to the portion of the Acts of the General Assembly for that year that created or amended this section of the code. The Acts of the General Assembly are sort of like a changelog for the code (but not exactly like a changelog!), in which all of the legislation that passed the General Assembly that year is ordered by the section of the state code that it affects; when multiple bills affect the same portion of the code, they are combined. It’s the intermediate step between a bill and the final, amended code. So here we can see that there were ten portions of the 1999 Acts of the General Assembly that affected this section of the code.

With this as a key, one can step through each stanza in the history of this Virginia law and understand how and when it changed, if not what the substance of those changes was.

Presumably it’s written in this manner to save space in the printed volumes, but obviously it no longer makes sense to codify our laws in a manner optimized for printed volumes. We can do better.

I’m developing a parser for the State Decoded for these history sections, so that rather than displaying this cryptic content, instead the material will be provided in plain English. By storing this data atomically, it’ll be possible to generate a listing of all laws that were amended in a given year, all laws amended by a given portion of the Acts of the General Assembly, or find laws similar to a given law based on their shared history of being amended within the same portion of the Acts. I’m optimistic that it’ll be possible to connect many state codes’ history records back to individual pieces of legislation, rather than just the legislature’s changelog, which opens up a potential wealth of information. (This can already be seen on Virginia Decoded for all changes from 2006 onward, such as in the “Amendment Attempts” listing on § 2.2-3705.1.)

Incidentally, Florida has the same sort of exemptions to its open records law, in s. 119.071, and its history section looks like this:

s. 4, ch. 75-225; ss. 2, 3, 4, 6, ch. 79-187; s. 1, ch. 82-95; s. 1, ch. 83-286; s. 5, ch. 84-298; s. 1, ch. 85-18; s. 1, ch. 85-45; s. 1, ch. 85-86; s. 4, ch. 85-301; s. 2, ch. 86-11; s. 1, ch. 86-21; s. 1, ch. 86-109; s. 2, ch. 88-188; s. 1, ch. 88-384; s. 1, ch. 89-80; s. 63, ch. 90-136; s. 4, ch. 90-211; s. 78, ch. 91-45; s. 1, ch. 91-96; s. 1, ch. 91-149; s. 90, ch. 92-152; s. 1, ch. 93-87; s. 2, ch. 93-232; s. 3, ch. 93-404; s. 4, ch. 93-405; s. 1, ch. 94-128; s. 3, ch. 94-130; s. 1, ch. 94-176; s. 1419, ch. 95-147; ss. 1, 3, ch. 95-170; s. 4, ch. 95-207; s. 1, ch. 95-320; ss. 3, 5, 6, 7, 8, 9, 11, 12, 14, 15, 16, 18, 20, 25, 29, 31, 32, 33, 34, ch. 95-398; s. 3, ch. 96-178; s. 41, ch. 96-406; s. 18, ch. 96-410; s. 1, ch. 98-9; s. 7, ch. 98-137; s. 1, ch. 98-259; s. 2, ch. 99-201; s. 27, ch. 2000-164; s. 1, ch. 2001-249; s. 29, ch. 2001-261; s. 1, ch. 2001-361; s. 1, ch. 2001-364; s. 1, ch. 2002-67; s. 1, ch. 2002-256; s. 1, ch. 2002-257; ss. 2, 3, ch. 2002-391; s. 11, ch. 2003-1; s. 1, ch. 2003-16; s. 1, ch. 2003-100; s. 1, ch. 2003-137; ss. 1, 2, ch. 2003-157; ss. 1, 2, ch. 2004-9; ss. 1, 2, ch. 2004-32; ss. 1, 3, ch. 2004-95; s. 7, ch. 2004-335; s. 4, ch. 2005-213; s. 41, ch. 2005-236; ss. 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, ch. 2005-251; s. 14, ch. 2006-1; s. 1, ch. 2006-158; s. 1, ch. 2006-180; s. 1, ch. 2006-181; s. 1, ch. 2006-211; s. 1, ch. 2006-212; s. 13, ch. 2006-224; s. 1, ch. 2006-284; s. 1, ch. 2006-285; s. 1, ch. 2007-93; s. 1, ch. 2007-95; s. 1, ch. 2007-250; s. 1, ch. 2007-251; s. 1, ch. 2008-41; s. 2, ch. 2008-57; s. 1, ch. 2008-145; ss. 1, 3, ch. 2008-234; s. 1, ch. 2009-104; ss. 1, 2, ch. 2009-150; s. 1, ch. 2009-169; ss. 1, 2, ch. 2009-235; s. 1, ch. 2009-237; s. 1, ch. 2010-71; s. 1, ch. 2010-171; s. 1, ch. 2011-83; s. 1, ch. 2011-85; s. 1, ch. 2011-140; s. 48, ch. 2011-142; s. 1, ch. 2011-201; s. 1, ch. 2011-202.

Wow. I’m not yet entirely clear on what all of that means, but I’m getting there.

Bills are Not a Changelog—Why You Can’t Turn Legislation into Laws

There is a common belief that since laws are the result of legislation, then surely one can automatically assemble an amended version of the code based on the bills that have passed the legislature. This is both a really cool idea and wrong.

Your standard narrative of how a bill becomes law doesn’t really cover what it purports to cover. Usually what’s really being explained is how a bill passes, but not how it becomes law. There’s a whole process between the passage of a bill and the encoding of that bill in a state’s codified laws.

Legislatures pass hundreds or thousands of bills every year, some of which are budgetary, some of which are to define their own rules, some of which pertain to state administration, and some of which are resolutions. The remainder are patches to be applied to law, either proposing a new section or amending an existing one (by either adding or removing material). These patches look familiar to anybody who has seen a prettied-up diff, and it’s wholly logical to figure that amending the state’s laws should simply mean collecting all of the bills that pass and applying those patches to the laws.

The trouble is that these patches are not always the last word on the changes that are going to be made to existing law.

Virginia is the state in which I have the most knowledge of this process, so I’ll provide a few words about my state’s process. The Virginia Code Commission is a tiny state agency, with just seven employees, who is charged with overseeing the code. Their duties are spelled out in Title 30 (General Assembly), Chapter 15 (Virginia Code Commission) of the state code, but the interesting bit is § 30-149 (Authority for minor changes to the Code of Virginia):

The Commission may correct unmistakable printer’s errors, misspellings and other unmistakable errors in the statutes as incorporated into the Code of Virginia, and may make consequential changes in the titles of officers and agencies, and other purely consequential changes made necessary by the use in the statutes of titles, terminology and references, or other language no longer appropriate.

The Commission may renumber, rename, and rearrange any Code of Virginia titles, chapters, articles, and sections in the statutes adopted, and make corresponding changes in lists of chapter, article, and section headings, catchlines, and tables, when, in the judgment of the Commission, it is necessary because of any disturbance or interruption of orderly or consecutive arrangement.

The Commission may correct unmistakable errors in cross-references to Code of Virginia sections and may change cross-references to Code of Virginia sections which have become outdated or incorrect due to subsequent amendment to, revision, or repeal of the sections to which reference is made.

The Commission may omit from the statutes incorporated into the Code of Virginia provisions which, in the judgment of the Commission, are inappropriate in a code, such as emergency clauses, clauses providing for specific nonrecurring appropriations and general repealing clauses.

(TL;DR: The Code Commission can make a lot of changes during that in-between period when a bill has passed, but it’s not quite yet a law.)

It is not unusual for the legislature to amend a bill at the very last minute, without proper review. From the marked-up format of a bill (words crossed out, others inserted) what can emerge is grammatically incorrect or even contains logical errors. It’s surely a judgment call whether such problems can be fixed by the Virginia Code Commission or whether it will require the General Assembly to fix them, which may well require a delay of nearly a year.

Here, for instance, is a selection of the 35 changes made to the Code of Virginia by Code Commission staff since the 2011 edition was published:

Section Correction Date
2.2-311 Catchline, after “authority of” change “investigation” to “investigators” 12/1/2011
2.2-515.1 Second sentence, after “responsibility to” add “(i) establish an address confidentiality program in accordance with § 2.2-515.2, (ii)” and change “programs and shall report” to read “programs, and (iii) report” 10/25/2011
2.2-2338 In first paragraph, (i) change “13 voting members” to “12 voting members”; (ii) insert “and” after “Commerce and Trade,”; and (iii) delete “and the Assistant to the Governor for Commonwealth Preparedness” 8/5/2011
2.2-2699.5 Subsection B, replace “Assistant to the Governor for Commonwealth Preparedness” with “Secretary of Veterans Affairs and Homeland Security” 8/5/2011
2.2-4509 Change “AA by Moody’s” to “Aa by Moody’s” 10/25/2011
6.2-314 End of catchline, change “institution” to “institutions” 10/25/2011
6.2-412 End of catchline, change “improvement” to “improvements” 10/25/2011

In Virginia—as in other states, although I don’t know how many—an attempt to use legislation as a changelog for the state code would yield results that would be very convincing-looking, but that would deviate substantially from the official code. Bills must frequently pass through a human filter before they become laws. That’s not something that you can simulate with a Ruby gem or a PEAR package. Some things just require some thought, in ways that can’t yet be automated.

Outsourcing Online Code Display Hinders Innovation

Although most states provide a copy of their laws online, some outsource this to LexisNexis. Arkansas, Colorado, Georgia, Mississippi, Tennessee, and Washington D.C. all do so. While this might seem like a decent solution at first blush, it’s actually incredibly problematic, and serves as a major obstacle to innovation within those states.

It is self-evident that state laws ought to be disseminated as widely as possible and be as accessible as possible. To follow the law, people must first be able to know what it says. Projects like the State Decoded (or Legix.info’s California Codes, or OregonLaws.org, or Justia’s US Law directory) rely on access to the text of the law. These services take the raw material of the laws and make substantial improvements to them, making this important information more accessible and understandable than they are on their state-sanctioned websites.

When states punt to LexisNexis, they make their state codes a dead end.

Washington D.C. used to provide their code on dccouncil.washington.dc.us. No longer. Now it’s found only on LexisNexis’s website. Any D.C. resident who wants to read the code—their own laws—must first agree to LexisNexis’s terms of use, which allow visitors merely “the right to download using the commands of the Online Services and store in machine-readable form, primarily for that Authorized User’s exclusive use, a single copy of insubstantial portions of those Authorized Legal Materials.” LexisNexis goes on to explain that, “for the avoidance of doubt, downloading and storing Materials in an archival database is prohibited.” LexisNexis’s terms of use make it impossible to do anything of interest with D.C.’s code. No value can be added to it. The strangely specific prohibition on storing data in a database (e.g., Zotero) ensures that.

The problem is not LexisNexis per se, but rather their strikingly restrictive licensing terms of materials that, were it not for those terms, could be reproduced freely.

Unless Arkansas, Colorado, Georgia, Mississippi, Tennessee, and Washington D.C. provide bulk downloads—which is rare—or can be persuaded to provide an electronic copy of their laws, LexisNexis’s licensing terms are an immovable object that prevents the advance of any private-sector effort to enhance the display of those laws.

The Road Ahead

With yesterday’s launch of Virginia Decoded, there are suddenly a lot of people who would love to set up The State Decoded for their own state, something that isn’t possible just yet. This calls for an explanation of what the plan is.

Building Virginia Decoded was an evenings-and-weekends hobby for me starting in the summer of 2010. I learned the structure of the Code of Virginia as I went, and built the site explicitly to mirror that structure. Some friends who were alpha testing the site in late 2010 insisted that it could be used in other states. I applied to the John S. and James L. Knight Foundation for the funding to overhaul the Virginia Decoded code base, abstract it enough that it could support the widely varying structure of legal codes throughout the United States, and turn it into a proper open source project. In June the Knight Foundation named the State Decoded project one of the winners of the 2011 News Challenge. The funding came through a few days before the end of 2011, and I was able to get started on the project two weeks ago.

Launching Virginia Decoded was easy, because I’d created the site long before getting started on The State Decoded.

The next task is to scrub the Virginia Decoded source of all material that shouldn’t be released publicly (passwords, API keys, etc.), at which point it can all go up on Github. (You can find Virginia Decoded’s parser on Github already.)

Then comes the real work, which is eliminating all of the functionality that is fundamentally Virginian, and replacing it with more flexible functionality. For instance, the Code of Virginia is broken into titles, which are broken into chapters, which is broken into sections (each section is a single law). But California’s laws are broken down into codes, which are broken into divisions, which are broken into chapters, which are broken into articles, which are broken into sections. As a result, California’s laws just won’t work in the software’s existing framework, which is premised on the assumption that codes are divided into three levels, and that those levels are called “titles,” “chapters,” and “sections.” This and other, similar problems are wholly solvable—they’ll just require some reflection and some time. Luckily, solving those problems is my full-time job for the foreseeable future, courtesy of the Knight Foundation.

When the State Decoded code base is sufficiently abstracted to work across states, that’s when things will get fun.

Eager to get started for your own state? Watch the project on Github, check out the parser, and contact me to let me know what state you’re interested in and in what capacity you want to help make that happen.

Virginia Decoded is Live

The very first State Decoded site went into public beta this morning: Virginia Decoded. This site was the initial one that snowballed into the State Decoded project, and proved to be a good testing ground for the software and, indeed, the concept. Virginia Decoded isn’t so much done as it’s done enough. There’s so very much more to be done, but the site has reached a point where it will benefit strongly from having actual people use it, and where actual people will—hopefully—benefit strongly from using it.

Virginia provides its code as SGML (which they, in turn, are provided by LexisNexis), making it relatively easy to extract the laws and store them in the State Decoded. Many states do not provide bulk downloads at all, so extracting their laws requires the laborious work of screen-scraping. Virginia is ahead in that regard.

More helpful than anything else was the Virginia Code Commission. That’s the official body that oversees the laws of the commonwealth, and they proved to be hugely helpful in testing the site. They provided invaluable information about how bills really become laws (it’s not as simple as you might think), and obsess about the details of the code the same way a great programmer obsesses about the details of…er…code. Without their input, Virginia Decoded would be a very convincing-looking but ultimately inaccurate website.

The process by which this website was put together is one that will be replicated in other states. We’ll find partners in states throughout the nation and, whenever possible, work with the state agency that oversees the state’s laws to craft a site that is the best fit for that state and its code. It will be a laborious process, but that’s what it takes to create a good, long-lasting network of state-level open government websites.

The Surprisingly Interesting History of the Virginia Code

Virginia Lawyer published an article in their February 2000 issue, by the UVA Law School’s Kent C. Olson, providing a fascinating history of the Code of Virginia. “The Path of Virginia Codification” explains how the Code of 1819 gave way to the Code of 1849, which was in turn replaced by the Code of 1887, then the Code of 1919, and finally the Code of 1950. Each new iteration didn’t just reflect changes in the law by intervening General Assembly sessions, but also a gradual rethinking of what a code should look like, how it should be organized, what purpose it should serve, and how it should be assembled.

Looking back, it seems ludicrous that the official collection of state laws would only be updated once a generation. All of the changes made by the biennial meetings of the legislature in the interim needed to be tracked, and a series of slim volumes had to be consulted to determine how the current law varied, if at all, from the last time that they were collected and printed.

The conclusion that I draw after reading this is that it’s time for the concept of printed legal volumes to disappear, at least as the primary venue for the dissemination of the text of the law. In most (all?) states, the printed edition is the canonical edition of the legal code, and everything else is basically for entertainment purposes only. Many states put a disclaimer on their code’s website to that effect. Illinois, for example:

This site contains provisions of the Illinois Compiled Statutes from databases that were created for the use of the members and staff of the Illinois General Assembly. The provisions have NOT been edited for publication, and are NOT in any sense the “official” text of the Illinois Compiled Statutes as enacted into law. The accuracy of any specific provision originating from this site cannot be assured, and you are urged to consult the official documents or contact legal counsel of your choice. This site should not be cited as an official or authoritative source.

One is tempted to conclude that states cannot long occupy this fantasy world, but the crudeness of most state’s websites for their codes support the notion that they may be able to spend many happy years in their world yet.

The Messiness of Real-World Data

Those of us who work with big data have a tendency to describe working with it in cavalier terms. “Oh, I just grabbed the XML file, wrote a quick parser to turn it into CSV, bulk loaded it into MySQL, laid an API on top of it, and I was done.” The truth is that things very rarely go so well.

Real-world data is messy. Data doesn’t convert correctly the first time (or, often, the tenth time.) File formats are invalid. The provided data turns out to be incomplete. Parser code that was so straightforward when written for the abstract concept of this data quickly turns into a series of conditionals to deal with all of the oddities of the real data.

While I’ve been developing the parser for The State Decoded, it’s become obvious that state laws themselves are too messy to standardize entirely, and the data formats in which states provide those laws are, in turn, too messy to import easily.

As a case study of the messiness of real-world data, here are some of the challenges I’ve encountered in parsing state legal codes.
 

Encoding Errors

A precious few states provide their state codes as bulk data. While it might seem like a real gift to get an XML file of every state law, if the XML is invalid, then that’s really more of a white elephant. One state provided me with SGML that they had, in turn, been provided with by LexisNexis. It was riddled with hundreds of errors, and could not be parsed automatically. After hours of attempting to fix problems by hand, I finally threw in the towel. Weeks later, LexisNexis provided a corrected version, and my work could continue.

Often, working with big data means doing pathbreaking work. The result is that sometimes nobody has ever before attempted to do anything with the data sets in question. Assembled by well-meaning but inexperienced people, those data sets may consequently be encoded incorrectly.

Changing Realities

State laws are occasionally restructured, renumbering huge portions of the code. Virginia’s entire criminal code—Title 18.1— became Title 18.2 about fifteen years ago. No redirects exist in 18.1, no pointers to the new location, no sign of what was. One must simply know that it changed. Court cases, articles from legal journals, or attorney generals’ opinions that cited sections of code within 18.1 are thus either useless or must be passed through a hand-crafted filter to point the citation to the new section number.

It would be nice if reality would consent to remain static to ease the process of cataloging it. But the world changes, and data reflects those changes. That can make it awfully frustrating to parse and apply that data, but that’s just the price of admission.

Inconsistencies in the Data

There are at least a few states who violate their standard state code structure. They might structure their code by dividing it into titles, each title into chapters, and each chapter into sections. Except, sometimes, when chapters are called “articles.” Why do they do this? I have no idea. If lawmakers consulted with database developers prior to recodifying their state’s laws, no doubt our legal codes would be normalized properly.

These inconsistencies might be illogical, but they’re how it is, and must be reflected in the final application of the data. This can be particularly frustrating if the provided bulk data doesn’t record these inconsistencies internally, requiring the gathering of external information to be applied to the data, as is often the case.

Missing Data

One state’s code contains periodic parenthetical asides like “See Editor’s note.” What does the editor’s note say? There’s no way to tell—it’s not part of the bulk data or the state’s official website for their legal code. Those editor’s notes will have to be obtained from the state’s code commission, which will probably come in the form of a bunch of Word files attached to an e-mail.

Not all data exists electronically, and not all data that exists electronically exists in a single location. Often, piecing together a meaningful data set requires gathering information from disparate sources, sometimes in awkward ways. And sometimes the last few bits of data just aren’t available, and the data set is going to have to be incomplete.
 

All of these problems are solvable, in one way or another, but those solutions can be time-consuming. The ratio of the Pareto principle applies here: one is liable to get 80% of the data set whipped into shape in the first 20% of the time. The remaining 20% of the data will require the remaining 80% of the time. That first 80% feels magical—everything just falling into place—but that last 20% is just plain hard work.

Real-world data is messy. Working with big data means cleaning it up.