## Macroanalysis Methodology

The basic process to mine the Decisions involved scraping them using a downloading
program called wget, OCRing them using Google's tesseract, then running them through Latent
Dirichlet Allocation processing- a program called MALLET. To do most of this work, I used the
statistical programming language R, and the integrated development environment called RStudio
to perform the analysis and visualize the results. Latent Dirichlet Allocation, a form of
topic modeling requires much trial and error in creating stoplists, which are the words ignored by
the program. In my case I used a traditional English stop list of words like a, the, it, plus tribal
and geographical names. Had I not removed these, my topics would have mostly been tribal
groups like “Pueblos” or “Sioux.” Literary practitioners of “distant reading” call this the
“character problem.”^{5} I also removed common geographical terms and especially states and cities
as these weren't at the heart of my interest. What was left was particular topics that the cases
were about. Topic modeling also requires the user to determine an optimal number of topics,
which I did with trial and error. All of my code is on my github page.

## Typical ICC Land Case

The images below show a word cluster of the topics, larger words are more heavily weighted in the topic. The graph shows the relative impact of the topic in each decision over time. The following topics follow what Rosenthal describes as a prototypical ICC case.^{6}

### Expected Behaviors

The three phases match up with their topics and occur approximately when we would expect them to occur over the span of the ICC's lifespan. So our test passes the logic test and behaves as expected. This baseline provides confidence that our other findings have significance and are not randomly generated.

A topic based around expert witnesses behaved similarly to how the prevailing historiography described it. The frequency with which lawyers got paid also increased throughout the proceedings (unsurprisingly).

## Looking at the Decisions from a Distance

Many of the topics line up with what is expected based on the historiography. But, as Ben Schmidt has noted, topics that stand on their own, analyzed over time should be given additional scrutiny. A cluster dendrogram is an excellent way to review the topics as they interact together. The algorithm determines which topics are most like each other. To employ the oft-used puzzle analogy- those topics represent the decisons like puzzle pieces represent a whole picture. If I’d sliced them into twice as many topics, they would look very different than now, but would line up to form the same whole. If I reduced the number of topics, aspects of each would fold into the others, but they would likely folk in where the dendrogram indicates.

### Cluster Dendrogram

So “lands approximately early found purchase” should fold into “treaty lands land ceded consideration” (lower-right of the dendrogram). This certainly makes intuitive sense, but do all the topics fit together logically?

They more or less do. There are a few odd connections. The nearest neighbor to the previously mentioned “expert witness” topic is “proposed settlement” (left third of dendrogram). Both are legally driven topics, but usually would come at different phases of a case. If these topics were correlated that might imply some interesting legal strategies- use an expert witness, then offer to settle. Or it could be that cases which relied on expert witnesses were more likely to settle because the underlying facts were clearly established.

The dendrogram proved useful for exploring possible connections that my reading, and the prevailing historiography haven’t picked up on. There is still work to be done to further analyze how the decisions work in conversation with the proposed topics. You can review the dendrogram at the right to see all of the topics and how they interacted together.

But one topic caught my interest as to when it peaked chronologically in the Commissions term.

### Notes

5. Matthew Jockers, “Secret Recipe for Topic Modeling Themes,” http://www.matthewjockers.net/2013/04/12/secretrecipe- for-topic-modeling-themes/, 4/12/2013.

6. Rosenthal, *Their Day in Court*, 161-164.