Much of the work on the Networking Archives project has been using the metadata (people, dates, places) of correspondence rather than the content itself. Here we investigate applying text mining techniques to the printed summaries.
Most of the quantitative research on the Networking Archives project has been using the metadata from the digitised correspondence of State Papers Online. Metadata in this sense means everything except the content of the letters: including author names, recipient names, date, place of sending and so on, in the research of seventeenth-century intelligencing. Gale State Papers Online brings together a number of historical primary sources, not only the manuscript images from the State Papers, but also full text versions of the ‘Calendars of State Papers’, a set of printed finding aids mostly produced in the nineteenth century. These printed summaries represent another huge store of data available to us which we also use in the analysis of the data.
As anyone who has worked with the calendars will tell you, they have been produced to very different standards and as such they interpret the documents they represent in very different ways. They tend to suffer from an identity crisis: never quite sure if they should be purely a manuscript finding aid or a more useful description. In addition, as there’s no inherent logic behind the inconsistencies other than changing editorial policies, it’s hard to get a sense of in what way exactly they are inconsistent. Data analysis can help with this, by analysing the entire dataset at scale, to understand the changing shape of the printed calendars by time, topic, and office.
Read the full post here.