Voyant: Text Visualization and New User Guide – Online Portfolio

Introductory Thoughts

The content of this post relates to the digital collection “Born in Slavery: Slave Narratives from the Federal Writers’ Project, 1936 to 1938”. The information about this collection can be found here from the Library of Congress. In essence, this was an oral history project produced by the Federal Writers’ Project (FWP) of the Works Progress Administration during the New Deal era. This project had set out to collect narratives from former slaves who detailed their time and experiences working, living, and, in some cases, escaping from plantations and bondage servitude.

This post uses this digital collection of manuscripts and interview notes to explore text mining and text visualization with the help of Voyant Tools. Furthermore, this post contains two major sections: the first discusses what this digital text tool allows one to discover, and the second section provides a general guide for a new user looking to understand and interact with the basic texting algorithms and data sets.

I: Discovery

Overview:

This first section contains two subsections: text analysis and text visualization. As stated earlier, the text that has been analyzed and computationally run through Voyant are the salve narratives from the Library of Congress digital archive collection.

Subsection 1: Text Analysis

One of the major two functions of Voyant is text analysis or text mining. By definition, this is the process when computer software “reads” a typed or early readable hand written document in order to create statistical data on word usage, verb usage, context usage, and many more. This software is called Optimal Character Recognition or OCR, and it is, essentially, a computer algorithm designed to reach letters and words. In many ways, one is pulling apart (or micro-analyzing) the document down to the word usage and verbiage itself. OCR allows for the creation of statistical data. This data can be in the form of how many times a word is used in a document, how many times a word phrase is used, or how words change between documents when looking at a larger digital collection or corpus.

Subsection 2: Text Visualization

The second major aspect to text mining and, more importantly, to the Voyant tool is text visualization. This is just as it sounds, creating visual representations of the data set. Voyant and other text mining tools will often provide more than just a list as to the frequency of words, they will provide various charts (line, graph, pie, etc.) to show visual representations of the data. For example, if there are three essays scanned through Voyant, the software will not only create a list of frequent words for each document, it will also provide charts to show how each document compares to one another.
Simply, this text tool allows quantitative evidence to be produced from qualitative sources. This is one of many tools used by scholars in digital humanities. What these scholars are able to do, therefore, are to come to statistical, computationally driven conclusions on language usage, cultural terms of slang, and even regional differences in language. In other words, quantitative methods applied to qualitative sources allows for new conclusions.
In the end, one of the best ways to understand is to use practical examples. Therefore, I direct my readers to section two, “A Users Guide”.

II: A Users Guide

Screen capture by author and all rights reserved therein.

Overview:

This section section will give a brief overview of the Voyant Tool, and how data is presented with a given corpus of work — in this case, its the slave narrative collection from the Library of Congress. This section will have five subsections: Corpus, Reader, Trends, Summary, and Context. First, however, one must import a set or singular document by uploading or copying/pasting into the Voyant homepage (a picture of the homepage can be seen above). Once the document(s) are uploaded, press “Reveal” and Voyant will automatically create statical data for the five subsection (Corpus, Reader, Trends, Summary, and Context)

- Circus:
  - - - Essentially, this is a word cloud of all the high frequency words across the whole corpus or all of the documents. Just like other word clouds, larger and bolder the word signifies higher frequency. There are two main function in this section. The first is “scale” and “terms”. In the lower left hand corner of that section, the “scale” function allows a user to see a word cloud for only documents chosen. Simply, one may view a word cloud for one, two, three, or all of the documents. The “terms” function allows you to expand how many terms are used. Simply, the default is set so that only the top 55 terms of the corpus are displayed and calculated into the word cloud. Slid the “terms” bar to the right to include more terms.
        
        The second function is on the top left, “terms” and “links”. Using the “terms” function creates a standardized list of top frequency terms. Using the “links” creates a word-web and show the top frequency terms and their surrounding context words.
        
        I have attached an HTML encoded link of the word cloud below for one to play and explore with.

- Reader:
  - The “Reader” section provides term frequency and visual data. On the x-axis of the section, there is a color coded chart displaying the comparative size of each document. Furthermore, one is able to pick any word and that word is displays how frequently it is used in the document. In other words, this section allows one to read the scanned or OCR’ed documents, and have an active interface for text mining and text visualization. In this corpus, the documents are headings according to state in which the former slave originated.

- Trends:
  - This section shows line graphs of word frequency across the corpus or a singular document. For example, choose a word from the word cloud – in this instances the word “white” was chosen. The “trends” chart adjusts accordingly. Furthermore, to see individual documents, double-click on a data point or dot. This will give the user an option of either “term” or “document”. The former presents a line graph showing the distribution of that term across the entire corpus, while the latter presents a line graph showing the distribution of that term within a specific state/document.
  - When finished, press “Reset” and the chart is reset. If one wishes to choose a new word, one may search using the search bar in the bottom left hand corner.

- Summary:
  - Simply, this section contains a great deal of textual data regarding the corpus. It displays data relating to document length, vocabulary density, and distinctive word comparison. This section allows a user to choose a word that is highlighted as distinctive and focus on just that word. For this example, one may choose a distinctive word from any state, and then that word will be calculated into the other sections. In the “reader” section, all documents containing that word are presented. One is able to read through the documents for context. However, the “context” section provides the most detailed data for contextual context data. See next section.

- Context:
  - This final section displays the contextual context of each word. One is able to choose a distinctive word, and then read the surrounding context of that word. Press the small + symbol in order to read the surrounding material. Use the bottom left toggle labeled “context” to add word to either the left or right side of the context word, use the “expand” button to see the more of the paragraph.