Text Analysis Tools & Resources

Much of the following comes from Northeastern’s Lab resources and Miriam Posner’s crowdsourced “Corpra of Interest” document.

Getting Started

Python

R (Programming Language)

Topic Modeling

Word Embedding Models

Datasets

Plain Text

TEI-Encoded

    • U.S. Presidents’ Inaugural Speeches
    • Abraham Lincoln Speeches and Letters
    • Documenting the American South
      • The Church in the Black Community
      • First-Person Narratives of the American South (African Americans, women, enlisted men, Native Americans, ex-slaves, etc.)
      • North American Slave Narratives
    • Sunday School Books in 19th Century America
    • The Grange Visitor (Michigan newspaper)
    • Historic American Cookbooks
    • Adult British Fiction – 1880s (by gender)
    • Children’s Fiction – 1880s (by gender) (I have formatted some of these data, ask me)
    • William Wordsworth writings
    • Book summaries and film summaries from Wikipedia
    • U.S. patents related to the humanities

Text in a Variety of Formats

WP-Backgrounds Lite by InoPlugs Web Design and Juwelier Schönmann 1010 Wien
Skip to toolbar