Achim Stein (University of Stuttgart) & Carola Trips (University of Mannheim) _ Working with the Penn Parsed Corpora of Historical English and French

https://www.ling.uni-stuttgart.de/institut/team/Stein-00011/

https://www.phil.uni-mannheim.de/anglistik/abteilungen/anglistik-iv/team/prof-dr-carola-trips/

 

 

ECLASS: 

> Participants will receive a link for an e-class & dropbox.

 

 

 

 

This course provides an introduction to working with linguistically annotated corpora, and more specifically with the Penn Parsed Corpora of Historical English and French. The methods are applicable to any other corpus annotated in the Penn format. At the end of the course participants will know the main types of corpora and how to use them for research. They will be able to work with the syntactic annotation of the Penn corpora and how to query them with CorpusSearch. They will know how to add further information to the corpora, e.g. lemmas using the BASICS Toolkit, or syntactic features using the coding function of CorpusSearch. They will be able to use the results for quantitative studies (with Excel filters or with statistical software like R). The course will be very interactive with many hands-on exercises after input sessions introducing the topics mentioned above. Although we will start from scratch it would be helpful if participants had a look beforehand at the resources listed below. They should also bring a computer with a spreadsheet software (Excel, OpenCalc etc.), and the downloaded French historical corpus (available free of charge at https://github.com/beatrice57/mcvf-plus-ppchf).

 

 

Monday, July 24

General intro: What are digital corpora?

 

Which types do exist and what can they be used for?

 

How can corpora be queried?

 

Anglistik Toolbox http://anglistik-toolbox.uni-mannheim.de/

 

 

Tuesday, July 25

Introduction to the Penn corpora

 

Structure, syntactic annotation, lemmatization, query language

 

CorpusSearch

 

https://www.ling.upenn.edu/hist-corpora/

 

https://corpussearch.sourceforge.net/

 

http://basics-toolkit.spdns.org/

 

 

Wednesday, July 26

Working with the Penn Corpora of Middle English

 

Investigating one phenomenon, using your knowledge

 

from the previous session

Thursday, July 27

Coding Penn corpora

 

Using CorpusSearch queries from previous sessions and

 

corpus coding, analysing coded output in spreadsheets

 

 

Friday, July 28

Doing quantitative studies with your knowledge

 

Investigating one phenomenon, using your knowledge

 

from the previous session

 

 

Resources:

 

http://anglistik-toolbox.uni-mannheim.de/

 

http://basics-toolkit.spdns.org/

 

https://www.english-corpora.org/

 

https://www.ling.upenn.edu/hist-corpora/

 

https://catalog.ldc.upenn.edu/LDC2020T16

 

https://quod.lib.umich.edu/m/middle-english-dictionary/dictionary

 

https://github.com/beatrice57/mcvf-plus-ppchf