TLCMap Recogito Get Started Guide

Pelagios Recogito have a 10 Minute Tutorial but here are a few quick basics, and information about the new TLCMap features for handling Corpora (in Folders), adding places not already in the gazetteer, time attributes and more export options for use in other systems.

TLCMap adds functionality and enhancements to Recogito that we see as essential. This is done on our own Recogito set up, seperate to the main one. We anticipate these two will be merged back together in time, without losing your data. In the mean time, use the TLCMap instance if you want to:

Key terms

Basics

This step by step guide describes a basic common task of identifying places in a text using recogito, from beginning to end.

Recogito is open source software produced by Pelagios. The TLCMap development team is working on a seperate version to add new features. These will all be incorporated back into the main version when the project concludes. You work will be retained.

Create Plain Text File

Add File To Recogito

Auto Detect Place Names

Correct Placenames in Text

View Map

Backup And Archive Work

New TLCMap Features - Corpora, Additional Places, Export Options

To Create A Corpus

Files in a folder can be processed as a corpus. This is useful where you have a large amount of small text, such as 100 newspaper articles, that you can process all at once, and regard as a whole.

  1. Log in
  2. Click the blue '+New' button to create a folder (eg: I have each chapter of Watkin Tench as a text file, so that I can attach a date to each chapter, so that I can see patterns of change in places mentioned by chapter over time, and I want to treat them all as a corpus, so I call my new folder 'WatkinTench'.)
  3. Ensuring you are in the folder, by checking the breadcrumbs at top of the page(eg: 'My Documents > WatkinTench'), drag and drop your text files on the page.

Add Metadata To Files

Document metadata will be exported in your results. This can be useful for further analysis or display in other systems.

  1. To edit the metadata for a file, double click the file if it is not already open, so that you can see the text.
  2. Click the spanner and screwdriver icon at the top to see the 'Document Settings', edit and save.
  3. If you have a large corpus of texts you can upload a CSV spreadsheet of metadata. This can be handy if you have many texts, such as 100 newspaper articles with dates and publication information, etc. You might have these in Excel already, or find it faster to do the data entry in Excel (in Excel, simply 'save as' and choose CSV as the file type).
  4. Go to the folder that holds your corpus and click 'Download CSV metadata'. Now you have a spreadsheet you can open in Excel in the right format to fill out.
  5. Fill in the data, perhaps copy and pasting from some other source.
  6. If you are not sure of valid values for some columns such as 'Licence', set it manually in the Document Settings and export, then copy that value.
  7. For StartDate and EndDate two formats are allowed: dd-mm-yyyy (eg: 1867-03-27) or dd/mm/yyyy (eg: 27/03/1867) This is because Excel sometimes auto 'corrects' the format. TIP: always double check Excel is not converting your dates to US format which might treat dates such as January the 4th as April the 1st).
  8. Click the button to upload the CSV.

Identify Places

Note: NER will be performed on any file you have highlighted. Keep in mind that this may overwrite any manual changes you have made. In the first instance you probably want to do the whole corpus, but if you add another file later, you probably want to only select that file.

  1. To select all, select the first file, hold shift and select the last file.
  2. Under the blue options menu, choose 'Named Entity Recognition'.
  3. You can choose from some languages the text is in, and the gazetteers that will be used to resolve placenames. You can simply use them all, but you may want to be specific. Eg: if you are working not working on ancient Europe, you might improve the accuracy of place name matching by excluding these texts (ie: 'Roma' will have a better chance of matching the town in Queensland if you choose the Australian gazetteer, and exclude gazetteers of ancient Europe). Note that you can include the 'gazetteer' of 'User Contributions'. These are places added to Recogito by users because they were not already in a gazetteer. These are not from an official or vetted gazetteer - use of them is your choice.
  4. Click "Start NER". For a small text file of a few paragraphs this will be immediate. For large files, such as a whole book this could take a few minutes.
  5. When complete the Title of files that have had NER done are highlighted orange.

Check and Correct

  1. Double click the file to open it.
  2. NER automatically identified places are highlighted in grey.
  3. You can click on them to confirm or change the identified place.
  4. Click 'Change' to choose from possible alternatives found in the gazetteers. This will prompt you to make all places in the text that place, or just this instance.
  5. If the place is not in any gazetteer, click 'Create place' at the top of the window. This will ask you for some details about the place. The minimum is the name and coordinates.
  6. When confirmed a place is highlighted in Green.
  7. If a place has not been identified select the word or phrase and click 'Place'. The options to confirm, change, or create a place are as above.

View

  1. You can view the map and annotations in two ways.
  2. The second icon at the top of the page provides the map and if you click it you can see the annotations with some surrounding text, and click to go to the text.
  3. The third icon at the top is the TLCMap addition allowing you to see the map and text side by side, for the whole corpus.
  4. Click a place in the text to go to that place on the map.
  5. Click the annotation in the map pop-up to got to that location in the text.

Export Results

Data can be exported in standard formats for further analysis, visualisation or archiving in other systems.

  1. Open a file. (If you want to export a corpus, just open a file within that folder.)
  2. Click the 'Download Options' icon at the top of the page,
  3. TLCMap has added options so data can be exported with some of the following variations:
    • in the format KML, CSV or GeoJSON
    • at the level of file or corpus
    • listing by 'place' or by 'annotation'. (in some cases with each annotation is listed under the place)
  4. If you need to convert the file from one format to another, there are GIS file converters on the web.
Import the file into another system, such as:
  1. Another GIS for further work.
  2. Excel for further analysis or working with other data.
  3. Ordinal Time to visualise how places change by order of occurence in text.
  4. Quick Coordinates to transform the data into a journey.
  5. Temporal Earth to view the data over time, or journeys (journey format exported from Quick Coordinates).
  6. STMetrics to compare basic statistical information, and identify clusters in space and time.
  7. GHAP to add any place names identified, or at least these attestations, and so others can find it.
  8. Heurist to build a database around it.
  9. Archive it with your research with ROCrate and Describo.