Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Text Pipeline

Text processing pipeline is made of the following steps

  1. Annotators
  2. Geocoder

Annotators

Annotate batches of pre-processed multilingual texts with an array of probabilities using binary models trained on natural disasters related texts.

Feature Extraction operations:

  • extract required
  • normalize URLs
  • normalize hashtags
  • remove punctuation
  • normalize white spaces
  • remove new lines
  • remove dates
  • merge neighbouring word duplicates

Available annotators:

Geocoder

Matches geographic locations identified from texts using DeepPavlov Named Entity Recognizer (NER) against a given gazetteer.