Text Pipeline
Text processing pipeline is made of the following steps
Annotators
Annotate batches of pre-processed multilingual texts with an array of probabilities using binary models trained on natural disasters related texts.
Feature Extraction operations:
- extract required
- normalize URLs
- normalize hashtags
- remove punctuation
- normalize white spaces
- remove new lines
- remove dates
- merge neighbouring word duplicates
Available annotators:
Geocoder
Matches geographic locations identified from texts using DeepPavlov Named Entity Recognizer (NER) against a given gazetteer.