This module allows you to anonymize sensitive information from input text using pre-trained named entity recognition (NER) models. It replaces detected entities (like names or organizations) with a specified masking character.
Input
- Text – Accepts raw input text for anonymization.
- Classifications – Optional classifications to guide or filter anonymization (if available from previous steps)
Output
- Text – Outputs the anonymized version of the input text
Configuration Fields
- Model – Select the NER model to use for anonymization.
- Options
- GLiNER Merged Large – Combines multiple datasets for broader detection
- GLiNER Arabic – For Arabic language.
- GLiNER Biomed Large / Small – Optimized for biomedical text.
- GLiNER Large / Medium / Small – General-purpose models with different performance profiles.
- GLiNER Italian, Korean, Multi – Language-specific or multilingual models.
- Gretel GLiNER Bi-Large, Bi-Small – Business-focused anonymization models.
- Options
- Character to use for anonymization – Defines the character that will replace detected entities in the output.
- Example – █ or *.
Example
- Input text – John Smith is a patient at St. Mary’s Hospital.
- Using GLiNER Merged Large
- With anonymization character █
- The output may be – ████ █████ is a patient at ██████████████████.
Usage Notes
- Choose the model based on the expected content language and domain
- The anonymization character will be applied across all detected entity spans
- Pair with preprocessor or OCR modules upstream, and optionally use output in QA or search pipelines