Text Anonymization

This module allows you to anonymize sensitive information from input text using pre-trained named entity recognition (NER) models. It replaces detected entities (like names or organizations) with a specified masking character. 

Input

  • Text – Accepts raw input text for anonymization.
  • Classifications – Optional classifications to guide or filter anonymization (if available from previous steps)

Output

  • Text – Outputs the anonymized version of the input text

Configuration Fields

  • Model – Select the NER model to use for anonymization.
    • Options
      • GLiNER Merged Large – Combines multiple datasets for broader detection
      • GLiNER Arabic – For Arabic language.
      • GLiNER Biomed Large / Small – Optimized for biomedical text.
      • GLiNER Large / Medium / Small – General-purpose models with different performance profiles.
      • GLiNER Italian, Korean, Multi – Language-specific or multilingual models.
      • Gretel GLiNER Bi-Large, Bi-Small – Business-focused anonymization models.
  • Character to use for anonymization – Defines the character that will replace detected entities in the output.
    • Example – █ or *.

Example

  • Input text – John Smith is a patient at St. Mary’s Hospital.
  • Using GLiNER Merged Large
  • With anonymization character █
  • The output may be – ████ █████ is a patient at ██████████████████.

Usage Notes

  • Choose the model based on the expected content language and domain
  • The anonymization character will be applied across all detected entity spans
  • Pair with preprocessor or OCR modules upstream, and optionally use output in QA or search pipelines