The Dictionary node enables lookups, tagging, or transformations using a predefined dictionary structure. It can operate over documents, questions, or general text, and emit structured outputs to be used in retrieval-augmented generation (RAG), classification, or enrichment pipelines.
Inputs
- Documents – Optional supporting documents for context or reference
- Text- Raw or structured text to process through the dictionary
- Questions/Term – Optional list of user queries for contextual matching or enrichment
Outputs
- Definitions – Generated term definitions
- Documents – Documents with annotations or tags resulting from dictionary matches
- Answers – Structured answers or matches from the dictionary
- Questions – Processed or enriched questions that may include mapped terms or synonyms
Configuration
Term Extraction
- Extraction Method – Method to identify terms
- Default – “auto”
- Options – auto, regex, nlp, predefined
- Max Terms – Maximum terms to extract per document
- Default – 20
- Notes – Limits processing time
- Min Term Length – Minimum character length for terms
- Default – 3
- Notes – Filters out short terms
- Term Filters – Filters to apply to extracted terms
- Default – []
- Examples – POS tags, frequency thresholds
Definition Generation
- Definition Source – Source for definitions
- Default – “context”
- Options – context, llm, dictionary, custom
- LLM Provider – LLM provider for definitions
- Default – null
- Notes – Required if definition source is “llm”
- Context Window – Number of characters around term for context
- Default – 500
- Notes – For context-based definitions
- Definition Length – Target character length for definitions
- Default – 200
- Notes – Approximate length guidance
Advanced Settings
- Deduplication – Remove duplicate terms
- Default – true
- Notes – Across all processed documents
- Custom Dictionary – Path to custom dictionary file
- Default – null
- Notes – For custom definition sources
- Term Grouping – Group related terms
- Default – false
- Notes – Creates term hierarchies
Example Usage
Basic Term Extraction and Definition
This example shows how to configure the Definitions component for basic term extraction and definition:
{
"extractionMethod": "nlp",
"maxTerms": 15,
"minTermLength": 4,
"definitionSource": "context",
"contextWindow": 600,
"definitionLength": 150,
"deduplication": true
}
LLM-Powered Technical Glossary
For creating a technical glossary using an LLM:
{
"extractionMethod": "nlp",
"maxTerms": 30,
"minTermLength": 3,
"termFilters": [ {"type": "pos", "tags": ["NOUN", "PROPN"]}, {"type": "frequency", "minOccurrences": 2} ],
"definitionSource": "llm",
"llmProvider": { "provider": "openai", "model": "gpt-4", "temperature": 0.3, "prompt": "Define the following technical term in 1-2 sentences: {term}" },
"definitionLength": 200,
"deduplication": true,
"termGrouping": true
}
Example
- Input – “Our infrastructure uses Kubernetes and S3.”
- Dictionary – {“Kubernetes”: “cloud-native orchestrator”, “S3”: “Amazon object storage”}
- Output
- Answers – Definitions for matched terms
- Documents – Original text with terms tagged or explained
- Questions – May be rephrased with definitions added for clarity
Best Practices
Term Extraction
- Use NLP-based extraction for technical documents
- Use regex for documents with predictable term patterns
- Apply appropriate filters to reduce noise in extracted terms
- Consider domain-specific term extraction rules
Definition Generation
- Use context-based definitions when document contains explanations
- Use LLM-based definitions for technical or specialized terminology
- Provide custom dictionaries for domain-specific terminology
- Adjust context window size based on document structure
- Ensure Dictionary is Configured (Not shown in this screenshot)
- If a UI or configuration file for dictionary terms exists, populate or validate the terms being matched.
- Dictionaries typically consist of key-value pairs, synonyms, or tagged patterns.
- Connect Answers to a downstream QA or retrieval system.
- Connect Documents to a storage or annotation step.
- Connect Questions to LLM or search nodes to enrich prompt quality
Troubleshooting
Extraction Problems
- Too many irrelevant terms – Adjust term filters or increase min term length
- Missing important terms – Try different extraction methods or reduce filters
- Duplicate terms with variations – Enable term grouping
Definition Problems
- Definitions too generic – Use domain-specific LLM prompts or custom dictionaries
- Definitions missing context – Increase context window size
- Inconsistent definition quality – Try different definition sources
Technical Reference
For detailed technical information, refer to:
- Definitions API Reference
- Supported NLP Models
- Definitions Source Code ../../../aparavi-connectors/connectors/definitions/definitions.py