The Embedding – OpenAI node allows you to convert text into vector embeddings using OpenAI’s pre-trained models. These embeddings can be used in downstream tasks like similarity search or clustering.
Inputs
- Text – Text content to convert to embeddings
- Document – Document objects containing text to embed
Outputs
- Vectors – Generated vector embeddings
- Documents – Original documents with embeddings attached
Configuration
API Settings
- API Key – OpenAI API key
- NOTE – Required for authentication
- Organization ID – OpenAI organization ID
- NOTE – Optional, for enterprise accounts
Model Settings
- Model – OpenAI embedding model
- Default – “text-embedding-3-small”
- Options – text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002
- Use text-embedding-3-small (Text Small) for most general-purpose applications
- Use text-embedding-3-large (Text Large) for tasks requiring higher semantic resolution
- Use text-embedding-ada-002 (Text Ada) for backward compatibility with existing systems
- Dimensions – Vector dimensions
- Default – 1536
- Note – Model-dependent, can be customized for some models
- Batch Size – Number of texts to embed at once
- Default – 32
- Note – Affects API usage and speed
Advanced Settings
- Cache – Cache embeddings for reuse
- Default – true
- Note – Improves performance for repeated texts
- Timeout – API request timeout
- Default – 60
- Note – In seconds
- Retry Count – Number of retries on failure
- Default – 3
- Note – For handling transient errors
- User – User identifier for API requests
- Note – Optional, for tracking API usage
Samples
Basic Text Embedding
This example shows how to configure the OpenAI Embedding component for basic text embedding:
{
"apiKey": "your-api-key",
"model": "text-embedding-3-small",
"dimensions": 1536,
"batchSize": 32,
"cache": true
}
High-Dimensional Embeddings for Complex Tasks
For high-dimensional embeddings suitable for complex semantic tasks:
{
"apiKey": "your-api-key",
"organizationId": "your-org-id",
"model": "text-embedding-3-large",
"dimensions": 3072,
"batchSize": 16,
"cache": true,
"timeout": 120,
"retryCount": 5,
"user": "project-rag-pipeline"
}
Best Practices
Text Preparation
- Preprocess text to remove noise and irrelevant content
- Consider chunking long texts for more granular embeddings
- Ensure consistent text formatting for comparable embeddings
API Usage Optimization
- Use appropriate batch sizes to minimize API calls
- Enable caching to avoid redundant embedding generation
- Implement rate limiting to avoid API usage limits
- Monitor API usage for cost management
Troubleshooting
API Problems
- Authentication errors – Verify API key validity
- Rate limit exceeded – Implement request throttling or upgrade API tier
- Timeout errors – Increase timeout setting or reduce batch size
Embedding Quality Issues
- Poor semantic matching – Try a higher-dimensional model
- Inconsistent results – Standardize text preprocessing
- High latency – Optimize batch size or implement caching
Technical Reference
For detailed technical information, refer to:
- OpenAI Embeddings API Documentation
- OpenAI Embedding Models
- Aparavi Source Code ../../../aparavi-connectors/connectors/embedding_openai/openai.py