OpenAI

The Embedding – OpenAI node allows you to convert text into vector embeddings using OpenAI’s pre-trained models. These embeddings can be used in downstream tasks like similarity search or clustering. 

Inputs

  • Text – Text content to convert to embeddings
  • Document – Document objects containing text to embed

Outputs

  • Vectors – Generated vector embeddings
  • Documents – Original documents with embeddings attached

Configuration

API Settings

  • API Key – OpenAI API key
    • NOTE – Required for authentication
  • Organization ID – OpenAI organization ID
    • NOTE – Optional, for enterprise accounts

Model Settings

  • Model – OpenAI embedding model
    • Default – “text-embedding-3-small”
    • Options – text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002
      • Use text-embedding-3-small (Text Small) for most general-purpose applications
      • Use text-embedding-3-large (Text Large) for tasks requiring higher semantic resolution
      • Use text-embedding-ada-002 (Text Ada) for backward compatibility with existing systems
  • Dimensions – Vector dimensions
    • Default – 1536
    • Note – Model-dependent, can be customized for some models
  • Batch Size – Number of texts to embed at once
    • Default – 32
    • Note – Affects API usage and speed

Advanced Settings

  • Cache – Cache embeddings for reuse
    • Default – true
    • Note – Improves performance for repeated texts
  • Timeout – API request timeout
    • Default – 60
    • Note – In seconds
  • Retry Count – Number of retries on failure
    • Default – 3
    • Note – For handling transient errors
  • User – User identifier for API requests
    • Note – Optional, for tracking API usage

Samples

Basic Text Embedding

This example shows how to configure the OpenAI Embedding component for basic text embedding:
{
"apiKey": "your-api-key",
"model": "text-embedding-3-small",
"dimensions": 1536,
"batchSize": 32,
"cache": true
}

High-Dimensional Embeddings for Complex Tasks

For high-dimensional embeddings suitable for complex semantic tasks:
{
"apiKey": "your-api-key",
"organizationId": "your-org-id",
"model": "text-embedding-3-large",
"dimensions": 3072,
"batchSize": 16,
"cache": true,
"timeout": 120,
"retryCount": 5,
"user": "project-rag-pipeline"
}

Best Practices

Text Preparation

  • Preprocess text to remove noise and irrelevant content
  • Consider chunking long texts for more granular embeddings
  • Ensure consistent text formatting for comparable embeddings

API Usage Optimization

  • Use appropriate batch sizes to minimize API calls
  • Enable caching to avoid redundant embedding generation
  • Implement rate limiting to avoid API usage limits
  • Monitor API usage for cost management

Troubleshooting

API Problems

  • Authentication errors – Verify API key validity
  • Rate limit exceeded – Implement request throttling or upgrade API tier
  • Timeout errors – Increase timeout setting or reduce batch size

Embedding Quality Issues

  • Poor semantic matching – Try a higher-dimensional model
  • Inconsistent results – Standardize text preprocessing
  • High latency – Optimize batch size or implement caching

Technical Reference

For detailed technical information, refer to: