The Data – Fingerprinter node generates a unique identifier or fingerprint for any incoming data object. This fingerprint can be used to detect duplicates, ensure data integrity, or track data lineage across a pipeline.
Input
- Data – Accepts any structured or unstructured data input. This may include text, files, records, or serialized objects. The fingerprint will be generated based on the content
Output
- Data – Forwards the original data downstream, now enriched with a fingerprint tag or identifier. The fingerprint does not alter the content but appends metadata
How to Use
- Connect any data-producing node to the Data input (example parser, file loader, database reader)
- The node computes a consistent fingerprint for the input data
- Connect the Data output to downstream nodes that require identity checking, deduplication, or integrity validation
Notes
- Fingerprints are typically generated using cryptographic hashes (example SHA-256)
- This node does not modify the data payload; it appends a non-intrusive metadata tag
- Useful for caching, change tracking, and deduplication strategies