Aparavi Data Toolchain
- Let’s build our first project.
- We’ll choose the Hold Unstructured Data Challenge bucket and Team 1. Now we see the data in the bucket.
- We’ll start with connecting a document parser node. You can see by clicking the red dot, we can identify the different nodes that can be parsed for the data type.
- Now we make a connection. We’ll choose text in this case and preprocess the text chunking it into smaller pieces that we can vectorize.
- Using the default text splitter, we’ll choose an embedding model. In this case, we could use an open source model like the sentence transformers or an OpenAI embedding model.
- We’ll enter the API key provided and save.
- Choose a vector database. In this case, we’re gonna go for VVA. As explained in the videos.
- We’ll enter the credentials to the VVA vector database cluster.
- We’ll use the URL endpoint on the VVA Cloud Server, skipping the HTTPS header. Click back. Get the API key. And then name the collection, so we can identify it later, then click Save.
- Fill in the boxes, then back to the very first box and start the chain.
- It will begin vectorizing the text. First, chunking it into smaller pieces, then turning it into vector embeddings.
- When complete, the vector database can be used to query and build your application.