Hello everyone,
I hope Im posting this to the right place. Ive been working extensively with the TextToVector features from the Solr-LLM module. I use the Update Processor to embed chunks from crawled documents, into a search engine. For performance reasons, I had to rework my process to use atomic update, instead of embeddings all documents at indexing. Here is the processor chain I use (in solrconfig.xml): <updateRequestProcessorChain name="datafari-embed"> <processor class="solr.llm.texttovector.update.processor.TextToVectorUpdateProcessorFac tory"> <str name="inputField">embedded_content</str> <str name="outputField">${texttovector.outputfield:vector_1536}</str> <str name="model">${texttovector.model:default_model}</str> </processor> <processor class="com.francelabs.datafari.updateprocessor.TextToVectorUpdateProcessorFa ctory"> <str name="enabled">true</str> <str name="outputField">${texttovector.outputfield:vector_1536}</str> </processor> <processor class="solr.LogUpdateProcessorFactory"/> <processor class="solr.DistributedUpdateProcessorFactory"/> <processor class="solr.RunUpdateProcessorFactory"/> </updateRequestProcessorChain> Here is the workflow: * A background job crawls the collection, and sends atomic update requests for each document. * These requests target the /update/embed endpoint, using the processor chain above. * The request is processed, the embedded_content is embedded, and stored in the outputField (knn dense vector field) Here is an example of atomic update request, similar to those generated by the job: curl -X POST "http://localhost:8983/solr/VectorMain/update/embed" \ -H "Content-Type: application/json" \ -d '[ { "id": "file://///localhost/mini/helloworld.txt_0", "embedded_content": { "set": "Hello world" } } ]' I use the langchain4J OpenAI to call my own LLM API to process the embeddings. However, the embedding model receives {set=Hello world} instead of just "Hello world", which breaks the semantic vector generation. For now, I am using Solr 9.8. I saw that Solr 9.9 documentation mentioned partial update for vector embeddings (https://solr.apache.org/guide/solr/latest/query-guide/text-to-vector.html). Has this issue been fixed in 9.9 ? Is there a recommended workaround or patch to ensure that only the string value is passed to the embedding model, and not the atomic update syntax itself? Thank you ! Kind regards, Emeric Bernet-Rollande France Labs Your knowledge, now Datafari Enterprise Search - Retrouvez-nous au salon <https://www.bigdataparis.com/> Big Data & IA les 1 et 2 octobre à Paris, stand C31 <https://www.bigdataparis.com/>