[ https://issues.apache.org/jira/browse/SOLR-17679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Khaled Alkhouli updated SOLR-17679: ----------------------------------- Attachment: Screenshot from 2025-02-20 16-31-48.png Description: Hello Apache Solr team, I am building a hybrid search engine that combines lexical search (traditional keyword-based search) and vector search (semantic search using embeddings) in a single request. I’m aiming to achieve the following in one request: # *Lexical Search:* Using edismax with specified fields and weights. # *Vector Search:* Using K-Nearest Neighbors (KNN) based on embeddings. # *Hybrid Score Combination:* The final score is the sum of the normalized lexical score and the vector search score. If a document appears in only one search, the other score should be treated as zero. I have implemented the following logic using Python: {code:java} def hybrid_search(query, top_k=10): embedding = np.array(embed([query]), dtype=np.float32 embedding = list(embedding[0]) lxq= rf"""{{!type=edismax qf='text' q.op=OR tie=0.1 bq='' bf='' boost='' }}({query})""" solr_query = {"params": { "q": "{!bool filter=$retrievalStage must=$rankingStage}", "rankingStage": "{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))", "retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", # Union "normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)", "lexicalQuery": lxq, "vectorQuery": f"{{!knn f=all_v512 topK={top_k}}}{embedding}", "fl": "text", "rows": top_k, "fq": [""], "rq": "{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}", "rqq": "{!frange l=$cutoff}query($rankingStage)", "sort": "score desc", }} response = requests.post(SOLR_URL, headers=HEADERS, json=solr_query) response = response.json() return response {code} The response returns documents with a combined score, which I assume is the addition of: * *Lexical Search Score:* Normalized between 0 and 1. * *Vector Search Score:* Already bounded between 0 and 1. If a document is present in one search but not the other, the score from the missing part is added as zero. Attached is an image of the current output. h3. *Request:* I would like documentation or guidance on the following: # {*}View and Return Individual Scores:{*}{*}{*}1.1 Lexical search score 1.2 Vector search score 1.3 Final combined score (already retrieved) I would like to display all three scores in the response together for each document. # *Cutoff Logic:* I am using a Python function to calculate a cutoff threshold based on the scores. Is it possible to implement this cutoff directly in Solr so that only documents that pass a certain threshold are returned? If so, how can I achieve this within Solr’s query syntax, without relying on external Python logic? How can I retrieve the following scores in the same request? * I appreciate any help or documentation that can assist with: * Returning separate scores for lexical and vector queries. * Implementing cutoff logic natively in Solr. Thank you! was: Hello Apache Solr team, I am building a hybrid search engine that combines lexical search (traditional keyword-based search) and vector search (semantic search using embeddings) in a single request. I’m aiming to achieve the following in one request: # *Lexical Search:* Using edismax with specified fields and weights. # *Vector Search:* Using K-Nearest Neighbors (KNN) based on embeddings. # *Hybrid Score Combination:* The final score is the sum of the normalized lexical score and the vector search score. If a document appears in only one search, the other score should be treated as zero. I have implemented the following logic using Python: {{}} {code:java} def hybrid_search(query, top_k=10): embedding = np.array(embed([query]), dtype=np.float32) embedding = list(embedding[0]) lxq = rf"""{{!type=edismax qf='all_txt' q.op=OR tie=0.1 }}({query_terms})""" solr_query = { "params": { "q": "{!bool filter=$retrievalStage must=$rankingStage}", "rankingStage": "{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))", "retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", # Union "normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)", "lexicalQuery": lxq, "vectorQuery": f"{{!knn f=all_v512 topK={top_k}}}{embedding}", "fl": "post_id,all_txt,score", "rows": top_k, "fq": [""], "rq": "{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}", "rqq": "{!frange l=$cutoff}query($rankingStage)", "sort": "score desc", "cutoff": f"{cutoff_ratio}" } } response = requests.post(SOLR_URL, headers=HEADERS, json=solr_query) response = response.json() return response {code} The response returns documents with a combined score, which I assume is the addition of: * *Lexical Search Score:* Normalized between 0 and 1. * *Vector Search Score:* Already bounded between 0 and 1. If a document is present in one search but not the other, the score from the missing part is added as zero. h3. *Request:* I would like documentation or guidance on the following: # *View and Return Individual Scores:* How can I retrieve the following scores in the same request? ** Lexical search score ** Vector search score ** Final combined score (already retrieved) I would like to display all three scores in the response together for each document. # *Cutoff Logic:* I am using a Python function to calculate a cutoff threshold based on the scores. Is it possible to implement this cutoff directly in Solr so that only documents that pass a certain threshold are returned? If so, how can I achieve this within Solr’s query syntax, without relying on external Python logic? I appreciate any help or documentation that can assist with: * Returning separate scores for lexical and vector queries. * Implementing cutoff logic natively in Solr. Thank you! Labels: hybrid-search search solr vector-based-search (was: ) > Request for Documentation on Hybrid Lexical and Vector Search with Score > Breakdown and Cutoff Logic > --------------------------------------------------------------------------------------------------- > > Key: SOLR-17679 > URL: https://issues.apache.org/jira/browse/SOLR-17679 > Project: Solr > Issue Type: Task > Components: search > Affects Versions: 9.6.1 > Reporter: Khaled Alkhouli > Priority: Major > Labels: hybrid-search, search, solr, vector-based-search > Attachments: Screenshot from 2025-02-20 16-31-48.png > > > Hello Apache Solr team, > I am building a hybrid search engine that combines lexical search > (traditional keyword-based search) and vector search (semantic search using > embeddings) in a single request. I’m aiming to achieve the following in one > request: > # *Lexical Search:* Using edismax with specified fields and weights. > # *Vector Search:* Using K-Nearest Neighbors (KNN) based on embeddings. > # *Hybrid Score Combination:* The final score is the sum of the normalized > lexical score and the vector search score. If a document appears in only one > search, the other score should be treated as zero. > I have implemented the following logic using Python: > {code:java} > def hybrid_search(query, top_k=10): > embedding = np.array(embed([query]), dtype=np.float32 > embedding = list(embedding[0]) > lxq= rf"""{{!type=edismax > qf='text' > q.op=OR > tie=0.1 > bq='' > bf='' > boost='' > }}({query})""" > solr_query = {"params": { > "q": "{!bool filter=$retrievalStage must=$rankingStage}", > "rankingStage": > "{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))", > "retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", > # Union > "normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)", > "lexicalQuery": lxq, > "vectorQuery": f"{{!knn f=all_v512 topK={top_k}}}{embedding}", > "fl": "text", > "rows": top_k, > "fq": [""], > "rq": "{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}", > "rqq": "{!frange l=$cutoff}query($rankingStage)", > "sort": "score desc", > }} > response = requests.post(SOLR_URL, headers=HEADERS, json=solr_query) > response = response.json() > return response {code} > The response returns documents with a combined score, which I assume is the > addition of: > * *Lexical Search Score:* Normalized between 0 and 1. > * *Vector Search Score:* Already bounded between 0 and 1. > If a document is present in one search but not the other, the score from the > missing part is added as zero. Attached is an image of the current output. > h3. *Request:* > I would like documentation or guidance on the following: > # {*}View and Return Individual Scores:{*}{*}{*}1.1 Lexical search score > 1.2 Vector search score > 1.3 Final combined score (already retrieved) > I would like to display all three scores in the response together for each > document. > # *Cutoff Logic:* > I am using a Python function to calculate a cutoff threshold based on the > scores. Is it possible to implement this cutoff directly in Solr so that only > documents that pass a certain threshold are returned? If so, how can I > achieve this within Solr’s query syntax, without relying on external Python > logic? > How can I retrieve the following scores in the same request? > * > I appreciate any help or documentation that can assist with: > * Returning separate scores for lexical and vector queries. > * Implementing cutoff logic natively in Solr. > Thank you! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org