Khaled Alkhouli created SOLR-17679: -------------------------------------- Summary: Request for Documentation on Hybrid Lexical and Vector Search with Score Breakdown and Cutoff Logic Key: SOLR-17679 URL: https://issues.apache.org/jira/browse/SOLR-17679 Project: Solr Issue Type: Task Components: search Affects Versions: 9.6.1 Reporter: Khaled Alkhouli
Hello Apache Solr team, I am building a hybrid search engine that combines lexical search (traditional keyword-based search) and vector search (semantic search using embeddings) in a single request. I’m aiming to achieve the following in one request: # *Lexical Search:* Using edismax with specified fields and weights. # *Vector Search:* Using K-Nearest Neighbors (KNN) based on embeddings. # *Hybrid Score Combination:* The final score is the sum of the normalized lexical score and the vector search score. If a document appears in only one search, the other score should be treated as zero. I have implemented the following logic using Python: {{}} {code:java} def hybrid_search(query, top_k=10): embedding = np.array(embed([query]), dtype=np.float32) embedding = list(embedding[0]) lxq = rf"""{{!type=edismax qf='all_txt' q.op=OR tie=0.1 }}({query_terms})""" solr_query = { "params": { "q": "{!bool filter=$retrievalStage must=$rankingStage}", "rankingStage": "{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))", "retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", # Union "normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)", "lexicalQuery": lxq, "vectorQuery": f"{{!knn f=all_v512 topK={top_k}}}{embedding}", "fl": "post_id,all_txt,score", "rows": top_k, "fq": [""], "rq": "{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}", "rqq": "{!frange l=$cutoff}query($rankingStage)", "sort": "score desc", "cutoff": f"{cutoff_ratio}" } } response = requests.post(SOLR_URL, headers=HEADERS, json=solr_query) response = response.json() return response {code} The response returns documents with a combined score, which I assume is the addition of: * *Lexical Search Score:* Normalized between 0 and 1. * *Vector Search Score:* Already bounded between 0 and 1. If a document is present in one search but not the other, the score from the missing part is added as zero. h3. *Request:* I would like documentation or guidance on the following: # *View and Return Individual Scores:* How can I retrieve the following scores in the same request? ** Lexical search score ** Vector search score ** Final combined score (already retrieved) I would like to display all three scores in the response together for each document. # *Cutoff Logic:* I am using a Python function to calculate a cutoff threshold based on the scores. Is it possible to implement this cutoff directly in Solr so that only documents that pass a certain threshold are returned? If so, how can I achieve this within Solr’s query syntax, without relying on external Python logic? I appreciate any help or documentation that can assist with: * Returning separate scores for lexical and vector queries. * Implementing cutoff logic natively in Solr. Thank you! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org