Khaled Alkhouli created SOLR-17679:
--------------------------------------

             Summary: Request for Documentation on Hybrid Lexical and Vector 
Search with Score Breakdown and Cutoff Logic
                 Key: SOLR-17679
                 URL: https://issues.apache.org/jira/browse/SOLR-17679
             Project: Solr
          Issue Type: Task
          Components: search
    Affects Versions: 9.6.1
            Reporter: Khaled Alkhouli


Hello Apache Solr team,

I am building a hybrid search engine that combines lexical search (traditional 
keyword-based search) and vector search (semantic search using embeddings) in a 
single request. I’m aiming to achieve the following in one request:
 # *Lexical Search:* Using edismax with specified fields and weights.
 # *Vector Search:* Using K-Nearest Neighbors (KNN) based on embeddings.
 # *Hybrid Score Combination:* The final score is the sum of the normalized 
lexical score and the vector search score. If a document appears in only one 
search, the other score should be treated as zero.

I have implemented the following logic using Python:
{{}}
{code:java}
def hybrid_search(query, top_k=10):
embedding = np.array(embed([query]), dtype=np.float32)     embedding = 
list(embedding[0])     lxq = rf"""{{!type=edismax                  qf='all_txt' 
                q.op=OR                 tie=0.1             
}}({query_terms})"""     solr_query = {         "params": {             "q": 
"{!bool filter=$retrievalStage must=$rankingStage}",             
"rankingStage": 
"{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))",             
"retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", # Union    
         "normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)",    
         "lexicalQuery": lxq,             "vectorQuery": f"{{!knn f=all_v512 
topK={top_k}}}{embedding}",             "fl": "post_id,all_txt,score",          
   "rows": top_k,             "fq": [""],             "rq": "{!rerank 
reRankQuery=$rqq reRankDocs=100 reRankWeight=3}",             "rqq": "{!frange 
l=$cutoff}query($rankingStage)",             "sort": "score desc",             
"cutoff": f"{cutoff_ratio}"         }     }     response = 
requests.post(SOLR_URL, headers=HEADERS, json=solr_query)     response = 
response.json()     return response
{code}
The response returns documents with a combined score, which I assume is the 
addition of:
 * *Lexical Search Score:* Normalized between 0 and 1.
 * *Vector Search Score:* Already bounded between 0 and 1.

If a document is present in one search but not the other, the score from the 
missing part is added as zero.
h3. *Request:*

I would like documentation or guidance on the following:
 # *View and Return Individual Scores:*
How can I retrieve the following scores in the same request?

 ** Lexical search score
 ** Vector search score
 ** Final combined score (already retrieved)
I would like to display all three scores in the response together for each 
document.

 # *Cutoff Logic:*
I am using a Python function to calculate a cutoff threshold based on the 
scores. Is it possible to implement this cutoff directly in Solr so that only 
documents that pass a certain threshold are returned? If so, how can I achieve 
this within Solr’s query syntax, without relying on external Python logic?

I appreciate any help or documentation that can assist with:
 * Returning separate scores for lexical and vector queries.
 * Implementing cutoff logic natively in Solr.

Thank you!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to