[ 
https://issues.apache.org/jira/browse/SOLR-17679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khaled Alkhouli updated SOLR-17679:
-----------------------------------
    Status: Open  (was: Patch Available)

> Request for Documentation/Feature Improvement on Hybrid Lexical and Vector 
> Search with Score Breakdown and Cutoff Logic
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-17679
>                 URL: https://issues.apache.org/jira/browse/SOLR-17679
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 9.6.1
>            Reporter: Khaled Alkhouli
>            Priority: Minor
>              Labels: hybrid-search, search, solr, vector-based-search
>         Attachments: Screenshot from 2025-02-20 16-31-48.png
>
>
> Hello Apache Solr team,
> I was able to implement a hybrid search engine that combines *lexical search 
> (edismax)* and *vector search (KNN-based embeddings)* within a single 
> request. The idea is simple:
>  * *Lexical Search* retrieves results based on text relevance.
>  * *Vector Search* retrieves results based on semantic similarity.
>  * *Hybrid Scoring* sums both scores, where a missing score (if a document 
> appears in only one search) should be treated as zero.
> This approach is working, but *there is a critical lack of documentation* on 
> how to properly return individual score components of lexical search (score1) 
> vs. vector search (score2 from cosine similarity). Right now, Solr only 
> returns the final combined score, but there is no clear way to see {*}how 
> much of that score comes from lexical search vs. vector search{*}. This is 
> essential for debugging and for fine-tuning ranking strategies.
>  
> I have implemented the following logic using Python:
> {code:java}
> def hybrid_search(query, top_k=10):
>     embedding = np.array(embed([query]), dtype=np.float32
>     embedding = list(embedding[0])
>     lxq= rf"""{{!type=edismax 
>                 qf='text'
>                 q.op=OR
>                 tie=0.1
>                 bq=''
>                 bf=''
>                 boost=''
>             }}({query})"""
>     solr_query = {"params": {
>         "q": "{!bool filter=$retrievalStage must=$rankingStage}",
>         "rankingStage": 
> "{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))",
>         "retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", 
> # Union
>         "normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)",
>         "lexicalQuery": lxq,
>         "vectorQuery": f"{{!knn f=all_v512 topK={top_k}}}{embedding}",
>         "fl": "text",
>         "rows": top_k,
>         "fq": [""],
>         "rq": "{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}",
>         "rqq": "{!frange l=$cutoff}query($rankingStage)",
>         "sort": "score desc",
>     }}
>     response = requests.post(SOLR_URL, headers=HEADERS, json=solr_query)
>     response = response.json()
>     return response {code}
> h3. *Issues & Missing Documentation*
>  # *No Way to Retrieve Individual Scores in a Hybrid Search*
> There is no clear documentation on how to return:
>  * 
>  ** The *lexical search score* separately.
>  ** The *vector search score* separately.
>  ** The *final combined score* (which Solr already provides).
> Right now, we’re left guessing whether the sum of these scores works as 
> expected, making debugging and tuning unnecessarily difficult.
>  # *No Clear Way to Implement Cutoff Logic in Solr*
> In a hybrid search, I need to filter out results that don’t meet a {*}minimum 
> score threshold{*}. Right now, I have to implement this in Python, {*}which 
> defeats the purpose of using Solr for ranking in the first place{*}.
>  * 
>  ** How can we enforce a {*}score-based cutoff directly in Solr{*}, without 
> external filtering?
>  ** The \{!frange} function is mentioned in the documentation but lacks 
> {*}clear examples on how to apply it to hybrid search{*}.
> h3. *Feature Request / Documentation Improvement*
>  * *Provide a way to return individual scores for lexical and vector search 
> in the response.* This should be as simple as adding fields like 
> {{{}fl=score,lexical_score,vector_score{}}}.
>  * *Clarify how to apply cutoff logic in a hybrid search.* This is an 
> essential ranking mechanism, and yet, there’s little guidance on how to do 
> this efficiently within Solr itself.
> Looking forward to a response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to