[jira] [Updated] (SOLR-17679) Request for Documentation on Hybrid Lexical and Vector Search with Score Breakdown and Cutoff Logic

Khaled Alkhouli (Jira) Thu, 20 Feb 2025 05:33:06 -0800


     [ 
https://issues.apache.org/jira/browse/SOLR-17679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Khaled Alkhouli updated SOLR-17679:
-----------------------------------
     Attachment: Screenshot from 2025-02-20 16-31-48.png
    Description: 
Hello Apache Solr team,

I am building a hybrid search engine that combines lexical search (traditional 
keyword-based search) and vector search (semantic search using embeddings) in a 
single request. I’m aiming to achieve the following in one request:
 # *Lexical Search:* Using edismax with specified fields and weights.
 # *Vector Search:* Using K-Nearest Neighbors (KNN) based on embeddings.
 # *Hybrid Score Combination:* The final score is the sum of the normalized 
lexical score and the vector search score. If a document appears in only one 
search, the other score should be treated as zero.

I have implemented the following logic using Python:
{code:java}
def hybrid_search(query, top_k=10):
    embedding = np.array(embed([query]), dtype=np.float32
    embedding = list(embedding[0])
    lxq= rf"""{{!type=edismax 
                qf='text'
                q.op=OR
                tie=0.1
                bq=''
                bf=''
                boost=''
            }}({query})"""
    solr_query = {"params": {
        "q": "{!bool filter=$retrievalStage must=$rankingStage}",
        "rankingStage": 
"{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))",
        "retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", # 
Union
        "normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)",
        "lexicalQuery": lxq,
        "vectorQuery": f"{{!knn f=all_v512 topK={top_k}}}{embedding}",
        "fl": "text",
        "rows": top_k,
        "fq": [""],
        "rq": "{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}",
        "rqq": "{!frange l=$cutoff}query($rankingStage)",
        "sort": "score desc",
    }}
    response = requests.post(SOLR_URL, headers=HEADERS, json=solr_query)
    response = response.json()
    return response {code}
The response returns documents with a combined score, which I assume is the 
addition of:
 * *Lexical Search Score:* Normalized between 0 and 1.
 * *Vector Search Score:* Already bounded between 0 and 1.

If a document is present in one search but not the other, the score from the 
missing part is added as zero. Attached is an image of the current output.
h3. *Request:*

I would like documentation or guidance on the following:
 # {*}View and Return Individual Scores:{*}{*}{*}1.1 Lexical search score
1.2 Vector search score
1.3 Final combined score (already retrieved)
I would like to display all three scores in the response together for each 
document.
 # *Cutoff Logic:*
I am using a Python function to calculate a cutoff threshold based on the 
scores. Is it possible to implement this cutoff directly in Solr so that only 
documents that pass a certain threshold are returned? If so, how can I achieve 
this within Solr’s query syntax, without relying on external Python logic?
How can I retrieve the following scores in the same request?

 *  

I appreciate any help or documentation that can assist with:
 * Returning separate scores for lexical and vector queries.
 * Implementing cutoff logic natively in Solr.

Thank you!

  was:
Hello Apache Solr team,

I am building a hybrid search engine that combines lexical search (traditional 
keyword-based search) and vector search (semantic search using embeddings) in a 
single request. I’m aiming to achieve the following in one request:
 # *Lexical Search:* Using edismax with specified fields and weights.
 # *Vector Search:* Using K-Nearest Neighbors (KNN) based on embeddings.
 # *Hybrid Score Combination:* The final score is the sum of the normalized 
lexical score and the vector search score. If a document appears in only one 
search, the other score should be treated as zero.

I have implemented the following logic using Python:
{{}}
{code:java}
def hybrid_search(query, top_k=10):
embedding = np.array(embed([query]), dtype=np.float32)     embedding = 
list(embedding[0])     lxq = rf"""{{!type=edismax                  qf='all_txt' 
                q.op=OR                 tie=0.1             
}}({query_terms})"""     solr_query = {         "params": {             "q": 
"{!bool filter=$retrievalStage must=$rankingStage}",             
"rankingStage": 
"{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))",             
"retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", # Union    
         "normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)",    
         "lexicalQuery": lxq,             "vectorQuery": f"{{!knn f=all_v512 
topK={top_k}}}{embedding}",             "fl": "post_id,all_txt,score",          
   "rows": top_k,             "fq": [""],             "rq": "{!rerank 
reRankQuery=$rqq reRankDocs=100 reRankWeight=3}",             "rqq": "{!frange 
l=$cutoff}query($rankingStage)",             "sort": "score desc",             
"cutoff": f"{cutoff_ratio}"         }     }     response = 
requests.post(SOLR_URL, headers=HEADERS, json=solr_query)     response = 
response.json()     return response
{code}
The response returns documents with a combined score, which I assume is the 
addition of:
 * *Lexical Search Score:* Normalized between 0 and 1.
 * *Vector Search Score:* Already bounded between 0 and 1.

If a document is present in one search but not the other, the score from the 
missing part is added as zero.
h3. *Request:*

I would like documentation or guidance on the following:
 # *View and Return Individual Scores:*
How can I retrieve the following scores in the same request?

 ** Lexical search score
 ** Vector search score
 ** Final combined score (already retrieved)
I would like to display all three scores in the response together for each 
document.

 # *Cutoff Logic:*
I am using a Python function to calculate a cutoff threshold based on the 
scores. Is it possible to implement this cutoff directly in Solr so that only 
documents that pass a certain threshold are returned? If so, how can I achieve 
this within Solr’s query syntax, without relying on external Python logic?

I appreciate any help or documentation that can assist with:
 * Returning separate scores for lexical and vector queries.
 * Implementing cutoff logic natively in Solr.

Thank you!

         Labels: hybrid-search search solr vector-based-search  (was: )

> Request for Documentation on Hybrid Lexical and Vector Search with Score 
> Breakdown and Cutoff Logic
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-17679
>                 URL: https://issues.apache.org/jira/browse/SOLR-17679
>             Project: Solr
>          Issue Type: Task
>          Components: search
>    Affects Versions: 9.6.1
>            Reporter: Khaled Alkhouli
>            Priority: Major
>              Labels: hybrid-search, search, solr, vector-based-search
>         Attachments: Screenshot from 2025-02-20 16-31-48.png
>
>
> Hello Apache Solr team,
> I am building a hybrid search engine that combines lexical search 
> (traditional keyword-based search) and vector search (semantic search using 
> embeddings) in a single request. I’m aiming to achieve the following in one 
> request:
>  # *Lexical Search:* Using edismax with specified fields and weights.
>  # *Vector Search:* Using K-Nearest Neighbors (KNN) based on embeddings.
>  # *Hybrid Score Combination:* The final score is the sum of the normalized 
> lexical score and the vector search score. If a document appears in only one 
> search, the other score should be treated as zero.
> I have implemented the following logic using Python:
> {code:java}
> def hybrid_search(query, top_k=10):
>     embedding = np.array(embed([query]), dtype=np.float32
>     embedding = list(embedding[0])
>     lxq= rf"""{{!type=edismax 
>                 qf='text'
>                 q.op=OR
>                 tie=0.1
>                 bq=''
>                 bf=''
>                 boost=''
>             }}({query})"""
>     solr_query = {"params": {
>         "q": "{!bool filter=$retrievalStage must=$rankingStage}",
>         "rankingStage": 
> "{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))",
>         "retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", 
> # Union
>         "normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)",
>         "lexicalQuery": lxq,
>         "vectorQuery": f"{{!knn f=all_v512 topK={top_k}}}{embedding}",
>         "fl": "text",
>         "rows": top_k,
>         "fq": [""],
>         "rq": "{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}",
>         "rqq": "{!frange l=$cutoff}query($rankingStage)",
>         "sort": "score desc",
>     }}
>     response = requests.post(SOLR_URL, headers=HEADERS, json=solr_query)
>     response = response.json()
>     return response {code}
> The response returns documents with a combined score, which I assume is the 
> addition of:
>  * *Lexical Search Score:* Normalized between 0 and 1.
>  * *Vector Search Score:* Already bounded between 0 and 1.
> If a document is present in one search but not the other, the score from the 
> missing part is added as zero. Attached is an image of the current output.
> h3. *Request:*
> I would like documentation or guidance on the following:
>  # {*}View and Return Individual Scores:{*}{*}{*}1.1 Lexical search score
> 1.2 Vector search score
> 1.3 Final combined score (already retrieved)
> I would like to display all three scores in the response together for each 
> document.
>  # *Cutoff Logic:*
> I am using a Python function to calculate a cutoff threshold based on the 
> scores. Is it possible to implement this cutoff directly in Solr so that only 
> documents that pass a certain threshold are returned? If so, how can I 
> achieve this within Solr’s query syntax, without relying on external Python 
> logic?
> How can I retrieve the following scores in the same request?
>  *  
> I appreciate any help or documentation that can assist with:
>  * Returning separate scores for lexical and vector queries.
>  * Implementing cutoff logic natively in Solr.
> Thank you!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Updated] (SOLR-17679) Request for Documentation on Hybrid Lexical and Vector Search with Score Breakdown and Cutoff Logic

Reply via email to