Keeping certain stored fields uncompressed

2024-01-26 Thread Srijan
Hi All, I'm currently facing a significant performance challenge with Apache Solr 9.x and would greatly appreciate any insights or suggestions you might have. Context: In my Solr setup, I have a custom post filter that is critical to our search process. This filter needs to read a specific stored

Solr 9.4 - Help regarding vector search min Similarity threshold with knn parser

2024-01-26 Thread kumar gaurav
HI Everyone I am using vector search in Solr 9.4. I am using cosine similarity with knn parser. Same as the documentation https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html Schema Query q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0] The problem is it always returns

Re: Solr 9.4 - Help regarding vector search min Similarity threshold with knn parser

2024-01-26 Thread Srijan
I have been testing dense vector search on Solr and it's been working great for me so far. Mine is an image search use case using OpenAI's CLIP model but the configurations are pretty much the same as yours. What embedding model are you using? And can you share a portion of the actual query? On Fr

[dev help wanted] /admin/segments handler: expose the term count

2024-01-26 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Everyone, Have you used or are you curious about the segments info handler and/or screen? https://solr.apache.org/guide/solr/latest/configuration-guide/index-segments-merging.html#segments-info-screen If so then would you be interested in contributing to the https://issues.apache.org/jira/br

Re: Solr 9.4 - Help regarding vector search min Similarity threshold with knn parser

2024-01-26 Thread kumar gaurav
HI Srijan Thanks for replying. I am using the BERT open source model to generate vectors. Are you aware of any minSimilary parameter threshold in knn parser ? I am working with an ecommerce dataset. So I am getting the same non relevant results and the same score if I am using any invalid search

Re: Expanding child document matches with parent fields

2024-01-26 Thread Frederic Font Corbera
Hi, Thanks for your suggestion. I already tried that, but unfortunately it is not what I need because it will not sort results according the the child score (which I need), and also it would return only one parent even if several of its children would match. My current solution using the domain p

Setting up Basic Authentication on Solr Cloud

2024-01-26 Thread Flowerday, Matthew J
Hi There I have been tasked with setting up Basic Authentication on our SolrCloud database running ZooKeeper 3.8 and Solr 9.1.1. I have got it working I think but there are a few things I would like to check. I set up a security.json file and placed it in the server/solr folder in a single ser

Re: Solr 9.4 - Help regarding vector search min Similarity threshold with knn parser

2024-01-26 Thread Charlie Hull
Hi Kumar, kNN will return the k closest vectors, which as you've found out may not be very close at all. Most of the approaches we're seeing as we work with e-commerce clients involve combining kNN with a standard, lexical search in some way - combining the results from both, or using one to

Re: Solr 9.4 - Help regarding vector search min Similarity threshold with knn parser

2024-01-26 Thread Alessandro Benedetti
Hi Kumar, Knn search in Apache Solr doesn't support any min-threshold parameter. To be honest, even if it did, you wouldn't be in a much better position: your perceived relevance won't necessarily match the 0-1 cosine similarity between your query and your vectors, and what you consider highly rele

Re: Keeping certain stored fields uncompressed

2024-01-26 Thread Shawn Heisey
On 1/26/24 03:38, Srijan wrote: Since upgrading to Solr 9.x, I've observed a drastic decrease in performance – approximately 10 to 20 times slower than before. And this stems from the fact that stored fields in Solr 9.x are now compressed. Decompressing these fields during each search query has i

Re: [dev help wanted] /admin/segments handler: expose the term count

2024-01-26 Thread Rahul Goswami
I would love to take this up. On Fri, Jan 26, 2024 at 6:46 AM Christine Poerschke (BLOOMBERG/ LONDON) < cpoersc...@bloomberg.net> wrote: > Hi Everyone, > > Have you used or are you curious about the segments info handler and/or > screen? > https://solr.apache.org/guide/solr/latest/configuration-g

Re: Keeping certain stored fields uncompressed

2024-01-26 Thread Mikhail Khludnev
Hello. Agreed. By default it's BEST_SPEED which is LZ4. So, it can't be faster or less compressive. Binary DocValues Field should be an answer. On Fri, Jan 26, 2024 at 9:41 PM Shawn Heisey wrote: > On 1/26/24 03:38, Srijan wrote: > > Since upgrading to Solr 9.x, I've observed a drastic decrease

Re: Expanding child document matches with parent fields

2024-01-26 Thread Mikhail Khludnev
Hi, I don't fully follow, but I remember that there's a function for sorting parents by matching children https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#childfieldfield-function unfortunately its' vice versa stuck in implementation https://issues.apache.org/jira/browse/

Re: Keeping certain stored fields uncompressed

2024-01-26 Thread Srijan
I stand corrected. Looks like my stored fields were compressed in Solr 8.11 too. But something seems to have changed in 9.x. Decompression is awfully slow. New algorithm? Regarding binary field, Solr doesn't allow docvalues for binary field (btw Lucene does). So I tried using stored binary field b

Re: Keeping certain stored fields uncompressed

2024-01-26 Thread ufuk yılmaz
Hi, just curious, may I ask how did you come to the conclusion that the compression of fields is the cause of slowness in 9.4? — > On 26 Jan 2024, at 23:13, Srijan wrote: > > I stand corrected. Looks like my stored fields were compressed in Solr 8.11 > too. But something seems to have changed

Re: Keeping certain stored fields uncompressed

2024-01-26 Thread Walter Underwood
You seem to be jumping to conclusions about causes. Might want to step back and do some measurements. Try eliminating parts of the query one at a time, including returning fields. You might need to do this with a query set of a few thousand queries to avoid cache effects. wunder Walter Underwo

Re: Keeping certain stored fields uncompressed

2024-01-26 Thread Mikhail Khludnev
On Fri, Jan 26, 2024 at 11:14 PM Srijan wrote: > Regarding binary field, Solr doesn't allow docvalues for binary field (btw > Lucene does). https://solr.apache.org/guide/solr/latest/indexing-guide/field-types-included-with-solr.html mentions BinaryField FWIW BinaryDocValues has no compression an

Re: Setting up Basic Authentication on Solr Cloud

2024-01-26 Thread Jan Høydahl
Hi, You probably want to enable SSL for Solr if you use BasicAuth. For ZK, ACL protection could be the first step, as described in https://solr.apache.org/guide/solr/latest/deployment-guide/zookeeper-access-control.html Protecting ZK connection with SSL is probably also smart, but it is unfortun