There *is* a Solr blog site that just launched: https://solr.apache.org/blog.html
On Thu, Mar 28, 2024 at 3:49 PM rajani m <rajinima...@gmail.com> wrote: > > @Alessandro, > Is there a solr blog site where we can submit work/articles or are you > suggesting to post on my own site and share a link here? I prefer the > former if there is one because there were times when I had my own, > it hardly had any views and on top of that google blogging made me migrate > from blogs to sites and sites got deprecated. Is there or can we have a > solr specific wiki/blog site where solr users can submit common features > configs/modules configs/examples/performance metrics and so on....and maybe > have a voting/likes to confirm it works. We will have one common place to > submit and look for. > > > > On Thu, Mar 28, 2024 at 3:33 PM rajani m <rajinima...@gmail.com> wrote: > > > Run the same knn queries at a slow throughput for 30-60 minutes, this > > should warm up disk caches with hnsw index files, and then you should see a > > significant drop in the query time. Also make use of "fq" and reduce the > > document space as much as you can. > > > > On Thu, Mar 28, 2024 at 12:50 PM Iram Tariq > > <iram.ta...@northbaysolutions.net.invalid> wrote: > > > >> Hi Alessandro, > >> > >> Thank you for the feedback. Kindly see my comments below, > >> > >> *Ale*: > >> https://www.elastic.co/blog/accelerating-vector-search-simd-instructions, > >> I > >> suggest to experiment with simD vector improvements (unless you are > >> already doing it) > >> > >> * We will try this soon. * > >> > >> *Ale*: What about the machine memory? > >> > >> Following is the system specification: Linux ( CPU:64, RAM:488 GB, > >> OS:Ubuntu 20.04.6 ) > >> > >> *Ale*: you can fine-tune the hyper-parameter to compromise a bit on recall > >> in favour of performance (hnswBeamWidth, hnswMaxConnections) > >> > >> I am trying this as a first step. But I am sure it will impact recall. > >> > >> Regards, > >> > >> > >> Iram Tariq | Software Architect > >> > >> NorthBay > >> > >> Direct: +1 (902) 329-7329 > >> > >> iram.ta...@northbaysolutions.net > >> > >> www.northbaysolutions.com > >> > >> > >> > >> > >> On Thu, Mar 28, 2024 at 5:42 AM Alessandro Benedetti < > >> a.benede...@sease.io> > >> wrote: > >> > >> > That's interesting. > >> > I think it's vital to get back some performance tests from the > >> community. > >> > Since my contribution to support Vector-search in Apache Solr was > >> merged, > >> > we got little or null feedback to understand its performance, in > >> real-world > >> > use cases. > >> > Blogs, open benchmarks or even just this sort of mail message are > >> welcome. > >> > Let me reply in line: > >> > -------------------------- > >> > *Alessandro Benedetti* > >> > Director @ Sease Ltd. > >> > *Apache Lucene/Solr Committer* > >> > *Apache Solr PMC Member* > >> > > >> > e-mail: a.benede...@sease.io > >> > > >> > > >> > *Sease* - Information Retrieval Applied > >> > Consulting | Training | Open Source > >> > > >> > Website: Sease.io <http://sease.io/> > >> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > >> > <https://twitter.com/seaseltd> | Youtube > >> > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > >> > <https://github.com/seaseltd> > >> > > >> > > >> > On Wed, 27 Mar 2024 at 21:06, Kent Fitch <kent.fi...@gmail.com> wrote: > >> > > >> > > Hi Iram, > >> > > > >> > > Is the machine doing lots of IO? If the hnsw graphs are not entirely > >> in > >> > > memory, performance will be poor. What JVM? You may get some benefit > >> from > >> > > simd support in java 21. Can you use the latest quantisation changes > >> in > >> > > Lucene to reduce memory footprint of the hnsw graphs? That's a large > >> > topk, > >> > > but I guess you need it? > >> > > > >> > > Best regards > >> > > > >> > > Kent Fitch > >> > > > >> > > On Thu, 28 Mar 2024, 5:12 am Iram Tariq, > >> > > <iram.ta...@northbaysolutions.net.invalid> wrote: > >> > > > >> > > > Hi All, > >> > > > > >> > > > I am using Dense vectors in SOLR and facing slowness in it. Each > >> search > >> > > is > >> > > > taking 10-25 seconds. I want to reduce the time to 5 seconds (or > >> less > >> > > > ideally). > >> > > > > >> > > > Following configurations are being used. > >> > > > > >> > > > > >> > > > 1. *SOLR Version:* 9.3.0 > >> > > > 2. *Lucene Version:* 9.7.0 > >> > > > >> > *Ale*: > >> > > >> https://www.elastic.co/blog/accelerating-vector-search-simd-instructions, > >> > I > >> > suggest to experiment with simD vector improvements (unless you are > >> > already doing it) > >> > > >> > > > 3. *Vector Dimensions*: 384 > >> > > > 4. *Total Shards:* 5 > >> > > > 5. *Number of Vectors (Per shard*): 43209158 > >> > > > 6. *JVM for each Instance:* 35GB > >> > > > >> > *Ale*: What about the machine memory? > >> > > >> > > > 7. *TopK: *1000 (Getting 1000 from each shard) > >> > > > 8. *Rows: *1000 > >> > > > 9. *Vector Field Schema: *<fieldType name="knn_vector_384" > >> > > > class="solr.DenseVectorField" hnswMaxConnections="20" > >> > > > knnAlgorithm="hnsw" > >> > > > vectorDimension="384" similarityFunction="cosine" > >> > hnswBeamWidth="40"/> > >> > > > >> > *Ale*: you can fine-tune the hyper-parameter to compromise a bit on > >> recall > >> > in favour of performance (hnswBeamWidth, hnswMaxConnections) > >> > > >> > > > 10. *Stored*: False > >> > > > 11. *WebServer:* Apache Tomcat > >> > > > 12. *System Specs*: Linux ( CPU:64, RAM:488 GB, OS:Ubuntu > >> 20.04.6 ) > >> > > > > >> > > > Any sort of help/clue will be appreciated. > >> > > > > >> > > > > >> > > > > >> > > > Regards, > >> > > > > >> > > > > >> > > > Iram Tariq | Software Architect > >> > > > > >> > > > NorthBay > >> > > > > >> > > > Direct: +1 (902) 329-7329 > >> > > > > >> > > > iram.ta...@northbaysolutions.net > >> > > > > >> > > > www.northbaysolutions.com > >> > > > > >> > > > >> > > >> > >