Community Over Code NA 2024 Travel Assistance Applications now open!
Hello to all users, contributors and Committers! [ You are receiving this email as a subscriber to one or more ASF project dev or user mailing lists and is not being sent to you directly. It is important that we reach all of our users and contributors/committers so that they may get a chance to benefit from this. We apologise in advance if this doesn't interest you but it is on topic for the mailing lists of the Apache Software Foundation; and it is important please that you do not mark this as spam in your email client. Thank You! ] The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for Community over Code NA 2024 are now open! We will be supporting Community over Code NA, Denver Colorado in October 7th to the 10th 2024. TAC exists to help those that would like to attend Community over Code events, but are unable to do so for financial reasons. For more info on this years applications and qualifying criteria, please visit the TAC website at < https://tac.apache.org/ >. Applications are already open on https://tac-apply.apache.org/, so don't delay! The Apache Travel Assistance Committee will only be accepting applications from those people that are able to attend the full event. Important: Applications close on Monday 6th May, 2024. Applicants have until the the closing date above to submit their applications (which should contain as much supporting material as required to efficiently and accurately process their request), this will enable TAC to announce successful applications shortly afterwards. As usual, TAC expects to deal with a range of applications from a diverse range of backgrounds; therefore, we encourage (as always) anyone thinking about sending in an application to do so ASAP. For those that will need a Visa to enter the Country - we advise you apply now so that you have enough time in case of interview delays. So do not wait until you know if you have been accepted or not. We look forward to greeting many of you in Denver, Colorado , October 2024! Kind Regards, Gavin (On behalf of the Travel Assistance Committee)
Re: Slow performance for phrases with terms with high ttf
This is also the sort of thing CommonGramsFilter ws designed for... https://solr.apache.org/guide/solr/latest/indexing-guide/filters.html#common-grams-filter : Date: Mon, 25 Mar 2024 10:17:48 -0400 : From: Doug Turnbull : Reply-To: users@solr.apache.org : To: users@solr.apache.org : Subject: Re: Slow performance for phrases with terms with high ttf : : As someone currently implementing a lot of positional search from scratch : (in a different side-project), I can say it's totally expected behavior : that high TTF / DF terms would be harder. To match the phrase there's : simply more candidate documents and positions to intersect, so it's : naturally a tougher problem. : : If you think about how phrase search works, you might roughly think you : 1. Find all documents with every term : 2. Iterate positions of these documents so that "Bill" is exactly one : before "Of" exactly one before "sale"... etc : : I'd say the best you could do is: : : 1. Make sure your index can fit in memory. : 2. Ensure you add any filters (fq) if you have any mandatory requirements. : Add a filter cache. Don't cache anything that's query-dependent : 3. If its a really common phrase, think about tokenizing it into a single : term "bill of sale" -> "bill_of_sale" which you could do outside the search : engine or with text analysis. With the downside you lose the ability to : match the individual terms. You could of course create a different field : for these significant phrases if its important. : : Best : -Doug : : On Mon, Mar 25, 2024 at 6:40 AM Sjoerd Smeets wrote: : : > There is a typo in my email. The term list should be like this: : > : > : >- "bill" -> df = 1.879.324, ttf = 14.145.950 : >- "note" -> df = 8.479.826, ttf = 151.249.542 : >- "sale" -> df = 7.557.685, ttf = 12.0948.163 : >- "of" -> df = 21.244.060, ttf = 6.879.196.700 : > : > : > On Mon, Mar 25, 2024 at 8:56 AM Sjoerd Smeets wrote: : > : > > Hi, : > > : > > We are experiencing quite a performance decrease when searching for : > > phrases that have terms with a high ttf value. : > > : > > E.g. searching for "note of sale" is around 3 times slower (~10 sec) than : > > the "bill of sale" `(~3 sec). This behaviour is consistent and can be : > > reproduced als when we use other terms that have a high ttf. We are : > > querying the unstemmed index. : > > : > > Terms (numDocs: 26220184): : > > : > >- "bill" -> df = 1.879.324, ttf = 14.145.950 : > >- "note" -> df = 8.479.826, ttf = 151.249.542 : > >- "sale" -> df = 7.557.685, ttf = 12.0948.163 : > >- "bill" -> df = 21.244.060, ttf = 6.879.196.700 : > > : > > : > > Is this the expected behaviour or is there something that can be : > > tuned, like a cache setting? : > > : > > Thanks, : > > Sjoerd : > > : > : -Hoss http://www.lucidworks.com/
Performance Suggestion for Dense Vectors
Hi All, I am using Dense vectors in SOLR and facing slowness in it. Each search is taking 10-25 seconds. I want to reduce the time to 5 seconds (or less ideally). Following configurations are being used. 1. *SOLR Version:* 9.3.0 2. *Lucene Version:* 9.7.0 3. *Vector Dimensions*: 384 4. *Total Shards:* 5 5. *Number of Vectors (Per shard*): 43209158 6. *JVM for each Instance:* 35GB 7. *TopK: *1000 (Getting 1000 from each shard) 8. *Rows: *1000 9. *Vector Field Schema: * 10. *Stored*: False 11. *WebServer:* Apache Tomcat 12. *System Specs*: Linux ( CPU:64, RAM:488 GB, OS:Ubuntu 20.04.6 ) Any sort of help/clue will be appreciated. Regards, Iram Tariq | Software Architect NorthBay Direct: +1 (902) 329-7329 iram.ta...@northbaysolutions.net www.northbaysolutions.com
Fwd: Solr 9 punctuation issue
I am migrating Solr 3 to Solr 9. However, the issues I have now are: 1. Solr 9 returns no results where punctuation is included within quotes (a phrase search), such as queries: "Electric Vehicles: Charging Points" 2. Solr 9 treats smart and not-smart apostrophes as different. Could Solr 9 treat them as the same for search? For example: different results for "dog’s breakfast" and "dog's breakfast" Would these be a configuration issue that can be fixed? Thanks. Best regards, John
Re: Performance Suggestion for Dense Vectors
Hi Iram, Is the machine doing lots of IO? If the hnsw graphs are not entirely in memory, performance will be poor. What JVM? You may get some benefit from simd support in java 21. Can you use the latest quantisation changes in Lucene to reduce memory footprint of the hnsw graphs? That's a large topk, but I guess you need it? Best regards Kent Fitch On Thu, 28 Mar 2024, 5:12 am Iram Tariq, wrote: > Hi All, > > I am using Dense vectors in SOLR and facing slowness in it. Each search is > taking 10-25 seconds. I want to reduce the time to 5 seconds (or less > ideally). > > Following configurations are being used. > > >1. *SOLR Version:* 9.3.0 >2. *Lucene Version:* 9.7.0 >3. *Vector Dimensions*: 384 >4. *Total Shards:* 5 >5. *Number of Vectors (Per shard*): 43209158 >6. *JVM for each Instance:* 35GB >7. *TopK: *1000 (Getting 1000 from each shard) >8. *Rows: *1000 >9. *Vector Field Schema: *class="solr.DenseVectorField" hnswMaxConnections="20" > knnAlgorithm="hnsw" >vectorDimension="384" similarityFunction="cosine" hnswBeamWidth="40"/> >10. *Stored*: False >11. *WebServer:* Apache Tomcat >12. *System Specs*: Linux ( CPU:64, RAM:488 GB, OS:Ubuntu 20.04.6 ) > > Any sort of help/clue will be appreciated. > > > > Regards, > > > Iram Tariq | Software Architect > > NorthBay > > Direct: +1 (902) 329-7329 > > iram.ta...@northbaysolutions.net > > www.northbaysolutions.com >
Re: edismax boost query(bq) with local params syntax
Hi Rajani, I used to use the '_val_' hook extensively and tend to prefer it's syntax in general... I think that bq and _val_ are generally equivalent but I could be wrong. Thx Robi On Mon, Mar 25, 2024 at 11:40 AM rajani m wrote: > ok, I figured, the syntax - bq= _query_:"" AND _val_:"" seems to be > working. > > > On Mon, Mar 25, 2024 at 1:44 PM rajani m wrote: > > > Hi Solr Users, > > > > Could you help me with the bq syntax that supports boosting a term with > > caret > > < > https://solr.apache.org/guide/7_2/the-standard-query-parser.html#boosting-a-term-with > >? > > Given the following boost query, I need to multiply the payload value > with > > 10. > > > > bq={!payload_score f=field_name v='solr' func=sum} > > > > I tried the following but no luck - > > bq=({!payload_score f=field_name v='solr' func=sum})^10 > > bq={!payload_score f=field_name v='solr'^10 func=sum} (gives syntax > error) > > > > Thank you, > > Rajani > > > > > > > > >