date:20240327

Community Over Code NA 2024 Travel Assistance Applications now open!

2024-03-27 Thread Gavin McDonald

Hello to all users, contributors and Committers!

[ You are receiving this email as a subscriber to one or more ASF project
dev or user
  mailing lists and is not being sent to you directly. It is important that
we reach all of our
  users and contributors/committers so that they may get a chance to
benefit from this.
  We apologise in advance if this doesn't interest you but it is on topic
for the mailing
  lists of the Apache Software Foundation; and it is important please that
you do not
  mark this as spam in your email client. Thank You! ]

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code NA 2024 are now
open!

We will be supporting Community over Code NA, Denver Colorado in
October 7th to the 10th 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Monday 6th May, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Denver, Colorado , October 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)

Re: Slow performance for phrases with terms with high ttf

2024-03-27 Thread Chris Hostetter


This is also the sort of thing CommonGramsFilter ws designed for...

https://solr.apache.org/guide/solr/latest/indexing-guide/filters.html#common-grams-filter


: Date: Mon, 25 Mar 2024 10:17:48 -0400
: From: Doug Turnbull 
: Reply-To: users@solr.apache.org
: To: users@solr.apache.org
: Subject: Re: Slow performance for phrases with terms with high ttf
: 
: As someone currently implementing a lot of positional search from scratch
: (in a different side-project), I can say it's totally expected behavior
: that high TTF / DF terms would be harder. To match the phrase there's
: simply more candidate documents and positions to intersect, so it's
: naturally a tougher problem.
: 
: If you think about how phrase search works, you might roughly think you
: 1. Find all documents with every term
: 2. Iterate positions of these documents so that "Bill" is exactly one
: before "Of" exactly one before "sale"... etc
: 
: I'd say the best you could do is:
: 
: 1. Make sure your index can fit in memory.
: 2. Ensure you add any filters (fq) if you have any mandatory requirements.
: Add a filter cache. Don't cache anything that's query-dependent
: 3. If its a really common phrase, think about tokenizing it into a single
: term "bill of sale" -> "bill_of_sale" which you could do outside the search
: engine or with text analysis. With the downside you lose the ability to
: match the individual terms. You could of course create a different field
: for these significant phrases if its important.
: 
: Best
: -Doug
: 
: On Mon, Mar 25, 2024 at 6:40 AM Sjoerd Smeets  wrote:
: 
: > There is a typo in my email. The term list should be like this:
: >
: >
: >- "bill" -> df = 1.879.324, ttf = 14.145.950
: >- "note" -> df = 8.479.826, ttf = 151.249.542
: >- "sale" -> df = 7.557.685, ttf = 12.0948.163
: >- "of" -> df = 21.244.060, ttf = 6.879.196.700
: >
: >
: > On Mon, Mar 25, 2024 at 8:56 AM Sjoerd Smeets  wrote:
: >
: > > Hi,
: > >
: > > We are experiencing quite a performance decrease when searching for
: > > phrases that have terms with a high ttf value.
: > >
: > > E.g. searching for "note of sale" is around 3 times slower (~10 sec) than
: > > the "bill of sale" `(~3 sec). This behaviour is consistent and can be
: > > reproduced als when we use other terms that have a high ttf. We are
: > > querying the unstemmed index.
: > >
: > > Terms (numDocs: 26220184):
: > >
: > >- "bill" -> df = 1.879.324, ttf = 14.145.950
: > >- "note" -> df = 8.479.826, ttf = 151.249.542
: > >- "sale" -> df = 7.557.685, ttf = 12.0948.163
: > >- "bill" -> df = 21.244.060, ttf = 6.879.196.700
: > >
: > >
: > > Is this the expected behaviour or is there something that can be
: > > tuned, like a cache setting?
: > >
: > > Thanks,
: > > Sjoerd
: > >
: >
: 

-Hoss
http://www.lucidworks.com/

Performance Suggestion for Dense Vectors

2024-03-27 Thread Iram Tariq

Hi All,

I am using Dense vectors in SOLR and facing slowness in it. Each search is
taking 10-25 seconds. I want to reduce the time to 5 seconds (or less
ideally).

Following configurations are being used.


   1. *SOLR Version:* 9.3.0
   2. *Lucene Version:* 9.7.0
   3. *Vector Dimensions*: 384
   4. *Total Shards:* 5
   5. *Number of Vectors (Per shard*): 43209158
   6. *JVM for each Instance:* 35GB
   7. *TopK: *1000  (Getting 1000 from each shard)
   8. *Rows: *1000
   9. *Vector Field Schema:  *
   10. *Stored*: False
   11. *WebServer:* Apache Tomcat
   12. *System Specs*:  Linux ( CPU:64, RAM:488 GB, OS:Ubuntu 20.04.6 )

Any sort of help/clue will be appreciated.



Regards,


Iram Tariq | Software Architect

NorthBay

Direct:  +1 (902) 329-7329

iram.ta...@northbaysolutions.net

www.northbaysolutions.com

Fwd: Solr 9 punctuation issue

2024-03-27 Thread J Zhu

I am migrating Solr 3 to Solr 9. However, the issues I have now are:

1. Solr 9 returns no results where punctuation is included within quotes (a
phrase search), such as queries: "Electric Vehicles: Charging Points"
2. Solr 9 treats smart and not-smart apostrophes as different. Could Solr 9
treat them as the same for search? For example: different results
for "dog’s breakfast" and "dog's breakfast"

Would these be a configuration issue that can be fixed?

Thanks.

Best regards,
John

Re: Performance Suggestion for Dense Vectors

2024-03-27 Thread Kent Fitch

Hi Iram,

Is the machine doing lots of IO? If the hnsw graphs are not entirely in
memory, performance will be poor. What JVM? You may get some benefit from
simd support in java 21. Can you use the latest quantisation changes in
Lucene to reduce memory footprint of the hnsw graphs? That's a large topk,
but I guess you need it?

Best regards

Kent Fitch

On Thu, 28 Mar 2024, 5:12 am Iram Tariq,
 wrote:

> Hi All,
>
> I am using Dense vectors in SOLR and facing slowness in it. Each search is
> taking 10-25 seconds. I want to reduce the time to 5 seconds (or less
> ideally).
>
> Following configurations are being used.
>
>
>1. *SOLR Version:* 9.3.0
>2. *Lucene Version:* 9.7.0
>3. *Vector Dimensions*: 384
>4. *Total Shards:* 5
>5. *Number of Vectors (Per shard*): 43209158
>6. *JVM for each Instance:* 35GB
>7. *TopK: *1000  (Getting 1000 from each shard)
>8. *Rows: *1000
>9. *Vector Field Schema:  *class="solr.DenseVectorField" hnswMaxConnections="20"
> knnAlgorithm="hnsw"
>vectorDimension="384" similarityFunction="cosine" hnswBeamWidth="40"/>
>10. *Stored*: False
>11. *WebServer:* Apache Tomcat
>12. *System Specs*:  Linux ( CPU:64, RAM:488 GB, OS:Ubuntu 20.04.6 )
>
> Any sort of help/clue will be appreciated.
>
>
>
> Regards,
>
>
> Iram Tariq | Software Architect
>
> NorthBay
>
> Direct:  +1 (902) 329-7329
>
> iram.ta...@northbaysolutions.net
>
> www.northbaysolutions.com
>

Re: edismax boost query(bq) with local params syntax

2024-03-27 Thread Robi Petersen

Hi Rajani,

I used to use the '_val_' hook extensively and tend to prefer it's syntax
in general... I think that bq and _val_ are generally equivalent but I
could be wrong.

Thx
Robi

On Mon, Mar 25, 2024 at 11:40 AM rajani m  wrote:

> ok, I figured, the syntax -  bq= _query_:"" AND _val_:"" seems to be
> working.
>
>
> On Mon, Mar 25, 2024 at 1:44 PM rajani m  wrote:
>
> > Hi Solr Users,
> >
> >   Could you help me with the bq syntax that supports boosting a term with
> > caret
> > <
> https://solr.apache.org/guide/7_2/the-standard-query-parser.html#boosting-a-term-with
> >?
> >  Given the following boost query, I need to multiply the payload value
> with
> > 10.
> >
> > bq={!payload_score f=field_name v='solr' func=sum}
> >
> > I tried the following but no luck -
> > bq=({!payload_score f=field_name v='solr' func=sum})^10
> > bq={!payload_score f=field_name v='solr'^10 func=sum} (gives syntax
> error)
> >
> > Thank you,
> > Rajani
> >
> >
> >
> >
>

Community Over Code NA 2024 Travel Assistance Applications now open!

Re: Slow performance for phrases with terms with high ttf

Performance Suggestion for Dense Vectors

Fwd: Solr 9 punctuation issue

Re: Performance Suggestion for Dense Vectors

Re: edismax boost query(bq) with local params syntax

6 matches

Site Navigation

Mail list logo

Footer information