Re: Relative cpu cost of fetching term frequency during scoring

2023-06-26 Thread Adrien Grand
This is a bit surprising, can you share the profiler output (e.g. screenshot), to see what is slow within the `PostingsEnum#freq` call? `PostingsEnum#freq` may need to decode a block of freqs, but I would generally not expect it to be 5x slower than decoding doc IDs for the same block. On Thu, Ju

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-21 Thread Vimal Jain
I did profiling of new code and found that below api call is most time consuming :- org.apache.lucene.index.PostingsEnum#freq If i comment out this call and instead use some random integer for testing purpose, then perf is at least 5x compared to old code. Is there any thoughts on why term frequenc

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-21 Thread Adrien Grand
As far as your performance problem is concerned, I don't know. Can you compare the number of documents that need to be evaluated in both cases, e.g. by running `IndexSearcher#count` on your two queries. If they're similar, can you run your new query under a profiler to figure out what its bottlenec

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Vimal Jain
Thanks Adrien , I had a look at your blog post. Looks like this Scorer#getMaxScore was added in lucene 8.0 , i am using 7.7.3. A side question , is there any resource to help migrate newer major version , i see lot of api changed from v7 to v8. *Thanks and Regards,* *Vimal Jain* On Wed, Jun 21,

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Adrien Grand
Lucene has logic to only evaluate a subset of the matching documents when retrieving top-k hits. This leverages the Scorer#getMaxScore API. If you never implemented it on your custom query, then you never took advantage of dynamic pruning anyway. I wrote a bit more about it

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Vimal Jain
Thanks Adrien for quick response. Yes , i am replacing disjuncts across multiple fields with single custom term query over merged field. Can you please provide more details on what do you mean by dynamic pruning in context of custom term query ? On Tue, 20 Jun, 2023, 9:45 pm Adrien Grand, wrote:

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Adrien Grand
Intuitively replacing a disjunction across multiple fields with a single term query should always be faster. You're saying that you're storing the type of token as part of the term frequency. This doesn't sound like something that would play well with dynamic pruning, so I wonder if this is the re

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Vimal Jain
Ok , sorry , I realized that I need to provide more context. So we used to create a lucene query which consisted of custom term queries for different fields and based on the type of field , we used to assign a boost that would be used in scoring. Now we want to get rid off different fields and inst

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Adrien Grand
You say you observed a performance drop, what are you comparing against? Le mar. 20 juin 2023, 08:59, Vimal Jain a écrit : > Note - i am using lucene 7.7.3 > > *Thanks and Regards,* > *Vimal Jain* > > > On Tue, Jun 20, 2023 at 12:26 PM Vimal Jain wrote: > > > Hi, > > I want to understand if fet

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-19 Thread Vimal Jain
Note - i am using lucene 7.7.3 *Thanks and Regards,* *Vimal Jain* On Tue, Jun 20, 2023 at 12:26 PM Vimal Jain wrote: > Hi, > I want to understand if fetching the term frequency of a term during > scoring is relatively cpu bound operation ? > Context - I am storing custom term frequency during