Re: An interesting case

2021-06-08 Thread baris . kazar
Ok, i think i fully understand now and thanks. https://stackoverflow.com/questions/15589186/lucene-4-pagination This post was really good and i would like a similar text to this to appear in the Javadocs please as it helps everyone. /"I agree with the solution explained by Jaimie. But I wan

Re: An interesting case

2021-06-08 Thread baris . kazar
May i please again suggest? the Javadocs need to be enhanced for Lucene There needs to be more info and explain parameters and more importantly in terms of performance why these two classes (TopScoreDocsCollector vs IndexSearcher) differ for performance. Thanks On 6/8/21 2:07 PM, baris.ka

Re: An interesting case

2021-06-08 Thread baris . kazar
yes i see sometimes 4000+, sometimes 3000+ hits from totalhits. So TopScoreDocsCollector is working underneath IndexSearcher.search api, right? in other words TopScoreDocsCollector will be saving time, right? Thanks On 6/8/21 1:27 PM, Adrien Grand wrote: Yes, for instance if you care about

Re: An interesting case

2021-06-08 Thread Adrien Grand
Yes, for instance if you care about the top 10 hits only, you could call TopScoreDocsCollector.create(10, null, 10). By default, IndexSearcher is configured to count at least 1,000 hits, and creates its top docs collector with TopScoreDocsCollector.create(10, null, 1000). On Tue, Jun 8, 2021 at 7:

Re: An interesting case

2021-06-08 Thread baris . kazar
Ok i think you meant something else here. you are not refering to total number of hits calculation or the mismatch, right? so to make lucene minimum work to reach the matched docs TopScoreDocCollector should be used, right? Let me check this class. Thanks On 6/8/21 1:16 PM, baris.ka..

Re: An interesting case

2021-06-08 Thread baris . kazar
Adrien my concern is not actually the number mismatch as i mentioned it is the performance. seeing those numbers mismatch it seems that lucene is still doing same amount of work to get results no matter how many results you need in the indexsearcher search api. i thought i was clear on tha

Re: An interesting case

2021-06-08 Thread Adrien Grand
If you don't need any information about the total hit count, you could create a TopScoreDocCollector that has the same value for numHits and totalHitsThreshold. This way Lucene will spend as little energy as possible computing the number of matches of the query. On Tue, Jun 8, 2021 at 6:28 PM wro

Re: On which field document is searched

2021-06-08 Thread baris . kazar
I guess you can setup an experiment like search your text against each field and then look at the score but you need to normalize the score in order to compare and normalization will include probably length of the field etc. Maybe there is an api in lucene for this but i dont know. Hope this

Re: An interesting case

2021-06-08 Thread baris . kazar
i am currently happy with Lucene performance but i want to understand and speedup further by limiting the results concretely. So i still donot know why totalHits and scoredocs report different number of hits. Best regards On 6/8/21 2:52 AM, Baris Kazar wrote: my worry is actually about t

On which field document is searched

2021-06-08 Thread Vivek Gobhil
Hi, I am creating a full text search API and one of my requirement is to find out which exact field the input text is matched to if the document has say more than 10 fields. Is there any way I can find out what is the most relevant field in the document against the input search text. Thanks i