performance of contextquery with many (~1000) contexts?

2025-01-10 Thread Rob Audenaerde
Hi all, I'm trying to build a (elastic) suggester that uses context in completionqueries to implement authorization for these suggestions. Basically, I only want suggestions from the contexts where the user has rights. (not sure if this is the best way, suggestions (no pun intended) welcome) What

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-16 Thread Navneet Verma
t; >>>>> can > >>>>>> try this to see if that helps. But I doubt that in this case. > >>>>>> > >>>>>> On opening the issue, I am working through some reproducible > >>>>>> benchmarks > >>>>>>

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-15 Thread Uwe Schindler
W graph searching is crazy slow... Mike McCandless http://blog.mikemccandless.com On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma < vermanavneet...@gmail.com wrote: Hi Lucene Experts, I wanted to understand the performance difference between opening and reading the whole file using an IndexIn

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-15 Thread Navneet Verma
ges > >>>>> to ensure that checksumming is always done with IOContext.READ_ONCE > >>>>> (which uses READ behind scenes). > >>>>> > >>>>> Uwe > >>>>> > >>>>> Am 29.09.2024 um 17:09 schrieb Michael McCandless

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Navneet Verma
> > use > >>> MADV_RANDOM (which is stupid), that is indeed expected to perform worse > >>> since there is no readahead pre-caching. 50% worse (what you are > > seeing) > >>> is indeed quite an impact ... > >>> > >>> May

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
hing is crazy slow... Mike McCandless http://blog.mikemccandless.com On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma < vermanavneet...@gmail.com wrote: Hi Lucene Experts, I wanted to understand the performance difference between opening and reading the whole file using an IndexInput with IoContext

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
m On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma < vermanavneet...@gmail.com wrote: Hi Lucene Experts, I wanted to understand the performance difference between opening and reading the whole file using an IndexInput with IoContext as RANDOM vs READ. I can see .vec files(storing the flat vectors) ar

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Navneet Verma
is to use RANDOM for normal readingof index and use the other IOContexts only for merging. If tis requiresfiles to be opened multiple times its a better compromise.* Yeah, I was thinking of doing something similar. But I am not 100% sure what would be the performance degradation of opening files

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
azon (product search) for our production searching processes. Otherwise paging in all .vec/.veq pages via random access provoked through HNSW graph searching is crazy slow... Mike McCandless http://blog.mikemccandless.com On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma Hi Lucene Experts, I wante

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-30 Thread Navneet Verma
all bytes/pages in .vec/.veq files -- this asks the OS to cache > > all of those bytes into page cache (if there is enough free RAM). We do > > this at Amazon (product search) for our production searching processes. > > Otherwise paging in all .vec/.veq pages via random access pr

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-30 Thread Uwe Schindler
candless.com On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma wrote: Hi Lucene Experts, I wanted to understand the performance difference between opening and reading the whole file using an IndexInput with IoContext as RANDOM vs READ. I can see .vec files(storing the flat vectors) are opened with RANDOM

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-29 Thread Michael McCandless
via random access provoked through HNSW graph searching is crazy slow... Mike McCandless http://blog.mikemccandless.com On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma wrote: > Hi Lucene Experts, > I wanted to understand the performance difference between opening and > reading the whol

Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-29 Thread Navneet Verma
Hi Lucene Experts, I wanted to understand the performance difference between opening and reading the whole file using an IndexInput with IoContext as RANDOM vs READ. I can see .vec files(storing the flat vectors) are opened with RANDOM and whereas dvd files are opened as READ. As per my testing

Re: Question about the performance of Lucene99PostingsFormat

2024-09-16 Thread Rui Wu
gt; (Lucene GITHUB#12696 <https://github.com/apache/lucene/pull/12696>: >> Change >> > Postings back to using FOR in Lucene99PostingsFormat. Freqs, positions >> and >> > offset keep using PFOR) >> > >> > However, in our (Mongodb Atlas Search) in

Re: Question about the performance of Lucene99PostingsFormat

2024-09-11 Thread Rui Wu
erts, > > > > I have a question about the following change: > > The Lucene9.11 changed the Posting list format > > (Lucene GITHUB#12696 <https://github.com/apache/lucene/pull/12696>: > Change > > Postings back to using FOR in Lucene99PostingsFormat. Freqs,

Re: Question about the performance of Lucene99PostingsFormat

2024-09-10 Thread Adrien Grand
thub.com/apache/lucene/pull/12696>: Change > Postings back to using FOR in Lucene99PostingsFormat. Freqs, positions and > offset keep using PFOR) > > However, in our (Mongodb Atlas Search) internal performance testing, we saw > an increase of query latency up to 32% on match-all

Question about the performance of Lucene99PostingsFormat

2024-09-10 Thread Rui Wu
in our (Mongodb Atlas Search) internal performance testing, we saw an increase of query latency up to 32% on match-all and match-many inverted index based queries, e.g. query.phrase-slop-0 and query.date-facet-match-all. I wonder if the community sees similar performance regressions on some queri

Re: Performance changes within the Lucene 8 branch

2023-12-14 Thread Michael McCandless
s. Mike McCandless http://blog.mikemccandless.com On Tue, Dec 12, 2023 at 4:36 PM Marc Davenport wrote: > Hello, > > We have a search application built around Lucene 8. Motivated by the list > of performance enhancements and optimizations in the change notes we > upgraded from 8.1 to 8.11.2

Performance changes within the Lucene 8 branch

2023-12-12 Thread Marc Davenport
Hello, We have a search application built around Lucene 8. Motivated by the list of performance enhancements and optimizations in the change notes we upgraded from 8.1 to 8.11.2. We track the performance of different activities within our application and can clearly see an improvement in our

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-09 Thread Michael McCandless
k access for every single byte during readByte(). > > > > Does this warrant a JIRA for regression? > > > > As mentioned, I am noticing a 10x slowdown in > SegmentTermsEnum.seekExact() > > affecting atomic update performance . For setups like mine that can't use > &

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-07 Thread Adrien Grand
y causing disk access for every single byte during readByte(). > > Does this warrant a JIRA for regression? > > As mentioned, I am noticing a 10x slowdown in SegmentTermsEnum.seekExact() > affecting atomic update performance . For setups like mine that can't use > mmap due

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Rahul Goswami
b.com/apache/lucene/issues/10297 ) I understand that this is essentially causing disk access for every single byte during readByte(). Does this warrant a JIRA for regression? As mentioned, I am noticing a 10x slowdown in SegmentTermsEnum.seekExact() affecting atomic update performance . For setups

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Adrien Grand
Yes, this changed in 8.x: - 8.0 moved the terms index off-heap for non-PK fields with MMapDirectory. https://github.com/apache/lucene/issues/9681 - Then in 8.6 the FST was moved off-heap all the time. https://github.com/apache/lucene/issues/10297 More generally, there's a few files that are no l

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Rahul Goswami
Thanks Adrien. Is this behavior of FST something that has changed in Lucene 8.x (from 7.x)? Also, is the terms index not loaded into memory anymore in 8.x? To your point on MMapDirectoryFactory, it is much faster as you anticipated, but the indexes commonly being >1 TB makes the Windows machine fr

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Adrien Grand
+Alan Woodward helped me better understand what is going on here. BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory) doesn't play well with the fact that the FST reads bytes backwards: every call to readByte() triggers a refill of 1kB because it wants to read the byte that is just be

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Adrien Grand
My best guess based on your description of the issue is that SimpleFSDirectory doesn't like the fact that the terms index now reads data directly from the directory instead of loading the terms index in heap. Would you be able to run the same benchmark with MMapDirectory to check if it addresses th

Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-05 Thread Rahul Goswami
Hello, We started experiencing slowness with atomic updates in Solr after upgrading from 7.7.2 to 8.11.1. Running several tests revealed the slowness to be in RealTimeGet's SolrIndexSearcher.getFirstMatch() call which eventually calls Lucene's SegmentTermsEnum.seekExact().. In the benchmarks I ran

Re: Performance Comparison of Benchmarks by using Lucene 9.1.0 vs 8.5.1

2022-07-26 Thread Baris Kazar
: Performance Comparison of Benchmarks by using Lucene 9.1.0 vs 8.5.1 https://urldefense.com/v3/__https://home.apache.org/*mikemccand/lucenebench/__;fg!!ACWV5N9M2RV99hQ!MxMLYjBYzRbF_h4Vx__pd6DDXhkE7Tu2WF3eudKJ-YxXBxzvpfhcAMO4Lt1zcBC9lfRrvzZ1Xg8tiSc8Xw$ shows how various benchmarks have evolved over time *on

Re: Performance Comparison of Benchmarks by using Lucene 9.1.0 vs 8.5.1

2022-07-26 Thread Michael Sokolov
https://home.apache.org/~mikemccand/lucenebench/ shows how various benchmarks have evolved over time *on the main branch*. There is no direct comparison of every version against every other version that I have seen though. On Tue, Jul 26, 2022 at 2:12 PM Baris Kazar wrote: > > Dear Folks,- > Sim

Performance Comparison of Benchmarks by using Lucene 9.1.0 vs 8.5.1

2022-07-26 Thread Baris Kazar
Dear Folks,- Similar question to my previous post: this time I wonder if there is a Lucene web site where benchmarks are run against these two versions of Lucene. I see many (44+16) api changes and (48+9) improvements and (16+15) Bug fixes, which sounds great. Best regards

RE: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Gietzen, Markus
May 2021 13:55 To: Michael McCandless ; Lucene Users Subject: RE: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0) Hi, thanks for reaching me that fast! Your hint that there were changes to NRTCachingDirectory were the right point: I copied the 8.3 NRTCachingDirectory impl

Re: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Adrien Grand
w-down. > I’ll report here. > > Bye, > > Markus > > > From: Michael McCandless > Sent: Wednesday, 19 May 2021 13:39 > To: Lucene Users ; Gietzen, Markus < > markus.giet...@softwareag.com> > Subject: Re: Performance decrease with NRT use-case in 8.8.x (coming from &

RE: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Gietzen, Markus
: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0) > The update showed no issues (e.g. compiled without changes) but I noticed > that our test-suites take a lot longer to finish. Hmm, that sounds bad. We need our tests to stay fast but also do a good job testing things ;) Doe

Re: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Michael McCandless
without changes) but I noticed > that our test-suites take a lot longer to finish. > > So I took a closer look at one test-case which showed a severe slowdown > (it’s doing small update, flush, search cycles in order to stress NRT; > the purpose is to see performance-changes in an

Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Gietzen, Markus
slowdown (it’s doing small update, flush, search cycles in order to stress NRT; the purpose is to see performance-changes in an early stage 😉 ): Lucene 8.3: ~2,3s Lucene 8.8.x: 25s This is a huge difference. Therefore I used YourKit to profile 8.3 and 8.8 and do a comparison. The gap is

Force-merge performance degrading after upgrade to Lucene 8.0

2021-03-02 Thread Xie, Eileen
Hi! After upgrading ES cluster from 6.2 to 7.9 version, we find that force merge operation will take long time, about double of previous latency. Based on our investigation, we found the follows is main cause of the force-merge performance decrease: * From Lucene 8.0, NormsProducer is

Force-merge performance degrading after upgrade to Lucene 8.0

2021-03-02 Thread Xie, Eileen
Hi! After upgrading ES cluster from 6.2 to 7.9 version, we find that force merge operation will take long time, about double of previous latency. Based on our investigation, we found the follows is main cause of the force-merge performance decrease: * From Lucene 8.0, NormsProducer is added as

Force-merge performance degrading after upgrade to Lucene 8.0

2021-03-02 Thread Xie, Eileen
Hi! After upgrading ES cluster from 6.2 to 7.9 version, we find that force merge operation will take long time, about double of previous latency. Based on our investigation, we found the follows is main cause of the force-merge performance decrease: * From Lucene 8.0, NormsProducer is

Re: best way (performance wise) to search for field without value?

2020-11-13 Thread Matt Davis
sts query like this, which is > fully in line with your investigation: if a field has docvalues it uses > DocValuesFieldExistsQuery, if it is a tokenized field it uses the > NormsFieldExistsQuery. The negative one is a must-not clause, which is > perfectly fine performance wise. >

Re: best way (performance wise) to search for field without value?

2020-11-13 Thread Uwe Schindler
fine performance wise. An alternative way to search is indexing all field names that have a value into a separate stringfield. But this needs preprocessing. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html https://issues.apache.org/jira/browse/SOLR

Re: best way (performance wise) to search for field without value?

2020-11-13 Thread Michael McCandless
That's great Rob! Thanks for bringing closure. Mike McCandless http://blog.mikemccandless.com On Fri, Nov 13, 2020 at 9:13 AM Rob Audenaerde wrote: > To follow up, based on a quick JMH-test with 2M docs with some random data > I see a speedup of 70% :) > That is a nice friday-afternoon gift,

Fwd: best way (performance wise) to search for field without value?

2020-11-13 Thread Rob Audenaerde
To follow up, based on a quick JMH-test with 2M docs with some random data I see a speedup of 70% :) That is a nice friday-afternoon gift, thanks! For ppl that are interested: I added a BinaryDocValues field like this: doc.add(BinaryDocValuesField("GROUPS_ALLOWED_EMPTY", new BytesRef(0x01;

Re: best way (performance wise) to search for field without value?

2020-11-13 Thread Michael McCandless
Maybe NormsFieldExistsQuery as a MUST_NOT clause? Though, you must enable norms on your field to use that. TermRangeQuery is indeed a horribly costly way to execute this, but if you cache the result on each refresh, perhaps it is OK? You could also index a dedicated doc values field indicating t

best way (performance wise) to search for field without value?

2020-11-13 Thread Rob Audenaerde
Hi all, We have implemented some security on our index by adding a field 'groups_allowed' to documents, and wrap a boolean must query around the original query, that checks if one of the given user-groups matches at least one groups_allowed. We chose to leave the groups_allowed field empty when t

Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

2020-10-13 Thread Rob Audenaerde
gt; > > > > On Tue, Oct 13, 2020 at 11:48 AM Adrien Grand > wrote: > > > > > >> Can you give us a few more details: > > >> - What version of Lucene are you testing? > > >> - Are you benchmarking "restrictionQuery" on its o

Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

2020-10-13 Thread Adrien Grand
Adrien Grand wrote: > > > >> Can you give us a few more details: > >> - What version of Lucene are you testing? > >> - Are you benchmarking "restrictionQuery" on its own, or its > conjunction > >> with another query? > >> > >&

Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

2020-10-13 Thread Rob Audenaerde
"restrictionQuery" >> since it should not contribute to scoring. >> >> TermsInSetQuery automatically executes like a BooleanQuery when the number >> of clauses is less than 16, so I would not expect major performance >> differences between a TermInSetQuery ove

Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

2020-10-13 Thread Rob Audenaerde
> You mentioned that you combine your "restrictionQuery" and the user query > with Occur.MUST, Occur.FILTER feels more appropriate for "restrictionQuery" > since it should not contribute to scoring. > > TermsInSetQuery automatically executes like a BooleanQuery when

Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

2020-10-13 Thread Adrien Grand
ur.FILTER feels more appropriate for "restrictionQuery" since it should not contribute to scoring. TermsInSetQuery automatically executes like a BooleanQuery when the number of clauses is less than 16, so I would not expect major performance differences between a TermInSetQuery over less than 16 te

unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

2020-10-13 Thread Rob Audenaerde
I'm having some performance issues when counting the index (>60M docs), so I thought about tweaking this restriction-implementation. I set-up a benchmark like this: I generate 2M documents, Each document has a multi-value "roles" field. The "roles" field in each docu

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-08-06 Thread Adrien Grand
not seeing the same slowdown on the other field. > > How hard would it be for you to test what the performance is if you > > lowercase the name of the digest algorithms, ie. "md5;[md5 value in > hex]", > > etc. The reason I'm asking is because the compression logic i

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-30 Thread Trejkaz
On Mon, 27 Jul 2020 at 19:24, Adrien Grand wrote: > > It's interesting you're not seeing the same slowdown on the other field. > How hard would it be for you to test what the performance is if you > lowercase the name of the digest algorithms, ie. "md5;[md5 value in h

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-27 Thread Adrien Grand
It's interesting you're not seeing the same slowdown on the other field. How hard would it be for you to test what the performance is if you lowercase the name of the digest algorithms, ie. "md5;[md5 value in hex]", etc. The reason I'm asking is because the compressi

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-27 Thread Trejkaz
iple runs? > > On Mon, Jul 27, 2020 at 5:57 AM Alex K wrote: > > > Hi, > > > > Also have a look here: > > https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-9378 > > > > Seems it might be related. > > - Alex > > > > On Sun, Jul 26, 202

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-27 Thread Adrien Grand
s it might be related. > - Alex > > On Sun, Jul 26, 2020, 23:31 Trejkaz wrote: > > > Hi all. > > > > I've been tracking down slow seeking performance in TermsEnum after > > updating to Lucene 8.5.1. > > > > On 8.5.1

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-26 Thread Alex K
Hi, Also have a look here: https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-9378 Seems it might be related. - Alex On Sun, Jul 26, 2020, 23:31 Trejkaz wrote: > Hi all. > > I've been tracking down slow seeking performance in TermsEnum after > updating to Luc

Fwd: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-26 Thread Trejkaz
Hi all. I've been tracking down slow seeking performance in TermsEnum after updating to Lucene 8.5.1. On 8.5.1: SegmentTermsEnum.seekExact: 33,829 ms (70.2%) (remaining time in our code) SegmentTermsEnumFrame.loadBlock: 29,104 ms (60.4%) CompressionAlgorithm$2

Re: Semantics and performance regarding min number of the optional BooleanClauses

2020-03-30 Thread Stamatis Zampetakis
gt; (A B C)~2 is equivalent to ((+A +B) (+A +C) (+B +C)). > > > > In other words a single BooleaQuery with a min should match parameter > could > > be rewritten as pure disjunctive BooleanQuery comprised from 3 > sub-queries. > > > > In terms of performance it seems

Re: Semantics and performance regarding min number of the optional BooleanClauses

2020-03-30 Thread Adrien Grand
that at least 2 should match. > > In terms of semantics what I understand so far is that > > (A B C)~2 is equivalent to ((+A +B) (+A +C) (+B +C)). > > In other words a single BooleaQuery with a min should match parameter could > be rewritten as pure disjunctive BooleanQuery comprised

Semantics and performance regarding min number of the optional BooleanClauses

2020-03-30 Thread Stamatis Zampetakis
is equivalent to ((+A +B) (+A +C) (+B +C)). In other words a single BooleaQuery with a min should match parameter could be rewritten as pure disjunctive BooleanQuery comprised from 3 sub-queries. In terms of performance it seems that the two queries present different behavior so the minMatch

Re: ComplexPhraseQueryParser performance question

2020-02-13 Thread baris . kazar
g great. I saw this issue with this class such that if you search for "term1*" it is good, (i.e., 4 millisecs when it has >= 5 chars and it is ~250 millisecs when it is 2 chars) but when you search for "term1 term2*" where when term2 is a single char, the performance degra

Re: ComplexPhraseQueryParser performance question

2020-02-13 Thread Mikhail Khludnev
l 1 char. > >>>> Best regards > >>>> > >>>> > >>>>> On 2/3/20 4:13 PM, baris.ka...@oracle.com wrote: > >>>>> Hi,- > >>>>> > >>>>> i hope everyone is doing great. > >>>>>

Re: ComplexPhraseQueryParser performance question

2020-02-12 Thread baris . kazar
ue. >>> > Thanks >>> > >>> >> On Feb 4, 2020, at 4:14 AM, Mikhail Khludnev wrote: >>> >> >>> >> It's slow per se, since it loads terms positions. Usual advices are >>> >> shingling or edge ngrams. Note, i

Re: ComplexPhraseQueryParser performance question

2020-02-12 Thread baris . kazar
n be smarter and faster in certain cases, although they >> >> are backed on the same slow positions. >> >> >> >>> On Tue, Feb 4, 2020 at 7:25 AM wrote: >> >>> >> >>> How can this slowdown be resolved? >> >>> is t

Re: ComplexPhraseQueryParser performance question

2020-02-12 Thread David Smiley
t; >>> is this another limitation of this class? > >>> Thanks > >>> > >>>>> On Feb 3, 2020, at 4:14 PM, baris.ka...@oracle.com wrote: > >>>> Please ignore the first comparison there. i was comparing there > {term1 > >>>

Re: ComplexPhraseQueryParser performance question

2020-02-12 Thread baris . kazar
har. Best regards On 2/3/20 4:13 PM, baris.ka...@oracle.com wrote: Hi,- i hope everyone is doing great. I saw this issue with this class such that if you search for "term1*" it is good, (i.e., 4 millisecs when it has >= 5 chars and it is ~250 millisecs when it is 2 chars) but wh

Re: ComplexPhraseQueryParser performance question

2020-02-04 Thread baris . kazar
gt; >>> Best regards >>> >>> >>>> On 2/3/20 4:13 PM, baris.ka...@oracle.com wrote: >>>> Hi,- >>>> >>>> i hope everyone is doing great. >>>> >>>> I saw this issue with this class such that if you sea

Re: ComplexPhraseQueryParser performance question

2020-02-04 Thread Mikhail Khludnev
that if you search for "term1*" > it is good, (i.e., 4 millisecs when it has >= 5 chars and it is ~250 > millisecs when it is 2 chars) > >> > >> but when you search for "term1 term2*" where when term2 is a single > char, the performance degrades too much.

Re: ComplexPhraseQueryParser performance question

2020-02-03 Thread baris . kazar
m wrote: >> Hi,- >> >> i hope everyone is doing great. >> >> I saw this issue with this class such that if you search for "term1*" it is >> good, (i.e., 4 millisecs when it has >= 5 chars and it is ~250 millisecs >> when it is 2 chars) &

Re: ComplexPhraseQueryParser performance question

2020-02-03 Thread baris . kazar
t is 2 chars) but when you search for "term1 term2*" where when term2 is a single char, the performance degrades too much. The query "term1 term2*" slows down 50 times (~200 millisecs) compared to "term1*" case when term 1 has >5 chars and term2 is still

ComplexPhraseQueryParser performance question

2020-02-03 Thread baris . kazar
is a single char, the performance degrades too much. The query "term1 term2*" slows down 50 times (~200 millisecs) compared to "term1*" case when term 1 has >5 chars and term2 is still 1 char. The query "term1 term2*" slows down 400 times (~1500 millisecs) compared

Re: Noticed performance degrade from lucene-7.5.0 to lucene-8.0.0

2019-04-14 Thread Uwe Schindler
https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand Uwe Am April 14, 2019 2:22:59 PM UTC schrieb Khurram Shehzad : >Hi All, > >I have recently updated from lucene-7.5.0 to lucene-8.0.0. But I >noticed considerable performance degrade. Queries that use

Noticed performance degrade from lucene-7.5.0 to lucene-8.0.0

2019-04-14 Thread Khurram Shehzad
Hi All, I have recently updated from lucene-7.5.0 to lucene-8.0.0. But I noticed considerable performance degrade. Queries that used to be executed in 18 to 24 milliseconds now taking 74 to 110 milliseconds. Any suggestion please? Regards, Khurram

Re: Any way to improve document fetching performance?

2018-08-28 Thread Erick Erickson
circumstances where fetching from docValues actually has poorer overall performance than using stored=true. That said, the ability to use docValues fields in place of stored (subject to certain restrictions that you should take the time to understand) does indeed blur the distinction. It's rea

Re: Any way to improve document fetching performance?

2018-08-28 Thread alex stark
scenario (I think it is more common nowadays), search phrase should return as many results as possible so that rank phrase can resort the results by machine learning algorithm(on other clusters). Fetching performance is also important. On Tue, 28 Aug 2018 00:11:40 +0800 Erick Erickson wrote

Re: Any way to improve document fetching performance?

2018-08-27 Thread Erick Erickson
gt; Aug 2018 22:12:07 +0800 wrote Alex,- how big > > are those docs? Best regards On 8/27/18 10:09 AM, alex stark wrote: > Hello > > experts, I am wondering is there any way to improve document fetching > > performance, it appears to me that visiting from store fi

Re: Any way to improve document fetching performance?

2018-08-27 Thread baris . kazar
:07 +0800 wrote Alex,- how big are those docs? Best regards On 8/27/18 10:09 AM, alex stark wrote: > Hello experts, I am wondering is there any way to improve document fetching performance, it appears to me that visiting from store field is quite slow. I simply tested to use indexsearch.do

Re: Any way to improve document fetching performance?

2018-08-27 Thread alex stark
:09 AM, alex stark wrote: > Hello experts, I am wondering is there any way to improve document fetching performance, it appears to me that visiting from store field is quite slow. I simply tested to use indexsearch.doc() to get 2000 document which takes 50ms. Is there any idea to improv

Re: Any way to improve document fetching performance?

2018-08-27 Thread baris . kazar
Mon, 27 Aug 2018 22:12:07 +0800 wrote Alex,- how big are those docs? Best regards On 8/27/18 10:09 AM, alex stark wrote: > Hello experts, I am wondering is there any way to improve document fetching performance, it appears to me that visiting from store field is quite slow. I simply tes

Re: Any way to improve document fetching performance?

2018-08-27 Thread alex stark
to improve document fetching performance, it appears to me that visiting from store field is quite slow. I simply tested to use indexsearch.doc() to get 2000 document which takes 50ms. Is there any idea to improve that? -

Re: Any way to improve document fetching performance?

2018-08-27 Thread baris . kazar
Alex,- how big are those docs? Best regards On 8/27/18 10:09 AM, alex stark wrote: Hello experts, I am wondering is there any way to improve document fetching performance, it appears to me that visiting from store field is quite slow. I simply tested to use indexsearch.doc() to get 2000

Any way to improve document fetching performance?

2018-08-27 Thread alex stark
Hello experts, I am wondering is there any way to improve document fetching performance, it appears to me that visiting from store field is quite slow. I simply tested to use indexsearch.doc() to get 2000 document which takes 50ms. Is there any idea to improve that? 

Lucene Performance Tuning

2018-07-18 Thread Hicks, Matt
I am seeing serious performance differences with three slightly varied queries: https://gist.github.com/darkfrog26/de19959db854aaf30957d64d1730d07f Can anyone explain why this might be happening and any tips to optimize it? Most queries are lightning fast, but ones like "Smith Mark D

LUCENE-8396 performance result?

2018-07-17 Thread alex stark
LUCENE-8396 looks pretty good for LBS use cases, do we have performance result for this approach? It appears to me it would greatly reduce terms to index a polygon, and how about search performance? does it also perform well for complex polygon which has hundreds or more coordinates? 

Re: Storage of indexed and stored fields (Space and Performance)

2018-03-15 Thread Erick Erickson
are indexed and stored fields treated by Lucene w.r.t space and > performance? > > Is there any performance hit with stored fields which are indexed? > > > > Lucene Version: 5.3.1 > > > > Assumption: > > Stored fields are just simple strings (not huge documents

Storage of indexed and stored fields (Space and Performance)

2018-03-15 Thread Rajnish kamboj
Hi How are indexed and stored fields treated by Lucene w.r.t space and performance? Is there any performance hit with stored fields which are indexed? Lucene Version: 5.3.1 Assumption: Stored fields are just simple strings (not huge documents) Example: Data: [101, Gold]; [102

RE: Increase search performance

2018-02-02 Thread Atul Bisaria
Thanks for the feedback! -Original Message- From: Adrien Grand [mailto:jpou...@gmail.com] Sent: Friday, February 02, 2018 1:42 PM To: java-user@lucene.apache.org Subject: Re: Increase search performance If needsScores returns false on the collector, then scores won't be computed.

Re: Increase search performance

2018-02-02 Thread Adrien Grand
.docBase = context.docBase; > } > > public ScoreDoc[] getHits() > { > return matches; > } > } > > Best Regards, > Atul Bisaria > > -Original Message- > From: Adrien Grand [mailto:jpou...@gmail.com] > Se

RE: Increase search performance

2018-02-01 Thread Atul Bisaria
iginal Message- From: Adrien Grand [mailto:jpou...@gmail.com] Sent: Thursday, February 01, 2018 6:11 PM To: java-user@lucene.apache.org Subject: Re: Increase search performance Yes, this collector won't perform well if you have many matches since memory usage is linear with the number of

Re: Increase search performance

2018-02-01 Thread Adrien Grand
ffle(matches); > maxHitsRequired = Math.min(matches.size(), > maxHitsRequired); > > return matches.subList(0, maxHitsRequired); > } > } > > Best Regards, > Atul Bisaria > > -Original Message- > From: Adrien Grand [ma

RE: Increase search performance

2018-02-01 Thread Atul Bisaria
); } } Best Regards, Atul Bisaria -Original Message- From: Adrien Grand [mailto:jpou...@gmail.com] Sent: Wednesday, January 31, 2018 6:33 PM To: java-user@lucene.apache.org Subject: Re: Increase search performance Hi Atul, Le mar. 30 janv. 2018 à 16:24, Atul Bisaria a écrit : >

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Rob Audenaerde
e a transaction log in parallel to > > indexing, > > >> so they commit very seldom. If the system crashes, the changes are > > replayed > > >> from tranlog since last commit. > > >> > > >> Uwe > > >> > > >>

Re: Increase search performance

2018-01-31 Thread Adrien Grand
on't sort by score, then wrapping with a ConstantScoreQuery won't help as Lucene will figure out scores are not needed anyway. > 2. Using query cache > > > > My understanding is that query cache would cache query results and hence > lead to significant increase in pe

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Adrien Grand
gt; >> > >> - > >> Uwe Schindler > >> Achterdiek 19, D-28357 Bremen > >> http://www.thetaphi.de > >> eMail: u...@thetaphi.de > >> > >> > -Original Message- > >> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.c

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Rob Audenaerde
>> > -Original Message- >> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] >> > Sent: Monday, January 29, 2018 11:29 AM >> > To: java-user@lucene.apache.org >> > Subject: Re: indexing performance 6.6 vs 7.1 >> > >> >

Increase search performance

2018-01-30 Thread Atul Bisaria
In the search use case in my application, I don't need to score query results since all results are equal. Also query patterns are also more or less fixed. Given these conditions, I am trying to increase search performance by 1. Using ConstantScoreQuery so that scoring overhe

Re: indexing performance 6.6 vs 7.1

2018-01-29 Thread Rob Audenaerde
we > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] > > Sent: Monday, January 29, 2018 11:29 AM > > To

RE: indexing performance 6.6 vs 7.1

2018-01-29 Thread Uwe Schindler
28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] > Sent: Monday, January 29, 2018 11:29 AM > To: java-user@lucene.apache.org > Subject: Re: indexing performance 6.6 vs 7.1 > > H

Re: indexing performance 6.6 vs 7.1

2018-01-29 Thread Rob Audenaerde
t; create pivot tables on search results really fast. > >> > >> These tables have some overlapping columns, but also disjoint ones. > >> > >> We anticipated a decrease in index size because of the sparse > docvalues. We > >> see this happening, w

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Erick Erickson
search results really fast. >> >> These tables have some overlapping columns, but also disjoint ones. >> >> We anticipated a decrease in index size because of the sparse docvalues. We >> see this happening, with decreases to ~50%-80% of the original index size. >>

  1   2   3   4   5   6   7   8   9   10   >