Hi, > > I mean my benchmarks show up > > to 300% improvement with 4.x versus older versions so something is > > weird ie. non-realistic here or there is a bug so lets figure this > > out. Can you profile you app and see if you find something suspicious? > > I'll try now and report back. > > It seems to be largely my mistake: maven enables assertions automatically > when running tests. > Executing it as normal public main class results in faster indexing times for > 4.0 > compared to 3.5. > > Conclusion: > 1. execution with assertions for 4.0 is slower than 3.5 (thats what I mainly > measured :/)
Die, Maven, die :-) > 2. luc 4.0 execution times vary more than 3.5 when using reopen thread (and > one single indexing thread, others not tested). > 3. luc 4.0 then is still slower, but for 5 mio of my items its less then 5%. > The hot spots are: > * 30% ThreadAffinityDocumentsWriterThreadPool -> > java.util.concurrent.ConcurrentHashMap.get(Object) -> threadBindings.get > * 26% BufferedDeletesStream.applyTermDeletes(Iterable, SegmentReader) > * 16% FreqProxTermsWriterPerField.flush(String, FieldsConsumer, > SegmentWriteState) > * 10% DocFieldProcessor.processDocument > > Now when reusing BytesRef in 4.0 (and reusing the char array in 3.5) then luc > 4 > is >20% faster than 3.5 for 5 mio docs! You can only reuse the BytesRef (I assume the one to encode the numeric key to delete the document) from within the same thread! I see no other BytesRef use in your code. If you reuse the BytesRef, you can also reuse all Fields and Documents - but only within the same thread. > But somewhen I had problems as a thread concurrently modified the docs - can > this happen e.g. from the reopen thread? Or is it safe to reuse BytesRef? In one thread: yes! Uwe > Regards, > Peter. > > > > > > Hi Simon, > > > > answers below. > > > >>> It does not seem to be an 'IO related issue' because using RAMDirectory > >>> results in the same times. > >>> And indexing via Luc4 with only one thread shouldn't be slower than 3.5 > >>> (?) > >> it could be since we use a different term dictionary impl which is > >> more expensive in building than the previous versions; thats just a > >> guess. > >> What I am really wondering is why you are using the NRT manager and > >> reopen during indexing - are you measuring the NRT reopen times too? > > My project requires reopening as it will then clear some caches. > > > > Reopening isn't that frequent (every 5 seconds). When disabling it the > > difference even increases slightly, but the big variation for luc4 goes > > away! > > > > > >> What merge policies are you using for 3x and 4x? > > The default ones. I'm now using LogByteSizeMergePolicy for both but it > > is nearly the same difference. > > > > > >>>> You should add some more randomness or reality to your test. > >>> Hmmh, ok. The uid and type is the reality in my other (experimental) > >>> project as it uses a generated and incremented id from AtomicLong and > >>> two types. > >>> Or do you have an explanation why luc4 can be slower on such 'simple' > >>> fields? > >> you reported that indexing only the ID is faster in 4.x but the other > >> fields AFAIK are likely always the same for all docs, no? > > no, the _uid field is different: it's the id field converted to string. > > > > > >> you are indexing with one thread right? > > yes. > > > > > >> I mean my benchmarks show up > >> to 300% improvement with 4.x versus older versions so something is > >> weird ie. non-realistic here or there is a bug so lets figure this > >> out. Can you profile you app and see if you find something suspicious? > > I'll try now and report back. > > > > > >> I'd also try to index way more documents to make your benchmarks run > >> little longer just to be sure. > > For ~5 times more docs (5 mio) it is nearly the same difference. > > > > > > Regards, > > Peter. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org