I'll run the test removing the ExitableDirectoryReader wrap to check if this is the issue. I'll post my results here.
On Mon, Aug 3, 2015 at 6:07 PM, Anshum Gupta <[email protected]> wrote: > Guess bypassing that check and always returning true instead would help > confirm that. If that's the case, we should just initialize the timeout to > Long.MAX_VALUE and check for that to short-circuit? > > On Mon, Aug 3, 2015 at 3:50 PM, Tomás Fernández Löbbe < > [email protected]> wrote: > >> Yes, I saw that, but thought it could be the underlying implementation, >> not the "ExitableTermsEnum" wrapper. Maybe it's related to the calls to >> System.nanoTime then... >> >> On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <[email protected]> wrote: >> >>> Thanks for sharing the traces, it looks like my intuition was wrong. >>> :) They seem to point to >>> ExitableDirectoryReader$ExitableTermsEnum.next, which checks whether >>> the time is out before delegating. >>> >>> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe >>> <[email protected]> wrote: >>> > Thanks Adrien, >>> > I'll run the tests with 5.3 snapshot and post the results here. In >>> case this >>> > helps, this is the hprof samples output >>> > (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-user/hprof_output.txt) >>> for >>> > 4.10.4 and 5.2.1 in my test: >>> > >>> > Solr 4.10.4: >>> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015 >>> > rank self accum count trace method >>> > 1 75.07% 75.07% 182812 300523 java.net.PlainSocketImpl.socketAccept >>> > 2 4.27% 79.34% 10408 301576 >>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMetaData >>> > 3 4.15% 83.49% 10108 301585 >>> > org.apache.lucene.index.FilteredTermsEnum.docs >>> > 4 3.46% 86.95% 8419 301582 >>> > org.apache.lucene.index.FilteredTermsEnum.next >>> > 5 2.49% 89.44% 6070 301573 >>> java.net.SocketInputStream.socketRead0 >>> > 6 1.99% 91.43% 4848 301599 >>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet >>> > 7 1.98% 93.42% 4824 301583 >>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet >>> > 8 1.57% 94.99% 3824 301589 >>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll >>> > 9 1.44% 96.42% 3504 301594 >>> > >>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs >>> > 10 1.09% 97.51% 2655 301581 java.nio.Bits.copyToArray >>> > 11 0.98% 98.50% 2388 301598 >>> > >>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc >>> > 12 0.62% 99.12% 1522 301600 >>> org.apache.lucene.store.DataInput.readVInt >>> > 13 0.21% 99.33% 500 301612 >>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs >>> > 14 0.07% 99.39% 167 301601 >>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next >>> > 15 0.06% 99.45% 139 301619 java.lang.System.identityHashCode >>> > 16 0.05% 99.50% 114 301632 >>> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock >>> > 17 0.04% 99.54% 92 300708 java.util.zip.Inflater.inflateBytes >>> > 18 0.03% 99.57% 76 301624 >>> > >>> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock >>> > 19 0.03% 99.59% 68 300613 java.lang.ClassLoader.defineClass1 >>> > 20 0.03% 99.62% 68 301615 >>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next >>> > 21 0.03% 99.65% 62 301635 >>> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC >>> > 22 0.02% 99.66% 41 301664 >>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next >>> > 23 0.01% 99.68% 31 301629 >>> org.apache.lucene.util.FixedBitSet.<init> >>> > CPU SAMPLES END >>> > >>> > Solr 5.2.1: >>> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015 >>> > rank self accum count trace method >>> > 1 51.38% 51.38% 120954 301291 >>> sun.nio.ch.EPollArrayWrapper.epollWait >>> > 2 25.69% 77.07% 60477 301292 >>> sun.nio.ch.ServerSocketChannelImpl.accept0 >>> > 3 10.59% 87.66% 24923 301369 >>> > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next >>> > 4 2.20% 89.86% 5182 301414 >>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings >>> > 5 2.01% 91.87% 4742 301384 >>> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings >>> > 6 1.25% 93.12% 2944 301434 >>> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss >>> > 7 1.11% 94.23% 2612 301367 >>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite >>> > 8 0.94% 95.17% 2204 301390 org.apache.lucene.util.BitSet.or >>> > 9 0.93% 96.10% 2190 301383 java.nio.Bits.copyToArray >>> > 10 0.78% 96.87% 1825 301449 >>> > >>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs >>> > 11 0.73% 97.60% 1717 301378 >>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll >>> > 12 0.73% 98.33% 1715 301374 org.apache.lucene.util.BitSet.or >>> > 13 0.33% 98.66% 787 301387 >>> > >>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc >>> > 14 0.16% 98.82% 374 301426 >>> > >>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc >>> > 15 0.10% 98.93% 245 301382 org.apache.lucene.util.BitSet.or >>> > 16 0.09% 99.02% 219 301381 >>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next >>> > 17 0.09% 99.11% 207 301370 org.apache.lucene.util.BitSet.or >>> > 18 0.06% 99.17% 153 301416 org.apache.lucene.util.BitSet.or >>> > 19 0.06% 99.24% 151 301427 org.apache.lucene.util.BitSet.or >>> > 20 0.06% 99.30% 151 301441 >>> org.apache.lucene.store.DataInput.readVInt >>> > 21 0.06% 99.36% 147 301389 java.lang.System.identityHashCode >>> > 22 0.06% 99.42% 140 301375 >>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next >>> > 23 0.04% 99.47% 104 301425 org.apache.lucene.util.BitSet.or >>> > 24 0.03% 99.50% 76 301423 >>> > >>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc >>> > 25 0.03% 99.53% 74 301454 >>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite >>> > 26 0.03% 99.56% 65 301432 >>> > org.apache.lucene.util.BitDocIdSet$Builder.or >>> > 27 0.02% 99.58% 53 301456 org.apache.lucene.util.FixedBitSet.or >>> > 28 0.02% 99.60% 52 300077 java.lang.ClassLoader.defineClass1 >>> > 29 0.02% 99.63% 50 301464 >>> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock >>> > 30 0.02% 99.64% 39 301438 >>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next >>> > 31 0.02% 99.66% 37 301465 >>> > >>> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock >>> > 32 0.02% 99.67% 36 301419 >>> > >>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc >>> > CPU SAMPLES END >>> > >>> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <[email protected]> >>> wrote: >>> >> >>> >> Hi Tomás, >>> >> >>> >> I suspect this might be related to LUCENE-5938. We changed the default >>> >> rewrite method for multi-term queries to load documents into a sparse >>> >> bit set first first, and only upgrade to a dense bit set when we know >>> >> many documents match. When there are lots of terms to intersect, then >>> >> we end up spending significant cpu time to build the sparse bit set to >>> >> eventually upgrade to a dense bit set like before. This might be what >>> >> you are seeing. >>> >> >>> >> You might see the issue less with the population field because it has >>> >> fewer unique values, so postings lists are longer and the DocIdSet >>> >> building logic can upgrade quicker to a dense bit set. >>> >> >>> >> Mike noticed this slowness when working on BDK trees and we changed >>> >> this first phase to use a plain int[] array that we sort and >>> >> deduplicate instead of a more fancy sparse bit set (LUCENE-6645), >>> >> which seemed to make things faster. Would it be possible for you to >>> >> also check a 5.3 snapshot? >>> >> >>> >> >>> >> >>> >> >>> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe >>> >> <[email protected]> wrote: >>> >> > Hi, I'm seeing some query performance degradation between 4.10.4 and >>> >> > 5.2.1. >>> >> > It doesn't happen with all the queries, but for queries like range >>> >> > queries >>> >> > on fields with many different values the average time in 5.2.1 is >>> worse >>> >> > than >>> >> > in 4.10.4. Is anyone seeing something similar? >>> >> > >>> >> > Test Details: >>> >> > * Single thread running queries continuously. I run the test twice >>> for >>> >> > each >>> >> > Solr version. >>> >> > * JMeter running on my laptop, Solr running on EC2, on an m3.xlarge >>> >> > instance >>> >> > with all the defaults but with 5G heap. Index in local disk (SSD) >>> >> > * Plain Solr releases, nothing custom. Single Solr core, not in >>> >> > SolrCloud >>> >> > mode, no distributed search. >>> >> > * "allCountries" geonames dataset (~8M small docs). No updates >>> during >>> >> > the >>> >> > test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr >>> >> > 4.10.4 >>> >> > (fits entirely in RAM) >>> >> > * jdk1.8.0_45 >>> >> > >>> >> > Queries: 3k boolean queries (generated with terms from the dataset) >>> with >>> >> > range queries as filters on "tlongitude" and "tlatitude" fields with >>> >> > randomly generated bounds, e.g. >>> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z] >>> >> > >>> >> > Fields are: >>> >> > <field name="tlatitude" type="tdouble"/> >>> >> > <field name="tlongitude" type="tdouble"/> >>> >> > Field Type: >>> >> > <fieldType name="tdouble" class="solr.TrieDoubleField" >>> precisionStep="8" >>> >> > positionIncrementGap="0"/> >>> >> > >>> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1 >>> in >>> >> > average. >>> >> > >>> >> > http://snag.gy/2yPPM.jpg >>> >> > >>> >> > Doing only the boolean queries show no performance difference >>> between >>> >> > 4.10 >>> >> > and 5.2, same thing if I do filters on a string field instead of the >>> >> > range >>> >> > queries. >>> >> > >>> >> > When using "double" field type (precisionStep="0"), the difference >>> was >>> >> > bigger: >>> >> > >>> >> > longitude/latitude fields: >>> >> > <field name="longitude" type="double" docValues="true"/> >>> >> > <field name="latitude" type="double" docValues="true"/> >>> >> > <fieldType name="double" class="solr.TrieDoubleField" >>> precisionStep="0" >>> >> > positionIncrementGap="0"/> >>> >> > >>> >> > http://snag.gy/Vi5uk.jpg >>> >> > I understand this is not the best field type definition for range >>> >> > queries, >>> >> > I'm just trying to understand the difference between the two >>> versions >>> >> > and >>> >> > why. >>> >> > >>> >> > Performance was OK when doing range queries on the "population" >>> field >>> >> > (long), but that field doesn't have many different values, only >>> 300k out >>> >> > of >>> >> > the 8.3M docs have the population field with a value different to >>> 0. On >>> >> > the >>> >> > other hand, doing range queries on the _version_ field did show a >>> graph >>> >> > similar to the previous one: >>> >> > >>> >> > <field name="_version_" type="long" indexed="true" stored="true"/> >>> >> > <fieldType name="long" class="solr.TrieLongField" precisionStep="0" >>> >> > positionIncrementGap="0"/> >>> >> > >>> >> > http://snag.gy/4tc7e.jpg >>> >> > >>> >> > Any idea what could be causing this? Is this expected after some >>> known >>> >> > change? >>> >> > >>> >> > With Solr 4.10, a single CPU core remains high during the test >>> (close to >>> >> > 100%), but with Solr 5.2, different cores go up and down in >>> utilization >>> >> > continuously. That's probably because of the different Jetty >>> version I >>> >> > suppose. >>> >> > GC pattern looks similar in both. For both Solr versions I'm using >>> the >>> >> > settings that ship with Solr (in solr.in.sh) except for Xmx and Xms >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Adrien >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: [email protected] >>> >> For additional commands, e-mail: [email protected] >>> >> >>> > >>> >>> >>> >>> -- >>> Adrien >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> > > > -- > Anshum Gupta >
