Yes, I saw that, but thought it could be the underlying implementation, not the "ExitableTermsEnum" wrapper. Maybe it's related to the calls to System.nanoTime then...
On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <[email protected]> wrote: > Thanks for sharing the traces, it looks like my intuition was wrong. > :) They seem to point to > ExitableDirectoryReader$ExitableTermsEnum.next, which checks whether > the time is out before delegating. > > On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe > <[email protected]> wrote: > > Thanks Adrien, > > I'll run the tests with 5.3 snapshot and post the results here. In case > this > > helps, this is the hprof samples output > > (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-user/hprof_output.txt) for > > 4.10.4 and 5.2.1 in my test: > > > > Solr 4.10.4: > > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015 > > rank self accum count trace method > > 1 75.07% 75.07% 182812 300523 java.net.PlainSocketImpl.socketAccept > > 2 4.27% 79.34% 10408 301576 > > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMetaData > > 3 4.15% 83.49% 10108 301585 > > org.apache.lucene.index.FilteredTermsEnum.docs > > 4 3.46% 86.95% 8419 301582 > > org.apache.lucene.index.FilteredTermsEnum.next > > 5 2.49% 89.44% 6070 301573 java.net.SocketInputStream.socketRead0 > > 6 1.99% 91.43% 4848 301599 > > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet > > 7 1.98% 93.42% 4824 301583 > > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet > > 8 1.57% 94.99% 3824 301589 > > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll > > 9 1.44% 96.42% 3504 301594 > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs > > 10 1.09% 97.51% 2655 301581 java.nio.Bits.copyToArray > > 11 0.98% 98.50% 2388 301598 > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc > > 12 0.62% 99.12% 1522 301600 > org.apache.lucene.store.DataInput.readVInt > > 13 0.21% 99.33% 500 301612 > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs > > 14 0.07% 99.39% 167 301601 > > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next > > 15 0.06% 99.45% 139 301619 java.lang.System.identityHashCode > > 16 0.05% 99.50% 114 301632 > > org.apache.lucene.codecs.lucene41.ForUtil.readBlock > > 17 0.04% 99.54% 92 300708 java.util.zip.Inflater.inflateBytes > > 18 0.03% 99.57% 76 301624 > > > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock > > 19 0.03% 99.59% 68 300613 java.lang.ClassLoader.defineClass1 > > 20 0.03% 99.62% 68 301615 > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next > > 21 0.03% 99.65% 62 301635 > > org.apache.solr.search.SolrIndexSearcher.getDocSetNC > > 22 0.02% 99.66% 41 301664 > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next > > 23 0.01% 99.68% 31 301629 > org.apache.lucene.util.FixedBitSet.<init> > > CPU SAMPLES END > > > > Solr 5.2.1: > > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015 > > rank self accum count trace method > > 1 51.38% 51.38% 120954 301291 sun.nio.ch.EPollArrayWrapper.epollWait > > 2 25.69% 77.07% 60477 301292 > sun.nio.ch.ServerSocketChannelImpl.accept0 > > 3 10.59% 87.66% 24923 301369 > > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next > > 4 2.20% 89.86% 5182 301414 > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings > > 5 2.01% 91.87% 4742 301384 > > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings > > 6 1.25% 93.12% 2944 301434 > > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss > > 7 1.11% 94.23% 2612 301367 > > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite > > 8 0.94% 95.17% 2204 301390 org.apache.lucene.util.BitSet.or > > 9 0.93% 96.10% 2190 301383 java.nio.Bits.copyToArray > > 10 0.78% 96.87% 1825 301449 > > > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs > > 11 0.73% 97.60% 1717 301378 > > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll > > 12 0.73% 98.33% 1715 301374 org.apache.lucene.util.BitSet.or > > 13 0.33% 98.66% 787 301387 > > > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc > > 14 0.16% 98.82% 374 301426 > > > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc > > 15 0.10% 98.93% 245 301382 org.apache.lucene.util.BitSet.or > > 16 0.09% 99.02% 219 301381 > > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next > > 17 0.09% 99.11% 207 301370 org.apache.lucene.util.BitSet.or > > 18 0.06% 99.17% 153 301416 org.apache.lucene.util.BitSet.or > > 19 0.06% 99.24% 151 301427 org.apache.lucene.util.BitSet.or > > 20 0.06% 99.30% 151 301441 > org.apache.lucene.store.DataInput.readVInt > > 21 0.06% 99.36% 147 301389 java.lang.System.identityHashCode > > 22 0.06% 99.42% 140 301375 > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next > > 23 0.04% 99.47% 104 301425 org.apache.lucene.util.BitSet.or > > 24 0.03% 99.50% 76 301423 > > > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc > > 25 0.03% 99.53% 74 301454 > > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite > > 26 0.03% 99.56% 65 301432 > > org.apache.lucene.util.BitDocIdSet$Builder.or > > 27 0.02% 99.58% 53 301456 org.apache.lucene.util.FixedBitSet.or > > 28 0.02% 99.60% 52 300077 java.lang.ClassLoader.defineClass1 > > 29 0.02% 99.63% 50 301464 > > org.apache.lucene.codecs.lucene50.ForUtil.readBlock > > 30 0.02% 99.64% 39 301438 > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next > > 31 0.02% 99.66% 37 301465 > > > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock > > 32 0.02% 99.67% 36 301419 > > > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc > > CPU SAMPLES END > > > > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <[email protected]> wrote: > >> > >> Hi Tomás, > >> > >> I suspect this might be related to LUCENE-5938. We changed the default > >> rewrite method for multi-term queries to load documents into a sparse > >> bit set first first, and only upgrade to a dense bit set when we know > >> many documents match. When there are lots of terms to intersect, then > >> we end up spending significant cpu time to build the sparse bit set to > >> eventually upgrade to a dense bit set like before. This might be what > >> you are seeing. > >> > >> You might see the issue less with the population field because it has > >> fewer unique values, so postings lists are longer and the DocIdSet > >> building logic can upgrade quicker to a dense bit set. > >> > >> Mike noticed this slowness when working on BDK trees and we changed > >> this first phase to use a plain int[] array that we sort and > >> deduplicate instead of a more fancy sparse bit set (LUCENE-6645), > >> which seemed to make things faster. Would it be possible for you to > >> also check a 5.3 snapshot? > >> > >> > >> > >> > >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe > >> <[email protected]> wrote: > >> > Hi, I'm seeing some query performance degradation between 4.10.4 and > >> > 5.2.1. > >> > It doesn't happen with all the queries, but for queries like range > >> > queries > >> > on fields with many different values the average time in 5.2.1 is > worse > >> > than > >> > in 4.10.4. Is anyone seeing something similar? > >> > > >> > Test Details: > >> > * Single thread running queries continuously. I run the test twice for > >> > each > >> > Solr version. > >> > * JMeter running on my laptop, Solr running on EC2, on an m3.xlarge > >> > instance > >> > with all the defaults but with 5G heap. Index in local disk (SSD) > >> > * Plain Solr releases, nothing custom. Single Solr core, not in > >> > SolrCloud > >> > mode, no distributed search. > >> > * "allCountries" geonames dataset (~8M small docs). No updates during > >> > the > >> > test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr > >> > 4.10.4 > >> > (fits entirely in RAM) > >> > * jdk1.8.0_45 > >> > > >> > Queries: 3k boolean queries (generated with terms from the dataset) > with > >> > range queries as filters on "tlongitude" and "tlatitude" fields with > >> > randomly generated bounds, e.g. > >> > q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z] > >> > > >> > Fields are: > >> > <field name="tlatitude" type="tdouble"/> > >> > <field name="tlongitude" type="tdouble"/> > >> > Field Type: > >> > <fieldType name="tdouble" class="solr.TrieDoubleField" > precisionStep="8" > >> > positionIncrementGap="0"/> > >> > > >> > In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1 in > >> > average. > >> > > >> > http://snag.gy/2yPPM.jpg > >> > > >> > Doing only the boolean queries show no performance difference between > >> > 4.10 > >> > and 5.2, same thing if I do filters on a string field instead of the > >> > range > >> > queries. > >> > > >> > When using "double" field type (precisionStep="0"), the difference was > >> > bigger: > >> > > >> > longitude/latitude fields: > >> > <field name="longitude" type="double" docValues="true"/> > >> > <field name="latitude" type="double" docValues="true"/> > >> > <fieldType name="double" class="solr.TrieDoubleField" > precisionStep="0" > >> > positionIncrementGap="0"/> > >> > > >> > http://snag.gy/Vi5uk.jpg > >> > I understand this is not the best field type definition for range > >> > queries, > >> > I'm just trying to understand the difference between the two versions > >> > and > >> > why. > >> > > >> > Performance was OK when doing range queries on the "population" field > >> > (long), but that field doesn't have many different values, only 300k > out > >> > of > >> > the 8.3M docs have the population field with a value different to 0. > On > >> > the > >> > other hand, doing range queries on the _version_ field did show a > graph > >> > similar to the previous one: > >> > > >> > <field name="_version_" type="long" indexed="true" stored="true"/> > >> > <fieldType name="long" class="solr.TrieLongField" precisionStep="0" > >> > positionIncrementGap="0"/> > >> > > >> > http://snag.gy/4tc7e.jpg > >> > > >> > Any idea what could be causing this? Is this expected after some known > >> > change? > >> > > >> > With Solr 4.10, a single CPU core remains high during the test (close > to > >> > 100%), but with Solr 5.2, different cores go up and down in > utilization > >> > continuously. That's probably because of the different Jetty version I > >> > suppose. > >> > GC pattern looks similar in both. For both Solr versions I'm using the > >> > settings that ship with Solr (in solr.in.sh) except for Xmx and Xms > >> > > >> > >> > >> > >> -- > >> Adrien > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > > > > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
