Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Uwe Schindler Tue, 04 Aug 2015 01:22:15 -0700

Hi,

Just related: Do you run tests on MacOSX? NanoTime is damn slow there! And its 
also not monotonic! There wss a discussion on hotspot-dev about that one year 
ago, but no solution until now.


Macos had no real nanotime impl in kernel so java falls back to wall clock, 
which is slow and non monotonic with ntp or vmware... (see test failures in the 
past on policeman when ntp adjusted clock).

If you run on Linux this could also be a good test: on osx the difference to 
prev version should be even larger.

Uwe

Am 4. August 2015 03:07:45 MESZ, schrieb Anshum Gupta <[email protected]>:
>Guess bypassing that check and always returning true instead would help
>confirm that. If that's the case, we should just initialize the timeout
>to
>Long.MAX_VALUE and check for that to short-circuit?
>
>On Mon, Aug 3, 2015 at 3:50 PM, Tomás Fernández Löbbe
><[email protected]
>> wrote:
>
>> Yes, I saw that, but thought it could be the underlying
>implementation,
>> not the "ExitableTermsEnum" wrapper. Maybe it's related to the calls
>to
>> System.nanoTime then...
>>
>> On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <[email protected]>
>wrote:
>>
>>> Thanks for sharing the traces, it looks like my intuition was wrong.
>>> :) They seem to point to
>>> ExitableDirectoryReader$ExitableTermsEnum.next, which checks whether
>>> the time is out before delegating.
>>>
>>> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
>>> <[email protected]> wrote:
>>> > Thanks Adrien,
>>> > I'll run the tests with 5.3 snapshot and post the results here. In
>case
>>> this
>>> > helps, this is the hprof samples output
>>> >
>(-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-user/hprof_output.txt)
>>> for
>>> > 4.10.4 and 5.2.1 in my test:
>>> >
>>> > Solr 4.10.4:
>>> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
>>> > rank   self  accum   count trace method
>>> >    1 75.07% 75.07%  182812 300523
>java.net.PlainSocketImpl.socketAccept
>>> >    2  4.27% 79.34%   10408 301576
>>> >
>org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMetaData
>>> >    3  4.15% 83.49%   10108 301585
>>> > org.apache.lucene.index.FilteredTermsEnum.docs
>>> >    4  3.46% 86.95%    8419 301582
>>> > org.apache.lucene.index.FilteredTermsEnum.next
>>> >    5  2.49% 89.44%    6070 301573
>java.net.SocketInputStream.socketRead0
>>> >    6  1.99% 91.43%    4848 301599
>>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>>> >    7  1.98% 93.42%    4824 301583
>>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>>> >    8  1.57% 94.99%    3824 301589
>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>>> >    9  1.44% 96.42%    3504 301594
>>> >
>>>
>org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs
>>> >   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
>>> >   11  0.98% 98.50%    2388 301598
>>> >
>>>
>org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc
>>> >   12  0.62% 99.12%    1522 301600
>>> org.apache.lucene.store.DataInput.readVInt
>>> >   13  0.21% 99.33%     500 301612
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
>>> >   14  0.07% 99.39%     167 301601
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>>> >   15  0.06% 99.45%     139 301619
>java.lang.System.identityHashCode
>>> >   16  0.05% 99.50%     114 301632
>>> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock
>>> >   17  0.04% 99.54%      92 300708
>java.util.zip.Inflater.inflateBytes
>>> >   18  0.03% 99.57%      76 301624
>>> >
>>>
>org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>>> >   19  0.03% 99.59%      68 300613
>java.lang.ClassLoader.defineClass1
>>> >   20  0.03% 99.62%      68 301615
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   21  0.03% 99.65%      62 301635
>>> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC
>>> >   22  0.02% 99.66%      41 301664
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   23  0.01% 99.68%      31 301629
>>> org.apache.lucene.util.FixedBitSet.<init>
>>> > CPU SAMPLES END
>>> >
>>> > Solr 5.2.1:
>>> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
>>> > rank   self  accum   count trace method
>>> >    1 51.38% 51.38%  120954 301291
>sun.nio.ch.EPollArrayWrapper.epollWait
>>> >    2 25.69% 77.07%   60477 301292
>>> sun.nio.ch.ServerSocketChannelImpl.accept0
>>> >    3 10.59% 87.66%   24923 301369
>>> >
>org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
>>> >    4  2.20% 89.86%    5182 301414
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
>>> >    5  2.01% 91.87%    4742 301384
>>> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
>>> >    6  1.25% 93.12%    2944 301434
>>> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
>>> >    7  1.11% 94.23%    2612 301367
>>> >
>org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>>> >    8  0.94% 95.17%    2204 301390 org.apache.lucene.util.BitSet.or
>>> >    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
>>> >   10  0.78% 96.87%    1825 301449
>>> >
>>>
>org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs
>>> >   11  0.73% 97.60%    1717 301378
>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>>> >   12  0.73% 98.33%    1715 301374 org.apache.lucene.util.BitSet.or
>>> >   13  0.33% 98.66%     787 301387
>>> >
>>>
>org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> >   14  0.16% 98.82%     374 301426
>>> >
>>>
>org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> >   15  0.10% 98.93%     245 301382 org.apache.lucene.util.BitSet.or
>>> >   16  0.09% 99.02%     219 301381
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>>> >   17  0.09% 99.11%     207 301370 org.apache.lucene.util.BitSet.or
>>> >   18  0.06% 99.17%     153 301416 org.apache.lucene.util.BitSet.or
>>> >   19  0.06% 99.24%     151 301427 org.apache.lucene.util.BitSet.or
>>> >   20  0.06% 99.30%     151 301441
>>> org.apache.lucene.store.DataInput.readVInt
>>> >   21  0.06% 99.36%     147 301389
>java.lang.System.identityHashCode
>>> >   22  0.06% 99.42%     140 301375
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   23  0.04% 99.47%     104 301425 org.apache.lucene.util.BitSet.or
>>> >   24  0.03% 99.50%      76 301423
>>> >
>>>
>org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> >   25  0.03% 99.53%      74 301454
>>> >
>org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>>> >   26  0.03% 99.56%      65 301432
>>> > org.apache.lucene.util.BitDocIdSet$Builder.or
>>> >   27  0.02% 99.58%      53 301456
>org.apache.lucene.util.FixedBitSet.or
>>> >   28  0.02% 99.60%      52 300077
>java.lang.ClassLoader.defineClass1
>>> >   29  0.02% 99.63%      50 301464
>>> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock
>>> >   30  0.02% 99.64%      39 301438
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   31  0.02% 99.66%      37 301465
>>> >
>>>
>org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>>> >   32  0.02% 99.67%      36 301419
>>> >
>>>
>org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> > CPU SAMPLES END
>>> >
>>> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <[email protected]>
>>> wrote:
>>> >>
>>> >> Hi Tomás,
>>> >>
>>> >> I suspect this might be related to LUCENE-5938. We changed the
>default
>>> >> rewrite method for multi-term queries to load documents into a
>sparse
>>> >> bit set first first, and only upgrade to a dense bit set when we
>know
>>> >> many documents match. When there are lots of terms to intersect,
>then
>>> >> we end up spending significant cpu time to build the sparse bit
>set to
>>> >> eventually upgrade to a dense bit set like before. This might be
>what
>>> >> you are seeing.
>>> >>
>>> >> You might see the issue less with the population field because it
>has
>>> >> fewer unique values, so postings lists are longer and the
>DocIdSet
>>> >> building logic can upgrade quicker to a dense bit set.
>>> >>
>>> >> Mike noticed this slowness when working on BDK trees and we
>changed
>>> >> this first phase to use a plain int[] array that we sort and
>>> >> deduplicate instead of a more fancy sparse bit set (LUCENE-6645),
>>> >> which seemed to make things faster. Would it be possible for you
>to
>>> >> also check a 5.3 snapshot?
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
>>> >> <[email protected]> wrote:
>>> >> > Hi, I'm seeing some query performance degradation between
>4.10.4 and
>>> >> > 5.2.1.
>>> >> > It doesn't happen with all the queries, but for queries like
>range
>>> >> > queries
>>> >> > on fields with many different values the average time in 5.2.1
>is
>>> worse
>>> >> > than
>>> >> > in 4.10.4. Is anyone seeing something similar?
>>> >> >
>>> >> > Test Details:
>>> >> > * Single thread running queries continuously. I run the test
>twice
>>> for
>>> >> > each
>>> >> > Solr version.
>>> >> > * JMeter running on my laptop, Solr running on EC2, on an
>m3.xlarge
>>> >> > instance
>>> >> > with all the defaults but with 5G heap. Index in local disk
>(SSD)
>>> >> > * Plain Solr releases, nothing custom. Single Solr core, not in
>>> >> > SolrCloud
>>> >> > mode, no distributed search.
>>> >> > * "allCountries" geonames dataset (~8M small docs). No updates
>during
>>> >> > the
>>> >> > test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for
>Solr
>>> >> > 4.10.4
>>> >> > (fits entirely in RAM)
>>> >> > * jdk1.8.0_45
>>> >> >
>>> >> > Queries: 3k boolean queries (generated with terms from the
>dataset)
>>> with
>>> >> > range queries as filters on "tlongitude" and "tlatitude" fields
>with
>>> >> > randomly generated bounds, e.g.
>>> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y
>TO Z]
>>> >> >
>>> >> > Fields are:
>>> >> > <field name="tlatitude" type="tdouble"/>
>>> >> > <field name="tlongitude" type="tdouble"/>
>>> >> > Field Type:
>>> >> > <fieldType name="tdouble" class="solr.TrieDoubleField"
>>> precisionStep="8"
>>> >> > positionIncrementGap="0"/>
>>> >> >
>>> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than
>5.2.1 in
>>> >> > average.
>>> >> >
>>> >> > http://snag.gy/2yPPM.jpg
>>> >> >
>>> >> > Doing only the boolean queries show no performance difference
>between
>>> >> > 4.10
>>> >> > and 5.2, same thing if I do filters on a string field instead
>of the
>>> >> > range
>>> >> > queries.
>>> >> >
>>> >> > When using "double" field type (precisionStep="0"), the
>difference
>>> was
>>> >> > bigger:
>>> >> >
>>> >> > longitude/latitude fields:
>>> >> > <field name="longitude" type="double" docValues="true"/>
>>> >> > <field name="latitude" type="double" docValues="true"/>
>>> >> > <fieldType name="double" class="solr.TrieDoubleField"
>>> precisionStep="0"
>>> >> > positionIncrementGap="0"/>
>>> >> >
>>> >> > http://snag.gy/Vi5uk.jpg
>>> >> > I understand this is not the best field type definition for
>range
>>> >> > queries,
>>> >> > I'm just trying to understand the difference between the two
>versions
>>> >> > and
>>> >> > why.
>>> >> >
>>> >> > Performance was OK when doing range queries on the "population"
>field
>>> >> > (long), but that field doesn't have many different values, only
>300k
>>> out
>>> >> > of
>>> >> > the 8.3M docs have the population field with a value different
>to 0.
>>> On
>>> >> > the
>>> >> > other hand, doing range queries on the _version_ field did show
>a
>>> graph
>>> >> > similar to the previous one:
>>> >> >
>>> >> > <field name="_version_" type="long" indexed="true"
>stored="true"/>
>>> >> > <fieldType name="long" class="solr.TrieLongField"
>precisionStep="0"
>>> >> > positionIncrementGap="0"/>
>>> >> >
>>> >> > http://snag.gy/4tc7e.jpg
>>> >> >
>>> >> > Any idea what could be causing this? Is this expected after
>some
>>> known
>>> >> > change?
>>> >> >
>>> >> > With Solr 4.10, a single CPU core remains high during the test
>>> (close to
>>> >> > 100%), but with Solr 5.2, different cores go up and down in
>>> utilization
>>> >> > continuously. That's probably because of the different Jetty
>version
>>> I
>>> >> > suppose.
>>> >> > GC pattern looks similar in both. For both Solr versions I'm
>using
>>> the
>>> >> > settings that ship with Solr (in solr.in.sh) except for Xmx and
>Xms
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Adrien
>>> >>
>>> >>
>---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: [email protected]
>>> >> For additional commands, e-mail: [email protected]
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Adrien
>>>
>>>
>---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>
>
>-- 
>Anshum Gupta

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de

Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Reply via email to