original-brownbear commented on PR #13472:
URL: https://github.com/apache/lucene/pull/13472#issuecomment-2170618011
Lucene util benchmark results for this by running with one less thread for
this branch vs main (credit to @jpountz and @javanna for the idea) to get an
idea of the impact:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
Fuzzy1 105.06 (3.1%) 103.22
(3.6%) -1.7% ( -8% - 5%) 0.103
BrowseDayOfYearTaxoFacets 14.80 (1.0%) 14.55
(4.5%) -1.7% ( -7% - 3%) 0.096
OrHighMedDayTaxoFacets 6.60 (3.3%) 6.49
(2.1%) -1.6% ( -6% - 3%) 0.062
Respell 52.96 (2.2%) 52.56
(1.9%) -0.8% ( -4% - 3%) 0.243
BrowseDateTaxoFacets 14.91 (1.2%) 14.86
(3.9%) -0.4% ( -5% - 4%) 0.695
BrowseRandomLabelSSDVFacets 3.73 (0.5%) 3.73
(0.5%) 0.1% ( 0% - 1%) 0.714
BrowseMonthSSDVFacets 5.58 (2.0%) 5.59
(2.0%) 0.2% ( -3% - 4%) 0.763
BrowseDayOfYearSSDVFacets 7.61 (0.6%) 7.62
(0.6%) 0.2% ( 0% - 1%) 0.276
MedTermDayTaxoFacets 25.46 (0.7%) 25.52
(0.9%) 0.3% ( -1% - 1%) 0.328
AndHighHighDayTaxoFacets 15.24 (0.7%) 15.28
(0.5%) 0.3% ( -1% - 1%) 0.183
AndHighMedDayTaxoFacets 17.92 (0.7%) 17.99
(0.5%) 0.4% ( 0% - 1%) 0.023
BrowseRandomLabelTaxoFacets 11.95 (1.7%) 12.00
(1.2%) 0.4% ( -2% - 3%) 0.331
BrowseMonthTaxoFacets 12.37 (3.0%) 12.46
(1.7%) 0.7% ( -3% - 5%) 0.358
HighTermMonthSort 306.96 (16.4%) 309.25
(14.6%) 0.7% ( -26% - 38%) 0.879
BrowseDateSSDVFacets 1.45 (1.0%) 1.48
(2.4%) 1.7% ( -1% - 5%) 0.004
Prefix3 223.49 (31.2%) 228.83
(13.7%) 2.4% ( -32% - 68%) 0.754
Fuzzy2 55.36 (20.9%) 58.92
(14.4%) 6.4% ( -23% - 52%) 0.256
PKLookup 176.48 (18.1%) 194.13
(13.2%) 10.0% ( -17% - 50%) 0.045
OrNotHighLow 472.02 (2.4%) 567.48
(26.2%) 20.2% ( -8% - 50%) 0.001
HighSloppyPhrase 3.06 (3.6%) 3.69
(7.1%) 20.4% ( 9% - 32%) 0.000
AndHighLow 784.51 (24.4%) 959.85
(12.6%) 22.4% ( -11% - 78%) 0.000
Wildcard 124.97 (1.4%) 154.50
(2.5%) 23.6% ( 19% - 27%) 0.000
IntNRQ 70.70 (1.2%) 87.67
(4.0%) 24.0% ( 18% - 29%) 0.000
HighPhrase 94.06 (2.9%) 118.04
(5.3%) 25.5% ( 16% - 34%) 0.000
AndHighHigh 53.83 (1.5%) 67.85
(2.0%) 26.1% ( 22% - 30%) 0.000
LowSloppyPhrase 60.97 (2.4%) 77.49
(5.6%) 27.1% ( 18% - 35%) 0.000
LowPhrase 20.56 (1.2%) 26.27
(2.9%) 27.7% ( 23% - 32%) 0.000
MedPhrase 29.76 (1.7%) 39.75
(5.1%) 33.6% ( 26% - 40%) 0.000
LowIntervalsOrdered 15.55 (2.5%) 20.83
(4.1%) 33.9% ( 26% - 41%) 0.000
AndHighMed 99.55 (2.7%) 135.12
(2.1%) 35.7% ( 30% - 41%) 0.000
LowSpanNear 3.16 (1.8%) 4.30
(1.6%) 36.3% ( 32% - 40%) 0.000
OrHighMed 117.00 (3.8%) 164.78
(4.2%) 40.8% ( 31% - 50%) 0.000
OrHighNotHigh 89.87 (6.3%) 128.16
(36.4%) 42.6% ( 0% - 91%) 0.000
OrHighHigh 38.70 (1.8%) 55.41
(8.0%) 43.2% ( 32% - 53%) 0.000
MedSloppyPhrase 7.29 (3.5%) 10.68
(4.6%) 46.5% ( 37% - 56%) 0.000
HighSpanNear 2.54 (2.1%) 3.77
(3.2%) 48.6% ( 42% - 55%) 0.000
MedTerm 216.76 (15.6%) 324.89
(29.6%) 49.9% ( 4% - 112%) 0.000
HighTermTitleSort 13.92 (9.3%) 23.43
(8.9%) 68.3% ( 45% - 95%) 0.000
TermDTSort 68.68 (3.3%) 117.77
(12.2%) 71.5% ( 54% - 90%) 0.000
HighTerm 220.46 (5.7%) 396.67
(14.8%) 79.9% ( 56% - 106%) 0.000
OrHighLow 218.43 (26.1%) 400.99
(82.8%) 83.6% ( -20% - 260%) 0.000
HighTermTitleBDVSort 4.45 (2.1%) 8.32
(2.1%) 86.8% ( 80% - 92%) 0.000
MedSpanNear 22.62 (2.7%) 42.88
(5.8%) 89.6% ( 78% - 100%) 0.000
OrHighNotLow 329.64 (22.4%) 672.19
(30.0%) 103.9% ( 42% - 201%) 0.000
HighTermDayOfYearSort 57.50 (3.8%) 125.18
(9.8%) 117.7% ( 100% - 136%) 0.000
MedIntervalsOrdered 10.22 (4.1%) 22.48
(9.4%) 119.9% ( 102% - 139%) 0.000
HighIntervalsOrdered 2.41 (6.1%) 5.39
(10.2%) 123.3% ( 100% - 148%) 0.000
LowTerm 251.06 (10.8%) 634.45
(7.9%) 152.7% ( 120% - 192%) 0.000
OrNotHighMed 74.81 (5.4%) 221.54
(14.8%) 196.1% ( 166% - 228%) 0.000
OrNotHighHigh 95.65 (7.1%) 314.65
(21.1%) 228.9% ( 187% - 276%) 0.000
OrHighNotMed 59.11 (6.5%) 206.56
(15.0%) 249.4% ( 214% - 289%) 0.000
```
This is wikimediumall, 3 threads for main and 2 threads for this branch.
Effectively no regressions but some considerable speedups.
The reason for this is the obvious reduction in context switching. We go
from perf output for `main`:
```
Performance counter stats for process id '157418':
574,008,686,445 cycles
1,130,739,465,717 instructions # 1.97 insn per cycle
2,599,704,747 cache-misses
429,542 context-switches
49.053969801 seconds time elapsed
```
to this branch
```
Performance counter stats for process id '157292':
526,556,069,563 cycles
1,122,410,787,297 instructions # 2.13 insn per cycle
2,420,210,310 cache-misses
385,991 context-switches
41.044785986 seconds time elapsed
```
-> same number of instructions need to be executed pretty much, but they run
in fewer cycles and encounter fewer cache misses.
This is also seen in the profile of where the CPU time goes:
main looks like this:
```
17.21% 328981
org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
5.75% 109925
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD()
5.24% 100195
org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#countHit()
5.17% 98733
org.apache.lucene.util.packed.DirectMonotonicReader#get()
4.11% 78637
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
3.98% 76164
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.57% 49115
org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$EverythingEnum#nextPosition()
1.82% 34823
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
1.73% 33136
jdk.internal.foreign.MemorySessionImpl#checkValidStateRaw()
1.63% 31172
java.util.concurrent.atomic.AtomicLong#incrementAndGet()
```
while this branch looks as follows:
```
10.79% 183254
org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
5.89% 100099
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD()
5.62% 95387
org.apache.lucene.util.packed.DirectMonotonicReader#get()
4.59% 77917
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
4.48% 76145
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
3.20% 54407
org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#countHit()
2.77% 47088
org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$EverythingEnum#nextPosition()
2.06% 34965
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
1.91% 32484
jdk.internal.foreign.MemorySessionImpl#checkValidStateRaw()
1.81% 30763
org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$BlockImpactsPostingsEnum#advance()
1.71% 28966
org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
1.66% 28206
org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$EverythingEnum#advance()
```
-> a lot less time goes into `collect` which goes through contended counter
increments.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]