original-brownbear commented on PR #13472:
URL: https://github.com/apache/lucene/pull/13472#issuecomment-2173609575
@msokolov they are astounding but in the opposite direction, in fact it's
concurrency that's the problem mostly.
This is `main` vs `main`, no concurrency vs 4 threads:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
BrowseDayOfYearTaxoFacets 14.81 (0.4%) 5.97
(0.4%) -59.7% ( -60% - -59%) 0.000
BrowseDateTaxoFacets 14.20 (9.0%) 5.85
(0.2%) -58.8% ( -62% - -54%) 0.000
IntNRQ 70.46 (1.3%) 30.29
(3.2%) -57.0% ( -60% - -53%) 0.000
BrowseRandomLabelTaxoFacets 11.61 (2.7%) 5.08
(0.3%) -56.3% ( -57% - -54%) 0.000
Fuzzy1 72.82 (5.7%) 44.58
(1.1%) -38.8% ( -43% - -33%) 0.000
BrowseDayOfYearSSDVFacets 7.66 (1.0%) 4.78
(0.6%) -37.6% ( -38% - -36%) 0.000
OrHighMed 74.56 (2.4%) 51.86
(3.2%) -30.4% ( -35% - -25%) 0.000
AndHighHigh 47.99 (2.7%) 34.33
(2.6%) -28.5% ( -32% - -23%) 0.000
AndHighMed 67.95 (1.5%) 52.17
(2.4%) -23.2% ( -26% - -19%) 0.000
LowSloppyPhrase 45.51 (1.7%) 37.92
(2.4%) -16.7% ( -20% - -12%) 0.000
MedPhrase 11.68 (5.0%) 9.74
(0.3%) -16.6% ( -20% - -11%) 0.000
BrowseMonthTaxoFacets 12.23 (2.6%) 10.73
(27.7%) -12.2% ( -41% - 18%) 0.378
OrHighHigh 45.32 (2.7%) 39.79
(4.2%) -12.2% ( -18% - -5%) 0.000
BrowseMonthSSDVFacets 5.49 (4.2%) 4.85
(1.1%) -11.7% ( -16% - -6%) 0.000
HighSloppyPhrase 2.01 (2.2%) 1.81
(7.4%) -10.2% ( -19% - 0%) 0.008
Wildcard 123.17 (2.5%) 115.43
(0.9%) -6.3% ( -9% - -2%) 0.000
OrNotHighLow 908.00 (2.2%) 865.22
(1.4%) -4.7% ( -8% - -1%) 0.000
LowIntervalsOrdered 57.32 (3.2%) 54.78
(3.9%) -4.4% ( -11% - 2%) 0.077
MedTermDayTaxoFacets 22.22 (0.6%) 21.57
(2.9%) -2.9% ( -6% - 0%) 0.049
BrowseDateSSDVFacets 1.46 (2.0%) 1.45
(2.1%) -0.5% ( -4% - 3%) 0.743
BrowseRandomLabelSSDVFacets 3.75 (0.6%) 3.74
(0.2%) -0.2% ( -1% - 0%) 0.551
OrHighMedDayTaxoFacets 1.20 (1.1%) 1.21
(4.4%) 0.9% ( -4% - 6%) 0.678
Respell 52.55 (1.4%) 53.25
(2.9%) 1.3% ( -2% - 5%) 0.407
AndHighMedDayTaxoFacets 11.46 (0.8%) 11.76
(2.7%) 2.6% ( 0% - 6%) 0.067
AndHighHighDayTaxoFacets 12.74 (1.3%) 13.23
(2.1%) 3.8% ( 0% - 7%) 0.002
MedSpanNear 8.28 (2.4%) 9.50
(5.0%) 14.7% ( 7% - 22%) 0.000
AndHighLow 624.28 (22.4%) 726.83
(3.6%) 16.4% ( -7% - 54%) 0.147
Fuzzy2 51.95 (23.2%) 60.73
(2.7%) 16.9% ( -7% - 55%) 0.147
MedSloppyPhrase 12.94 (4.1%) 15.57
(10.9%) 20.3% ( 5% - 36%) 0.001
Prefix3 158.65 (23.1%) 213.31
(3.8%) 34.5% ( 6% - 79%) 0.003
PKLookup 175.73 (6.3%) 247.50
(0.6%) 40.8% ( 31% - 50%) 0.000
HighPhrase 24.79 (6.3%) 37.67
(1.4%) 52.0% ( 41% - 63%) 0.000
LowPhrase 153.31 (1.4%) 244.54
(1.6%) 59.5% ( 55% - 63%) 0.000
OrHighLow 232.73 (23.8%) 371.84
(4.1%) 59.8% ( 25% - 115%) 0.000
HighSpanNear 2.93 (3.5%) 4.82
(13.3%) 64.7% ( 46% - 84%) 0.000
LowSpanNear 51.65 (6.0%) 98.03
(9.8%) 89.8% ( 69% - 112%) 0.000
HighTermTitleBDVSort 4.37 (4.4%) 8.65
(1.6%) 98.0% ( 88% - 108%) 0.000
MedIntervalsOrdered 9.46 (7.3%) 19.51
(13.2%) 106.1% ( 79% - 136%) 0.000
HighIntervalsOrdered 4.26 (6.5%) 8.81
(13.6%) 106.9% ( 81% - 135%) 0.000
LowTerm 232.68 (3.8%) 485.59
(7.5%) 108.7% ( 93% - 124%) 0.000
MedTerm 202.48 (26.4%) 535.61
(18.8%) 164.5% ( 94% - 285%) 0.000
OrHighNotLow 172.52 (3.4%) 516.56
(7.3%) 199.4% ( 182% - 217%) 0.000
OrNotHighHigh 69.11 (4.1%) 224.30
(11.7%) 224.6% ( 200% - 250%) 0.000
OrHighNotHigh 77.32 (2.7%) 271.59
(12.7%) 251.3% ( 229% - 274%) 0.000
TermDTSort 62.88 (4.5%) 224.63
(5.5%) 257.2% ( 236% - 279%) 0.000
HighTerm 106.32 (3.1%) 385.12
(25.1%) 262.2% ( 227% - 299%) 0.000
OrNotHighMed 64.14 (10.1%) 247.41
(19.2%) 285.7% ( 232% - 350%) 0.000
OrHighNotMed 78.41 (5.3%) 306.67
(10.6%) 291.1% ( 261% - 324%) 0.000
HighTermMonthSort 395.36 (38.7%) 2712.69
(16.2%) 586.1% ( 382% - 1046%) 0.000
HighTermDayOfYearSort 67.77 (4.8%) 524.03
(18.3%) 673.3% ( 620% - 731%) 0.000
HighTermTitleSort 15.06 (3.6%) 131.50
(5.9%) 773.4% ( 737% - 811%) 0.000
```
A large number of these items are actually showing extreme regressions from
forking. Even this branch is like 50% behind no concurrency on some points.
This is in fact how I got to opening this PR.
When profiling ES benchmark runs I saw a bunch of sections where the
overhead of forking for a given task was higher than the cost of just executing
that same task right away. It's a little hard to quantitatively show this in a
flame graph but the qualitative problem is here:
This is the profiling with vanilla Lucene:
<img width="2497" alt="image"
src="https://github.com/apache/lucene/assets/6490959/c7415ecc-8625-4ada-ab14-e8122b8a3d6f">
And this is the same situation with my changes in Lucene:
<img width="2519" alt="image"
src="https://github.com/apache/lucene/assets/6490959/fecc99be-449f-45d4-8c2d-9ed74f4afe00">
For weight creation, the forking overhead is still overwhelming but at least
we save the future.get overhead from putting the calling thread to sleep and
waking it up again. Only for longer running search tasks is the forking
overhead "ok" I think. As I tried to show with the `perf` output, the cache
effects of context switching often outweigh any benefits of parallization of
IO. I could even see a point where the IO parallization causes harm, not from
the IO itself but from the fact that page faulting isn't super scalable in
Linux, so even if you make an NVMe drive run faster, the contention on the page
fault handling might actually destroy any benefit from pushing the disk
(assuming a fast disk that is) harder.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]