[
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743067#comment-16743067
]
Mike Sokolov commented on LUCENE-8635:
--------------------------------------
This looked interesting to me, too, so I did run the becnhmarks with the
change, but sadly the results were not great, which is surprising given the
Rally test results, which looked positive I think? I'm not really sure how to
interpret Rally output since I'm not familiar wit hthat tool. Does it test
query performance? Maybe there is a use case for this that is different than
what is being tested by the benchmarks; here is what I saw after a benchmark
run. This run is maybe a little unusual since I have some mods to the benchmark
(running w/8 threads executor service, enabled indexSort, topN=500 b/c of some
other tests I was running. I can re-run with more "normal" settings, but this
already looks kind of suspect.
{noformat}
Task QPS before StdDev QPS after StdDev
Pct diff
PKLookup 163.94 (2.3%) 123.50 (2.0%)
-24.7% ( -28% - -20%)
AndHighLow 5096.79 (1.2%) 4860.87 (1.5%)
-4.6% ( -7% - -2%)
Fuzzy1 711.37 (2.3%) 681.03 (2.4%)
-4.3% ( -8% - 0%)
Fuzzy2 203.67 (2.6%) 196.77 (2.6%)
-3.4% ( -8% - 1%)
AndHighMed 3460.06 (2.7%) 3346.84 (3.2%)
-3.3% ( -8% - 2%)
LowPhrase 3448.68 (2.8%) 3345.41 (2.7%)
-3.0% ( -8% - 2%)
LowSloppyPhrase 3278.72 (2.9%) 3184.03 (2.8%)
-2.9% ( -8% - 2%)
LowSpanNear 3123.68 (2.9%) 3040.74 (2.6%)
-2.7% ( -7% - 2%)
Respell 716.61 (1.7%) 699.22 (1.8%)
-2.4% ( -5% - 1%)
MedPhrase 2970.83 (3.2%) 2899.18 (3.0%)
-2.4% ( -8% - 3%)
AndHighHigh 2626.26 (3.7%) 2563.37 (4.0%)
-2.4% ( -9% - 5%)
MedSloppyPhrase 2642.66 (3.6%) 2582.02 (3.3%)
-2.3% ( -8% - 4%)
MedSpanNear 2598.01 (3.5%) 2541.03 (3.2%)
-2.2% ( -8% - 4%)
BrowseDateTaxoFacets 3467.39 (2.7%) 3399.62 (3.3%)
-2.0% ( -7% - 4%)
LowTerm 3896.13 (4.7%) 3824.62 (4.4%)
-1.8% ( -10% - 7%)
HighSpanNear 1511.97 (4.7%) 1484.42 (4.6%)
-1.8% ( -10% - 7%)
OrHighMed 1406.84 (5.7%) 1382.52 (5.8%)
-1.7% ( -12% - 10%)
OrHighLow 1484.58 (6.1%) 1460.06 (6.0%)
-1.7% ( -12% - 11%)
HighPhrase 1740.06 (4.5%) 1712.12 (4.4%)
-1.6% ( -10% - 7%)
HighSloppyPhrase 1547.60 (4.7%) 1523.48 (4.6%)
-1.6% ( -10% - 8%)
BrowseMonthTaxoFacets 9031.31 (2.1%) 8897.26 (2.6%)
-1.5% ( -6% - 3%)
OrHighHigh 1111.59 (6.3%) 1095.29 (6.5%)
-1.5% ( -13% - 12%)
HighTermDayOfYearSort 2197.07 (5.9%) 2166.89 (3.9%)
-1.4% ( -10% - 8%)
MedTerm 2621.21 (5.3%) 2586.41 (5.0%)
-1.3% ( -11% - 9%)
BrowseDayOfYearTaxoFacets 9011.41 (1.6%) 8907.44 (1.5%)
-1.2% ( -4% - 1%)
HighTermMonthSort 2449.33 (5.5%) 2421.11 (4.4%)
-1.2% ( -10% - 9%)
HighTerm 1629.92 (6.5%) 1612.72 (6.4%)
-1.1% ( -13% - 12%)
IntNRQ 980.43 (9.1%) 973.72 (8.9%)
-0.7% ( -17% - 19%)
Wildcard 1779.82 (5.7%) 1771.12 (5.5%)
-0.5% ( -11% - 11%)
Prefix3 1790.47 (5.9%) 1781.85 (5.8%)
-0.5% ( -11% - 11%)
BrowseDayOfYearSSDVFacets 2038.63 (3.0%) 2032.32 (2.1%)
-0.3% ( -5% - 4%)
BrowseMonthSSDVFacets 2295.02 (2.5%) 2303.01 (1.9%)
0.3% ( -4% - 4%)
{noformat}
> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
> Reporter: Ankit Jain
> Priority: Major
> Attachments: offheap.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This
> causes frequent JVM OOM issues if the term size gets big. A better way of
> doing this will be to lazily load FST using mmap. That ensures only the
> required terms get loaded into memory.
>
> Lucene can expose API for providing list of fields to load terms offheap. I'm
> planning to take following approach for this:
> # Add a boolean property fstOffHeap in FieldInfo
> # Pass list of offheap fields to lucene during index open (ALL can be
> special keyword for loading ALL fields offheap)
> # Initialize the fstOffHeap property during lucene index open
> # FieldReader invokes default FST constructor or OffHeap constructor based
> on fstOffHeap field
>
> I created a patch (that loads all fields offheap), did some benchmarks using
> es_rally and results look good.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]