[
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744344#comment-16744344
]
Mike Sokolov commented on LUCENE-8635:
--------------------------------------
Following a suggestion from ~mikemccand I tried a slightly different version of
this, making use of randomAccessSlice to avoid some calls to seek(), and this
gives better perf in the benchmarks. I also spent some time trying to
understand FST's backwards-seeking behavior. Based on my crude understanding,
and comment from Mike again, it seems as if with some work it would be possible
to make it more naturally forward-seeking, but it's not obvious that in general
you would get more local cache-friendly access patterns from that. Still you
might; probably needs some experimentation to know for sure. Here are the
benchmark #s from the random-access patch:
{noformat}
Task QPS before StdDev QPS after StdDev
Pct diff
PKLookup 133.62 (2.2%) 123.74 (1.5%)
-7.4% ( -10% - -3%)
AndHighLow 3411.49 (3.2%) 3268.04 (3.1%)
-4.2% ( -10% - 2%)
BrowseDayOfYearTaxoFacets 10067.18 (4.3%) 9828.65 (3.5%)
-2.4% ( -9% - 5%)
LowTerm 3567.48 (1.2%) 3489.27 (1.7%)
-2.2% ( -5% - 0%)
Fuzzy1 147.67 (3.1%) 144.65 (2.4%)
-2.0% ( -7% - 3%)
BrowseMonthTaxoFacets 10102.27 (4.2%) 9901.49 (4.1%)
-2.0% ( -9% - 6%)
Fuzzy2 62.00 (2.8%) 60.87 (2.4%)
-1.8% ( -6% - 3%)
MedTerm 2694.87 (2.0%) 2647.08 (2.1%)
-1.8% ( -5% - 2%)
AndHighMed 1171.52 (2.7%) 1154.25 (2.8%)
-1.5% ( -6% - 4%)
HighTerm 2061.53 (2.3%) 2032.84 (2.5%)
-1.4% ( -6% - 3%)
MedSloppyPhrase 266.60 (3.4%) 263.01 (4.2%)
-1.3% ( -8% - 6%)
OrHighHigh 278.90 (4.0%) 275.35 (4.7%)
-1.3% ( -9% - 7%)
HighSloppyPhrase 107.68 (5.5%) 106.34 (5.6%)
-1.2% ( -11% - 10%)
Respell 118.26 (2.1%) 116.95 (2.2%)
-1.1% ( -5% - 3%)
AndHighHigh 472.93 (4.4%) 467.78 (3.3%)
-1.1% ( -8% - 6%)
OrHighMed 755.21 (2.9%) 748.34 (3.3%)
-0.9% ( -6% - 5%)
MedSpanNear 308.31 (3.3%) 305.59 (3.8%)
-0.9% ( -7% - 6%)
Wildcard 869.37 (3.5%) 862.74 (1.9%)
-0.8% ( -5% - 4%)
HighTermMonthSort 871.33 (7.1%) 865.80 (6.1%)
-0.6% ( -12% - 13%)
MedPhrase 449.39 (3.0%) 446.55 (2.4%)
-0.6% ( -5% - 4%)
LowSpanNear 391.10 (3.3%) 388.77 (3.8%)
-0.6% ( -7% - 6%)
LowSloppyPhrase 406.57 (3.8%) 404.23 (3.6%)
-0.6% ( -7% - 7%)
HighPhrase 239.84 (3.7%) 238.78 (3.3%)
-0.4% ( -7% - 6%)
Prefix3 1230.56 (5.0%) 1225.52 (2.9%)
-0.4% ( -7% - 7%)
HighSpanNear 107.34 (5.2%) 107.20 (5.3%)
-0.1% ( -10% - 10%)
LowPhrase 438.52 (3.4%) 438.14 (2.5%)
-0.1% ( -5% - 5%)
BrowseDateTaxoFacets 11.14 (4.0%) 11.16 (7.0%)
0.2% ( -10% - 11%)
HighTermDayOfYearSort 606.85 (6.7%) 608.65 (5.4%)
0.3% ( -11% - 13%)
IntNRQ 987.08 (12.5%) 990.96 (13.5%)
0.4% ( -22% - 30%)
OrHighLow 553.72 (3.2%) 558.09 (3.5%)
0.8% ( -5% - 7%)
BrowseDayOfYearSSDVFacets 38.23 (3.9%) 38.66 (4.1%)
1.1% ( -6% - 9%)
BrowseMonthSSDVFacets 42.05 (3.5%) 42.57 (3.7%)
1.2% ( -5% - 8%)
{noformat}
> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
> Reporter: Ankit Jain
> Priority: Major
> Attachments: offheap.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This
> causes frequent JVM OOM issues if the term size gets big. A better way of
> doing this will be to lazily load FST using mmap. That ensures only the
> required terms get loaded into memory.
>
> Lucene can expose API for providing list of fields to load terms offheap. I'm
> planning to take following approach for this:
> # Add a boolean property fstOffHeap in FieldInfo
> # Pass list of offheap fields to lucene during index open (ALL can be
> special keyword for loading ALL fields offheap)
> # Initialize the fstOffHeap property during lucene index open
> # FieldReader invokes default FST constructor or OffHeap constructor based
> on fstOffHeap field
>
> I created a patch (that loads all fields offheap), did some benchmarks using
> es_rally and results look good.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]