[
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753609#comment-16753609
]
Ankit Jain commented on LUCENE-8635:
------------------------------------
Results for bigger data sets:
{code| title=wikimedium10m, java ...... -DFST.offheap=true|borderStyle=solid}
TaskQPS baseline StdDevQPS candidate StdDev
Pct diff
PKLookup 117.59 (3.0%) 107.48 (2.3%)
-8.6% ( -13% - -3%)
OrHighNotMed 1085.05 (2.1%) 1056.43 (2.2%)
-2.6% ( -6% - 1%)
OrNotHighLow 976.94 (2.4%) 955.32 (1.8%)
-2.2% ( -6% - 2%)
OrHighNotLow 1152.58 (2.6%) 1128.25 (2.0%)
-2.1% ( -6% - 2%)
Fuzzy1 83.10 (2.6%) 81.54 (2.5%)
-1.9% ( -6% - 3%)
IntNRQ 88.53 (16.2%) 86.92 (14.7%)
-1.8% ( -28% - 34%)
OrNotHighHigh 886.10 (1.7%) 870.26 (1.4%)
-1.8% ( -4% - 1%)
OrHighNotHigh 838.32 (1.8%) 824.15 (1.9%)
-1.7% ( -5% - 2%)
BrowseMonthTaxoFacets 8099.58 (2.0%) 7968.65 (1.8%)
-1.6% ( -5% - 2%)
Fuzzy2 55.95 (2.7%) 55.08 (2.5%)
-1.6% ( -6% - 3%)
OrNotHighMed 764.40 (2.3%) 752.56 (1.7%)
-1.5% ( -5% - 2%)
BrowseDayOfYearTaxoFacets 8081.37 (2.1%) 7957.27 (2.7%)
-1.5% ( -6% - 3%)
LowTerm 1941.88 (5.2%) 1912.71 (4.0%)
-1.5% ( -10% - 8%)
HighTermMonthSort 78.12 (10.8%) 76.99 (14.3%)
-1.4% ( -23% - 26%)
Respell 61.23 (2.7%) 60.57 (2.7%)
-1.1% ( -6% - 4%)
HighTerm 1526.16 (3.1%) 1510.23 (1.8%)
-1.0% ( -5% - 4%)
MedTerm 1814.44 (3.7%) 1797.69 (2.1%)
-0.9% ( -6% - 5%)
OrHighLow 443.93 (2.4%) 439.92 (2.5%)
-0.9% ( -5% - 4%)
AndHighLow 577.60 (2.0%) 573.43 (1.4%)
-0.7% ( -4% - 2%)
Wildcard 62.79 (5.8%) 62.54 (6.1%)
-0.4% ( -11% - 12%)
BrowseDayOfYearSSDVFacets 11.56 (8.0%) 11.55 (8.2%)
-0.0% ( -15% - 17%)
Prefix3 165.76 (8.7%) 165.70 (9.2%)
-0.0% ( -16% - 19%)
MedSpanNear 51.40 (2.3%) 51.48 (2.5%)
0.2% ( -4% - 5%)
BrowseMonthSSDVFacets 14.45 (13.6%) 14.47 (13.2%)
0.2% ( -23% - 31%)
HighTermDayOfYearSort 44.98 (6.8%) 45.05 (5.3%)
0.2% ( -11% - 13%)
OrHighMed 111.81 (3.0%) 112.01 (2.8%)
0.2% ( -5% - 6%)
LowSpanNear 47.14 (2.4%) 47.24 (2.5%)
0.2% ( -4% - 5%)
MedSloppyPhrase 48.25 (1.9%) 48.37 (2.3%)
0.2% ( -3% - 4%)
LowSloppyPhrase 35.36 (2.2%) 35.46 (2.5%)
0.3% ( -4% - 5%)
AndHighMed 144.05 (3.6%) 144.53 (2.7%)
0.3% ( -5% - 6%)
HighSpanNear 6.92 (3.5%) 6.95 (3.5%)
0.5% ( -6% - 7%)
MedPhrase 25.88 (2.4%) 26.00 (1.4%)
0.5% ( -3% - 4%)
AndHighHigh 38.77 (4.0%) 38.98 (3.9%)
0.5% ( -7% - 8%)
OrHighHigh 27.47 (3.2%) 27.63 (3.1%)
0.6% ( -5% - 7%)
LowPhrase 91.71 (4.3%) 92.56 (3.5%)
0.9% ( -6% - 9%)
HighSloppyPhrase 18.28 (3.2%) 18.45 (3.6%)
0.9% ( -5% - 8%)
HighPhrase 20.07 (3.9%) 20.35 (1.3%)
1.4% ( -3% - 6%)
BrowseDateTaxoFacets 2.37 (0.4%) 2.41 (0.2%)
1.4% ( 0% - 2%)
{code}
> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
> Reporter: Ankit Jain
> Priority: Major
> Attachments: fst-offheap-ra-rev.patch, offheap.patch,
> optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This
> causes frequent JVM OOM issues if the term size gets big. A better way of
> doing this will be to lazily load FST using mmap. That ensures only the
> required terms get loaded into memory.
>
> Lucene can expose API for providing list of fields to load terms offheap. I'm
> planning to take following approach for this:
> # Add a boolean property fstOffHeap in FieldInfo
> # Pass list of offheap fields to lucene during index open (ALL can be
> special keyword for loading ALL fields offheap)
> # Initialize the fstOffHeap property during lucene index open
> # FieldReader invokes default FST constructor or OffHeap constructor based
> on fstOffHeap field
>
> I created a patch (that loads all fields offheap), did some benchmarks using
> es_rally and results look good.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]