[
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760118#comment-16760118
]
Ankit Jain commented on LUCENE-8635:
------------------------------------
I have created [pull request|https://github.com/apache/lucene-solr/pull/563]
with the proposed changes. Though surprisingly, I still see some impact on the
PKLookup performance.
{code:title=wikimedium10m|borderStyle=solid}
TaskQPS baseline StdDevQPS candidate StdDev
Pct diff
PKLookup 117.45 (2.2%) 108.72 (2.3%)
-7.4% ( -11% - -3%)
OrHighNotMed 1094.23 (2.5%) 1057.88 (2.7%)
-3.3% ( -8% - 1%)
OrHighNotLow 1047.30 (1.7%) 1012.91 (2.5%)
-3.3% ( -7% - 1%)
Fuzzy2 44.10 (2.3%) 42.71 (2.7%)
-3.2% ( -7% - 1%)
OrNotHighLow 1022.67 (2.5%) 992.28 (2.4%)
-3.0% ( -7% - 1%)
BrowseDayOfYearTaxoFacets 7907.19 (2.0%) 7677.99 (2.7%)
-2.9% ( -7% - 1%)
OrNotHighMed 866.37 (1.9%) 843.10 (2.3%)
-2.7% ( -6% - 1%)
LowTerm 2103.58 (3.5%) 2048.98 (3.6%)
-2.6% ( -9% - 4%)
BrowseMonthTaxoFacets 7883.86 (2.0%) 7692.48 (2.1%)
-2.4% ( -6% - 1%)
Fuzzy1 64.44 (1.9%) 62.88 (2.3%)
-2.4% ( -6% - 1%)
OrNotHighHigh 779.27 (2.0%) 761.04 (2.1%)
-2.3% ( -6% - 1%)
Respell 55.60 (2.6%) 54.34 (2.3%)
-2.3% ( -7% - 2%)
OrHighNotHigh 877.28 (2.2%) 858.10 (2.5%)
-2.2% ( -6% - 2%)
BrowseMonthSSDVFacets 14.85 (7.9%) 14.57 (10.7%)
-1.9% ( -18% - 18%)
MedTerm 1984.26 (3.6%) 1947.76 (2.3%)
-1.8% ( -7% - 4%)
AndHighLow 718.71 (1.5%) 706.06 (1.6%)
-1.8% ( -4% - 1%)
OrHighLow 523.40 (2.5%) 515.56 (2.4%)
-1.5% ( -6% - 3%)
HighTerm 1381.10 (2.9%) 1360.80 (2.7%)
-1.5% ( -6% - 4%)
HighTermMonthSort 120.45 (12.3%) 119.00 (16.4%)
-1.2% ( -26% - 31%)
BrowseDayOfYearSSDVFacets 11.55 (9.7%) 11.45 (10.0%)
-0.8% ( -18% - 20%)
AndHighMed 155.15 (2.6%) 154.25 (2.4%)
-0.6% ( -5% - 4%)
OrHighMed 88.00 (2.5%) 87.85 (2.7%)
-0.2% ( -5% - 5%)
LowPhrase 80.53 (1.6%) 80.40 (1.4%)
-0.2% ( -3% - 2%)
AndHighHigh 41.91 (4.2%) 41.86 (2.9%)
-0.1% ( -6% - 7%)
MedPhrase 46.29 (1.4%) 46.33 (1.5%)
0.1% ( -2% - 3%)
IntNRQ 127.54 (0.4%) 127.76 (0.4%)
0.2% ( 0% - 1%)
HighTermDayOfYearSort 48.59 (5.1%) 48.71 (6.0%)
0.2% ( -10% - 12%)
LowSloppyPhrase 13.04 (4.0%) 13.08 (4.3%)
0.3% ( -7% - 8%)
MedSloppyPhrase 19.48 (2.3%) 19.54 (2.4%)
0.3% ( -4% - 5%)
OrHighHigh 23.60 (3.0%) 23.68 (2.9%)
0.3% ( -5% - 6%)
HighPhrase 20.25 (2.4%) 20.32 (1.8%)
0.3% ( -3% - 4%)
HighSloppyPhrase 9.29 (3.3%) 9.32 (3.2%)
0.4% ( -5% - 7%)
LowSpanNear 25.70 (3.8%) 25.89 (3.9%)
0.7% ( -6% - 8%)
MedSpanNear 30.46 (4.1%) 30.69 (4.3%)
0.7% ( -7% - 9%)
HighSpanNear 14.41 (4.3%) 14.60 (4.7%)
1.3% ( -7% - 10%)
Wildcard 70.08 (10.3%) 71.09 (6.1%)
1.4% ( -13% - 19%)
BrowseDateTaxoFacets 2.37 (0.2%) 2.41 (0.3%)
1.5% ( 0% - 1%)
Prefix3 86.71 (11.4%) 89.04 (6.8%)
2.7% ( -13% - 23%)
{code}
> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
> Reporter: Ankit Jain
> Priority: Major
> Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch,
> offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This
> causes frequent JVM OOM issues if the term size gets big. A better way of
> doing this will be to lazily load FST using mmap. That ensures only the
> required terms get loaded into memory.
>
> Lucene can expose API for providing list of fields to load terms offheap. I'm
> planning to take following approach for this:
> # Add a boolean property fstOffHeap in FieldInfo
> # Pass list of offheap fields to lucene during index open (ALL can be
> special keyword for loading ALL fields offheap)
> # Initialize the fstOffHeap property during lucene index open
> # FieldReader invokes default FST constructor or OffHeap constructor based
> on fstOffHeap field
>
> I created a patch (that loads all fields offheap), did some benchmarks using
> es_rally and results look good.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]