[
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764370#comment-16764370
]
Ankit Jain commented on LUCENE-8635:
------------------------------------
I added print statements while running the benchmarks, and the classification
looks correct:
```
Initializing field offheap start=55 field=Date.taxonomy
Initializing field offheap start=76 field=DayOfYear.sortedset
Initializing field offheap start=97 field=Month.sortedset
Initializing field offheap start=118 field=body
Initializing field onheap start=267 field=date
Initializing field onheap start=289 field=groupend
Initializing field onheap start=311 field=id
Initializing field onheap start=333 field=title
```
Though, when I restricted tests to PKLookups only using
comp.addTaskPattern('PKLookup') in localrun.py, results look as expected:
```
wikimedium10k
TaskQPS baseline StdDevQPS candidate StdDev Pct diff
PKLookup 163.29 (1.6%) 164.80 (2.1%) 0.9% (-2%
- 4%)
```
```
wikimedium10m
TaskQPS baseline StdDevQPS candidate StdDev Pct diff
PKLookup 114.29 (1.7%) 114.73 (1.2%) 0.4% ( -2% -
3%)
```
I guess we are good then.
> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
> Reporter: Ankit Jain
> Priority: Major
> Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch,
> offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This
> causes frequent JVM OOM issues if the term size gets big. A better way of
> doing this will be to lazily load FST using mmap. That ensures only the
> required terms get loaded into memory.
>
> Lucene can expose API for providing list of fields to load terms offheap. I'm
> planning to take following approach for this:
> # Add a boolean property fstOffHeap in FieldInfo
> # Pass list of offheap fields to lucene during index open (ALL can be
> special keyword for loading ALL fields offheap)
> # Initialize the fstOffHeap property during lucene index open
> # FieldReader invokes default FST constructor or OffHeap constructor based
> on fstOffHeap field
>
> I created a patch (that loads all fields offheap), did some benchmarks using
> es_rally and results look good.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]