[
https://issues.apache.org/jira/browse/LUCENE-6183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278455#comment-14278455
]
Robert Muir commented on LUCENE-6183:
-------------------------------------
I ran a benchmark indexing log data (just stored fields only, no actual
"indexing").
Stored fields merging in this case is 5x faster with BEST_SPEED and 10x faster
with BEST_COMPRESSION. Any space differences are trivial.
I will run it also with the deflate-6 in the patch, but I think it will be fine.
iwc.setMergeScheduler(new SerialMergeScheduler());
iwc.setMaxBufferedDocs(10001);
iwc.setMergePolicy(new LogDocMergePolicy());
{noformat}
BEST_SPEED (lz4)
Trunk:
timeIndexing=578014
timeForceMerging=183421
SM 0 [2015-01-15 04:05:30.380; main]: 114732 msec to merge stored fields
[6881288 docs]
-rw-rw-r-- 1 rmuir rmuir 4690955837 Jan 15 04:05 _7j0.fdt
-rw-rw-r-- 1 rmuir rmuir 2559414 Jan 15 04:05 _7j0.fdx
Patch:
timeIndexing=389148
timeForceMerging=37476
SM 0 [2015-01-15 03:49:20.538; main]: 21690 msec to merge stored fields
[6881288 docs]
-rw-rw-r-- 1 rmuir rmuir 4691200952 Jan 15 03:49 _6xq.fdt
-rw-rw-r-- 1 rmuir rmuir 2557794 Jan 15 03:49 _6xq.fdx
BEST_COMPRESSION (deflate-3)
Trunk:
timeIndexing=586511
timeForceMerging=204363
SM 0 [2015-01-15 03:33:11.906; main]: 130097 msec to merge stored fields
[6881288 docs]
-rw-rw-r-- 1 rmuir rmuir 2673871545 Jan 15 03:33 _5r6.fdt
-rw-rw-r-- 1 rmuir rmuir 731953 Jan 15 03:33 _5r6.fdx
Patch:
timeIndexing=364453
timeForceMerging=19519
SM 0 [2015-01-15 03:41:05.477; main]: 11641 msec to merge stored fields
[6881288 docs]
-rw-rw-r-- 1 rmuir rmuir 2674305752 Jan 15 03:41 _6cg.fdt
-rw-rw-r-- 1 rmuir rmuir 735374 Jan 15 03:41 _6cg.fdx
{noformat}
> Avoid re-compression on stored fields merge
> -------------------------------------------
>
> Key: LUCENE-6183
> URL: https://issues.apache.org/jira/browse/LUCENE-6183
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: Trunk, 5.1
>
> Attachments: LUCENE-6183.patch
>
>
> We removed this optimization before, it didnt really work right because it
> required things to be "aligned".
> But I think we can do it simpler and safer. This recompression is a big cpu
> hog in merge, and limits our options compression-wise (especially ones like
> LZ4-HC that are only slower at write-time).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]