[
https://issues.apache.org/jira/browse/LUCENE-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613789#comment-14613789
]
Markus Heiden commented on LUCENE-6365:
---------------------------------------
@Michael: The removal of @lucene.experimental was a mistake of mine during
merging.Thanks for your rework and your patience.
@Uwe: I measured the cpu runtime in sampling mode, so (almost) no additional
overhead should occur. I did the reuse because there is not just one allocation
of the array, but many. During runtime the array will be resized over and over
again, because the initial size was rather small (4 entries). I changed that to
16 so the resizing occurs less frequent. My test case was the build of
dictionary of 100000s of words, so even small things accumulate.
A better solution to that problem would be, if automatons know the length of
their longest word. In that case that above mentioned array could initially be
sized right. But I don't know, if that length is always known during
construction of automatons.
> Optimized iteration of finite strings
> -------------------------------------
>
> Key: LUCENE-6365
> URL: https://issues.apache.org/jira/browse/LUCENE-6365
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/other
> Affects Versions: 5.0
> Reporter: Markus Heiden
> Priority: Minor
> Labels: patch, performance
> Attachments: FiniteStrings_noreuse.patch, FiniteStrings_reuse.patch,
> LUCENE-6365.patch
>
>
> Replaced Operations.getFiniteStrings() by an optimized FiniteStringIterator.
> Benefits:
> Avoid huge hash set of finite strings.
> Avoid massive object/array creation during processing.
> "Downside":
> Iteration order changed, so when iterating with a limit, the result may
> differ slightly. Old: emit current node, if accept / recurse. New: recurse /
> emit current node, if accept.
> The old method Operations.getFiniteStrings() still exists, because it eases
> the tests. It is now implemented by use of the new FiniteStringIterator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]