[
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886076#comment-16886076
]
ASF subversion and git services commented on LUCENE-8920:
---------------------------------------------------------
Commit d8b510bead86d4c6ec59063519894d207ee99d5e in lucene-solr's branch
refs/heads/branch_8_2 from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d8b510b ]
LUCENE-8920: disable FST direct-addressing pending size reduction
revert to FST version 6
removed CHANGES entry
> Reduce size of FSTs due to use of direct-addressing encoding
> -------------------------------------------------------------
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Mike Sokolov
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization.
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance,
> the size increase we're seeing while building (or perhaps do a preliminary
> pass before building) in order to decide whether to apply the encoding.
> bq. we could also make the encoding a bit more efficient. For instance I
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes)
> which make gaps very costly. Associating each label with a dense id and
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset
> instead of doing label->arc directly could save a lot of space in some cases?
> Also it seems that we are repeating the label in the arc metadata when
> array-with-gaps is used, even though it shouldn't be necessary since the
> label is implicit from the address?
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]