[
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264309#comment-15264309
]
Jeff Wartes commented on LUCENE-7258:
-------------------------------------
I'm not sure I understand how the dangers of large FBS size would be any
different with a pooling mechanism than they are right now. If a query needs
several of them, then it needs several of them, whether they're freshly
allocated or not. The only real difference I see might be whether that memory
exists in the tenured space, rather than thrashing the eden space every time.
I don't think it'd need to be per-thread. I don't mind points of
synchronization if they're tight and well understood. Allocation rate by count
is generally lower here. One thought:
https://gist.github.com/randomstatistic/87caefdea8435d6af4ad13a3f92d2698
To anticipate some objections, there are likely lockless data structures you
could use, and yes, you might prefer to control size in terms of memory instead
of count. I can think of a dozen improvements per minute I spend looking at
this. But you get the idea. Anyone anywhere who knows for *sure* they're done
with a FBS can offer it up for reuse, and anyone can potentially get some reuse
by just changing their "new" to "request".
If everybody does this, you end up with a fairly steady pool of FBS instances
large enough for most uses. If only some places use it, there's no chance of an
unbounded leak, you might get some gain, and worst-case you haven't lost much.
If nobody uses it, you've lost nothing.
Last I checked, something like a full 50% of (my) allocations by size were
FixedBitSets despite a low allocation rate by count, or I wouldn't be harping
on the subject. As a matter of principle, I'd gladly pay heap to reduce GC. The
fastest search algorithm in the world doesn't help me if I'm stuck waiting for
the collector to finish all the time.
> Tune DocIdSetBuilder allocation rate
> ------------------------------------
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/spatial
> Reporter: Jeff Wartes
> Attachments:
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch,
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch,
> allocation_plot.jpg
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies
> to see if I could tune things more.
> See here: http://i.imgur.com/7sXLAYv.jpg
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is
> terrible from an allocation standpoint if you're doing a lot of expansions,
> and is especially terrible when used to build a short-lived data structure
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory
> for the buffer as it would have needed for just the FBS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]