[jira] [Commented] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

Jeff Wartes (JIRA) Fri, 29 Apr 2016 09:40:36 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264309#comment-15264309
 ]


Jeff Wartes commented on LUCENE-7258:
-------------------------------------

I'm not sure I understand how the dangers of large FBS size would be any 
different with a pooling mechanism than they are right now. If a query needs 
several of them, then it needs several of them, whether they're freshly 
allocated or not. The only real difference I see might be whether that memory 
exists in the tenured space, rather than thrashing the eden space every time. 

I don't think it'd need to be per-thread. I don't mind points of 
synchronization if they're tight and well understood. Allocation rate by count 
is generally lower here. One thought:
https://gist.github.com/randomstatistic/87caefdea8435d6af4ad13a3f92d2698

To anticipate some objections, there are likely lockless data structures you 
could use, and yes, you might prefer to control size in terms of memory instead 
of count. I can think of a dozen improvements per minute I spend looking at 
this. But you get the idea. Anyone anywhere who knows for *sure* they're done 
with a FBS can offer it up for reuse, and anyone can potentially get some reuse 
by just changing their "new" to "request". 
If everybody does this, you end up with a fairly steady pool of FBS instances 
large enough for most uses. If only some places use it, there's no chance of an 
unbounded leak, you might get some gain, and worst-case you haven't lost much. 
If nobody uses it, you've lost nothing.

Last I checked, something like a full 50% of (my) allocations by size were 
FixedBitSets despite a low allocation rate by count, or I wouldn't be harping 
on the subject. As a matter of principle, I'd gladly pay heap to reduce GC. The 
fastest search algorithm in the world doesn't help me if I'm stuck waiting for 
the collector to finish all the time.


> Tune DocIdSetBuilder allocation rate
> ------------------------------------
>
>                 Key: LUCENE-7258
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7258
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>            Reporter: Jeff Wartes
>         Attachments: 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> allocation_plot.jpg
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

Reply via email to