[jira] [Commented] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

Jeff Wartes (JIRA) Sun, 01 May 2016 10:08:56 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265829#comment-15265829
 ]


Jeff Wartes commented on LUCENE-7258:
-------------------------------------

There are actually three threads going on this ticket right now, there’s the 
“what threshold and expansion to use for geospatial” that I’d originally 
intended and provided a patch for, there’s the “what expansion for 
DocIdSetBuilder is generically optimal”, and there’s the “FBS is 50% of my 
allocation rate, can we pool” conversation.

I think the latter is a worthy conversation, and I don’t have a better place 
for it, so I’m going to continue to respond to the comments along those lines, 
(with apologies for the book I’m writing here) but I wanted to point out the 
divergence.

So, I certainly understand a knee-jerk reaction around using object pools of 
any kind. Yes, this IS what the JVM is for. It’s easier and simpler and lower 
maintenance to just use what’s provided. But I could also argue that 
Arrays.sort has all those same positive attributes and that hasn’t stopped 
several hand-written sort algorithms get into this codebase. The question is 
actually whether the easy and simple thing is good enough, or whether the 
harder thing has a sufficient offsetting benefit. Everyone on this thread is a 
highly experienced programmer, we all know this.

In this case, that means the question is actually whether the allocation rate 
is “good enough” or if there's a sufficiently offsetting opportunity for 
improvement, and arguments should ideally come from that analysis. 

I can empirically state that for my large Solr index, that GC pause is the 
single biggest detriment to my 90+th percentile query latency. Put another way, 
Lucene is fantastically fast, at least when the JVM isn’t otherwise occupied. 
Because of shard fan-out, a per-shard p90 latency very quickly becomes a p50 
latency for queries overall. (Even with mitigations like SOLR-4449) 
I don’t think there’s anything particularly unique to my use-case in anything I 
just said, except possibly the word “large”.

As such, I consider this an opportunity for improvement, so I’ve suggested a 
mitigation strategy. It clearly has some costs. I’d be delighted to entertain 
any alternative strategies.

Actually, [~dsmiley] did bring up one alternative suggestion for improvement, 
so let’s talk about -Xmn:

First, let’s assume that Lucene’s policy on G1 hasn’t changed, and we’re still 
talking about ParNew/CMS. Second, with the exception of a few things like 
cache, most of the allocations in a Solr/Lucene index are very short-lived. So 
it follows that given a young generation of sufficient size, the tenured 
generation would actually see very little activity.

The major disadvantage to just using a huge young generation then is that there 
aren’t any concurrent young-generation collectors. The bigger it is, the less 
frequently you need to collect, but the longer the stop-the-world GC pause when 
you do.
On the other end of the scale, a very small young space means shorter pauses, 
but far more frequent. Since almost all garbage is short-lived, maybe now 
you're doing young-collections so often that you’ve got the tenured collector 
doing a bunch of the work cleaning up short-lived objects too. (This can 
actually be a good thing, since the CMS collector is mostly concurrent)

There’s some theoretical size that optimizes frequency vs pause for averaged 
latency. Perhaps even by deliberately allowing some premature overflow into 
tenured simply because tenured can be collected concurrently. This kind of 
thing is extremely delicate to tune for though, especially since query rate 
(and query type distribution) can fluctuate. It’s easy to get it wrong, such 
that a sudden large-allocation slams past the rate CMS was expecting and 
triggers a full-heap stop-the-world pause.

I’m focusing on FBS here because: 1. _Fifty Percent_. 2. These are generally 
larger objects, so mitigating those allocations seemed like a good way to 
mitigate unexpected changes in allocation rate and allow more stable tuning.

There’s probably also at least one Jira issue around looking at object count 
allocation rate (vs size) since I suspect the single biggest factor in 
collector pause is the object count. Certainly I can point to objects that get 
allocated (by count) in orders of magnitude greater frequency than then next 
highest count. But since I don’t have a good an understanding of the use cases, 
let alone have any suggestions yet, I’ve left that for another time.


> Tune DocIdSetBuilder allocation rate
> ------------------------------------
>
>                 Key: LUCENE-7258
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7258
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>            Reporter: Jeff Wartes
>         Attachments: 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> allocation_plot.jpg
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

Reply via email to