Heya, sorry for the delay (and missing the sync, i'll try to get better about showing up). I've fixed a handful of coordinator bugs post 0.12.0 (and not backported to 0.12.1), some of these issues go far back, some back to when segment assignment priority for different tiers of historicals was introduced, some are just some oddities on the behavior of the balancer that I am unsure when were introduced. This is the complete list of fixes that are currently in 0.12.2 afaik, with a small description (see PRs and associated issues for more details)
https://github.com/apache/incubator-druid/pull/5528 fixed an issue that movement did not drop the segment from the server the segment was being moved from (this one goes waaaay back, to batch segment announcements) https://github.com/apache/incubator-druid/pull/5529 changed behavior of drop to use the balancer to choose where to drop segments from, based on behavior observed caused by the issue of 5528 https://github.com/apache/incubator-druid/pull/5532 fixes an issue where primary assignment during load rule processing would assign an unavailable segment to every server with capacity until at least 1 historical had the segment (and drop it from all the others if they all loaded at the same time), choking load queues from doing useful things https://github.com/apache/incubator-druid/pull/5555 fixed a way for http based coordinator to get stuck loading or dropping segments and a companion PR that fixed a lambda that wasn't friendly to older jvm versions https://github.com/apache/incubator-druid/pull/5591 https://github.com/apache/incubator-druid/pull/5888 makes balancing honor a load rule max load queue depth setting to help prevent movement from starving loading https://github.com/apache/incubator-druid/pull/5928 doesn't really fix anything, just does an early return to avoid doing pointless work Additionally, there are a couple of pairs of PRs that are not currently in 0.12.2: https://github.com/druid-io/druid/pull/5927 and https://github.com/apache/incubator-druid/pull/5929 and their respective fixes which have yet to be merged, but have been performing well on our test cluster, https://github.com/apache/incubator-druid/pull/5987 and https://github.com/apache/incubator-druid/pull/5988. One of them makes balancing behave in a way more consistent with expectations by always trying to move maxSegmentsToMove and more correctly tracking what the balancer is doing, and one just adds better logging (without much extra log volume) due to frustrations I had chasing down all these other issues. Both of these were slated for 0.12.2 but were pulled out because of the issues (which the open PRs fix afaict). I would be in favor of sliding them in there, pending review of the fixes, but understand if they won't make the cut since they maybe fall a bit more on the cosmetic side of things. I'm pretty happy of the state of things on our test cluster right now, but without these 4 patches things should still be operating more correctly than they were before, just the differences being with balancing moving somewhere between 0 and max, and less useful logging making future issues (which I have no doubts still lurk) harder to diagnose. Cheers, Clint On Tue, Jul 10, 2018 at 10:30 AM, Charles Allen <cral...@apache.org> wrote: > Brought this up in the dev sync: > > I saw a lot of PRs and fixes for Coordinator segment balancing related to > some regressions that happened in 0.12.x . Is anyone able to give a rundown > of the state of coordinator segment management for the 0.12.2 RC? > > On Tue, Jul 10, 2018 at 10:26 AM Nishant Bangarwa < > nbanga...@hortonworks.com> > wrote: > > > +1 > > > > -- > > Nishant Bangarwa > > > > Hortonworks > > > > On 7/10/18, 3:57 AM, "Jihoon Son" <jihoon...@apache.org> wrote: > > > > Related thread: > > > > https://lists.apache.org/thread.html/76755aecfddb1210fcc3f08b1d4631 > 784a8a5eede64d22718c271841@%3Cdev.druid.apache.org%3E > > . > > > > Jihoon > > > > On Mon, Jul 9, 2018 at 3:25 PM Jihoon Son <jihoon...@apache.org> > > wrote: > > > > > Hi all, > > > > > > We have no open issues and PRs for 0.12.2 ( > > > https://github.com/apache/incubator-druid/milestone/27). The > 0.12.2 > > > branch is already available and all PRs for 0.12.2 have merged into > > that > > > branch. > > > > > > Let's vote on releasing RC1. Here is my +1. > > > > > > This is a non-ASF release. > > > > > > Best, > > > Jihoon > > > > > > > > > >