I have no idea about the implementation, but the concept is certainly one
we have been looking for for quite a while in the several clusters I
manage. I'm excited to see this capability added to the system.

Will

On Mon, Jan 20, 2020, 1:55 PM Lucas Capistrant <capistrant.lu...@gmail.com>
wrote:

> Hi all,
>
> Looking for some feedback on the idea of creating a new dynamic config for
> the coordinator that allows cluster admins to pause coordination by setting
> the new config to true (default is false). By pause coordination, I mean to
> skip running any coordinator helpers every time the coordinator runs. Some
> more details are included below as well as a link to a PR with the initial
> implementation that I came up with. Any feedback helps, we want to make
> sure we are not overlooking any negative side effects!
>
> My organization is preparing to undergo some heavy maintenance on our HDFS
> cluster that backs our production Druid clusters. This involves HDFS
> downtime. Our plan was to stop the coordinators and overlords and rolling
> restart the Historical nodes during the outage to lay down the new site
> files and retain a static picture of the world for client queries to run
> against. During our tests in stage we realized the Historical's check in
> with the coordinator when starting up. Therefore, we wanted to find a way
> to leave the coordinator up, but not actually coordinate segments on the
> cluster, try run kill tasks, etc. (because HDFS is offline and we don't
> want to be talking with it until we know it is back up and healthy). Thus,
> Pull
> 9224 <https://github.com/apache/druid/pull/9224/files> was born. This
> seemed like an easy and effective way to halt coordination and keep the API
> up.
>
> We've done some small scale testing in a dev environment and I am currently
> looking into writing some time of integration test that flexes this code
> path. Despite the changes perceived simplicity, it would be nice to have
> something there.
>
> Thanks!
> Lucas Capistrant
>

Reply via email to