Much appreciated.
On Wed, Oct 1, 2014 at 2:11 PM, Bill Farner wrote:
> Ok, when you have bandwidth to upgrade again feel free to let us know if
> you would like somebody standing by in IRC to assist.
>
> -=Bill
>
> On Wed, Oct 1, 2014 at 11:04 AM, Isaac Councill wrote:
>
> > Thanks! Comment dro
Ok, when you have bandwidth to upgrade again feel free to let us know if
you would like somebody standing by in IRC to assist.
-=Bill
On Wed, Oct 1, 2014 at 11:04 AM, Isaac Councill wrote:
> Thanks! Comment dropped on AURORA-634.
>
> As for the error I encountered, I saw "Storage is not READY"
Thanks! Comment dropped on AURORA-634.
As for the error I encountered, I saw "Storage is not READY" exceptions on
all scheduler instances, and no leader was elected. Nothing other than that
jumped out as unusual in the logs - no ZK_* warnings/errors etc.
Aurora came up before zookeeper, but auror
Firstly, please chime in on AURORA-634 to nudge us to formally document
this.
There's a wealth of instrumentation exposed at /vars on the scheduler. To
rattle off a few that are a good fit for monitoring:
task_store_LOST
If this value is increasing at a high rate, it's a sign of trouble. Note:
I've been having a bad time with the great AWS Xen reboot, and thought it
would be a good time to revamp monitoring among other things.
Do you have any recommendations for monitoring scheduler health? I've got
my own ideas, but am more interested in learning about twitter prod
monitoring.
For co