Re: [DISCUSS] KIP-143: Controller Health Metrics

Becket Qin Thu, 11 May 2017 09:54:08 -0700

Hi Ismael,

Yes, a follow up KIP after the controller code settles down sounds good.


Thanks,

Jiangjie (Becket) Qin

On Wed, May 10, 2017 at 6:11 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> Thanks Jun. Discussed this offline with Onur and Jun and I believe there's
> agreement so updated the KIP:
>
> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?
> pageId=69407758&selectedPageVersions=8&selectedPageVersions=7
>
> Ismael
>
> On Wed, May 10, 2017 at 4:46 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Hi, Onur,
> >
> > We probably don't want to do the 1-to-1 mapping from the event type to
> the
> > controller state since some of the event types are implementation
> details.
> > How about the following mapping?
> >
> > 0 - idle
> > 1 - controller change (Startup, ControllerChange, Reelect)
> > 2 - broker change (BrokerChange)
> > 3 - topic creation/change (TopicChange, PartitionModifications)
> > 4 - topic deletion (TopicDeletion, TopicDeletionStopReplicaResult)
> > 5 - partition reassigning (PartitionReassignment,
> > PartitionReassignmentIsrChange)
> > 6 - auto leader balancing (AutoPreferredReplicaLeaderElection)
> > 7 - manual leader balancing (PreferredReplicaLeaderElection)
> > 8 - controlled shutdown (ControlledShutdown)
> > 9 - isr change (IsrChangeNotification)
> >
> > For each state, we will add a corresponding timer to track the rate and
> the
> > latency, if it's not there already (e.g., broker change and controlled
> > shutdown). If there are future changes to the controller, we can make a
> > call whether the new event should be mapped to one of the existing states
> > or a new state.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, May 9, 2017 at 6:17 PM, Onur Karaman <
> onurkaraman.apa...@gmail.com
> > >
> > wrote:
> >
> > > @Ismael, Jun
> > > After bringing up an earlier point twice now, it still doesn't feel
> like
> > > it's been commented on/addressed, so I'm going to give it one more
> shot:
> > > Assuming that ControllerState should reflect the current event being
> > > processed, the KIP is missing states.
> > >
> > > The controller currently has 14 event types:
> > > BrokerChange
> > > TopicChange
> > > PartitionModifications
> > > TopicDeletion
> > > PartitionReassignment
> > > PartitionReassignmentIsrChange
> > > IsrChangeNotification
> > > PreferredReplicaLeaderElection
> > > AutoPreferredReplicaLeaderElection
> > > ControlledShutdown
> > > TopicDeletionStopReplicaResult
> > > Startup
> > > ControllerChange
> > > Reelect
> > >
> > > The KIP only shows 10 event types (and it's not a proper subset of the
> > > above set).
> > >
> > > I think this mismatch would cause the ControllerState to incorrectly be
> > in
> > > the Idle state when in fact the controller could be doing a lot of
> work.
> > >
> > > 1. Should ControllerState exactly consist of the 14 controller event
> > types
> > > + the 1 Idle state?
> > > 2. If so, what's the process for adding/removing/merging event types
> > > w.r.t. this metric?
> > >
> > > On Tue, May 9, 2017 at 4:45 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> > >
> > >> Becket,
> > >>
> > >> Are you OK with extending the metrics via a subsequent KIP (assuming
> > that
> > >> what we're doing here doesn't prevent that)? The KIP freeze is
> tomorrow
> > >> (although I will give it an extra day or two as many in the community
> > have
> > >> been attending the Kafka Summit this week), so we should avoid
> > increasing
> > >> the scope unless it's important for future improvements.
> > >>
> > >> Thanks,
> > >> Ismael
> > >>
> > >> On Wed, May 10, 2017 at 12:09 AM, Jun Rao <j...@confluent.io> wrote:
> > >>
> > >> > Hi, Becket,
> > >> >
> > >> > q10. The reason why there is not a timer metric for broker change
> > event
> > >> is
> > >> > that the controller currently always has a
> LeaderElectionRateAndTimeMs
> > >> > timer metric (in ControllerStats).
> > >> >
> > >> > q11. I agree that that it's useful to know the queue time in the
> > >> > controller event queue and suggested that earlier. Onur thinks that
> > >> it's a
> > >> > bit too early to add that since we are about to change how to queue
> > >> events
> > >> > from ZK. Similarly, we will probably also make changes to batch
> > requests
> > >> > from the controller to the broker. So, perhaps we can add more
> metrics
> > >> once
> > >> > those changes in the controller have been made. For now, knowing the
> > >> > controller state and the controller channel queue size is probably
> > good
> > >> > enough.
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jun
> > >> >
> > >> >
> > >> >
> > >> > On Mon, May 8, 2017 at 10:05 PM, Becket Qin <becket....@gmail.com>
> > >> wrote:
> > >> >
> > >> >> @Ismael,
> > >> >>
> > >> >> About the stage and event type. Yes, I think each event handling
> > should
> > >> >> have those stages covered. It is similar to what we are doing for
> the
> > >> >> requests on the broker side. We have benefited from such systematic
> > >> metric
> > >> >> structure a lot so I think it would be worth following the same way
> > in
> > >> the
> > >> >> controller.
> > >> >>
> > >> >> As an example, for BrokerChangeEvent (or any event), I am thinking
> we
> > >> >> would
> > >> >> have the following metrics:
> > >> >> 1. EventRate
> > >> >> 2. EventQueueTime : The event queue time
> > >> >> 3. EventHandlingTime: The event handling time (including zk path
> > >> updates)
> > >> >> 4. EventControllerChannelQueueTime: The queue time of the
> > >> corresponding
> > >> >> LeaderAndIsrRequest and UpdateMetadataRequest in the controller
> > channel
> > >> >> queue.
> > >> >> 5. EventControllerChannelSendTime: The time to send the
> > corresponding
> > >> >> requests, could be the total time for the corresponding
> > >> >> LeaderAndIsrRequest
> > >> >> and UpdateMetadataRequest. It is typically small, but in some cases
> > >> could
> > >> >> be slower than we expect.
> > >> >> 6. EventAckTime: The time waiting for all the corresponding
> > responses.
> > >> >> 7. EventHandlingTotalTime: sum of 2-6
> > >> >>
> > >> >> Note that all the metrics are from the event and cluster wide state
> > >> >> transition point of view, but not from a single request type point
> of
> > >> >> view.
> > >> >> Among the above metrics, 4,5,6 could potentially be per broker.
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> Jiangjie (Becket) Qin
> > >> >>
> > >> >> On Mon, May 8, 2017 at 7:24 PM, Onur Karaman <
> > >> >> onurkaraman.apa...@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >> > I had a similar comment to Becket but accidentally posted it on
> the
> > >> vote
> > >> >> > thread last Friday. From that thread:
> > >> >> > "I noticed that both the ControllerState metric and the
> > >> *RateAndTimeMs
> > >> >> > metrics
> > >> >> > only cover a subset of the controller event types. Was this
> > >> >> intentional?"
> > >> >> >
> > >> >> > I think it makes most sense to just have the ControllerState
> metric
> > >> and
> > >> >> > *RateAndTimeMs metrics exactly match up with the event types.
> > >> >> >
> > >> >> > On Mon, May 8, 2017 at 4:35 PM, Ismael Juma <ism...@juma.me.uk>
> > >> wrote:
> > >> >> >
> > >> >> > > Hi Becket,
> > >> >> > >
> > >> >> > > Thanks for the feedback. Comments inline.
> > >> >> > >
> > >> >> > > On Tue, May 9, 2017 at 12:19 AM, Becket Qin <
> > becket....@gmail.com>
> > >> >> > wrote:
> > >> >> > >>
> > >> >> > >> q10. With event loop based controller design, it seems
> natrural
> > to
> > >> >> have
> > >> >> > >> the
> > >> >> > >> processing time for each controller event type. In that case,
> > the
> > >> >> > current
> > >> >> > >> metrics seem not covering all the event processing time? e.g.
> > >> (broker
> > >> >> > >> joining and broker failure event handling time).
> > >> >> > >>
> > >> >> > >
> > >> >> > > I'll leave it to Jun to explain why some metrics were not
> > included
> > >> in
> > >> >> the
> > >> >> > > proposal.
> > >> >> > >
> > >> >> > > q11. There are actually a couple of stages for controller state
> > >> >> > transition
> > >> >> > >> and propagation. More specifically:
> > >> >> > >> Stage 1: Event queue time
> > >> >> > >> Stage 2: Event process time (including all the zk path
> updates)
> > >> >> > >> Stage 3: State propagation time (including the state
> propagation
> > >> >> queuing
> > >> >> > >> time on the controller and the actual request sent to response
> > >> >> receive
> > >> >> > >> time)
> > >> >> > >>
> > >> >> > >> I think it worth to have metrics for each of the stage. Stage
> 3
> > >> might
> > >> >> > be a
> > >> >> > >> little tricky because we may need to potentially manage the
> per
> > >> >> broker
> > >> >> > >> propagation time. Arguably the state propagation time is a
> > little
> > >> >> > >> overlapping with the broker side request handling time, but
> for
> > >> some
> > >> >> > state
> > >> >> > >> change that involves multiple types of requests, it could
> still
> > be
> > >> >> > useful
> > >> >> > >> to know what is the time for a state transition to be
> propagated
> > >> to
> > >> >> the
> > >> >> > >> entire cluster.
> > >> >> > >>
> > >> >> > >
> > >> >> > > Can you please elaborate on how you would like to see this
> > >> measured.
> > >> >> Do
> > >> >> > > you mean that each event would have a separate metric for each
> of
> > >> >> these
> > >> >> > > stages? Maybe a worked out example would make it clear.
> > >> >> > >
> > >> >> > > q11. Regarding (13), controller actually do not update the ISR
> > but
> > >> >> just
> > >> >> > >> read it. The error message seems from the brokers. It usually
> > >> >> happens in
> > >> >> > >> the following case:
> > >> >> > >> 1. Current leader broker has the cached ZNode version V
> > >> >> > >> 2. The controller elected a new leader and updated the ZNode,
> > now
> > >> >> ZNode
> > >> >> > >> version becomes V+1,
> > >> >> > >> 3. Controller sends the LeaderAndIsrRequest to the replica
> > >> brokers to
> > >> >> > >> propagate the new leader information as well as the new
> > zkVersion.
> > >> >> > >> 4. Before the old leader process the LeaderAndIsrRequest from
> > >> >> controller
> > >> >> > >> in
> > >> >> > >> step 3, The old leader tries to update ISR and found the
> cached
> > >> >> > zkVersion
> > >> >> > >> V
> > >> >> > >> is different from the actual zkVersion V+1.
> > >> >> > >>
> > >> >> > >> It looks that this is not a controller metric.
> > >> >> > >
> > >> >> > >
> > >> >> > > Yes, it's not a controller metric, that's why it's under the
> > >> >> "Partition
> > >> >> > > Metrics" section. In the PR, I actually implemented it in
> > >> >> ReplicaManager
> > >> >> > > alongside IsrExpandsPerSec and IsrShrinksPerSec.
> > >> >> > >
> > >> >> > > Thanks,
> > >> >> > > Ismael
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [DISCUSS] KIP-143: Controller Health Metrics

Reply via email to