@Ismael, Jun After bringing up an earlier point twice now, it still doesn't feel like it's been commented on/addressed, so I'm going to give it one more shot: Assuming that ControllerState should reflect the current event being processed, the KIP is missing states.
The controller currently has 14 event types: BrokerChange TopicChange PartitionModifications TopicDeletion PartitionReassignment PartitionReassignmentIsrChange IsrChangeNotification PreferredReplicaLeaderElection AutoPreferredReplicaLeaderElection ControlledShutdown TopicDeletionStopReplicaResult Startup ControllerChange Reelect The KIP only shows 10 event types (and it's not a proper subset of the above set). I think this mismatch would cause the ControllerState to incorrectly be in the Idle state when in fact the controller could be doing a lot of work. 1. Should ControllerState exactly consist of the 14 controller event types + the 1 Idle state? 2. If so, what's the process for adding/removing/merging event types w.r.t. this metric? On Tue, May 9, 2017 at 4:45 PM, Ismael Juma <ism...@juma.me.uk> wrote: > Becket, > > Are you OK with extending the metrics via a subsequent KIP (assuming that > what we're doing here doesn't prevent that)? The KIP freeze is tomorrow > (although I will give it an extra day or two as many in the community have > been attending the Kafka Summit this week), so we should avoid increasing > the scope unless it's important for future improvements. > > Thanks, > Ismael > > On Wed, May 10, 2017 at 12:09 AM, Jun Rao <j...@confluent.io> wrote: > > > Hi, Becket, > > > > q10. The reason why there is not a timer metric for broker change event > is > > that the controller currently always has a LeaderElectionRateAndTimeMs > > timer metric (in ControllerStats). > > > > q11. I agree that that it's useful to know the queue time in the > > controller event queue and suggested that earlier. Onur thinks that it's > a > > bit too early to add that since we are about to change how to queue > events > > from ZK. Similarly, we will probably also make changes to batch requests > > from the controller to the broker. So, perhaps we can add more metrics > once > > those changes in the controller have been made. For now, knowing the > > controller state and the controller channel queue size is probably good > > enough. > > > > Thanks, > > > > Jun > > > > > > > > On Mon, May 8, 2017 at 10:05 PM, Becket Qin <becket....@gmail.com> > wrote: > > > >> @Ismael, > >> > >> About the stage and event type. Yes, I think each event handling should > >> have those stages covered. It is similar to what we are doing for the > >> requests on the broker side. We have benefited from such systematic > metric > >> structure a lot so I think it would be worth following the same way in > the > >> controller. > >> > >> As an example, for BrokerChangeEvent (or any event), I am thinking we > >> would > >> have the following metrics: > >> 1. EventRate > >> 2. EventQueueTime : The event queue time > >> 3. EventHandlingTime: The event handling time (including zk path > updates) > >> 4. EventControllerChannelQueueTime: The queue time of the corresponding > >> LeaderAndIsrRequest and UpdateMetadataRequest in the controller channel > >> queue. > >> 5. EventControllerChannelSendTime: The time to send the corresponding > >> requests, could be the total time for the corresponding > >> LeaderAndIsrRequest > >> and UpdateMetadataRequest. It is typically small, but in some cases > could > >> be slower than we expect. > >> 6. EventAckTime: The time waiting for all the corresponding responses. > >> 7. EventHandlingTotalTime: sum of 2-6 > >> > >> Note that all the metrics are from the event and cluster wide state > >> transition point of view, but not from a single request type point of > >> view. > >> Among the above metrics, 4,5,6 could potentially be per broker. > >> > >> Thanks, > >> > >> Jiangjie (Becket) Qin > >> > >> On Mon, May 8, 2017 at 7:24 PM, Onur Karaman < > >> onurkaraman.apa...@gmail.com> > >> wrote: > >> > >> > I had a similar comment to Becket but accidentally posted it on the > vote > >> > thread last Friday. From that thread: > >> > "I noticed that both the ControllerState metric and the *RateAndTimeMs > >> > metrics > >> > only cover a subset of the controller event types. Was this > >> intentional?" > >> > > >> > I think it makes most sense to just have the ControllerState metric > and > >> > *RateAndTimeMs metrics exactly match up with the event types. > >> > > >> > On Mon, May 8, 2017 at 4:35 PM, Ismael Juma <ism...@juma.me.uk> > wrote: > >> > > >> > > Hi Becket, > >> > > > >> > > Thanks for the feedback. Comments inline. > >> > > > >> > > On Tue, May 9, 2017 at 12:19 AM, Becket Qin <becket....@gmail.com> > >> > wrote: > >> > >> > >> > >> q10. With event loop based controller design, it seems natrural to > >> have > >> > >> the > >> > >> processing time for each controller event type. In that case, the > >> > current > >> > >> metrics seem not covering all the event processing time? e.g. > (broker > >> > >> joining and broker failure event handling time). > >> > >> > >> > > > >> > > I'll leave it to Jun to explain why some metrics were not included > in > >> the > >> > > proposal. > >> > > > >> > > q11. There are actually a couple of stages for controller state > >> > transition > >> > >> and propagation. More specifically: > >> > >> Stage 1: Event queue time > >> > >> Stage 2: Event process time (including all the zk path updates) > >> > >> Stage 3: State propagation time (including the state propagation > >> queuing > >> > >> time on the controller and the actual request sent to response > >> receive > >> > >> time) > >> > >> > >> > >> I think it worth to have metrics for each of the stage. Stage 3 > might > >> > be a > >> > >> little tricky because we may need to potentially manage the per > >> broker > >> > >> propagation time. Arguably the state propagation time is a little > >> > >> overlapping with the broker side request handling time, but for > some > >> > state > >> > >> change that involves multiple types of requests, it could still be > >> > useful > >> > >> to know what is the time for a state transition to be propagated to > >> the > >> > >> entire cluster. > >> > >> > >> > > > >> > > Can you please elaborate on how you would like to see this measured. > >> Do > >> > > you mean that each event would have a separate metric for each of > >> these > >> > > stages? Maybe a worked out example would make it clear. > >> > > > >> > > q11. Regarding (13), controller actually do not update the ISR but > >> just > >> > >> read it. The error message seems from the brokers. It usually > >> happens in > >> > >> the following case: > >> > >> 1. Current leader broker has the cached ZNode version V > >> > >> 2. The controller elected a new leader and updated the ZNode, now > >> ZNode > >> > >> version becomes V+1, > >> > >> 3. Controller sends the LeaderAndIsrRequest to the replica brokers > to > >> > >> propagate the new leader information as well as the new zkVersion. > >> > >> 4. Before the old leader process the LeaderAndIsrRequest from > >> controller > >> > >> in > >> > >> step 3, The old leader tries to update ISR and found the cached > >> > zkVersion > >> > >> V > >> > >> is different from the actual zkVersion V+1. > >> > >> > >> > >> It looks that this is not a controller metric. > >> > > > >> > > > >> > > Yes, it's not a controller metric, that's why it's under the > >> "Partition > >> > > Metrics" section. In the PR, I actually implemented it in > >> ReplicaManager > >> > > alongside IsrExpandsPerSec and IsrShrinksPerSec. > >> > > > >> > > Thanks, > >> > > Ismael > >> > > > >> > > >> > > > > >