Hi Ismael, Yes, a follow up KIP after the controller code settles down sounds good.
Thanks, Jiangjie (Becket) Qin On Wed, May 10, 2017 at 6:11 PM, Ismael Juma <ism...@juma.me.uk> wrote: > Thanks Jun. Discussed this offline with Onur and Jun and I believe there's > agreement so updated the KIP: > > https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action? > pageId=69407758&selectedPageVersions=8&selectedPageVersions=7 > > Ismael > > On Wed, May 10, 2017 at 4:46 PM, Jun Rao <j...@confluent.io> wrote: > > > Hi, Onur, > > > > We probably don't want to do the 1-to-1 mapping from the event type to > the > > controller state since some of the event types are implementation > details. > > How about the following mapping? > > > > 0 - idle > > 1 - controller change (Startup, ControllerChange, Reelect) > > 2 - broker change (BrokerChange) > > 3 - topic creation/change (TopicChange, PartitionModifications) > > 4 - topic deletion (TopicDeletion, TopicDeletionStopReplicaResult) > > 5 - partition reassigning (PartitionReassignment, > > PartitionReassignmentIsrChange) > > 6 - auto leader balancing (AutoPreferredReplicaLeaderElection) > > 7 - manual leader balancing (PreferredReplicaLeaderElection) > > 8 - controlled shutdown (ControlledShutdown) > > 9 - isr change (IsrChangeNotification) > > > > For each state, we will add a corresponding timer to track the rate and > the > > latency, if it's not there already (e.g., broker change and controlled > > shutdown). If there are future changes to the controller, we can make a > > call whether the new event should be mapped to one of the existing states > > or a new state. > > > > Thanks, > > > > Jun > > > > On Tue, May 9, 2017 at 6:17 PM, Onur Karaman < > onurkaraman.apa...@gmail.com > > > > > wrote: > > > > > @Ismael, Jun > > > After bringing up an earlier point twice now, it still doesn't feel > like > > > it's been commented on/addressed, so I'm going to give it one more > shot: > > > Assuming that ControllerState should reflect the current event being > > > processed, the KIP is missing states. > > > > > > The controller currently has 14 event types: > > > BrokerChange > > > TopicChange > > > PartitionModifications > > > TopicDeletion > > > PartitionReassignment > > > PartitionReassignmentIsrChange > > > IsrChangeNotification > > > PreferredReplicaLeaderElection > > > AutoPreferredReplicaLeaderElection > > > ControlledShutdown > > > TopicDeletionStopReplicaResult > > > Startup > > > ControllerChange > > > Reelect > > > > > > The KIP only shows 10 event types (and it's not a proper subset of the > > > above set). > > > > > > I think this mismatch would cause the ControllerState to incorrectly be > > in > > > the Idle state when in fact the controller could be doing a lot of > work. > > > > > > 1. Should ControllerState exactly consist of the 14 controller event > > types > > > + the 1 Idle state? > > > 2. If so, what's the process for adding/removing/merging event types > > > w.r.t. this metric? > > > > > > On Tue, May 9, 2017 at 4:45 PM, Ismael Juma <ism...@juma.me.uk> wrote: > > > > > >> Becket, > > >> > > >> Are you OK with extending the metrics via a subsequent KIP (assuming > > that > > >> what we're doing here doesn't prevent that)? The KIP freeze is > tomorrow > > >> (although I will give it an extra day or two as many in the community > > have > > >> been attending the Kafka Summit this week), so we should avoid > > increasing > > >> the scope unless it's important for future improvements. > > >> > > >> Thanks, > > >> Ismael > > >> > > >> On Wed, May 10, 2017 at 12:09 AM, Jun Rao <j...@confluent.io> wrote: > > >> > > >> > Hi, Becket, > > >> > > > >> > q10. The reason why there is not a timer metric for broker change > > event > > >> is > > >> > that the controller currently always has a > LeaderElectionRateAndTimeMs > > >> > timer metric (in ControllerStats). > > >> > > > >> > q11. I agree that that it's useful to know the queue time in the > > >> > controller event queue and suggested that earlier. Onur thinks that > > >> it's a > > >> > bit too early to add that since we are about to change how to queue > > >> events > > >> > from ZK. Similarly, we will probably also make changes to batch > > requests > > >> > from the controller to the broker. So, perhaps we can add more > metrics > > >> once > > >> > those changes in the controller have been made. For now, knowing the > > >> > controller state and the controller channel queue size is probably > > good > > >> > enough. > > >> > > > >> > Thanks, > > >> > > > >> > Jun > > >> > > > >> > > > >> > > > >> > On Mon, May 8, 2017 at 10:05 PM, Becket Qin <becket....@gmail.com> > > >> wrote: > > >> > > > >> >> @Ismael, > > >> >> > > >> >> About the stage and event type. Yes, I think each event handling > > should > > >> >> have those stages covered. It is similar to what we are doing for > the > > >> >> requests on the broker side. We have benefited from such systematic > > >> metric > > >> >> structure a lot so I think it would be worth following the same way > > in > > >> the > > >> >> controller. > > >> >> > > >> >> As an example, for BrokerChangeEvent (or any event), I am thinking > we > > >> >> would > > >> >> have the following metrics: > > >> >> 1. EventRate > > >> >> 2. EventQueueTime : The event queue time > > >> >> 3. EventHandlingTime: The event handling time (including zk path > > >> updates) > > >> >> 4. EventControllerChannelQueueTime: The queue time of the > > >> corresponding > > >> >> LeaderAndIsrRequest and UpdateMetadataRequest in the controller > > channel > > >> >> queue. > > >> >> 5. EventControllerChannelSendTime: The time to send the > > corresponding > > >> >> requests, could be the total time for the corresponding > > >> >> LeaderAndIsrRequest > > >> >> and UpdateMetadataRequest. It is typically small, but in some cases > > >> could > > >> >> be slower than we expect. > > >> >> 6. EventAckTime: The time waiting for all the corresponding > > responses. > > >> >> 7. EventHandlingTotalTime: sum of 2-6 > > >> >> > > >> >> Note that all the metrics are from the event and cluster wide state > > >> >> transition point of view, but not from a single request type point > of > > >> >> view. > > >> >> Among the above metrics, 4,5,6 could potentially be per broker. > > >> >> > > >> >> Thanks, > > >> >> > > >> >> Jiangjie (Becket) Qin > > >> >> > > >> >> On Mon, May 8, 2017 at 7:24 PM, Onur Karaman < > > >> >> onurkaraman.apa...@gmail.com> > > >> >> wrote: > > >> >> > > >> >> > I had a similar comment to Becket but accidentally posted it on > the > > >> vote > > >> >> > thread last Friday. From that thread: > > >> >> > "I noticed that both the ControllerState metric and the > > >> *RateAndTimeMs > > >> >> > metrics > > >> >> > only cover a subset of the controller event types. Was this > > >> >> intentional?" > > >> >> > > > >> >> > I think it makes most sense to just have the ControllerState > metric > > >> and > > >> >> > *RateAndTimeMs metrics exactly match up with the event types. > > >> >> > > > >> >> > On Mon, May 8, 2017 at 4:35 PM, Ismael Juma <ism...@juma.me.uk> > > >> wrote: > > >> >> > > > >> >> > > Hi Becket, > > >> >> > > > > >> >> > > Thanks for the feedback. Comments inline. > > >> >> > > > > >> >> > > On Tue, May 9, 2017 at 12:19 AM, Becket Qin < > > becket....@gmail.com> > > >> >> > wrote: > > >> >> > >> > > >> >> > >> q10. With event loop based controller design, it seems > natrural > > to > > >> >> have > > >> >> > >> the > > >> >> > >> processing time for each controller event type. In that case, > > the > > >> >> > current > > >> >> > >> metrics seem not covering all the event processing time? e.g. > > >> (broker > > >> >> > >> joining and broker failure event handling time). > > >> >> > >> > > >> >> > > > > >> >> > > I'll leave it to Jun to explain why some metrics were not > > included > > >> in > > >> >> the > > >> >> > > proposal. > > >> >> > > > > >> >> > > q11. There are actually a couple of stages for controller state > > >> >> > transition > > >> >> > >> and propagation. More specifically: > > >> >> > >> Stage 1: Event queue time > > >> >> > >> Stage 2: Event process time (including all the zk path > updates) > > >> >> > >> Stage 3: State propagation time (including the state > propagation > > >> >> queuing > > >> >> > >> time on the controller and the actual request sent to response > > >> >> receive > > >> >> > >> time) > > >> >> > >> > > >> >> > >> I think it worth to have metrics for each of the stage. Stage > 3 > > >> might > > >> >> > be a > > >> >> > >> little tricky because we may need to potentially manage the > per > > >> >> broker > > >> >> > >> propagation time. Arguably the state propagation time is a > > little > > >> >> > >> overlapping with the broker side request handling time, but > for > > >> some > > >> >> > state > > >> >> > >> change that involves multiple types of requests, it could > still > > be > > >> >> > useful > > >> >> > >> to know what is the time for a state transition to be > propagated > > >> to > > >> >> the > > >> >> > >> entire cluster. > > >> >> > >> > > >> >> > > > > >> >> > > Can you please elaborate on how you would like to see this > > >> measured. > > >> >> Do > > >> >> > > you mean that each event would have a separate metric for each > of > > >> >> these > > >> >> > > stages? Maybe a worked out example would make it clear. > > >> >> > > > > >> >> > > q11. Regarding (13), controller actually do not update the ISR > > but > > >> >> just > > >> >> > >> read it. The error message seems from the brokers. It usually > > >> >> happens in > > >> >> > >> the following case: > > >> >> > >> 1. Current leader broker has the cached ZNode version V > > >> >> > >> 2. The controller elected a new leader and updated the ZNode, > > now > > >> >> ZNode > > >> >> > >> version becomes V+1, > > >> >> > >> 3. Controller sends the LeaderAndIsrRequest to the replica > > >> brokers to > > >> >> > >> propagate the new leader information as well as the new > > zkVersion. > > >> >> > >> 4. Before the old leader process the LeaderAndIsrRequest from > > >> >> controller > > >> >> > >> in > > >> >> > >> step 3, The old leader tries to update ISR and found the > cached > > >> >> > zkVersion > > >> >> > >> V > > >> >> > >> is different from the actual zkVersion V+1. > > >> >> > >> > > >> >> > >> It looks that this is not a controller metric. > > >> >> > > > > >> >> > > > > >> >> > > Yes, it's not a controller metric, that's why it's under the > > >> >> "Partition > > >> >> > > Metrics" section. In the PR, I actually implemented it in > > >> >> ReplicaManager > > >> >> > > alongside IsrExpandsPerSec and IsrShrinksPerSec. > > >> >> > > > > >> >> > > Thanks, > > >> >> > > Ismael > > >> >> > > > > >> >> > > > >> >> > > >> > > > >> > > > >> > > > > > > > > >