Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-04 Thread Matthias J. Sax
Thanks for updating the KIP. LGTM. -Matthias On 4/4/18 12:46 PM, John Roesler wrote: > Cool, if you're satisfied with the KIP now, maybe I can lobby for your vote > ;) > > The vote thread is still at only one binding +1, I think. > > Thanks, > -John > > On Tue, Apr 3, 2018 at 9:05 PM, Matthia

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-04 Thread John Roesler
Cool, if you're satisfied with the KIP now, maybe I can lobby for your vote ;) The vote thread is still at only one binding +1, I think. Thanks, -John On Tue, Apr 3, 2018 at 9:05 PM, Matthias J. Sax wrote: > Sounds great! > > The cryptic topic names can be an issue -- however, people can > `de

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Matthias J. Sax
Sounds great! The cryptic topic names can be an issue -- however, people can `describe()` their topology to map the name to the corresponding sub-topology/tasks to narrow the error down to the corresponding operators. I think, this should be "sufficient for now" for debugging. Renaming those top

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Guozhang Wang
Thanks John, your proposal looks fine to me. I'll go ahead and look into the PR for more details myself. Guozhang On Tue, Apr 3, 2018 at 1:35 PM, Bill Bejeck wrote: > Hi John, > > Thanks for making the updates. > > I agree with the information you've included in the logs as described > above,

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Bill Bejeck
Hi John, Thanks for making the updates. I agree with the information you've included in the logs as described above, as log statements without enough context/information can be frustrating. -Bill On Tue, Apr 3, 2018 at 3:29 PM, John Roesler wrote: > Allrighty, how about this, then... > > I'll

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread John Roesler
Allrighty, how about this, then... I'll move the metric back to the StreamThread and maintain the existing tag (client-id=...(per-thread client-id)). It won't be present in the TopologyTestDriver's metrics. As a side note, I'm not sure that the location of the log messages has visibility into the

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Guozhang Wang
1. If we can indeed gather all the context information from the log4j entries I'd suggest we change to thread-level (I'm not sure if that is doable, so if John have already some WIP PR that can help us decide). 2. We can consider adding the API in TopologyTestDriver for general testing purposes; t

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Matthias J. Sax
I am fine adding `TopologyTestDriver#metrics()`. It should be helpful to test custom metrics users can implement. -Matthias On 4/3/18 11:35 AM, John Roesler wrote: > Oh, sorry, I missed the point. > > Yeah, we can totally do that. The reason to move it to the task level was > mainly to make it

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Matthias J. Sax
Thanks Guozhang, that was my intent. @John: yes, we should not nail down the exact log message. It's just to point out the trade-off. If we can get the required information in the logs, we might not need task level metrics. -Matthias On 4/3/18 11:26 AM, Guozhang Wang wrote: > I think Matthias'

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread John Roesler
Oh, sorry, I missed the point. Yeah, we can totally do that. The reason to move it to the task level was mainly to make it available for the metrics in TopologyTestDriver as well. But if we decide that's a non-goal, then there's no motivation to change it. And actually that reminds me that we do

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread Guozhang Wang
I think Matthias' comment is that, we can still record the metrics on the thread-level, while having the WARN log entry to include sufficient context information so that users can still easily narrow down the investigation scope. Guozhang On Tue, Apr 3, 2018 at 11:22 AM, John Roesler wrote: >

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-03 Thread John Roesler
I agree we should add as much information as is reasonable to the log. For example, see this WIP PR I started for this KIP: https://github.com/apache/kafka/pull/4812/files#diff-88d129f048bc842c7db5b2566a45fce8R80 and https://github.com/apache/kafka/pull/4812/files#diff-69e6789eb675ec978a1abd24fe

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-02 Thread Matthias J. Sax
Thanks for sharing your thoughts. As I mentioned originally, I am not sure about the right log level either. Your arguments are convincing -- thus, I am fine with keeping WARN level. The task vs thread level argument is an interesting one. However, I am wondering if we should add this information

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-02 Thread Guozhang Wang
Regarding logging: I'm inclined to keep logging at WARN level since skipped records are not expected in normal execution (for all reasons that we are aware of), and hence when error happens users should be alerted from metrics and looked into the log files, so to me if it is really spamming the log

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-02 Thread John Roesler
Hi Matthias, No worries! Thanks for the reply. 1) There isn't a connection. I tried using the TopologyTestDriver to write a quick test exercising the current behavior and discovered that the metrics weren't available. It seemed like they should be, so I tacked it on to this KIP. If you feel it's

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-04-02 Thread Matthias J. Sax
John, sorry for my late reply and thanks for updating the KIP. I like your approach about "metrics are for monitoring, logs are for debugging" -- however: 1) I don't see a connection between this and the task-level metrics that you propose to get the metrics in `TopologyTestDriver`. I don't thin

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-30 Thread John Roesler
Allrighty! The KIP is updated. Thanks again, all, for the feedback. -John On Fri, Mar 30, 2018 at 3:35 PM, John Roesler wrote: > Hey Guozhang and Bill, > > Ok, I'll update the KIP. At the risk of disturbing consensus, I'd like to > put it in the task instead of the thread so that it'll show up

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-30 Thread John Roesler
Hey Guozhang and Bill, Ok, I'll update the KIP. At the risk of disturbing consensus, I'd like to put it in the task instead of the thread so that it'll show up in the TopologyTestDriver metrics as well. I'm leaning toward keeping the scope where it is right now, but if others want to advocate for

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-30 Thread Bill Bejeck
Thanks for the KIP John, and sorry for the late comments. I'm on the fence with providing a single level metrics, but I think we'll have that discussion outside of this KIP. > * maintain one skipped-record metric (could be per-thread, per-task, or > per-processor-node) with no "reason" > * introd

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-30 Thread Guozhang Wang
The proposal sounds good to me. About "maintain only one level of metrics" maybe we can discuss about that separately from this KIP since that would be a larger scope of discussion. I agree that if we are going to maintain only one-level metrics it should be lowest level and we would let users to d

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-30 Thread John Roesler
Hey Guozhang, Thanks for the reply. Regarding JMX, I can dig it. I'll provide a list in the KIP. I was also thinking we'd better start a documentation page with the metrics listed. I'd have no problem logging a warning when we skip records. On the metric front, really I'm just pushing for us to m

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-29 Thread Guozhang Wang
> One thing you mention is the notion of setting alerts on coarser metrics being easier than finer ones. All the metric alerting systems I have used make it equally easy to alert on metrics by-tag or over tags. So my experience doesn't say that this is a use case. Were you thinking of an alerting s

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-29 Thread John Roesler
Hey Guozhang, Thanks for the review. 1. Matthias raised the same question about the "reason" tag values. I can list all possible values of the "reason" tag, but I'm thinking this level of detail may not be KIP-worthy, maybe the code and documentation review would be sufficient. If you all disagre

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-29 Thread John Roesler
Hi Matthias, Thanks for the review. Regarding the "reason" tags, I listed "deserialization-error" and "negative-timestamp" to capture the two skips we already account for. More generally, I would like to establish a pattern in which we could add new values for the "reason" tags without needing a

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-29 Thread Guozhang Wang
Hello John, Thanks for the KIP. Some comments: 1. Could you list all the possible values of the "reason" tag? In the JIRA ticket I left some potential reasons but I'm not clear if you're going to categorize each of them as a separate reason, or is there any additional ones you have in mind. Also

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-29 Thread Matthias J. Sax
Thanks for the KIP John. Reading the material on the related Jiras, I am wondering what `reason` tags you want to introduce? Can you elaborate? The KIP should list those IMHO. About the fine grained metrics vs the roll-up: you say that > the coarse metric aggregates across two dimensions simulta

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-28 Thread Ted Yu
Looks good to me. On Wed, Mar 28, 2018 at 3:11 PM, John Roesler wrote: > Hello all, > > I am proposing KIP-274 to improve the metrics around skipped records in > Streams. > > Please find the details here: > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 274%3A+Kafka+Streams+Skipped+Rec

[DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-28 Thread John Roesler
Hello all, I am proposing KIP-274 to improve the metrics around skipped records in Streams. Please find the details here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-274%3A+Kafka+Streams+Skipped+Records+Metrics Please let me know what you think! Thanks, -John