Thanks for updating the KIP. LGTM.
-Matthias
On 4/4/18 12:46 PM, John Roesler wrote:
> Cool, if you're satisfied with the KIP now, maybe I can lobby for your vote
> ;)
>
> The vote thread is still at only one binding +1, I think.
>
> Thanks,
> -John
>
> On Tue, Apr 3, 2018 at 9:05 PM, Matthia
Cool, if you're satisfied with the KIP now, maybe I can lobby for your vote
;)
The vote thread is still at only one binding +1, I think.
Thanks,
-John
On Tue, Apr 3, 2018 at 9:05 PM, Matthias J. Sax
wrote:
> Sounds great!
>
> The cryptic topic names can be an issue -- however, people can
> `de
Sounds great!
The cryptic topic names can be an issue -- however, people can
`describe()` their topology to map the name to the corresponding
sub-topology/tasks to narrow the error down to the corresponding
operators. I think, this should be "sufficient for now" for debugging.
Renaming those top
Thanks John, your proposal looks fine to me.
I'll go ahead and look into the PR for more details myself.
Guozhang
On Tue, Apr 3, 2018 at 1:35 PM, Bill Bejeck wrote:
> Hi John,
>
> Thanks for making the updates.
>
> I agree with the information you've included in the logs as described
> above,
Hi John,
Thanks for making the updates.
I agree with the information you've included in the logs as described
above, as log statements without enough context/information can be
frustrating.
-Bill
On Tue, Apr 3, 2018 at 3:29 PM, John Roesler wrote:
> Allrighty, how about this, then...
>
> I'll
Allrighty, how about this, then...
I'll move the metric back to the StreamThread and maintain the existing tag
(client-id=...(per-thread client-id)). It won't be present in the
TopologyTestDriver's metrics.
As a side note, I'm not sure that the location of the log messages has
visibility into the
1. If we can indeed gather all the context information from the log4j
entries I'd suggest we change to thread-level (I'm not sure if that is
doable, so if John have already some WIP PR that can help us decide).
2. We can consider adding the API in TopologyTestDriver for general testing
purposes; t
I am fine adding `TopologyTestDriver#metrics()`. It should be helpful to
test custom metrics users can implement.
-Matthias
On 4/3/18 11:35 AM, John Roesler wrote:
> Oh, sorry, I missed the point.
>
> Yeah, we can totally do that. The reason to move it to the task level was
> mainly to make it
Thanks Guozhang, that was my intent.
@John: yes, we should not nail down the exact log message. It's just to
point out the trade-off. If we can get the required information in the
logs, we might not need task level metrics.
-Matthias
On 4/3/18 11:26 AM, Guozhang Wang wrote:
> I think Matthias'
Oh, sorry, I missed the point.
Yeah, we can totally do that. The reason to move it to the task level was
mainly to make it available for the metrics in TopologyTestDriver as well.
But if we decide that's a non-goal, then there's no motivation to change it.
And actually that reminds me that we do
I think Matthias' comment is that, we can still record the metrics on the
thread-level, while having the WARN log entry to include sufficient context
information so that users can still easily narrow down the investigation
scope.
Guozhang
On Tue, Apr 3, 2018 at 11:22 AM, John Roesler wrote:
>
I agree we should add as much information as is reasonable to the log. For
example, see this WIP PR I started for this KIP:
https://github.com/apache/kafka/pull/4812/files#diff-88d129f048bc842c7db5b2566a45fce8R80
and
https://github.com/apache/kafka/pull/4812/files#diff-69e6789eb675ec978a1abd24fe
Thanks for sharing your thoughts. As I mentioned originally, I am not
sure about the right log level either. Your arguments are convincing --
thus, I am fine with keeping WARN level.
The task vs thread level argument is an interesting one. However, I am
wondering if we should add this information
Regarding logging: I'm inclined to keep logging at WARN level since skipped
records are not expected in normal execution (for all reasons that we are
aware of), and hence when error happens users should be alerted from
metrics and looked into the log files, so to me if it is really spamming
the log
Hi Matthias,
No worries! Thanks for the reply.
1) There isn't a connection. I tried using the TopologyTestDriver to write
a quick test exercising the current behavior and discovered that the
metrics weren't available. It seemed like they should be, so I tacked it on
to this KIP. If you feel it's
John,
sorry for my late reply and thanks for updating the KIP.
I like your approach about "metrics are for monitoring, logs are for
debugging" -- however:
1) I don't see a connection between this and the task-level metrics that
you propose to get the metrics in `TopologyTestDriver`. I don't thin
Allrighty! The KIP is updated.
Thanks again, all, for the feedback.
-John
On Fri, Mar 30, 2018 at 3:35 PM, John Roesler wrote:
> Hey Guozhang and Bill,
>
> Ok, I'll update the KIP. At the risk of disturbing consensus, I'd like to
> put it in the task instead of the thread so that it'll show up
Hey Guozhang and Bill,
Ok, I'll update the KIP. At the risk of disturbing consensus, I'd like to
put it in the task instead of the thread so that it'll show up in the
TopologyTestDriver metrics as well.
I'm leaning toward keeping the scope where it is right now, but if others
want to advocate for
Thanks for the KIP John, and sorry for the late comments.
I'm on the fence with providing a single level metrics, but I think we'll
have that discussion outside of this KIP.
> * maintain one skipped-record metric (could be per-thread, per-task, or
> per-processor-node) with no "reason"
> * introd
The proposal sounds good to me. About "maintain only one level of metrics"
maybe we can discuss about that separately from this KIP since that would
be a larger scope of discussion. I agree that if we are going to maintain
only one-level metrics it should be lowest level and we would let users to
d
Hey Guozhang,
Thanks for the reply. Regarding JMX, I can dig it. I'll provide a list in
the KIP. I was also thinking we'd better start a documentation page with
the metrics listed.
I'd have no problem logging a warning when we skip records. On the metric
front, really I'm just pushing for us to m
> One thing you mention is the notion of setting alerts on coarser metrics
being easier than finer ones. All the metric alerting systems I have used
make it equally easy to alert on metrics by-tag or over tags. So my
experience doesn't say that this is a use case. Were you thinking of an
alerting s
Hey Guozhang,
Thanks for the review.
1.
Matthias raised the same question about the "reason" tag values. I can list
all possible values of the "reason" tag, but I'm thinking this level of
detail may not be KIP-worthy, maybe the code and documentation review would
be sufficient. If you all disagre
Hi Matthias,
Thanks for the review.
Regarding the "reason" tags, I listed "deserialization-error" and
"negative-timestamp" to capture the two skips we already account for. More
generally, I would like to establish a pattern in which we could add new
values for the "reason" tags without needing a
Hello John,
Thanks for the KIP. Some comments:
1. Could you list all the possible values of the "reason" tag? In the JIRA
ticket I left some potential reasons but I'm not clear if you're going to
categorize each of them as a separate reason, or is there any additional
ones you have in mind.
Also
Thanks for the KIP John.
Reading the material on the related Jiras, I am wondering what `reason`
tags you want to introduce? Can you elaborate? The KIP should list those
IMHO.
About the fine grained metrics vs the roll-up: you say that
> the coarse metric aggregates across two dimensions simulta
Looks good to me.
On Wed, Mar 28, 2018 at 3:11 PM, John Roesler wrote:
> Hello all,
>
> I am proposing KIP-274 to improve the metrics around skipped records in
> Streams.
>
> Please find the details here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 274%3A+Kafka+Streams+Skipped+Rec
Hello all,
I am proposing KIP-274 to improve the metrics around skipped records in
Streams.
Please find the details here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-274%3A+Kafka+Streams+Skipped+Records+Metrics
Please let me know what you think!
Thanks,
-John
28 matches
Mail list logo