Hi Chris,

Good KIP – I think it will be very helpful in alerting and automating the 
resolution of common Connect problems.

I have a couple of questions / suggestions:

1. What are you planning on documenting as guidance for using this new 
endpoint? My guess would be that if Connect doesn’t return a status of 200 
after some period I would either page someone, or restart the process? But I’m 
missing the nuance of distinguishing between readiness and liveness, is this 
for maintaining overall availability when rolling out updates to several 
Connect processes?

2. Would you consider providing a way to discover details about exactly which 
condition (or conditions) is/are failing? Perhaps by providing a richer JSON 
response? Something like this would help us adopt the health check, as we could 
start by paging someone for all failures, then over time (as we gained 
confidence) transition more of the conditions over to being handled by 
automation.

Regards,
- Adrian


From: Chris Egerton <chr...@aiven.io.INVALID>
Date: Monday, 10 June 2024 at 15:26
To: dev@kafka.apache.org <dev@kafka.apache.org>
Subject: [EXTERNAL] Re: [DISCUSS] KIP-1017: A health check endpoint for Kafka 
Connect
Hi all,

Thanks for the positive feedback!

I've made one small addition to the KIP since there's been a change to our
REST timeout error messages that's worth incorporating here. Quoting the
added section directly:

> Note that the HTTP status codes and "status" fields in the JSON response
will match the exact examples above. However, the "message" field may be
augmented to include, among other things, more information about the
operation(s) the worker could be blocked on (such as was added in REST
timeout error messages in KAFKA-15563 [1]).

Assuming this still looks okay to everyone, I'll kick off a vote thread
sometime this week or the next.

[1] - https://issues.apache.org/jira/browse/KAFKA-15563

Cheers,

Chris

On Fri, Jun 7, 2024 at 12:01 PM Andrew Schofield <andrew_schofi...@live.com>
wrote:

> Hi Chris,
> This KIP looks good to me. I particularly like the explanation of how the
> result will specifically
> check the worker health in ways that have previously caused trouble.
>
> Thanks,
> Andrew
>
> > On 7 Jun 2024, at 16:18, Mickael Maison <mickael.mai...@gmail.com>
> wrote:
> >
> > Hi Chris,
> >
> > Happy Friday! The KIP looks good to me. +1
> >
> > Thanks,
> > Mickael
> >
> > On Fri, Jan 26, 2024 at 8:41 PM Chris Egerton <chr...@aiven.io.invalid>
> wrote:
> >>
> >> Hi all,
> >>
> >> Happy Friday! I'd like to kick off discussion for KIP-1017, which (as
> the
> >> title suggests) proposes adding a health check endpoint for Kafka
> Connect:
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1017%3A+Health+check+endpoint+for+Kafka+Connect
> >>
> >> This is one of the longest-standing issues with Kafka Connect and I'm
> >> hoping we can finally put it in the ground soon. Looking forward to
> hearing
> >> people's thoughts!
> >>
> >> Cheers,
> >>
> >> Chris
>
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Reply via email to