Re: [DISCUSS] KIP-1097: error record reporter

Anton Liauchuk Thu, 13 Feb 2025 08:33:40 -0800

Bumping the thread

On Wed, Jan 29, 2025 at 5:50 PM Anton Liauchuk <anton93...@gmail.com> wrote:


> Bumping the thread
>
>
> On Mon, Jan 13, 2025 at 2:18 PM Anton Liauchuk <anton93...@gmail.com>
> wrote:
>
>> Trying to bump once more
>>
>> On Wed, Dec 11, 2024 at 4:02 PM Anton Liauchuk <anton93...@gmail.com>
>> wrote:
>>
>>> Bumping the thread
>>>
>>> On Sun, Dec 1, 2024 at 8:33 AM Anton Liauchuk <anton93...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>> Thank you for your feedback.
>>>> I have numbered the questions to simplify communication.
>>>>
>>>> 1. What sort of implementation do you have in mind for this interface?
>>>>> What use-case does this interface enable that is not possible with log
>>>>> scraping, or implementing a source-connector DLQ to Kafka?
>>>>
>>>> I have a use case where source connectors need to send metrics and logs
>>>> to a custom Kafka topic. Although it's possible to use a log reporter to
>>>> extract the required information from logs, there are several limitations
>>>> to consider:
>>>> - It depends on the log format used in `*kafka-runtime*`.
>>>> - A pluggable interface provides greater flexibility for defining
>>>> custom behavior.
>>>> - The API will have better support in future releases of `
>>>> *kafka-connect*`.
>>>>
>>>> 2. Could you add the ErrorContext class to your public API description?
>>>>> I don't think that is an existing interface. Also please specify the
>>>>> package/fully qualified names for these classes.
>>>>
>>>> added, thank you!
>>>>
>>>> 3. How do you expect this will interact with the existing log and DLQ
>>>>> reporters? Will users specifying a custom error reporter be able to turn
>>>>> off the other reporters?
>>>>
>>>> In the current implementation, custom reporters are an independent
>>>> addition to the runtime reporters.
>>>>
>>>> 4. Are error reporters expected to be source/sink agnostic (like the
>>>>> Log reporter) or are they permitted to function for just one type (like 
>>>>> the
>>>>> DLQ reporter?)
>>>>
>>>> Error reporters are expected to be source/sink agnostic.
>>>>
>>>> 5. Should reporters be asynchronous/fire-and-forget, or should they
>>>>> have a mechanism for propagating errors that kill the task?
>>>>
>>>> I believe that adding a mechanism for propagating errors to the error
>>>> handler interface is preferable.
>>>>
>>>> 6. Would it make sense for error reporting to also involve error
>>>>> handling: i.e. let the plugin decide how to handle errors (drop record,
>>>>> trigger retries, fail the task, etc)?
>>>>
>>>> I believe this approach makes sense. I have added new changes to a
>>>> separate branch and created a PR
>>>> https://github.com/anton-liauchuk/kafka/pull/1/files. I haven’t
>>>> extended the KIP at this stage, as I would like to discuss some items
>>>> first. In this PR, I haven’t prepared all the necessary changes to support
>>>> a new mode yet; it's just POC.
>>>>
>>>> It seems we don’t need to add this functionality to the reporter, as it
>>>> would be better for the reporter interface to focus solely on reporting. I
>>>> have created a new interface called `ErrorHandler`, which provides a way to
>>>> handle error responses. I designed this interface to be similar to
>>>> `org.apache.kafka.streams.errors.ProcessingExceptionHandler` from the
>>>> `kafka-streams` project.
>>>>
>>>> I'm considering extending the tolerance configuration to enable this
>>>> handler with the `*errors.tolerance=custom*` setting. When custom
>>>> tolerance is selected, the client can specify the class name for the error
>>>> handler. Handling the error might result in one of three options:
>>>> - *DROP*: Skips the record.
>>>> - *FAIL*: Fails the task.
>>>> - *ACK*: Skips the message and acknowledges it, applicable for source
>>>> connectors.
>>>> The following stages are where error handling might be used (these
>>>> stages are part of the `*TOLERABLE_EXCEPTIONS*` in `
>>>> *org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator#TOLERABLE_EXCEPTIONS*
>>>> `):
>>>> - *TRANSFORMATION*
>>>> - *KEY_CONVERTER*
>>>> - *VALUE_CONVERTER*
>>>> - *HEADER_CONVERTER*
>>>>
>>>> I would like some advice on the following items:
>>>> 6.1. Do we still need to define an error reporter interface if we have
>>>> the option to create an error handler? I believe that all necessary
>>>> reporting can be managed within the error handler, making the reporter
>>>> interface seem unnecessary.
>>>> 6.2. Does it make sense to expand the list of stages where the error
>>>> handler can be used? The current list is based on the existing error
>>>> handling logic. For instance, it could be beneficial to handle errors from
>>>> the `*TASK_POLL*` stage. The current implementation does not support
>>>> error handling for errors that are unassigned to any records, but we could
>>>> consider how to extend it if needed. Additionally, we might review the `
>>>> *KAFKA_PRODUCE*` and `*TASK_PUT*` stages.
>>>> 6.3. If we begin improvements to error handling, should we also explore
>>>> the possibility of supporting error handling for connector or task 
>>>> failures?
>>>>
>>>>
>>>> On Fri, Oct 25, 2024 at 2:30 AM Greg Harris
>>>> <greg.har...@aiven.io.invalid> wrote:
>>>>
>>>>> Hi Anton,
>>>>>
>>>>> Thanks for the KIP! I think that looking at internal APIs as
>>>>> inspiration
>>>>> for new external APIs is a good idea, and I'm glad that you found an
>>>>> interface close to the problem you're trying to solve.
>>>>>
>>>>> What sort of implementation do you have in mind for this interface?
>>>>> What
>>>>> use-case does this interface enable that is not possible with log
>>>>> scraping,
>>>>> or implementing a source-connector DLQ to Kafka?
>>>>> Before we make something pluggable, we should consider if the existing
>>>>> framework implementations could be improved directly.
>>>>>
>>>>> Could you add the ErrorContext class to your public API description? I
>>>>> don't think that is an existing interface. Also please specify the
>>>>> package/fully qualified names for these classes.
>>>>>
>>>>> How do you expect this will interact with the existing log and DLQ
>>>>> reporters?
>>>>> Will users specifying a custom error reporter be able to turn off the
>>>>> other
>>>>> reporters?
>>>>>
>>>>> Are error reporters expected to be source/sink agnostic (like the Log
>>>>> reporter) or are they permitted to function for just one type (like
>>>>> the DLQ
>>>>> reporter?)
>>>>>
>>>>> The runtime interface returns a Future<RecordMetadata>, which is an
>>>>> abstraction specific for the DLQ reporter and ignored for the Log
>>>>> reporter,
>>>>> and I see that you've omitted it from the new API.
>>>>> Should reporters be asynchronous/fire-and-forget, or should they have a
>>>>> mechanism for propagating errors that kill the task?
>>>>>
>>>>> Would it make sense for error reporting to also involve error handling:
>>>>> i.e. let the plugin decide how to handle errors (drop record, trigger
>>>>> retries, fail the task, etc)?
>>>>> In Connect there's been a longstanding pattern where every connector
>>>>> reimplements error handling individually, often hardcoding response
>>>>> behaviors to various errors, because the existing errors.tolerance
>>>>> configuration is too limiting.
>>>>> Maybe making this pluggable leads us towards a solution where there
>>>>> could
>>>>> be a pluggable "error handler" that can implement reporting for many
>>>>> different errors, but also allow for simple reconfiguration of error
>>>>> handling behavior.
>>>>>
>>>>> Thanks,
>>>>> Greg
>>>>>
>>>>> On Thu, Oct 24, 2024 at 3:57 PM Anton Liauchuk <anton93...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Bumping the thread. Please review this KIP. Thanks!
>>>>> >
>>>>> > On Sun, Oct 13, 2024 at 11:44 PM Anton Liauchuk <
>>>>> anton93...@gmail.com>
>>>>> > wrote:
>>>>> > >
>>>>> > > Hi all,
>>>>> > >
>>>>> > > I have opened
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1097+error+record+reporter
>>>>> > >
>>>>> > > POC: https://github.com/apache/kafka/pull/17493
>>>>> > >
>>>>> > > Please review KIP and PR, feedbacks and suggestions are welcome.
>>>>> >
>>>>>
>>>>

Re: [DISCUSS] KIP-1097: error record reporter

Reply via email to