Hi Greg,

This was my understanding as well--if we can't turn a record into a byte
array on the source side, it's difficult to know exactly what to write to a
DLQ topic.

One idea I've toyed with recently is that we could write the source
partition and offset for the failed record (assuming, hopefully safely,
that these can at least be serialized). This may not cover all bases, is
highly dependent on how user-friendly the offsets published by the
connector are, and does come with the risk of data loss (if the upstream
system is wiped before skipped records can be recovered), but could be
useful in some scenarios.

Thoughts?

Chris

On Tue, Mar 5, 2024 at 5:49 PM Greg Harris <greg.har...@aiven.io.invalid>
wrote:

> Hi Yeikel,
>
> Thanks for your question. It certainly isn't clear from the original
> KIP-298, the attached discussion, or the follow-up KIP-610 as to why
> the situation is asymmetric.
>
> The reason as I understand it is: Source connectors are responsible
> for importing data to Kafka. If an error occurs during this process,
> then writing useful information to a dead letter queue about the
> failure is at least as difficult as importing the record correctly.
>
> For some examples:
> * If an error occurs during poll(), the external data has not yet been
> transformed into a SourceRecord that the framework can transform or
> serialize.
> * If an error occurs during conversion/serialization, the external
> data cannot be reasonably serialized to be forwarded to the DLQ.
> * If a record cannot be written to Kafka, such as due to being too
> large, the same failure is likely to happen with writing to the DLQ as
> well.
>
> For the Sink side, we already know that the data was properly
> serializable and appeared as a ConsumerRecord<byte[],byte[]>. That can
> be forwarded to the DLQ as-is with a reasonable expectation for
> success, with the same data formatting as the source topic.
>
> If you have a vision for how this can be improved and are interested,
> please consider opening a KIP! The situation can certainly be made
> better than it is today.
>
> Thanks!
> Greg
>
> On Tue, Mar 5, 2024 at 5:35 AM Yeikel Santana <em...@yeikel.com> wrote:
> >
> > Hi all,
> >
> > Sink connectors support Dear Letter Queues[1], but Source connectors
> don't seem to
> >
> > What is the reason that we decided to do that?
> >
> > In my data pipeline, I'd like to apply some transformations to the
> messages before they are sink, but that leaves me vulnerable to failures as
> I need to either fail the connector or employ logging to track source
> failures
> >
> > It seems that for now, I'll need to apply the transformations as a sink
> and possibly reinsert them back to Kafka for downstream consumption, but
> that sounds unnecessary
> >
> >
> > [1]
> https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=80453065#content/view/80453065
>

Reply via email to