Hi Zihan, Thanks for the changes and the clarifications! I agree that the complexity of maintaining a second topic and a second connector is a fair amount of work; to Andrew's question, it seems less about the cost of just running another connector, and more about managing that second connector (and topic) when a lot of the logic is identical, such as topic ACLs, credentials for the connector to access the external system, and other fine-tuning.
However, I'm still curious about the general use case here. For example, if a converter fails to deserialize a record, it seems like the right thing to do would be to examine the record, try to understand why it's failing, and then find a converter that can handle it. If the raw byte array for the Kafka message gets written to the external system instead, what's the benefit to the user? Yes, they won't have to configure another connector and manage another topic, but they're still going to want to examine that data at some point; why would it be easier to deal with malformed records from an external system than it would from where they originally broke, in Kafka? If we're going to add a new feature like this to the framework, I just want to make sure that there's a general use case for this that isn't tied to one specific type of connector, external system, usage pattern, etc. Oh, and one other question that came to mind--what would the expected behavior be if a converter was unable to deserialize a record's key, but was able to deserialize its value? Cheers, Chris On Sat, Apr 25, 2020 at 12:27 PM Andrew Schofield <andrew_schofi...@live.com> wrote: > Hi Zihan, > Thanks for the KIP. I have a question about the proposal. > > Why do you think putting a broken record somewhere other than a > dead-letter topic is > better? For a reliable system, you really want zero broken records, or > perhaps close to zero. > Broken records represent exceptions that need to be alerted and dealt > with, either by > fixing an incorrect application, or improving the quality of the data > incoming. I wouldn't > imagine a second set of connectors reading dead-letter topics storing the > broken events > elsewhere. > > If you did want to store them in S3, HDFS or wherever, why couldn't you > run another > connector off the dead-letter topic, with the ByteArrayConverter, that > just bundles up > the broken records as raw bytes. This seems to me very close to what this > KIP is trying to > achieve, only without needing any interface or behaviour changes in the > connectors. Yes, > you need to run more connectors, but in a distributed connect cluster, > that's easy to achieve. > > Thanks, > Andrew Schofield > IBM > > On 24/04/2020, 22:00, "Zihan Li" <lizi...@umich.edu> wrote: > > Hi Chris, > > Thanks a lot for your comments. > > 1. The complexity comes from maintaining an additional topic and a > connector, rather than configuring them. Users need to spend extra time and > money to maintain the additional connectors. I can imagine a case where a > user has 3 topics consumed by S3, HDFS and JDBC respectively The user has > to maintain 3 more connectors to consume three DLQs, in order to put broken > records to the place they should go. This new option will give users a > choice to only maintain half of their connectors, yet having broken records > stored in each destination system. > > 2. This is a great question. I updated my KIP to reflect the most > recent plan. We can add a new method to SinkTask called “putBrokenRecord”, > so that sink connectors is able to differentiate between well-formed > records and broken records. The default implementation of this method > should be throwing errors to indicate that the connector does not support > broken record handling yet. > > 3. I think the Schema should be Optional Byte Array, in order to > handle all possibilities. But I’m open to suggestions on that. > > 4. Yes, this rejected alternative plan makes sense to me. I’ll put > that into the KIP. Compared with this alternative, the point of this > proposal is to save the effort to maintain twice as many connectors as > necessary. > > Thanks again. Looking forward to the discussion! > > Sorry if you see this email twice, the previous one didn't show up on > this discussion thread. > > Best, > Zihan > > On 2020/04/13 22:35:56, Christopher Egerton <chr...@confluent.io> > wrote: > > HI Zihan, > > > > Thanks for the KIP! I have some questions that I'm hoping we can > address to > > help better understand the motivation for this proposal. > > > > 1. In the "Motivation" section it's written that "If users want to > store > > their broken records, they have to config a broken record queue, > which is > > too much work for them in some cases." Could you elaborate on what > makes > > this a lot of work? Ideally, users should be able to configure the > dead > > letter queue by specifying a value for the " > > errors.deadletterqueue.topic.name" property in their sink connector > config; > > this doesn't seem like a lot of work on the surface. > > > > 2. If the "errors.tolerance" property is set to "continue", would > sink > > connectors be able to differentiate between well-formed records whose > > successfully-deserialized contents are byte arrays and malformed > records > > whose contents are the still-serialized byte arrays of the Kafka > message > > from which they came? > > > > 3. I think it's somewhat implied by the KIP, but it'd be nice to see > what > > the schema for a malformed record would be. Null? Byte array? > Optional byte > > array? > > > > 4. This is somewhat covered by the first question, but it seems worth > > pointing out that this exact functionality can already be achieved > by using > > features already provided by the framework. Configure your connector > to > > send malformed records to a dead letter queue topic, and configure a > > separate connector to consume from that dead letter queue topic, use > the > > ByteArrayConverter to deserialize records, and send those records to > the > > destination sink. It'd be nice if this were called out in the > "Rejected > > Alternatives" section with a reason on why the changes proposed in > the KIP > > are preferable, especially since it may still work as a viable > workaround > > for users who are working on older versions of the Connect framework. > > > > Looking forward to the discussion! > > > > Cheers, > > > > Chris > > > > On Tue, Mar 24, 2020 at 11:50 AM Zihan Li <lizi...@umich.edu> wrote: > > > > > Hi, > > > > > > I just want to re-up this discussion thread about KIP-582 Add a > "continue" > > > option for Kafka Connect error handling. > > > > > > Wiki page: https://cwiki.apache.org/confluence/x/XRvcC < > > > https://cwiki.apache.org/confluence/x/XRvcC> > > > > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-9740 < > > > https://issues.apache.org/jira/browse/KAFKA-9740> > > > > > > Please share your thoughts about adding this new error handling > option to > > > Kafka Connect. > > > > > > Best, > > > Zihan > > > > > > > On Mar 18, 2020, at 12:55 PM, Zihan Li <lizi...@umich.edu> > wrote: > > > > > > > > Hi all, > > > > > > > > I'd like to use this thread to discuss KIP-582 Add a "continue" > option > > > for Kafka Connect error handling, please see detail at: > > > > https://cwiki.apache.org/confluence/x/XRvcC < > > > https://cwiki.apache.org/confluence/x/XRvcC> > > > > > > > > Best, > > > > Zihan Li > > > > > > > > > >