Sorry for the super late reply, but since the vote thread showed up, I've read the KIP and have a concern. The concern was raised by Matthias Sax earlier, but I didn't see it addressed.
Basically the current iteration of the KIP unifies deserialization errors with corruption. As was pointed out, these are not the same thing. Corruption means that the message itself (including envelope, not just the payload) is broken. De-serialization errors mean that either key or value serializers have a problem. It can even be a temporary problem of connecting to schema registry, I believe. Corrupt messages can only be skipped. De-serialization errors can (and arguably should) be retried with a different serializer. Something like Connect will need to skip corrupt messages, but messages with SerDe issues should probably go into a dead-letter queue. Anyway, IMO we need exceptions that will let us tell the difference. Gwen On Fri, Oct 11, 2019 at 10:05 AM Stanislav Kozlovski <stanis...@confluent.io> wrote: > > Thanks Jason. I've edited the KIP with the latest proposal. > > On Thu, Oct 10, 2019 at 2:00 AM Jason Gustafson <ja...@confluent.io> wrote: > > > Hi Stanislav, > > > > Sorry for the late comment. I'm happy with the current proposal. Just one > > small request is to include an example which shows how a user could use > > this to skip over a record. > > > > I'd suggest pushing this to a vote to see if anyone else has feedback. > > > > Thanks, > > Jason > > > > On Sat, Jul 13, 2019 at 2:27 PM Stanislav Kozlovski < > > stanis...@confluent.io> > > wrote: > > > > > Hello, > > > > > > Could we restart the discussion here again? > > > > > > My last message was sent on the 3rd of June but I haven't received > > replies > > > since then. > > > > > > I'd like to get this KIP to a finished state, regardless of whether that > > is > > > merged-in or discarded. It has been almost one year since the publication > > > of the KIP. > > > > > > Thanks, > > > Stanislav > > > > > > On Mon, Jun 3, 2019 at 11:19 AM Stanislav Kozlovski < > > > stanis...@confluent.io> > > > wrote: > > > > > > > Do people agree with the approach I outlined in my last reply? > > > > > > > > On Mon, May 6, 2019 at 2:12 PM Stanislav Kozlovski < > > > stanis...@confluent.io> > > > > wrote: > > > > > > > >> Hey there Kamal, > > > >> > > > >> I'm sincerely sorry for missing your earlier message. As I open this > > > >> thread up, I see I have an unsent draft message about resuming > > > discussion > > > >> from some time ago. > > > >> > > > >> In retrospect, I think I may have been too pedantic with the exception > > > >> naming and hierarchy. > > > >> I now believe a single exception type of > > > `RecordDeserializationException` > > > >> is enough. Let's go with that. > > > >> > > > >> On Mon, May 6, 2019 at 6:40 AM Kamal Chandraprakash < > > > >> kamal.chandraprak...@gmail.com> wrote: > > > >> > > > >>> Matthias, > > > >>> > > > >>> We already have CorruptRecordException which doesn't extend the > > > >>> SerializationException. So, we need an alternate > > > >>> name suggestion for the corrupted record error if we decide to extend > > > the > > > >>> FaultyRecordException class. > > > >>> > > > >>> Stanislav, > > > >>> > > > >>> Our users are also facing this error. Could we bump up this > > discussion? > > > >>> > > > >>> I think we can have a single exception type > > > >>> FaultyRecordException/RecordDeserialization exception to capture both > > > >>> the errors. We can add an additional enum field to differentiate the > > > >>> errors > > > >>> if required. > > > >>> > > > >>> Thanks, > > > >>> Kamal Chandraprakash > > > >>> > > > >>> On Wed, Apr 24, 2019 at 1:49 PM Kamal Chandraprakash < > > > >>> kamal.chandraprak...@gmail.com> wrote: > > > >>> > > > >>> > Stanislav, > > > >>> > > > > >>> > Any updates on this KIP? We have internal users who want to skip > > the > > > >>> > corrupted message while consuming the records. > > > >>> > > > > >>> > > > > >>> > On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax < > > > >>> matth...@confluent.io> > > > >>> > wrote: > > > >>> > > > > >>> >> I am not 100% familiar with the details of the consumer code, > > > however > > > >>> I > > > >>> >> tend to disagree with: > > > >>> >> > > > >>> >> > There's no difference between the two cases -- if (and only if) > > > the > > > >>> >> message is corrupt, it can't be deserialized. If (and only if) it > > > >>> can't be > > > >>> >> deserialized, it is corrupt. > > > >>> >> > > > >>> >> Assume that a user configures a JSON deserializer but a faulty > > > >>> upstream > > > >>> >> producer writes an Avro message. For this case, the message is not > > > >>> >> corrupted, but still can't be deserialized. And I can imaging that > > > >>> users > > > >>> >> want to handle both cases differently. > > > >>> >> > > > >>> >> Thus, I think it makes sense to have two different exceptions > > > >>> >> `RecordDeserializationException` and `CorruptedRecordException` > > that > > > >>> can > > > >>> >> both extend `FaultyRecordException` (don't like this name too much > > > >>> >> honestly, but don't have a better idea for it anyway). > > > >>> >> > > > >>> >> Side remark. If we introduce class > > `RecordDeserializationException` > > > >>> and > > > >>> >> `CorruptedRecordException`, we can also add an interface that both > > > >>> >> implement to return partition/offset information and let both > > extend > > > >>> >> `SerializationException` directly without an intermediate class in > > > the > > > >>> >> exception hierarchy. > > > >>> >> > > > >>> >> > > > >>> >> -Matthias > > > >>> >> > > > >>> >> On 8/8/18 2:57 AM, Stanislav Kozlovski wrote: > > > >>> >> >> If you are inheriting from SerializationException, your derived > > > >>> class > > > >>> >> > should also be a kind of serialization exception. Not something > > > >>> more > > > >>> >> > general. > > > >>> >> > Yeah, the reason for inheriting it would be for > > > >>> backwards-compatibility. > > > >>> >> > > > > >>> >> >> Hmm. Can you think of any new scenarios that would make Kafka > > > >>> force > > > >>> >> the > > > >>> >> > user need to skip a specific record? Perhaps one scenario is if > > > >>> records > > > >>> >> > are lost but we don't know how many. > > > >>> >> > Not on the spot, but I do wonder how likely a new scenario is to > > > >>> >> surface in > > > >>> >> > the future and how we'd handle the exceptions' class hierarchy > > > then. > > > >>> >> > > > > >>> >> >> Which offset were we planning to use in the > > > >>> >> > exception? > > > >>> >> > The offset of the record which caused the exception. In the case > > > of > > > >>> >> > batches, we use the last offset of the batch. In both cases, > > users > > > >>> >> should > > > >>> >> > have to seek +1 from the given offset. You can review the PR to > > > >>> ensure > > > >>> >> its > > > >>> >> > accurate > > > >>> >> > > > > >>> >> > > > > >>> >> > If both of you prefer `RecordDeserializationException`, we can > > go > > > >>> with > > > >>> >> > that. Please do confirm that is okay > > > >>> >> > > > > >>> >> > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson < > > > ja...@confluent.io > > > >>> > > > > >>> >> wrote: > > > >>> >> > > > > >>> >> >> One difference between the two cases is that we can't generally > > > >>> trust > > > >>> >> the > > > >>> >> >> offset of a corrupt message. Which offset were we planning to > > use > > > >>> in > > > >>> >> the > > > >>> >> >> exception? Maybe it should be either the fetch offset or one > > plus > > > >>> the > > > >>> >> last > > > >>> >> >> consumed offset? I think I'm with Colin in preferring > > > >>> >> >> RecordDeserializationException for both cases if possible. For > > > one > > > >>> >> thing, > > > >>> >> >> that makes the behavior consistent whether or not `check.crs` > > is > > > >>> >> enabled. > > > >>> >> >> > > > >>> >> >> -Jason > > > >>> >> >> > > > >>> >> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe < > > > cmcc...@apache.org> > > > >>> >> wrote: > > > >>> >> >> > > > >>> >> >>> Hi Stanislav, > > > >>> >> >>> > > > >>> >> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote: > > > >>> >> >>>> Hey Colin, > > > >>> >> >>>> > > > >>> >> >>>> It may be a bit vague but keep in mind we also raise the > > > >>> exception in > > > >>> >> >> the > > > >>> >> >>>> case where the record is corrupted. > > > >>> >> >>>> We discussed with Jason offline that message corruption most > > > >>> often > > > >>> >> >>> prevents > > > >>> >> >>>> deserialization itself and that may be enough of an argument > > to > > > >>> raise > > > >>> >> >>>> `RecordDeserializationException` in the case of a corrupt > > > >>> record. I > > > >>> >> >>>> personally find that misleading. > > > >>> >> >>> > > > >>> >> >>> Hmm. I think that by definition, corrupt records are records > > > that > > > >>> >> can't > > > >>> >> >>> be deserialized. There's no difference between the two cases > > -- > > > >>> if > > > >>> >> (and > > > >>> >> >>> only if) the message is corrupt, it can't be deserialized. If > > > >>> (and > > > >>> >> only > > > >>> >> >>> if) it can't be deserialized, it is corrupt. > > > >>> >> >>> > > > >>> >> >>>> > > > >>> >> >>>> In the end, I think it might be worth it to have a bit of a > > > >>> >> >>>> wider-encompassing `FaultyRecordException` (or even > > > >>> >> >>>> `UnconsumableRecordException`) which would allow users to > > skip > > > >>> over > > > >>> >> the > > > >>> >> >>>> record. > > > >>> >> >>> > > > >>> >> >>> If you are inheriting from SerializationException, your > > derived > > > >>> class > > > >>> >> >>> should also be a kind of serialization exception. Not > > something > > > >>> more > > > >>> >> >>> general. > > > >>> >> >>> > > > >>> >> >>>> We could then potentially have more specific exceptions (e.g > > > >>> >> >>>> deserialization) inherit that but I'm not sure if we should. > > > >>> >> >>>> A case for a more general exception which provides access to > > > the > > > >>> >> >>>> partition/offset is future backwards-compatibility. If there > > is > > > >>> ever > > > >>> >> a > > > >>> >> >>> new > > > >>> >> >>>> scenario that would make the user need to skip a specific > > > record > > > >>> - > > > >>> >> >> there > > > >>> >> >>>> would already be such an exception and this will provide some > > > >>> >> >>>> backward-compatibility with older clients. > > > >>> >> >>> > > > >>> >> >>> Hmm. Can you think of any new scenarios that would make Kafka > > > >>> force > > > >>> >> the > > > >>> >> >>> user need to skip a specific record? Perhaps one scenario is > > if > > > >>> >> records > > > >>> >> >>> are lost but we don't know how many. > > > >>> >> >>> > > > >>> >> >>> If we really want to have something more general, perhaps we > > > need > > > >>> >> >>> something like LostDataException. But in that case, we can't > > > >>> inherit > > > >>> >> >> from > > > >>> >> >>> SerializationException, since lost data is more general than > > > >>> >> >> serialization > > > >>> >> >>> issues. > > > >>> >> >>> > > > >>> >> >>> best, > > > >>> >> >>> Colin > > > >>> >> >>> > > > >>> >> >>> > > > >>> >> >>>> > > > >>> >> >>>> Best, > > > >>> >> >>>> Stanislav > > > >>> >> >>>> > > > >>> >> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe < > > > cmcc...@apache.org > > > >>> > > > > >>> >> >> wrote: > > > >>> >> >>>> > > > >>> >> >>>>> Hi Stanislav, > > > >>> >> >>>>> > > > >>> >> >>>>> Thanks for the KIP. > > > >>> >> >>>>> > > > >>> >> >>>>> As as user, "FaultyRecordException" seems a little vague. > > > >>> What's > > > >>> >> >>> faulty > > > >>> >> >>>>> about it? Is it too big? Does it not match the schema of > > the > > > >>> other > > > >>> >> >>>>> records? Was it not found? > > > >>> >> >>>>> > > > >>> >> >>>>> Of course, we know that the specific problem is with > > > >>> [de]serializing > > > >>> >> >>> the > > > >>> >> >>>>> record, not with any of those those things. So we should > > > >>> choose a > > > >>> >> >> name > > > >>> >> >>>>> that reflects that serialization is the problem. How about > > > >>> >> >>>>> RecordSerializationException? > > > >>> >> >>>>> > > > >>> >> >>>>> best, > > > >>> >> >>>>> Colin > > > >>> >> >>>>> > > > >>> >> >>>>> > > > >>> >> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote: > > > >>> >> >>>>>> Hi Jason and Ted, > > > >>> >> >>>>>> > > > >>> >> >>>>>> @Jason > > > >>> >> >>>>>> I did not oppose the idea as I'm not too familiar with Java > > > >>> >> >>> conventions. > > > >>> >> >>>>> I > > > >>> >> >>>>>> agree it is a non-ideal way for the user to catch the > > > >>> exception so > > > >>> >> >> I > > > >>> >> >>>>>> changed it back. > > > >>> >> >>>>>> > > > >>> >> >>>>>> I've updated the KIP and PR with the latest changes. Now, > > > >>> there is > > > >>> >> >>> only > > > >>> >> >>>>> one > > > >>> >> >>>>>> public exception `FaultyRecordException` which is thrown to > > > the > > > >>> >> >> user > > > >>> >> >>> in > > > >>> >> >>>>>> both scenarios (corrupt record and deserialization error). > > > >>> >> >>>>>> Please take a look again. > > > >>> >> >>>>>> > > > >>> >> >>>>>> Best, > > > >>> >> >>>>>> Stanislav > > > >>> >> >>>>>> > > > >>> >> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson < > > > >>> ja...@confluent.io > > > >>> >> >>> > > > >>> >> >>>>> wrote: > > > >>> >> >>>>>> > > > >>> >> >>>>>>> Hey Stanislav, > > > >>> >> >>>>>>> > > > >>> >> >>>>>>> I should have noticed it earlier from your example, but I > > > just > > > >>> >> >>> realized > > > >>> >> >>>>>>> that interfaces don't mix well with exceptions. There is > > no > > > >>> way > > > >>> >> >> to > > > >>> >> >>>>> catch > > > >>> >> >>>>>>> the interface type, which means you have to depend on > > > >>> instanceof > > > >>> >> >>>>> checks, > > > >>> >> >>>>>>> which is not very conventional. At the moment, we raise > > > >>> >> >>>>>>> SerializationException if there is a problem parsing the > > > >>> error, > > > >>> >> >>> and we > > > >>> >> >>>>>>> raise KafkaException if the record fails its CRC check. > > > Since > > > >>> >> >>>>>>> SerializationException extends KafkaExeption, it seems > > like > > > we > > > >>> >> >> can > > > >>> >> >>>>> handle > > > >>> >> >>>>>>> both cases in a compatible way by handling both cases > > with a > > > >>> >> >> single > > > >>> >> >>>>> type > > > >>> >> >>>>>>> that extends SerializationException. Maybe something like > > > >>> >> >>>>>>> RecordDeserializationException? > > > >>> >> >>>>>>> > > > >>> >> >>>>>>> Thanks, > > > >>> >> >>>>>>> Jason > > > >>> >> >>>>>>> > > > >>> >> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu < > > yuzhih...@gmail.com > > > > > > > >>> >> >>> wrote: > > > >>> >> >>>>>>> > > > >>> >> >>>>>>>> +1 > > > >>> >> >>>>>>>> -------- Original message --------From: Stanislav > > > Kozlovski < > > > >>> >> >>>>>>>> stanis...@confluent.io> Date: 8/2/18 2:41 AM > > (GMT-08:00) > > > >>> To: > > > >>> >> >>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include > > > >>> >> >> partitions > > > >>> >> >>> in > > > >>> >> >>>>>>>> exceptions raised during consumer record > > > >>> >> >>> deserialization/validation > > > >>> >> >>>>>>>> Hey everybody, > > > >>> >> >>>>>>>> > > > >>> >> >>>>>>>> I'd like to start a vote thread for KIP-334 Include > > > >>> partitions > > > >>> >> >> in > > > >>> >> >>>>>>>> exceptions raised during consumer record > > > >>> >> >>> deserialization/validation > > > >>> >> >>>>>>>> < > > > >>> >> >>>>>>> > > > >>> >> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action? > > > >>> >> >>> pageId=87297793 > > > >>> >> >>>>>>>>> > > > >>> >> >>>>>>>> > > > >>> >> >>>>>>>> -- > > > >>> >> >>>>>>>> Best, > > > >>> >> >>>>>>>> Stanislav > > > >>> >> >>>>>>>> > > > >>> >> >>>>>>> > > > >>> >> >>>>>> > > > >>> >> >>>>>> > > > >>> >> >>>>>> -- > > > >>> >> >>>>>> Best, > > > >>> >> >>>>>> Stanislav > > > >>> >> >>>>> > > > >>> >> >>>> > > > >>> >> >>>> > > > >>> >> >>>> -- > > > >>> >> >>>> Best, > > > >>> >> >>>> Stanislav > > > >>> >> >>> > > > >>> >> >> > > > >>> >> > > > > >>> >> > > > > >>> >> > > > >>> >> > > > >>> > > > >> > > > >> > > > >> -- > > > >> Best, > > > >> Stanislav > > > >> > > > > > > > > > > > > -- > > > > Best, > > > > Stanislav > > > > > > > > > > > > > -- > > > Best, > > > Stanislav > > > > > > > > -- > Best, > Stanislav