Hey there Kamal, I'm sincerely sorry for missing your earlier message. As I open this thread up, I see I have an unsent draft message about resuming discussion from some time ago.
In retrospect, I think I may have been too pedantic with the exception naming and hierarchy. I now believe a single exception type of `RecordDeserializationException` is enough. Let's go with that. On Mon, May 6, 2019 at 6:40 AM Kamal Chandraprakash < kamal.chandraprak...@gmail.com> wrote: > Matthias, > > We already have CorruptRecordException which doesn't extend the > SerializationException. So, we need an alternate > name suggestion for the corrupted record error if we decide to extend the > FaultyRecordException class. > > Stanislav, > > Our users are also facing this error. Could we bump up this discussion? > > I think we can have a single exception type > FaultyRecordException/RecordDeserialization exception to capture both > the errors. We can add an additional enum field to differentiate the errors > if required. > > Thanks, > Kamal Chandraprakash > > On Wed, Apr 24, 2019 at 1:49 PM Kamal Chandraprakash < > kamal.chandraprak...@gmail.com> wrote: > > > Stanislav, > > > > Any updates on this KIP? We have internal users who want to skip the > > corrupted message while consuming the records. > > > > > > On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax <matth...@confluent.io> > > wrote: > > > >> I am not 100% familiar with the details of the consumer code, however I > >> tend to disagree with: > >> > >> > There's no difference between the two cases -- if (and only if) the > >> message is corrupt, it can't be deserialized. If (and only if) it > can't be > >> deserialized, it is corrupt. > >> > >> Assume that a user configures a JSON deserializer but a faulty upstream > >> producer writes an Avro message. For this case, the message is not > >> corrupted, but still can't be deserialized. And I can imaging that users > >> want to handle both cases differently. > >> > >> Thus, I think it makes sense to have two different exceptions > >> `RecordDeserializationException` and `CorruptedRecordException` that can > >> both extend `FaultyRecordException` (don't like this name too much > >> honestly, but don't have a better idea for it anyway). > >> > >> Side remark. If we introduce class `RecordDeserializationException` and > >> `CorruptedRecordException`, we can also add an interface that both > >> implement to return partition/offset information and let both extend > >> `SerializationException` directly without an intermediate class in the > >> exception hierarchy. > >> > >> > >> -Matthias > >> > >> On 8/8/18 2:57 AM, Stanislav Kozlovski wrote: > >> >> If you are inheriting from SerializationException, your derived class > >> > should also be a kind of serialization exception. Not something more > >> > general. > >> > Yeah, the reason for inheriting it would be for > backwards-compatibility. > >> > > >> >> Hmm. Can you think of any new scenarios that would make Kafka force > >> the > >> > user need to skip a specific record? Perhaps one scenario is if > records > >> > are lost but we don't know how many. > >> > Not on the spot, but I do wonder how likely a new scenario is to > >> surface in > >> > the future and how we'd handle the exceptions' class hierarchy then. > >> > > >> >> Which offset were we planning to use in the > >> > exception? > >> > The offset of the record which caused the exception. In the case of > >> > batches, we use the last offset of the batch. In both cases, users > >> should > >> > have to seek +1 from the given offset. You can review the PR to ensure > >> its > >> > accurate > >> > > >> > > >> > If both of you prefer `RecordDeserializationException`, we can go with > >> > that. Please do confirm that is okay > >> > > >> > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <ja...@confluent.io> > >> wrote: > >> > > >> >> One difference between the two cases is that we can't generally trust > >> the > >> >> offset of a corrupt message. Which offset were we planning to use in > >> the > >> >> exception? Maybe it should be either the fetch offset or one plus the > >> last > >> >> consumed offset? I think I'm with Colin in preferring > >> >> RecordDeserializationException for both cases if possible. For one > >> thing, > >> >> that makes the behavior consistent whether or not `check.crs` is > >> enabled. > >> >> > >> >> -Jason > >> >> > >> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <cmcc...@apache.org> > >> wrote: > >> >> > >> >>> Hi Stanislav, > >> >>> > >> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote: > >> >>>> Hey Colin, > >> >>>> > >> >>>> It may be a bit vague but keep in mind we also raise the exception > in > >> >> the > >> >>>> case where the record is corrupted. > >> >>>> We discussed with Jason offline that message corruption most often > >> >>> prevents > >> >>>> deserialization itself and that may be enough of an argument to > raise > >> >>>> `RecordDeserializationException` in the case of a corrupt record. I > >> >>>> personally find that misleading. > >> >>> > >> >>> Hmm. I think that by definition, corrupt records are records that > >> can't > >> >>> be deserialized. There's no difference between the two cases -- if > >> (and > >> >>> only if) the message is corrupt, it can't be deserialized. If (and > >> only > >> >>> if) it can't be deserialized, it is corrupt. > >> >>> > >> >>>> > >> >>>> In the end, I think it might be worth it to have a bit of a > >> >>>> wider-encompassing `FaultyRecordException` (or even > >> >>>> `UnconsumableRecordException`) which would allow users to skip over > >> the > >> >>>> record. > >> >>> > >> >>> If you are inheriting from SerializationException, your derived > class > >> >>> should also be a kind of serialization exception. Not something > more > >> >>> general. > >> >>> > >> >>>> We could then potentially have more specific exceptions (e.g > >> >>>> deserialization) inherit that but I'm not sure if we should. > >> >>>> A case for a more general exception which provides access to the > >> >>>> partition/offset is future backwards-compatibility. If there is > ever > >> a > >> >>> new > >> >>>> scenario that would make the user need to skip a specific record - > >> >> there > >> >>>> would already be such an exception and this will provide some > >> >>>> backward-compatibility with older clients. > >> >>> > >> >>> Hmm. Can you think of any new scenarios that would make Kafka force > >> the > >> >>> user need to skip a specific record? Perhaps one scenario is if > >> records > >> >>> are lost but we don't know how many. > >> >>> > >> >>> If we really want to have something more general, perhaps we need > >> >>> something like LostDataException. But in that case, we can't > inherit > >> >> from > >> >>> SerializationException, since lost data is more general than > >> >> serialization > >> >>> issues. > >> >>> > >> >>> best, > >> >>> Colin > >> >>> > >> >>> > >> >>>> > >> >>>> Best, > >> >>>> Stanislav > >> >>>> > >> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe <cmcc...@apache.org> > >> >> wrote: > >> >>>> > >> >>>>> Hi Stanislav, > >> >>>>> > >> >>>>> Thanks for the KIP. > >> >>>>> > >> >>>>> As as user, "FaultyRecordException" seems a little vague. What's > >> >>> faulty > >> >>>>> about it? Is it too big? Does it not match the schema of the > other > >> >>>>> records? Was it not found? > >> >>>>> > >> >>>>> Of course, we know that the specific problem is with > [de]serializing > >> >>> the > >> >>>>> record, not with any of those those things. So we should choose a > >> >> name > >> >>>>> that reflects that serialization is the problem. How about > >> >>>>> RecordSerializationException? > >> >>>>> > >> >>>>> best, > >> >>>>> Colin > >> >>>>> > >> >>>>> > >> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote: > >> >>>>>> Hi Jason and Ted, > >> >>>>>> > >> >>>>>> @Jason > >> >>>>>> I did not oppose the idea as I'm not too familiar with Java > >> >>> conventions. > >> >>>>> I > >> >>>>>> agree it is a non-ideal way for the user to catch the exception > so > >> >> I > >> >>>>>> changed it back. > >> >>>>>> > >> >>>>>> I've updated the KIP and PR with the latest changes. Now, there > is > >> >>> only > >> >>>>> one > >> >>>>>> public exception `FaultyRecordException` which is thrown to the > >> >> user > >> >>> in > >> >>>>>> both scenarios (corrupt record and deserialization error). > >> >>>>>> Please take a look again. > >> >>>>>> > >> >>>>>> Best, > >> >>>>>> Stanislav > >> >>>>>> > >> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson < > ja...@confluent.io > >> >>> > >> >>>>> wrote: > >> >>>>>> > >> >>>>>>> Hey Stanislav, > >> >>>>>>> > >> >>>>>>> I should have noticed it earlier from your example, but I just > >> >>> realized > >> >>>>>>> that interfaces don't mix well with exceptions. There is no way > >> >> to > >> >>>>> catch > >> >>>>>>> the interface type, which means you have to depend on instanceof > >> >>>>> checks, > >> >>>>>>> which is not very conventional. At the moment, we raise > >> >>>>>>> SerializationException if there is a problem parsing the error, > >> >>> and we > >> >>>>>>> raise KafkaException if the record fails its CRC check. Since > >> >>>>>>> SerializationException extends KafkaExeption, it seems like we > >> >> can > >> >>>>> handle > >> >>>>>>> both cases in a compatible way by handling both cases with a > >> >> single > >> >>>>> type > >> >>>>>>> that extends SerializationException. Maybe something like > >> >>>>>>> RecordDeserializationException? > >> >>>>>>> > >> >>>>>>> Thanks, > >> >>>>>>> Jason > >> >>>>>>> > >> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <yuzhih...@gmail.com> > >> >>> wrote: > >> >>>>>>> > >> >>>>>>>> +1 > >> >>>>>>>> -------- Original message --------From: Stanislav Kozlovski < > >> >>>>>>>> stanis...@confluent.io> Date: 8/2/18 2:41 AM (GMT-08:00) To: > >> >>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include > >> >> partitions > >> >>> in > >> >>>>>>>> exceptions raised during consumer record > >> >>> deserialization/validation > >> >>>>>>>> Hey everybody, > >> >>>>>>>> > >> >>>>>>>> I'd like to start a vote thread for KIP-334 Include partitions > >> >> in > >> >>>>>>>> exceptions raised during consumer record > >> >>> deserialization/validation > >> >>>>>>>> < > >> >>>>>>> > >> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action? > >> >>> pageId=87297793 > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>>> -- > >> >>>>>>>> Best, > >> >>>>>>>> Stanislav > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> -- > >> >>>>>> Best, > >> >>>>>> Stanislav > >> >>>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> Best, > >> >>>> Stanislav > >> >>> > >> >> > >> > > >> > > >> > >> > -- Best, Stanislav