I am not 100% familiar with the details of the consumer code, however I tend to disagree with:
> There's no difference between the two cases -- if (and only if) the message > is corrupt, it can't be deserialized. If (and only if) it can't be > deserialized, it is corrupt. Assume that a user configures a JSON deserializer but a faulty upstream producer writes an Avro message. For this case, the message is not corrupted, but still can't be deserialized. And I can imaging that users want to handle both cases differently. Thus, I think it makes sense to have two different exceptions `RecordDeserializationException` and `CorruptedRecordException` that can both extend `FaultyRecordException` (don't like this name too much honestly, but don't have a better idea for it anyway). Side remark. If we introduce class `RecordDeserializationException` and `CorruptedRecordException`, we can also add an interface that both implement to return partition/offset information and let both extend `SerializationException` directly without an intermediate class in the exception hierarchy. -Matthias On 8/8/18 2:57 AM, Stanislav Kozlovski wrote: >> If you are inheriting from SerializationException, your derived class > should also be a kind of serialization exception. Not something more > general. > Yeah, the reason for inheriting it would be for backwards-compatibility. > >> Hmm. Can you think of any new scenarios that would make Kafka force the > user need to skip a specific record? Perhaps one scenario is if records > are lost but we don't know how many. > Not on the spot, but I do wonder how likely a new scenario is to surface in > the future and how we'd handle the exceptions' class hierarchy then. > >> Which offset were we planning to use in the > exception? > The offset of the record which caused the exception. In the case of > batches, we use the last offset of the batch. In both cases, users should > have to seek +1 from the given offset. You can review the PR to ensure its > accurate > > > If both of you prefer `RecordDeserializationException`, we can go with > that. Please do confirm that is okay > > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <ja...@confluent.io> wrote: > >> One difference between the two cases is that we can't generally trust the >> offset of a corrupt message. Which offset were we planning to use in the >> exception? Maybe it should be either the fetch offset or one plus the last >> consumed offset? I think I'm with Colin in preferring >> RecordDeserializationException for both cases if possible. For one thing, >> that makes the behavior consistent whether or not `check.crs` is enabled. >> >> -Jason >> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <cmcc...@apache.org> wrote: >> >>> Hi Stanislav, >>> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote: >>>> Hey Colin, >>>> >>>> It may be a bit vague but keep in mind we also raise the exception in >> the >>>> case where the record is corrupted. >>>> We discussed with Jason offline that message corruption most often >>> prevents >>>> deserialization itself and that may be enough of an argument to raise >>>> `RecordDeserializationException` in the case of a corrupt record. I >>>> personally find that misleading. >>> >>> Hmm. I think that by definition, corrupt records are records that can't >>> be deserialized. There's no difference between the two cases -- if (and >>> only if) the message is corrupt, it can't be deserialized. If (and only >>> if) it can't be deserialized, it is corrupt. >>> >>>> >>>> In the end, I think it might be worth it to have a bit of a >>>> wider-encompassing `FaultyRecordException` (or even >>>> `UnconsumableRecordException`) which would allow users to skip over the >>>> record. >>> >>> If you are inheriting from SerializationException, your derived class >>> should also be a kind of serialization exception. Not something more >>> general. >>> >>>> We could then potentially have more specific exceptions (e.g >>>> deserialization) inherit that but I'm not sure if we should. >>>> A case for a more general exception which provides access to the >>>> partition/offset is future backwards-compatibility. If there is ever a >>> new >>>> scenario that would make the user need to skip a specific record - >> there >>>> would already be such an exception and this will provide some >>>> backward-compatibility with older clients. >>> >>> Hmm. Can you think of any new scenarios that would make Kafka force the >>> user need to skip a specific record? Perhaps one scenario is if records >>> are lost but we don't know how many. >>> >>> If we really want to have something more general, perhaps we need >>> something like LostDataException. But in that case, we can't inherit >> from >>> SerializationException, since lost data is more general than >> serialization >>> issues. >>> >>> best, >>> Colin >>> >>> >>>> >>>> Best, >>>> Stanislav >>>> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe <cmcc...@apache.org> >> wrote: >>>> >>>>> Hi Stanislav, >>>>> >>>>> Thanks for the KIP. >>>>> >>>>> As as user, "FaultyRecordException" seems a little vague. What's >>> faulty >>>>> about it? Is it too big? Does it not match the schema of the other >>>>> records? Was it not found? >>>>> >>>>> Of course, we know that the specific problem is with [de]serializing >>> the >>>>> record, not with any of those those things. So we should choose a >> name >>>>> that reflects that serialization is the problem. How about >>>>> RecordSerializationException? >>>>> >>>>> best, >>>>> Colin >>>>> >>>>> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote: >>>>>> Hi Jason and Ted, >>>>>> >>>>>> @Jason >>>>>> I did not oppose the idea as I'm not too familiar with Java >>> conventions. >>>>> I >>>>>> agree it is a non-ideal way for the user to catch the exception so >> I >>>>>> changed it back. >>>>>> >>>>>> I've updated the KIP and PR with the latest changes. Now, there is >>> only >>>>> one >>>>>> public exception `FaultyRecordException` which is thrown to the >> user >>> in >>>>>> both scenarios (corrupt record and deserialization error). >>>>>> Please take a look again. >>>>>> >>>>>> Best, >>>>>> Stanislav >>>>>> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io >>> >>>>> wrote: >>>>>> >>>>>>> Hey Stanislav, >>>>>>> >>>>>>> I should have noticed it earlier from your example, but I just >>> realized >>>>>>> that interfaces don't mix well with exceptions. There is no way >> to >>>>> catch >>>>>>> the interface type, which means you have to depend on instanceof >>>>> checks, >>>>>>> which is not very conventional. At the moment, we raise >>>>>>> SerializationException if there is a problem parsing the error, >>> and we >>>>>>> raise KafkaException if the record fails its CRC check. Since >>>>>>> SerializationException extends KafkaExeption, it seems like we >> can >>>>> handle >>>>>>> both cases in a compatible way by handling both cases with a >> single >>>>> type >>>>>>> that extends SerializationException. Maybe something like >>>>>>> RecordDeserializationException? >>>>>>> >>>>>>> Thanks, >>>>>>> Jason >>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <yuzhih...@gmail.com> >>> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> -------- Original message --------From: Stanislav Kozlovski < >>>>>>>> stanis...@confluent.io> Date: 8/2/18 2:41 AM (GMT-08:00) To: >>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include >> partitions >>> in >>>>>>>> exceptions raised during consumer record >>> deserialization/validation >>>>>>>> Hey everybody, >>>>>>>> >>>>>>>> I'd like to start a vote thread for KIP-334 Include partitions >> in >>>>>>>> exceptions raised during consumer record >>> deserialization/validation >>>>>>>> < >>>>>>> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action? >>> pageId=87297793 >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best, >>>>>>>> Stanislav >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best, >>>>>> Stanislav >>>>> >>>> >>>> >>>> -- >>>> Best, >>>> Stanislav >>> >> > >
signature.asc
Description: OpenPGP digital signature