Re: [VOTE] KIP-334 Include partitions in exceptions raised during consumer record deserialization/validation

Stanislav Kozlovski Mon, 03 Jun 2019 03:23:57 -0700

Do people agree with the approach I outlined in my last reply?

On Mon, May 6, 2019 at 2:12 PM Stanislav Kozlovski <stanis...@confluent.io>
wrote:


> Hey there Kamal,
>
> I'm sincerely sorry for missing your earlier message. As I open this
> thread up, I see I have an unsent draft message about resuming discussion
> from some time ago.
>
> In retrospect, I think I may have been too pedantic with the exception
> naming and hierarchy.
> I now believe a single exception type of `RecordDeserializationException`
> is enough. Let's go with that.
>
> On Mon, May 6, 2019 at 6:40 AM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
>> Matthias,
>>
>> We already have CorruptRecordException which doesn't extend the
>> SerializationException. So, we need an alternate
>> name suggestion for the corrupted record error if we decide to extend the
>> FaultyRecordException class.
>>
>> Stanislav,
>>
>> Our users are also facing this error. Could we bump up this discussion?
>>
>> I think we can have a single exception type
>> FaultyRecordException/RecordDeserialization exception to capture both
>> the errors. We can add an additional enum field to differentiate the
>> errors
>> if required.
>>
>> Thanks,
>> Kamal Chandraprakash
>>
>> On Wed, Apr 24, 2019 at 1:49 PM Kamal Chandraprakash <
>> kamal.chandraprak...@gmail.com> wrote:
>>
>> > Stanislav,
>> >
>> > Any updates on this KIP? We have internal users who want to skip the
>> > corrupted message while consuming the records.
>> >
>> >
>> > On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax <matth...@confluent.io
>> >
>> > wrote:
>> >
>> >> I am not 100% familiar with the details of the consumer code, however I
>> >> tend to disagree with:
>> >>
>> >> > There's no difference between the two cases -- if (and only if) the
>> >> message is corrupt, it can't be deserialized.  If (and only if) it
>> can't be
>> >> deserialized, it is corrupt.
>> >>
>> >> Assume that a user configures a JSON deserializer but a faulty upstream
>> >> producer writes an Avro message. For this case, the message is not
>> >> corrupted, but still can't be deserialized. And I can imaging that
>> users
>> >> want to handle both cases differently.
>> >>
>> >> Thus, I think it makes sense to have two different exceptions
>> >> `RecordDeserializationException` and `CorruptedRecordException` that
>> can
>> >> both extend `FaultyRecordException` (don't like this name too much
>> >> honestly, but don't have a better idea for it anyway).
>> >>
>> >> Side remark. If we introduce class `RecordDeserializationException` and
>> >> `CorruptedRecordException`, we can also add an interface that both
>> >> implement to return partition/offset information and let both extend
>> >> `SerializationException` directly without an intermediate class in the
>> >> exception hierarchy.
>> >>
>> >>
>> >> -Matthias
>> >>
>> >> On 8/8/18 2:57 AM, Stanislav Kozlovski wrote:
>> >> >> If you are inheriting from SerializationException, your derived
>> class
>> >> > should also be a kind of serialization exception.  Not something more
>> >> > general.
>> >> > Yeah, the reason for inheriting it would be for
>> backwards-compatibility.
>> >> >
>> >> >> Hmm.  Can you think of any new scenarios that would make Kafka force
>> >> the
>> >> > user need to skip a specific record?  Perhaps one scenario is if
>> records
>> >> > are lost but we don't know how many.
>> >> > Not on the spot, but I do wonder how likely a new scenario is to
>> >> surface in
>> >> > the future and how we'd handle the exceptions' class hierarchy then.
>> >> >
>> >> >> Which offset were we planning to use in the
>> >> > exception?
>> >> > The offset of the record which caused the exception. In the case of
>> >> > batches, we use the last offset of the batch. In both cases, users
>> >> should
>> >> > have to seek +1 from the given offset. You can review the PR to
>> ensure
>> >> its
>> >> > accurate
>> >> >
>> >> >
>> >> > If both of you prefer `RecordDeserializationException`, we can go
>> with
>> >> > that. Please do confirm that is okay
>> >> >
>> >> > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <ja...@confluent.io>
>> >> wrote:
>> >> >
>> >> >> One difference between the two cases is that we can't generally
>> trust
>> >> the
>> >> >> offset of a corrupt message. Which offset were we planning to use in
>> >> the
>> >> >> exception? Maybe it should be either the fetch offset or one plus
>> the
>> >> last
>> >> >> consumed offset? I think I'm with Colin in preferring
>> >> >> RecordDeserializationException for both cases if possible. For one
>> >> thing,
>> >> >> that makes the behavior consistent whether or not `check.crs` is
>> >> enabled.
>> >> >>
>> >> >> -Jason
>> >> >>
>> >> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <cmcc...@apache.org>
>> >> wrote:
>> >> >>
>> >> >>> Hi Stanislav,
>> >> >>>
>> >> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote:
>> >> >>>> Hey Colin,
>> >> >>>>
>> >> >>>> It may be a bit vague but keep in mind we also raise the
>> exception in
>> >> >> the
>> >> >>>> case where the record is corrupted.
>> >> >>>> We discussed with Jason offline that message corruption most often
>> >> >>> prevents
>> >> >>>> deserialization itself and that may be enough of an argument to
>> raise
>> >> >>>> `RecordDeserializationException` in the case of a corrupt record.
>> I
>> >> >>>> personally find that misleading.
>> >> >>>
>> >> >>> Hmm.  I think that by definition, corrupt records are records that
>> >> can't
>> >> >>> be deserialized.  There's no difference between the two cases -- if
>> >> (and
>> >> >>> only if) the message is corrupt, it can't be deserialized.  If (and
>> >> only
>> >> >>> if) it can't be deserialized, it is corrupt.
>> >> >>>
>> >> >>>>
>> >> >>>> In the end, I think it might be worth it to have a bit of a
>> >> >>>> wider-encompassing `FaultyRecordException` (or even
>> >> >>>> `UnconsumableRecordException`) which would allow users to skip
>> over
>> >> the
>> >> >>>> record.
>> >> >>>
>> >> >>> If you are inheriting from SerializationException, your derived
>> class
>> >> >>> should also be a kind of serialization exception.  Not something
>> more
>> >> >>> general.
>> >> >>>
>> >> >>>> We could then potentially have more specific exceptions (e.g
>> >> >>>> deserialization) inherit that but I'm not sure if we should.
>> >> >>>> A case for a more general exception which provides access to the
>> >> >>>> partition/offset is future backwards-compatibility. If there is
>> ever
>> >> a
>> >> >>> new
>> >> >>>> scenario that would make the user need to skip a specific record -
>> >> >> there
>> >> >>>> would already be such an exception and this will provide some
>> >> >>>> backward-compatibility with older clients.
>> >> >>>
>> >> >>> Hmm.  Can you think of any new scenarios that would make Kafka
>> force
>> >> the
>> >> >>> user need to skip a specific record?  Perhaps one scenario is if
>> >> records
>> >> >>> are lost but we don't know how many.
>> >> >>>
>> >> >>> If we really want to have something more general, perhaps we need
>> >> >>> something like LostDataException.  But in that case, we can't
>> inherit
>> >> >> from
>> >> >>> SerializationException, since lost data is more general than
>> >> >> serialization
>> >> >>> issues.
>> >> >>>
>> >> >>> best,
>> >> >>> Colin
>> >> >>>
>> >> >>>
>> >> >>>>
>> >> >>>> Best,
>> >> >>>> Stanislav
>> >> >>>>
>> >> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe <cmcc...@apache.org>
>> >> >> wrote:
>> >> >>>>
>> >> >>>>> Hi Stanislav,
>> >> >>>>>
>> >> >>>>> Thanks for the KIP.
>> >> >>>>>
>> >> >>>>> As as user, "FaultyRecordException" seems a little vague.  What's
>> >> >>> faulty
>> >> >>>>> about it?  Is it too big?  Does it not match the schema of the
>> other
>> >> >>>>> records?  Was it not found?
>> >> >>>>>
>> >> >>>>> Of course, we know that the specific problem is with
>> [de]serializing
>> >> >>> the
>> >> >>>>> record, not with any of those those things.  So we should choose
>> a
>> >> >> name
>> >> >>>>> that reflects that serialization is the problem.  How about
>> >> >>>>> RecordSerializationException?
>> >> >>>>>
>> >> >>>>> best,
>> >> >>>>> Colin
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote:
>> >> >>>>>> Hi Jason and Ted,
>> >> >>>>>>
>> >> >>>>>> @Jason
>> >> >>>>>> I did not oppose the idea as I'm not too familiar with Java
>> >> >>> conventions.
>> >> >>>>> I
>> >> >>>>>> agree it is a non-ideal way for the user to catch the exception
>> so
>> >> >> I
>> >> >>>>>> changed it back.
>> >> >>>>>>
>> >> >>>>>> I've updated the KIP and PR with the latest changes. Now, there
>> is
>> >> >>> only
>> >> >>>>> one
>> >> >>>>>> public exception `FaultyRecordException` which is thrown to the
>> >> >> user
>> >> >>> in
>> >> >>>>>> both scenarios (corrupt record and deserialization error).
>> >> >>>>>> Please take a look again.
>> >> >>>>>>
>> >> >>>>>> Best,
>> >> >>>>>> Stanislav
>> >> >>>>>>
>> >> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson <
>> ja...@confluent.io
>> >> >>>
>> >> >>>>> wrote:
>> >> >>>>>>
>> >> >>>>>>> Hey Stanislav,
>> >> >>>>>>>
>> >> >>>>>>> I should have noticed it earlier from your example, but I just
>> >> >>> realized
>> >> >>>>>>> that interfaces don't mix well with exceptions. There is no way
>> >> >> to
>> >> >>>>> catch
>> >> >>>>>>> the interface type, which means you have to depend on
>> instanceof
>> >> >>>>> checks,
>> >> >>>>>>> which is not very conventional. At the moment, we raise
>> >> >>>>>>> SerializationException if there is a problem parsing the error,
>> >> >>> and we
>> >> >>>>>>> raise KafkaException if the record fails its CRC check. Since
>> >> >>>>>>> SerializationException extends KafkaExeption, it seems like we
>> >> >> can
>> >> >>>>> handle
>> >> >>>>>>> both cases in a compatible way by handling both cases with a
>> >> >> single
>> >> >>>>> type
>> >> >>>>>>> that extends SerializationException. Maybe something like
>> >> >>>>>>> RecordDeserializationException?
>> >> >>>>>>>
>> >> >>>>>>> Thanks,
>> >> >>>>>>> Jason
>> >> >>>>>>>
>> >> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <yuzhih...@gmail.com>
>> >> >>> wrote:
>> >> >>>>>>>
>> >> >>>>>>>> +1
>> >> >>>>>>>> -------- Original message --------From: Stanislav Kozlovski <
>> >> >>>>>>>> stanis...@confluent.io> Date: 8/2/18  2:41 AM  (GMT-08:00)
>> To:
>> >> >>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include
>> >> >> partitions
>> >> >>> in
>> >> >>>>>>>> exceptions raised during consumer record
>> >> >>> deserialization/validation
>> >> >>>>>>>> Hey everybody,
>> >> >>>>>>>>
>> >> >>>>>>>> I'd like to start a vote thread for KIP-334 Include partitions
>> >> >> in
>> >> >>>>>>>> exceptions raised during consumer record
>> >> >>> deserialization/validation
>> >> >>>>>>>> <
>> >> >>>>>>>
>> >> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?
>> >> >>> pageId=87297793
>> >> >>>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> --
>> >> >>>>>>>> Best,
>> >> >>>>>>>> Stanislav
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> --
>> >> >>>>>> Best,
>> >> >>>>>> Stanislav
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>> Best,
>> >> >>>> Stanislav
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >>
>> >>
>>
>
>
> --
> Best,
> Stanislav
>


-- 
Best,
Stanislav

Re: [VOTE] KIP-334 Include partitions in exceptions raised during consumer record deserialization/validation

Reply via email to