Hi Stanislav,

Sorry for the late comment. I'm happy with the current proposal. Just one
small request is to include an example which shows how a user could use
this to skip over a record.

I'd suggest pushing this to a vote to see if anyone else has feedback.

Thanks,
Jason

On Sat, Jul 13, 2019 at 2:27 PM Stanislav Kozlovski <stanis...@confluent.io>
wrote:

> Hello,
>
> Could we restart the discussion here again?
>
> My last message was sent on the 3rd of June but I haven't received replies
> since then.
>
> I'd like to get this KIP to a finished state, regardless of whether that is
> merged-in or discarded. It has been almost one year since the publication
> of the KIP.
>
> Thanks,
> Stanislav
>
> On Mon, Jun 3, 2019 at 11:19 AM Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> > Do people agree with the approach I outlined in my last reply?
> >
> > On Mon, May 6, 2019 at 2:12 PM Stanislav Kozlovski <
> stanis...@confluent.io>
> > wrote:
> >
> >> Hey there Kamal,
> >>
> >> I'm sincerely sorry for missing your earlier message. As I open this
> >> thread up, I see I have an unsent draft message about resuming
> discussion
> >> from some time ago.
> >>
> >> In retrospect, I think I may have been too pedantic with the exception
> >> naming and hierarchy.
> >> I now believe a single exception type of
> `RecordDeserializationException`
> >> is enough. Let's go with that.
> >>
> >> On Mon, May 6, 2019 at 6:40 AM Kamal Chandraprakash <
> >> kamal.chandraprak...@gmail.com> wrote:
> >>
> >>> Matthias,
> >>>
> >>> We already have CorruptRecordException which doesn't extend the
> >>> SerializationException. So, we need an alternate
> >>> name suggestion for the corrupted record error if we decide to extend
> the
> >>> FaultyRecordException class.
> >>>
> >>> Stanislav,
> >>>
> >>> Our users are also facing this error. Could we bump up this discussion?
> >>>
> >>> I think we can have a single exception type
> >>> FaultyRecordException/RecordDeserialization exception to capture both
> >>> the errors. We can add an additional enum field to differentiate the
> >>> errors
> >>> if required.
> >>>
> >>> Thanks,
> >>> Kamal Chandraprakash
> >>>
> >>> On Wed, Apr 24, 2019 at 1:49 PM Kamal Chandraprakash <
> >>> kamal.chandraprak...@gmail.com> wrote:
> >>>
> >>> > Stanislav,
> >>> >
> >>> > Any updates on this KIP? We have internal users who want to skip the
> >>> > corrupted message while consuming the records.
> >>> >
> >>> >
> >>> > On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax <
> >>> matth...@confluent.io>
> >>> > wrote:
> >>> >
> >>> >> I am not 100% familiar with the details of the consumer code,
> however
> >>> I
> >>> >> tend to disagree with:
> >>> >>
> >>> >> > There's no difference between the two cases -- if (and only if)
> the
> >>> >> message is corrupt, it can't be deserialized.  If (and only if) it
> >>> can't be
> >>> >> deserialized, it is corrupt.
> >>> >>
> >>> >> Assume that a user configures a JSON deserializer but a faulty
> >>> upstream
> >>> >> producer writes an Avro message. For this case, the message is not
> >>> >> corrupted, but still can't be deserialized. And I can imaging that
> >>> users
> >>> >> want to handle both cases differently.
> >>> >>
> >>> >> Thus, I think it makes sense to have two different exceptions
> >>> >> `RecordDeserializationException` and `CorruptedRecordException` that
> >>> can
> >>> >> both extend `FaultyRecordException` (don't like this name too much
> >>> >> honestly, but don't have a better idea for it anyway).
> >>> >>
> >>> >> Side remark. If we introduce class `RecordDeserializationException`
> >>> and
> >>> >> `CorruptedRecordException`, we can also add an interface that both
> >>> >> implement to return partition/offset information and let both extend
> >>> >> `SerializationException` directly without an intermediate class in
> the
> >>> >> exception hierarchy.
> >>> >>
> >>> >>
> >>> >> -Matthias
> >>> >>
> >>> >> On 8/8/18 2:57 AM, Stanislav Kozlovski wrote:
> >>> >> >> If you are inheriting from SerializationException, your derived
> >>> class
> >>> >> > should also be a kind of serialization exception.  Not something
> >>> more
> >>> >> > general.
> >>> >> > Yeah, the reason for inheriting it would be for
> >>> backwards-compatibility.
> >>> >> >
> >>> >> >> Hmm.  Can you think of any new scenarios that would make Kafka
> >>> force
> >>> >> the
> >>> >> > user need to skip a specific record?  Perhaps one scenario is if
> >>> records
> >>> >> > are lost but we don't know how many.
> >>> >> > Not on the spot, but I do wonder how likely a new scenario is to
> >>> >> surface in
> >>> >> > the future and how we'd handle the exceptions' class hierarchy
> then.
> >>> >> >
> >>> >> >> Which offset were we planning to use in the
> >>> >> > exception?
> >>> >> > The offset of the record which caused the exception. In the case
> of
> >>> >> > batches, we use the last offset of the batch. In both cases, users
> >>> >> should
> >>> >> > have to seek +1 from the given offset. You can review the PR to
> >>> ensure
> >>> >> its
> >>> >> > accurate
> >>> >> >
> >>> >> >
> >>> >> > If both of you prefer `RecordDeserializationException`, we can go
> >>> with
> >>> >> > that. Please do confirm that is okay
> >>> >> >
> >>> >> > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <
> ja...@confluent.io
> >>> >
> >>> >> wrote:
> >>> >> >
> >>> >> >> One difference between the two cases is that we can't generally
> >>> trust
> >>> >> the
> >>> >> >> offset of a corrupt message. Which offset were we planning to use
> >>> in
> >>> >> the
> >>> >> >> exception? Maybe it should be either the fetch offset or one plus
> >>> the
> >>> >> last
> >>> >> >> consumed offset? I think I'm with Colin in preferring
> >>> >> >> RecordDeserializationException for both cases if possible. For
> one
> >>> >> thing,
> >>> >> >> that makes the behavior consistent whether or not `check.crs` is
> >>> >> enabled.
> >>> >> >>
> >>> >> >> -Jason
> >>> >> >>
> >>> >> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <
> cmcc...@apache.org>
> >>> >> wrote:
> >>> >> >>
> >>> >> >>> Hi Stanislav,
> >>> >> >>>
> >>> >> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote:
> >>> >> >>>> Hey Colin,
> >>> >> >>>>
> >>> >> >>>> It may be a bit vague but keep in mind we also raise the
> >>> exception in
> >>> >> >> the
> >>> >> >>>> case where the record is corrupted.
> >>> >> >>>> We discussed with Jason offline that message corruption most
> >>> often
> >>> >> >>> prevents
> >>> >> >>>> deserialization itself and that may be enough of an argument to
> >>> raise
> >>> >> >>>> `RecordDeserializationException` in the case of a corrupt
> >>> record. I
> >>> >> >>>> personally find that misleading.
> >>> >> >>>
> >>> >> >>> Hmm.  I think that by definition, corrupt records are records
> that
> >>> >> can't
> >>> >> >>> be deserialized.  There's no difference between the two cases --
> >>> if
> >>> >> (and
> >>> >> >>> only if) the message is corrupt, it can't be deserialized.  If
> >>> (and
> >>> >> only
> >>> >> >>> if) it can't be deserialized, it is corrupt.
> >>> >> >>>
> >>> >> >>>>
> >>> >> >>>> In the end, I think it might be worth it to have a bit of a
> >>> >> >>>> wider-encompassing `FaultyRecordException` (or even
> >>> >> >>>> `UnconsumableRecordException`) which would allow users to skip
> >>> over
> >>> >> the
> >>> >> >>>> record.
> >>> >> >>>
> >>> >> >>> If you are inheriting from SerializationException, your derived
> >>> class
> >>> >> >>> should also be a kind of serialization exception.  Not something
> >>> more
> >>> >> >>> general.
> >>> >> >>>
> >>> >> >>>> We could then potentially have more specific exceptions (e.g
> >>> >> >>>> deserialization) inherit that but I'm not sure if we should.
> >>> >> >>>> A case for a more general exception which provides access to
> the
> >>> >> >>>> partition/offset is future backwards-compatibility. If there is
> >>> ever
> >>> >> a
> >>> >> >>> new
> >>> >> >>>> scenario that would make the user need to skip a specific
> record
> >>> -
> >>> >> >> there
> >>> >> >>>> would already be such an exception and this will provide some
> >>> >> >>>> backward-compatibility with older clients.
> >>> >> >>>
> >>> >> >>> Hmm.  Can you think of any new scenarios that would make Kafka
> >>> force
> >>> >> the
> >>> >> >>> user need to skip a specific record?  Perhaps one scenario is if
> >>> >> records
> >>> >> >>> are lost but we don't know how many.
> >>> >> >>>
> >>> >> >>> If we really want to have something more general, perhaps we
> need
> >>> >> >>> something like LostDataException.  But in that case, we can't
> >>> inherit
> >>> >> >> from
> >>> >> >>> SerializationException, since lost data is more general than
> >>> >> >> serialization
> >>> >> >>> issues.
> >>> >> >>>
> >>> >> >>> best,
> >>> >> >>> Colin
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>>
> >>> >> >>>> Best,
> >>> >> >>>> Stanislav
> >>> >> >>>>
> >>> >> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe <
> cmcc...@apache.org
> >>> >
> >>> >> >> wrote:
> >>> >> >>>>
> >>> >> >>>>> Hi Stanislav,
> >>> >> >>>>>
> >>> >> >>>>> Thanks for the KIP.
> >>> >> >>>>>
> >>> >> >>>>> As as user, "FaultyRecordException" seems a little vague.
> >>> What's
> >>> >> >>> faulty
> >>> >> >>>>> about it?  Is it too big?  Does it not match the schema of the
> >>> other
> >>> >> >>>>> records?  Was it not found?
> >>> >> >>>>>
> >>> >> >>>>> Of course, we know that the specific problem is with
> >>> [de]serializing
> >>> >> >>> the
> >>> >> >>>>> record, not with any of those those things.  So we should
> >>> choose a
> >>> >> >> name
> >>> >> >>>>> that reflects that serialization is the problem.  How about
> >>> >> >>>>> RecordSerializationException?
> >>> >> >>>>>
> >>> >> >>>>> best,
> >>> >> >>>>> Colin
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote:
> >>> >> >>>>>> Hi Jason and Ted,
> >>> >> >>>>>>
> >>> >> >>>>>> @Jason
> >>> >> >>>>>> I did not oppose the idea as I'm not too familiar with Java
> >>> >> >>> conventions.
> >>> >> >>>>> I
> >>> >> >>>>>> agree it is a non-ideal way for the user to catch the
> >>> exception so
> >>> >> >> I
> >>> >> >>>>>> changed it back.
> >>> >> >>>>>>
> >>> >> >>>>>> I've updated the KIP and PR with the latest changes. Now,
> >>> there is
> >>> >> >>> only
> >>> >> >>>>> one
> >>> >> >>>>>> public exception `FaultyRecordException` which is thrown to
> the
> >>> >> >> user
> >>> >> >>> in
> >>> >> >>>>>> both scenarios (corrupt record and deserialization error).
> >>> >> >>>>>> Please take a look again.
> >>> >> >>>>>>
> >>> >> >>>>>> Best,
> >>> >> >>>>>> Stanislav
> >>> >> >>>>>>
> >>> >> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson <
> >>> ja...@confluent.io
> >>> >> >>>
> >>> >> >>>>> wrote:
> >>> >> >>>>>>
> >>> >> >>>>>>> Hey Stanislav,
> >>> >> >>>>>>>
> >>> >> >>>>>>> I should have noticed it earlier from your example, but I
> just
> >>> >> >>> realized
> >>> >> >>>>>>> that interfaces don't mix well with exceptions. There is no
> >>> way
> >>> >> >> to
> >>> >> >>>>> catch
> >>> >> >>>>>>> the interface type, which means you have to depend on
> >>> instanceof
> >>> >> >>>>> checks,
> >>> >> >>>>>>> which is not very conventional. At the moment, we raise
> >>> >> >>>>>>> SerializationException if there is a problem parsing the
> >>> error,
> >>> >> >>> and we
> >>> >> >>>>>>> raise KafkaException if the record fails its CRC check.
> Since
> >>> >> >>>>>>> SerializationException extends KafkaExeption, it seems like
> we
> >>> >> >> can
> >>> >> >>>>> handle
> >>> >> >>>>>>> both cases in a compatible way by handling both cases with a
> >>> >> >> single
> >>> >> >>>>> type
> >>> >> >>>>>>> that extends SerializationException. Maybe something like
> >>> >> >>>>>>> RecordDeserializationException?
> >>> >> >>>>>>>
> >>> >> >>>>>>> Thanks,
> >>> >> >>>>>>> Jason
> >>> >> >>>>>>>
> >>> >> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <yuzhih...@gmail.com
> >
> >>> >> >>> wrote:
> >>> >> >>>>>>>
> >>> >> >>>>>>>> +1
> >>> >> >>>>>>>> -------- Original message --------From: Stanislav
> Kozlovski <
> >>> >> >>>>>>>> stanis...@confluent.io> Date: 8/2/18  2:41 AM  (GMT-08:00)
> >>> To:
> >>> >> >>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include
> >>> >> >> partitions
> >>> >> >>> in
> >>> >> >>>>>>>> exceptions raised during consumer record
> >>> >> >>> deserialization/validation
> >>> >> >>>>>>>> Hey everybody,
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> I'd like to start a vote thread for KIP-334 Include
> >>> partitions
> >>> >> >> in
> >>> >> >>>>>>>> exceptions raised during consumer record
> >>> >> >>> deserialization/validation
> >>> >> >>>>>>>> <
> >>> >> >>>>>>>
> >>> >> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?
> >>> >> >>> pageId=87297793
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> --
> >>> >> >>>>>>>> Best,
> >>> >> >>>>>>>> Stanislav
> >>> >> >>>>>>>>
> >>> >> >>>>>>>
> >>> >> >>>>>>
> >>> >> >>>>>>
> >>> >> >>>>>> --
> >>> >> >>>>>> Best,
> >>> >> >>>>>> Stanislav
> >>> >> >>>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> --
> >>> >> >>>> Best,
> >>> >> >>>> Stanislav
> >>> >> >>>
> >>> >> >>
> >>> >> >
> >>> >> >
> >>> >>
> >>> >>
> >>>
> >>
> >>
> >> --
> >> Best,
> >> Stanislav
> >>
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav
>

Reply via email to