Re: [VOTE] KIP-334 Include partitions in exceptions raised during consumer record deserialization/validation

Stanislav Kozlovski Wed, 08 Aug 2018 02:57:52 -0700

> If you are inheriting from SerializationException, your derived class
should also be a kind of serialization exception.  Not something more
general.
Yeah, the reason for inheriting it would be for backwards-compatibility.


> Hmm.  Can you think of any new scenarios that would make Kafka force the
user need to skip a specific record?  Perhaps one scenario is if records
are lost but we don't know how many.
Not on the spot, but I do wonder how likely a new scenario is to surface in
the future and how we'd handle the exceptions' class hierarchy then.

> Which offset were we planning to use in the
exception?
The offset of the record which caused the exception. In the case of
batches, we use the last offset of the batch. In both cases, users should
have to seek +1 from the given offset. You can review the PR to ensure its
accurate


If both of you prefer `RecordDeserializationException`, we can go with
that. Please do confirm that is okay

On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <ja...@confluent.io> wrote:

> One difference between the two cases is that we can't generally trust the
> offset of a corrupt message. Which offset were we planning to use in the
> exception? Maybe it should be either the fetch offset or one plus the last
> consumed offset? I think I'm with Colin in preferring
> RecordDeserializationException for both cases if possible. For one thing,
> that makes the behavior consistent whether or not `check.crs` is enabled.
>
> -Jason
>
> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <cmcc...@apache.org> wrote:
>
> > Hi Stanislav,
> >
> > On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote:
> > > Hey Colin,
> > >
> > > It may be a bit vague but keep in mind we also raise the exception in
> the
> > > case where the record is corrupted.
> > > We discussed with Jason offline that message corruption most often
> > prevents
> > > deserialization itself and that may be enough of an argument to raise
> > > `RecordDeserializationException` in the case of a corrupt record. I
> > > personally find that misleading.
> >
> > Hmm.  I think that by definition, corrupt records are records that can't
> > be deserialized.  There's no difference between the two cases -- if (and
> > only if) the message is corrupt, it can't be deserialized.  If (and only
> > if) it can't be deserialized, it is corrupt.
> >
> > >
> > > In the end, I think it might be worth it to have a bit of a
> > > wider-encompassing `FaultyRecordException` (or even
> > > `UnconsumableRecordException`) which would allow users to skip over the
> > > record.
> >
> > If you are inheriting from SerializationException, your derived class
> > should also be a kind of serialization exception.  Not something more
> > general.
> >
> > > We could then potentially have more specific exceptions (e.g
> > > deserialization) inherit that but I'm not sure if we should.
> > > A case for a more general exception which provides access to the
> > > partition/offset is future backwards-compatibility. If there is ever a
> > new
> > > scenario that would make the user need to skip a specific record -
> there
> > > would already be such an exception and this will provide some
> > > backward-compatibility with older clients.
> >
> > Hmm.  Can you think of any new scenarios that would make Kafka force the
> > user need to skip a specific record?  Perhaps one scenario is if records
> > are lost but we don't know how many.
> >
> > If we really want to have something more general, perhaps we need
> > something like LostDataException.  But in that case, we can't inherit
> from
> > SerializationException, since lost data is more general than
> serialization
> > issues.
> >
> > best,
> > Colin
> >
> >
> > >
> > > Best,
> > > Stanislav
> > >
> > > On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe <cmcc...@apache.org>
> wrote:
> > >
> > > > Hi Stanislav,
> > > >
> > > > Thanks for the KIP.
> > > >
> > > > As as user, "FaultyRecordException" seems a little vague.  What's
> > faulty
> > > > about it?  Is it too big?  Does it not match the schema of the other
> > > > records?  Was it not found?
> > > >
> > > > Of course, we know that the specific problem is with [de]serializing
> > the
> > > > record, not with any of those those things.  So we should choose a
> name
> > > > that reflects that serialization is the problem.  How about
> > > > RecordSerializationException?
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote:
> > > > > Hi Jason and Ted,
> > > > >
> > > > > @Jason
> > > > > I did not oppose the idea as I'm not too familiar with Java
> > conventions.
> > > > I
> > > > > agree it is a non-ideal way for the user to catch the exception so
> I
> > > > > changed it back.
> > > > >
> > > > > I've updated the KIP and PR with the latest changes. Now, there is
> > only
> > > > one
> > > > > public exception `FaultyRecordException` which is thrown to the
> user
> > in
> > > > > both scenarios (corrupt record and deserialization error).
> > > > > Please take a look again.
> > > > >
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > > > On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io
> >
> > > > wrote:
> > > > >
> > > > > > Hey Stanislav,
> > > > > >
> > > > > > I should have noticed it earlier from your example, but I just
> > realized
> > > > > > that interfaces don't mix well with exceptions. There is no way
> to
> > > > catch
> > > > > > the interface type, which means you have to depend on instanceof
> > > > checks,
> > > > > > which is not very conventional. At the moment, we raise
> > > > > > SerializationException if there is a problem parsing the error,
> > and we
> > > > > > raise KafkaException if the record fails its CRC check. Since
> > > > > > SerializationException extends KafkaExeption, it seems like we
> can
> > > > handle
> > > > > > both cases in a compatible way by handling both cases with a
> single
> > > > type
> > > > > > that extends SerializationException. Maybe something like
> > > > > > RecordDeserializationException?
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <yuzhih...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > +1
> > > > > > > -------- Original message --------From: Stanislav Kozlovski <
> > > > > > > stanis...@confluent.io> Date: 8/2/18  2:41 AM  (GMT-08:00) To:
> > > > > > > dev@kafka.apache.org Subject: [VOTE] KIP-334 Include
> partitions
> > in
> > > > > > > exceptions raised during consumer record
> > deserialization/validation
> > > > > > > Hey everybody,
> > > > > > >
> > > > > > > I'd like to start a vote thread for KIP-334 Include partitions
> in
> > > > > > > exceptions raised during consumer record
> > deserialization/validation
> > > > > > > <
> > > > > >
> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?
> > pageId=87297793
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> >
>


-- 
Best,
Stanislav

Re: [VOTE] KIP-334 Include partitions in exceptions raised during consumer record deserialization/validation

Reply via email to