Re: [VOTE] KIP-334 Include partitions in exceptions raised during consumer record deserialization/validation

Matthias J. Sax Fri, 19 Oct 2018 11:04:37 -0700

I am not 100% familiar with the details of the consumer code, however I
tend to disagree with:


> There's no difference between the two cases -- if (and only if) the message 
> is corrupt, it can't be deserialized.  If (and only if) it can't be 
> deserialized, it is corrupt.

Assume that a user configures a JSON deserializer but a faulty upstream
producer writes an Avro message. For this case, the message is not
corrupted, but still can't be deserialized. And I can imaging that users
want to handle both cases differently.

Thus, I think it makes sense to have two different exceptions
`RecordDeserializationException` and `CorruptedRecordException` that can
both extend `FaultyRecordException` (don't like this name too much
honestly, but don't have a better idea for it anyway).

Side remark. If we introduce class `RecordDeserializationException` and
`CorruptedRecordException`, we can also add an interface that both
implement to return partition/offset information and let both extend
`SerializationException` directly without an intermediate class in the
exception hierarchy.


-Matthias

On 8/8/18 2:57 AM, Stanislav Kozlovski wrote:
>> If you are inheriting from SerializationException, your derived class
> should also be a kind of serialization exception.  Not something more
> general.
> Yeah, the reason for inheriting it would be for backwards-compatibility.
> 
>> Hmm.  Can you think of any new scenarios that would make Kafka force the
> user need to skip a specific record?  Perhaps one scenario is if records
> are lost but we don't know how many.
> Not on the spot, but I do wonder how likely a new scenario is to surface in
> the future and how we'd handle the exceptions' class hierarchy then.
> 
>> Which offset were we planning to use in the
> exception?
> The offset of the record which caused the exception. In the case of
> batches, we use the last offset of the batch. In both cases, users should
> have to seek +1 from the given offset. You can review the PR to ensure its
> accurate
> 
> 
> If both of you prefer `RecordDeserializationException`, we can go with
> that. Please do confirm that is okay
> 
> On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <ja...@confluent.io> wrote:
> 
>> One difference between the two cases is that we can't generally trust the
>> offset of a corrupt message. Which offset were we planning to use in the
>> exception? Maybe it should be either the fetch offset or one plus the last
>> consumed offset? I think I'm with Colin in preferring
>> RecordDeserializationException for both cases if possible. For one thing,
>> that makes the behavior consistent whether or not `check.crs` is enabled.
>>
>> -Jason
>>
>> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <cmcc...@apache.org> wrote:
>>
>>> Hi Stanislav,
>>>
>>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote:
>>>> Hey Colin,
>>>>
>>>> It may be a bit vague but keep in mind we also raise the exception in
>> the
>>>> case where the record is corrupted.
>>>> We discussed with Jason offline that message corruption most often
>>> prevents
>>>> deserialization itself and that may be enough of an argument to raise
>>>> `RecordDeserializationException` in the case of a corrupt record. I
>>>> personally find that misleading.
>>>
>>> Hmm.  I think that by definition, corrupt records are records that can't
>>> be deserialized.  There's no difference between the two cases -- if (and
>>> only if) the message is corrupt, it can't be deserialized.  If (and only
>>> if) it can't be deserialized, it is corrupt.
>>>
>>>>
>>>> In the end, I think it might be worth it to have a bit of a
>>>> wider-encompassing `FaultyRecordException` (or even
>>>> `UnconsumableRecordException`) which would allow users to skip over the
>>>> record.
>>>
>>> If you are inheriting from SerializationException, your derived class
>>> should also be a kind of serialization exception.  Not something more
>>> general.
>>>
>>>> We could then potentially have more specific exceptions (e.g
>>>> deserialization) inherit that but I'm not sure if we should.
>>>> A case for a more general exception which provides access to the
>>>> partition/offset is future backwards-compatibility. If there is ever a
>>> new
>>>> scenario that would make the user need to skip a specific record -
>> there
>>>> would already be such an exception and this will provide some
>>>> backward-compatibility with older clients.
>>>
>>> Hmm.  Can you think of any new scenarios that would make Kafka force the
>>> user need to skip a specific record?  Perhaps one scenario is if records
>>> are lost but we don't know how many.
>>>
>>> If we really want to have something more general, perhaps we need
>>> something like LostDataException.  But in that case, we can't inherit
>> from
>>> SerializationException, since lost data is more general than
>> serialization
>>> issues.
>>>
>>> best,
>>> Colin
>>>
>>>
>>>>
>>>> Best,
>>>> Stanislav
>>>>
>>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe <cmcc...@apache.org>
>> wrote:
>>>>
>>>>> Hi Stanislav,
>>>>>
>>>>> Thanks for the KIP.
>>>>>
>>>>> As as user, "FaultyRecordException" seems a little vague.  What's
>>> faulty
>>>>> about it?  Is it too big?  Does it not match the schema of the other
>>>>> records?  Was it not found?
>>>>>
>>>>> Of course, we know that the specific problem is with [de]serializing
>>> the
>>>>> record, not with any of those those things.  So we should choose a
>> name
>>>>> that reflects that serialization is the problem.  How about
>>>>> RecordSerializationException?
>>>>>
>>>>> best,
>>>>> Colin
>>>>>
>>>>>
>>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote:
>>>>>> Hi Jason and Ted,
>>>>>>
>>>>>> @Jason
>>>>>> I did not oppose the idea as I'm not too familiar with Java
>>> conventions.
>>>>> I
>>>>>> agree it is a non-ideal way for the user to catch the exception so
>> I
>>>>>> changed it back.
>>>>>>
>>>>>> I've updated the KIP and PR with the latest changes. Now, there is
>>> only
>>>>> one
>>>>>> public exception `FaultyRecordException` which is thrown to the
>> user
>>> in
>>>>>> both scenarios (corrupt record and deserialization error).
>>>>>> Please take a look again.
>>>>>>
>>>>>> Best,
>>>>>> Stanislav
>>>>>>
>>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io
>>>
>>>>> wrote:
>>>>>>
>>>>>>> Hey Stanislav,
>>>>>>>
>>>>>>> I should have noticed it earlier from your example, but I just
>>> realized
>>>>>>> that interfaces don't mix well with exceptions. There is no way
>> to
>>>>> catch
>>>>>>> the interface type, which means you have to depend on instanceof
>>>>> checks,
>>>>>>> which is not very conventional. At the moment, we raise
>>>>>>> SerializationException if there is a problem parsing the error,
>>> and we
>>>>>>> raise KafkaException if the record fails its CRC check. Since
>>>>>>> SerializationException extends KafkaExeption, it seems like we
>> can
>>>>> handle
>>>>>>> both cases in a compatible way by handling both cases with a
>> single
>>>>> type
>>>>>>> that extends SerializationException. Maybe something like
>>>>>>> RecordDeserializationException?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jason
>>>>>>>
>>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <yuzhih...@gmail.com>
>>> wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>> -------- Original message --------From: Stanislav Kozlovski <
>>>>>>>> stanis...@confluent.io> Date: 8/2/18  2:41 AM  (GMT-08:00) To:
>>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include
>> partitions
>>> in
>>>>>>>> exceptions raised during consumer record
>>> deserialization/validation
>>>>>>>> Hey everybody,
>>>>>>>>
>>>>>>>> I'd like to start a vote thread for KIP-334 Include partitions
>> in
>>>>>>>> exceptions raised during consumer record
>>> deserialization/validation
>>>>>>>> <
>>>>>>>
>>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?
>>> pageId=87297793
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best,
>>>>>>>> Stanislav
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best,
>>>>>> Stanislav
>>>>>
>>>>
>>>>
>>>> --
>>>> Best,
>>>> Stanislav
>>>
>>
> 
>

signature.asc
Description: OpenPGP digital signature

Re: [VOTE] KIP-334 Include partitions in exceptions raised during consumer record deserialization/validation

Reply via email to