Do people agree with the approach I outlined in my last reply? On Mon, May 6, 2019 at 2:12 PM Stanislav Kozlovski <stanis...@confluent.io> wrote:
> Hey there Kamal, > > I'm sincerely sorry for missing your earlier message. As I open this > thread up, I see I have an unsent draft message about resuming discussion > from some time ago. > > In retrospect, I think I may have been too pedantic with the exception > naming and hierarchy. > I now believe a single exception type of `RecordDeserializationException` > is enough. Let's go with that. > > On Mon, May 6, 2019 at 6:40 AM Kamal Chandraprakash < > kamal.chandraprak...@gmail.com> wrote: > >> Matthias, >> >> We already have CorruptRecordException which doesn't extend the >> SerializationException. So, we need an alternate >> name suggestion for the corrupted record error if we decide to extend the >> FaultyRecordException class. >> >> Stanislav, >> >> Our users are also facing this error. Could we bump up this discussion? >> >> I think we can have a single exception type >> FaultyRecordException/RecordDeserialization exception to capture both >> the errors. We can add an additional enum field to differentiate the >> errors >> if required. >> >> Thanks, >> Kamal Chandraprakash >> >> On Wed, Apr 24, 2019 at 1:49 PM Kamal Chandraprakash < >> kamal.chandraprak...@gmail.com> wrote: >> >> > Stanislav, >> > >> > Any updates on this KIP? We have internal users who want to skip the >> > corrupted message while consuming the records. >> > >> > >> > On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax <matth...@confluent.io >> > >> > wrote: >> > >> >> I am not 100% familiar with the details of the consumer code, however I >> >> tend to disagree with: >> >> >> >> > There's no difference between the two cases -- if (and only if) the >> >> message is corrupt, it can't be deserialized. If (and only if) it >> can't be >> >> deserialized, it is corrupt. >> >> >> >> Assume that a user configures a JSON deserializer but a faulty upstream >> >> producer writes an Avro message. For this case, the message is not >> >> corrupted, but still can't be deserialized. And I can imaging that >> users >> >> want to handle both cases differently. >> >> >> >> Thus, I think it makes sense to have two different exceptions >> >> `RecordDeserializationException` and `CorruptedRecordException` that >> can >> >> both extend `FaultyRecordException` (don't like this name too much >> >> honestly, but don't have a better idea for it anyway). >> >> >> >> Side remark. If we introduce class `RecordDeserializationException` and >> >> `CorruptedRecordException`, we can also add an interface that both >> >> implement to return partition/offset information and let both extend >> >> `SerializationException` directly without an intermediate class in the >> >> exception hierarchy. >> >> >> >> >> >> -Matthias >> >> >> >> On 8/8/18 2:57 AM, Stanislav Kozlovski wrote: >> >> >> If you are inheriting from SerializationException, your derived >> class >> >> > should also be a kind of serialization exception. Not something more >> >> > general. >> >> > Yeah, the reason for inheriting it would be for >> backwards-compatibility. >> >> > >> >> >> Hmm. Can you think of any new scenarios that would make Kafka force >> >> the >> >> > user need to skip a specific record? Perhaps one scenario is if >> records >> >> > are lost but we don't know how many. >> >> > Not on the spot, but I do wonder how likely a new scenario is to >> >> surface in >> >> > the future and how we'd handle the exceptions' class hierarchy then. >> >> > >> >> >> Which offset were we planning to use in the >> >> > exception? >> >> > The offset of the record which caused the exception. In the case of >> >> > batches, we use the last offset of the batch. In both cases, users >> >> should >> >> > have to seek +1 from the given offset. You can review the PR to >> ensure >> >> its >> >> > accurate >> >> > >> >> > >> >> > If both of you prefer `RecordDeserializationException`, we can go >> with >> >> > that. Please do confirm that is okay >> >> > >> >> > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <ja...@confluent.io> >> >> wrote: >> >> > >> >> >> One difference between the two cases is that we can't generally >> trust >> >> the >> >> >> offset of a corrupt message. Which offset were we planning to use in >> >> the >> >> >> exception? Maybe it should be either the fetch offset or one plus >> the >> >> last >> >> >> consumed offset? I think I'm with Colin in preferring >> >> >> RecordDeserializationException for both cases if possible. For one >> >> thing, >> >> >> that makes the behavior consistent whether or not `check.crs` is >> >> enabled. >> >> >> >> >> >> -Jason >> >> >> >> >> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <cmcc...@apache.org> >> >> wrote: >> >> >> >> >> >>> Hi Stanislav, >> >> >>> >> >> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote: >> >> >>>> Hey Colin, >> >> >>>> >> >> >>>> It may be a bit vague but keep in mind we also raise the >> exception in >> >> >> the >> >> >>>> case where the record is corrupted. >> >> >>>> We discussed with Jason offline that message corruption most often >> >> >>> prevents >> >> >>>> deserialization itself and that may be enough of an argument to >> raise >> >> >>>> `RecordDeserializationException` in the case of a corrupt record. >> I >> >> >>>> personally find that misleading. >> >> >>> >> >> >>> Hmm. I think that by definition, corrupt records are records that >> >> can't >> >> >>> be deserialized. There's no difference between the two cases -- if >> >> (and >> >> >>> only if) the message is corrupt, it can't be deserialized. If (and >> >> only >> >> >>> if) it can't be deserialized, it is corrupt. >> >> >>> >> >> >>>> >> >> >>>> In the end, I think it might be worth it to have a bit of a >> >> >>>> wider-encompassing `FaultyRecordException` (or even >> >> >>>> `UnconsumableRecordException`) which would allow users to skip >> over >> >> the >> >> >>>> record. >> >> >>> >> >> >>> If you are inheriting from SerializationException, your derived >> class >> >> >>> should also be a kind of serialization exception. Not something >> more >> >> >>> general. >> >> >>> >> >> >>>> We could then potentially have more specific exceptions (e.g >> >> >>>> deserialization) inherit that but I'm not sure if we should. >> >> >>>> A case for a more general exception which provides access to the >> >> >>>> partition/offset is future backwards-compatibility. If there is >> ever >> >> a >> >> >>> new >> >> >>>> scenario that would make the user need to skip a specific record - >> >> >> there >> >> >>>> would already be such an exception and this will provide some >> >> >>>> backward-compatibility with older clients. >> >> >>> >> >> >>> Hmm. Can you think of any new scenarios that would make Kafka >> force >> >> the >> >> >>> user need to skip a specific record? Perhaps one scenario is if >> >> records >> >> >>> are lost but we don't know how many. >> >> >>> >> >> >>> If we really want to have something more general, perhaps we need >> >> >>> something like LostDataException. But in that case, we can't >> inherit >> >> >> from >> >> >>> SerializationException, since lost data is more general than >> >> >> serialization >> >> >>> issues. >> >> >>> >> >> >>> best, >> >> >>> Colin >> >> >>> >> >> >>> >> >> >>>> >> >> >>>> Best, >> >> >>>> Stanislav >> >> >>>> >> >> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe <cmcc...@apache.org> >> >> >> wrote: >> >> >>>> >> >> >>>>> Hi Stanislav, >> >> >>>>> >> >> >>>>> Thanks for the KIP. >> >> >>>>> >> >> >>>>> As as user, "FaultyRecordException" seems a little vague. What's >> >> >>> faulty >> >> >>>>> about it? Is it too big? Does it not match the schema of the >> other >> >> >>>>> records? Was it not found? >> >> >>>>> >> >> >>>>> Of course, we know that the specific problem is with >> [de]serializing >> >> >>> the >> >> >>>>> record, not with any of those those things. So we should choose >> a >> >> >> name >> >> >>>>> that reflects that serialization is the problem. How about >> >> >>>>> RecordSerializationException? >> >> >>>>> >> >> >>>>> best, >> >> >>>>> Colin >> >> >>>>> >> >> >>>>> >> >> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote: >> >> >>>>>> Hi Jason and Ted, >> >> >>>>>> >> >> >>>>>> @Jason >> >> >>>>>> I did not oppose the idea as I'm not too familiar with Java >> >> >>> conventions. >> >> >>>>> I >> >> >>>>>> agree it is a non-ideal way for the user to catch the exception >> so >> >> >> I >> >> >>>>>> changed it back. >> >> >>>>>> >> >> >>>>>> I've updated the KIP and PR with the latest changes. Now, there >> is >> >> >>> only >> >> >>>>> one >> >> >>>>>> public exception `FaultyRecordException` which is thrown to the >> >> >> user >> >> >>> in >> >> >>>>>> both scenarios (corrupt record and deserialization error). >> >> >>>>>> Please take a look again. >> >> >>>>>> >> >> >>>>>> Best, >> >> >>>>>> Stanislav >> >> >>>>>> >> >> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson < >> ja...@confluent.io >> >> >>> >> >> >>>>> wrote: >> >> >>>>>> >> >> >>>>>>> Hey Stanislav, >> >> >>>>>>> >> >> >>>>>>> I should have noticed it earlier from your example, but I just >> >> >>> realized >> >> >>>>>>> that interfaces don't mix well with exceptions. There is no way >> >> >> to >> >> >>>>> catch >> >> >>>>>>> the interface type, which means you have to depend on >> instanceof >> >> >>>>> checks, >> >> >>>>>>> which is not very conventional. At the moment, we raise >> >> >>>>>>> SerializationException if there is a problem parsing the error, >> >> >>> and we >> >> >>>>>>> raise KafkaException if the record fails its CRC check. Since >> >> >>>>>>> SerializationException extends KafkaExeption, it seems like we >> >> >> can >> >> >>>>> handle >> >> >>>>>>> both cases in a compatible way by handling both cases with a >> >> >> single >> >> >>>>> type >> >> >>>>>>> that extends SerializationException. Maybe something like >> >> >>>>>>> RecordDeserializationException? >> >> >>>>>>> >> >> >>>>>>> Thanks, >> >> >>>>>>> Jason >> >> >>>>>>> >> >> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <yuzhih...@gmail.com> >> >> >>> wrote: >> >> >>>>>>> >> >> >>>>>>>> +1 >> >> >>>>>>>> -------- Original message --------From: Stanislav Kozlovski < >> >> >>>>>>>> stanis...@confluent.io> Date: 8/2/18 2:41 AM (GMT-08:00) >> To: >> >> >>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include >> >> >> partitions >> >> >>> in >> >> >>>>>>>> exceptions raised during consumer record >> >> >>> deserialization/validation >> >> >>>>>>>> Hey everybody, >> >> >>>>>>>> >> >> >>>>>>>> I'd like to start a vote thread for KIP-334 Include partitions >> >> >> in >> >> >>>>>>>> exceptions raised during consumer record >> >> >>> deserialization/validation >> >> >>>>>>>> < >> >> >>>>>>> >> >> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action? >> >> >>> pageId=87297793 >> >> >>>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> -- >> >> >>>>>>>> Best, >> >> >>>>>>>> Stanislav >> >> >>>>>>>> >> >> >>>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> -- >> >> >>>>>> Best, >> >> >>>>>> Stanislav >> >> >>>>> >> >> >>>> >> >> >>>> >> >> >>>> -- >> >> >>>> Best, >> >> >>>> Stanislav >> >> >>> >> >> >> >> >> > >> >> > >> >> >> >> >> > > > -- > Best, > Stanislav > -- Best, Stanislav