Hi Stanislav, Sorry for the late comment. I'm happy with the current proposal. Just one small request is to include an example which shows how a user could use this to skip over a record.
I'd suggest pushing this to a vote to see if anyone else has feedback. Thanks, Jason On Sat, Jul 13, 2019 at 2:27 PM Stanislav Kozlovski <stanis...@confluent.io> wrote: > Hello, > > Could we restart the discussion here again? > > My last message was sent on the 3rd of June but I haven't received replies > since then. > > I'd like to get this KIP to a finished state, regardless of whether that is > merged-in or discarded. It has been almost one year since the publication > of the KIP. > > Thanks, > Stanislav > > On Mon, Jun 3, 2019 at 11:19 AM Stanislav Kozlovski < > stanis...@confluent.io> > wrote: > > > Do people agree with the approach I outlined in my last reply? > > > > On Mon, May 6, 2019 at 2:12 PM Stanislav Kozlovski < > stanis...@confluent.io> > > wrote: > > > >> Hey there Kamal, > >> > >> I'm sincerely sorry for missing your earlier message. As I open this > >> thread up, I see I have an unsent draft message about resuming > discussion > >> from some time ago. > >> > >> In retrospect, I think I may have been too pedantic with the exception > >> naming and hierarchy. > >> I now believe a single exception type of > `RecordDeserializationException` > >> is enough. Let's go with that. > >> > >> On Mon, May 6, 2019 at 6:40 AM Kamal Chandraprakash < > >> kamal.chandraprak...@gmail.com> wrote: > >> > >>> Matthias, > >>> > >>> We already have CorruptRecordException which doesn't extend the > >>> SerializationException. So, we need an alternate > >>> name suggestion for the corrupted record error if we decide to extend > the > >>> FaultyRecordException class. > >>> > >>> Stanislav, > >>> > >>> Our users are also facing this error. Could we bump up this discussion? > >>> > >>> I think we can have a single exception type > >>> FaultyRecordException/RecordDeserialization exception to capture both > >>> the errors. We can add an additional enum field to differentiate the > >>> errors > >>> if required. > >>> > >>> Thanks, > >>> Kamal Chandraprakash > >>> > >>> On Wed, Apr 24, 2019 at 1:49 PM Kamal Chandraprakash < > >>> kamal.chandraprak...@gmail.com> wrote: > >>> > >>> > Stanislav, > >>> > > >>> > Any updates on this KIP? We have internal users who want to skip the > >>> > corrupted message while consuming the records. > >>> > > >>> > > >>> > On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax < > >>> matth...@confluent.io> > >>> > wrote: > >>> > > >>> >> I am not 100% familiar with the details of the consumer code, > however > >>> I > >>> >> tend to disagree with: > >>> >> > >>> >> > There's no difference between the two cases -- if (and only if) > the > >>> >> message is corrupt, it can't be deserialized. If (and only if) it > >>> can't be > >>> >> deserialized, it is corrupt. > >>> >> > >>> >> Assume that a user configures a JSON deserializer but a faulty > >>> upstream > >>> >> producer writes an Avro message. For this case, the message is not > >>> >> corrupted, but still can't be deserialized. And I can imaging that > >>> users > >>> >> want to handle both cases differently. > >>> >> > >>> >> Thus, I think it makes sense to have two different exceptions > >>> >> `RecordDeserializationException` and `CorruptedRecordException` that > >>> can > >>> >> both extend `FaultyRecordException` (don't like this name too much > >>> >> honestly, but don't have a better idea for it anyway). > >>> >> > >>> >> Side remark. If we introduce class `RecordDeserializationException` > >>> and > >>> >> `CorruptedRecordException`, we can also add an interface that both > >>> >> implement to return partition/offset information and let both extend > >>> >> `SerializationException` directly without an intermediate class in > the > >>> >> exception hierarchy. > >>> >> > >>> >> > >>> >> -Matthias > >>> >> > >>> >> On 8/8/18 2:57 AM, Stanislav Kozlovski wrote: > >>> >> >> If you are inheriting from SerializationException, your derived > >>> class > >>> >> > should also be a kind of serialization exception. Not something > >>> more > >>> >> > general. > >>> >> > Yeah, the reason for inheriting it would be for > >>> backwards-compatibility. > >>> >> > > >>> >> >> Hmm. Can you think of any new scenarios that would make Kafka > >>> force > >>> >> the > >>> >> > user need to skip a specific record? Perhaps one scenario is if > >>> records > >>> >> > are lost but we don't know how many. > >>> >> > Not on the spot, but I do wonder how likely a new scenario is to > >>> >> surface in > >>> >> > the future and how we'd handle the exceptions' class hierarchy > then. > >>> >> > > >>> >> >> Which offset were we planning to use in the > >>> >> > exception? > >>> >> > The offset of the record which caused the exception. In the case > of > >>> >> > batches, we use the last offset of the batch. In both cases, users > >>> >> should > >>> >> > have to seek +1 from the given offset. You can review the PR to > >>> ensure > >>> >> its > >>> >> > accurate > >>> >> > > >>> >> > > >>> >> > If both of you prefer `RecordDeserializationException`, we can go > >>> with > >>> >> > that. Please do confirm that is okay > >>> >> > > >>> >> > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson < > ja...@confluent.io > >>> > > >>> >> wrote: > >>> >> > > >>> >> >> One difference between the two cases is that we can't generally > >>> trust > >>> >> the > >>> >> >> offset of a corrupt message. Which offset were we planning to use > >>> in > >>> >> the > >>> >> >> exception? Maybe it should be either the fetch offset or one plus > >>> the > >>> >> last > >>> >> >> consumed offset? I think I'm with Colin in preferring > >>> >> >> RecordDeserializationException for both cases if possible. For > one > >>> >> thing, > >>> >> >> that makes the behavior consistent whether or not `check.crs` is > >>> >> enabled. > >>> >> >> > >>> >> >> -Jason > >>> >> >> > >>> >> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe < > cmcc...@apache.org> > >>> >> wrote: > >>> >> >> > >>> >> >>> Hi Stanislav, > >>> >> >>> > >>> >> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote: > >>> >> >>>> Hey Colin, > >>> >> >>>> > >>> >> >>>> It may be a bit vague but keep in mind we also raise the > >>> exception in > >>> >> >> the > >>> >> >>>> case where the record is corrupted. > >>> >> >>>> We discussed with Jason offline that message corruption most > >>> often > >>> >> >>> prevents > >>> >> >>>> deserialization itself and that may be enough of an argument to > >>> raise > >>> >> >>>> `RecordDeserializationException` in the case of a corrupt > >>> record. I > >>> >> >>>> personally find that misleading. > >>> >> >>> > >>> >> >>> Hmm. I think that by definition, corrupt records are records > that > >>> >> can't > >>> >> >>> be deserialized. There's no difference between the two cases -- > >>> if > >>> >> (and > >>> >> >>> only if) the message is corrupt, it can't be deserialized. If > >>> (and > >>> >> only > >>> >> >>> if) it can't be deserialized, it is corrupt. > >>> >> >>> > >>> >> >>>> > >>> >> >>>> In the end, I think it might be worth it to have a bit of a > >>> >> >>>> wider-encompassing `FaultyRecordException` (or even > >>> >> >>>> `UnconsumableRecordException`) which would allow users to skip > >>> over > >>> >> the > >>> >> >>>> record. > >>> >> >>> > >>> >> >>> If you are inheriting from SerializationException, your derived > >>> class > >>> >> >>> should also be a kind of serialization exception. Not something > >>> more > >>> >> >>> general. > >>> >> >>> > >>> >> >>>> We could then potentially have more specific exceptions (e.g > >>> >> >>>> deserialization) inherit that but I'm not sure if we should. > >>> >> >>>> A case for a more general exception which provides access to > the > >>> >> >>>> partition/offset is future backwards-compatibility. If there is > >>> ever > >>> >> a > >>> >> >>> new > >>> >> >>>> scenario that would make the user need to skip a specific > record > >>> - > >>> >> >> there > >>> >> >>>> would already be such an exception and this will provide some > >>> >> >>>> backward-compatibility with older clients. > >>> >> >>> > >>> >> >>> Hmm. Can you think of any new scenarios that would make Kafka > >>> force > >>> >> the > >>> >> >>> user need to skip a specific record? Perhaps one scenario is if > >>> >> records > >>> >> >>> are lost but we don't know how many. > >>> >> >>> > >>> >> >>> If we really want to have something more general, perhaps we > need > >>> >> >>> something like LostDataException. But in that case, we can't > >>> inherit > >>> >> >> from > >>> >> >>> SerializationException, since lost data is more general than > >>> >> >> serialization > >>> >> >>> issues. > >>> >> >>> > >>> >> >>> best, > >>> >> >>> Colin > >>> >> >>> > >>> >> >>> > >>> >> >>>> > >>> >> >>>> Best, > >>> >> >>>> Stanislav > >>> >> >>>> > >>> >> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe < > cmcc...@apache.org > >>> > > >>> >> >> wrote: > >>> >> >>>> > >>> >> >>>>> Hi Stanislav, > >>> >> >>>>> > >>> >> >>>>> Thanks for the KIP. > >>> >> >>>>> > >>> >> >>>>> As as user, "FaultyRecordException" seems a little vague. > >>> What's > >>> >> >>> faulty > >>> >> >>>>> about it? Is it too big? Does it not match the schema of the > >>> other > >>> >> >>>>> records? Was it not found? > >>> >> >>>>> > >>> >> >>>>> Of course, we know that the specific problem is with > >>> [de]serializing > >>> >> >>> the > >>> >> >>>>> record, not with any of those those things. So we should > >>> choose a > >>> >> >> name > >>> >> >>>>> that reflects that serialization is the problem. How about > >>> >> >>>>> RecordSerializationException? > >>> >> >>>>> > >>> >> >>>>> best, > >>> >> >>>>> Colin > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote: > >>> >> >>>>>> Hi Jason and Ted, > >>> >> >>>>>> > >>> >> >>>>>> @Jason > >>> >> >>>>>> I did not oppose the idea as I'm not too familiar with Java > >>> >> >>> conventions. > >>> >> >>>>> I > >>> >> >>>>>> agree it is a non-ideal way for the user to catch the > >>> exception so > >>> >> >> I > >>> >> >>>>>> changed it back. > >>> >> >>>>>> > >>> >> >>>>>> I've updated the KIP and PR with the latest changes. Now, > >>> there is > >>> >> >>> only > >>> >> >>>>> one > >>> >> >>>>>> public exception `FaultyRecordException` which is thrown to > the > >>> >> >> user > >>> >> >>> in > >>> >> >>>>>> both scenarios (corrupt record and deserialization error). > >>> >> >>>>>> Please take a look again. > >>> >> >>>>>> > >>> >> >>>>>> Best, > >>> >> >>>>>> Stanislav > >>> >> >>>>>> > >>> >> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson < > >>> ja...@confluent.io > >>> >> >>> > >>> >> >>>>> wrote: > >>> >> >>>>>> > >>> >> >>>>>>> Hey Stanislav, > >>> >> >>>>>>> > >>> >> >>>>>>> I should have noticed it earlier from your example, but I > just > >>> >> >>> realized > >>> >> >>>>>>> that interfaces don't mix well with exceptions. There is no > >>> way > >>> >> >> to > >>> >> >>>>> catch > >>> >> >>>>>>> the interface type, which means you have to depend on > >>> instanceof > >>> >> >>>>> checks, > >>> >> >>>>>>> which is not very conventional. At the moment, we raise > >>> >> >>>>>>> SerializationException if there is a problem parsing the > >>> error, > >>> >> >>> and we > >>> >> >>>>>>> raise KafkaException if the record fails its CRC check. > Since > >>> >> >>>>>>> SerializationException extends KafkaExeption, it seems like > we > >>> >> >> can > >>> >> >>>>> handle > >>> >> >>>>>>> both cases in a compatible way by handling both cases with a > >>> >> >> single > >>> >> >>>>> type > >>> >> >>>>>>> that extends SerializationException. Maybe something like > >>> >> >>>>>>> RecordDeserializationException? > >>> >> >>>>>>> > >>> >> >>>>>>> Thanks, > >>> >> >>>>>>> Jason > >>> >> >>>>>>> > >>> >> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <yuzhih...@gmail.com > > > >>> >> >>> wrote: > >>> >> >>>>>>> > >>> >> >>>>>>>> +1 > >>> >> >>>>>>>> -------- Original message --------From: Stanislav > Kozlovski < > >>> >> >>>>>>>> stanis...@confluent.io> Date: 8/2/18 2:41 AM (GMT-08:00) > >>> To: > >>> >> >>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include > >>> >> >> partitions > >>> >> >>> in > >>> >> >>>>>>>> exceptions raised during consumer record > >>> >> >>> deserialization/validation > >>> >> >>>>>>>> Hey everybody, > >>> >> >>>>>>>> > >>> >> >>>>>>>> I'd like to start a vote thread for KIP-334 Include > >>> partitions > >>> >> >> in > >>> >> >>>>>>>> exceptions raised during consumer record > >>> >> >>> deserialization/validation > >>> >> >>>>>>>> < > >>> >> >>>>>>> > >>> >> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action? > >>> >> >>> pageId=87297793 > >>> >> >>>>>>>>> > >>> >> >>>>>>>> > >>> >> >>>>>>>> -- > >>> >> >>>>>>>> Best, > >>> >> >>>>>>>> Stanislav > >>> >> >>>>>>>> > >>> >> >>>>>>> > >>> >> >>>>>> > >>> >> >>>>>> > >>> >> >>>>>> -- > >>> >> >>>>>> Best, > >>> >> >>>>>> Stanislav > >>> >> >>>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> -- > >>> >> >>>> Best, > >>> >> >>>> Stanislav > >>> >> >>> > >>> >> >> > >>> >> > > >>> >> > > >>> >> > >>> >> > >>> > >> > >> > >> -- > >> Best, > >> Stanislav > >> > > > > > > -- > > Best, > > Stanislav > > > > > -- > Best, > Stanislav >