Thanks Jason. I've edited the KIP with the latest proposal. On Thu, Oct 10, 2019 at 2:00 AM Jason Gustafson <ja...@confluent.io> wrote:
> Hi Stanislav, > > Sorry for the late comment. I'm happy with the current proposal. Just one > small request is to include an example which shows how a user could use > this to skip over a record. > > I'd suggest pushing this to a vote to see if anyone else has feedback. > > Thanks, > Jason > > On Sat, Jul 13, 2019 at 2:27 PM Stanislav Kozlovski < > stanis...@confluent.io> > wrote: > > > Hello, > > > > Could we restart the discussion here again? > > > > My last message was sent on the 3rd of June but I haven't received > replies > > since then. > > > > I'd like to get this KIP to a finished state, regardless of whether that > is > > merged-in or discarded. It has been almost one year since the publication > > of the KIP. > > > > Thanks, > > Stanislav > > > > On Mon, Jun 3, 2019 at 11:19 AM Stanislav Kozlovski < > > stanis...@confluent.io> > > wrote: > > > > > Do people agree with the approach I outlined in my last reply? > > > > > > On Mon, May 6, 2019 at 2:12 PM Stanislav Kozlovski < > > stanis...@confluent.io> > > > wrote: > > > > > >> Hey there Kamal, > > >> > > >> I'm sincerely sorry for missing your earlier message. As I open this > > >> thread up, I see I have an unsent draft message about resuming > > discussion > > >> from some time ago. > > >> > > >> In retrospect, I think I may have been too pedantic with the exception > > >> naming and hierarchy. > > >> I now believe a single exception type of > > `RecordDeserializationException` > > >> is enough. Let's go with that. > > >> > > >> On Mon, May 6, 2019 at 6:40 AM Kamal Chandraprakash < > > >> kamal.chandraprak...@gmail.com> wrote: > > >> > > >>> Matthias, > > >>> > > >>> We already have CorruptRecordException which doesn't extend the > > >>> SerializationException. So, we need an alternate > > >>> name suggestion for the corrupted record error if we decide to extend > > the > > >>> FaultyRecordException class. > > >>> > > >>> Stanislav, > > >>> > > >>> Our users are also facing this error. Could we bump up this > discussion? > > >>> > > >>> I think we can have a single exception type > > >>> FaultyRecordException/RecordDeserialization exception to capture both > > >>> the errors. We can add an additional enum field to differentiate the > > >>> errors > > >>> if required. > > >>> > > >>> Thanks, > > >>> Kamal Chandraprakash > > >>> > > >>> On Wed, Apr 24, 2019 at 1:49 PM Kamal Chandraprakash < > > >>> kamal.chandraprak...@gmail.com> wrote: > > >>> > > >>> > Stanislav, > > >>> > > > >>> > Any updates on this KIP? We have internal users who want to skip > the > > >>> > corrupted message while consuming the records. > > >>> > > > >>> > > > >>> > On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax < > > >>> matth...@confluent.io> > > >>> > wrote: > > >>> > > > >>> >> I am not 100% familiar with the details of the consumer code, > > however > > >>> I > > >>> >> tend to disagree with: > > >>> >> > > >>> >> > There's no difference between the two cases -- if (and only if) > > the > > >>> >> message is corrupt, it can't be deserialized. If (and only if) it > > >>> can't be > > >>> >> deserialized, it is corrupt. > > >>> >> > > >>> >> Assume that a user configures a JSON deserializer but a faulty > > >>> upstream > > >>> >> producer writes an Avro message. For this case, the message is not > > >>> >> corrupted, but still can't be deserialized. And I can imaging that > > >>> users > > >>> >> want to handle both cases differently. > > >>> >> > > >>> >> Thus, I think it makes sense to have two different exceptions > > >>> >> `RecordDeserializationException` and `CorruptedRecordException` > that > > >>> can > > >>> >> both extend `FaultyRecordException` (don't like this name too much > > >>> >> honestly, but don't have a better idea for it anyway). > > >>> >> > > >>> >> Side remark. If we introduce class > `RecordDeserializationException` > > >>> and > > >>> >> `CorruptedRecordException`, we can also add an interface that both > > >>> >> implement to return partition/offset information and let both > extend > > >>> >> `SerializationException` directly without an intermediate class in > > the > > >>> >> exception hierarchy. > > >>> >> > > >>> >> > > >>> >> -Matthias > > >>> >> > > >>> >> On 8/8/18 2:57 AM, Stanislav Kozlovski wrote: > > >>> >> >> If you are inheriting from SerializationException, your derived > > >>> class > > >>> >> > should also be a kind of serialization exception. Not something > > >>> more > > >>> >> > general. > > >>> >> > Yeah, the reason for inheriting it would be for > > >>> backwards-compatibility. > > >>> >> > > > >>> >> >> Hmm. Can you think of any new scenarios that would make Kafka > > >>> force > > >>> >> the > > >>> >> > user need to skip a specific record? Perhaps one scenario is if > > >>> records > > >>> >> > are lost but we don't know how many. > > >>> >> > Not on the spot, but I do wonder how likely a new scenario is to > > >>> >> surface in > > >>> >> > the future and how we'd handle the exceptions' class hierarchy > > then. > > >>> >> > > > >>> >> >> Which offset were we planning to use in the > > >>> >> > exception? > > >>> >> > The offset of the record which caused the exception. In the case > > of > > >>> >> > batches, we use the last offset of the batch. In both cases, > users > > >>> >> should > > >>> >> > have to seek +1 from the given offset. You can review the PR to > > >>> ensure > > >>> >> its > > >>> >> > accurate > > >>> >> > > > >>> >> > > > >>> >> > If both of you prefer `RecordDeserializationException`, we can > go > > >>> with > > >>> >> > that. Please do confirm that is okay > > >>> >> > > > >>> >> > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson < > > ja...@confluent.io > > >>> > > > >>> >> wrote: > > >>> >> > > > >>> >> >> One difference between the two cases is that we can't generally > > >>> trust > > >>> >> the > > >>> >> >> offset of a corrupt message. Which offset were we planning to > use > > >>> in > > >>> >> the > > >>> >> >> exception? Maybe it should be either the fetch offset or one > plus > > >>> the > > >>> >> last > > >>> >> >> consumed offset? I think I'm with Colin in preferring > > >>> >> >> RecordDeserializationException for both cases if possible. For > > one > > >>> >> thing, > > >>> >> >> that makes the behavior consistent whether or not `check.crs` > is > > >>> >> enabled. > > >>> >> >> > > >>> >> >> -Jason > > >>> >> >> > > >>> >> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe < > > cmcc...@apache.org> > > >>> >> wrote: > > >>> >> >> > > >>> >> >>> Hi Stanislav, > > >>> >> >>> > > >>> >> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote: > > >>> >> >>>> Hey Colin, > > >>> >> >>>> > > >>> >> >>>> It may be a bit vague but keep in mind we also raise the > > >>> exception in > > >>> >> >> the > > >>> >> >>>> case where the record is corrupted. > > >>> >> >>>> We discussed with Jason offline that message corruption most > > >>> often > > >>> >> >>> prevents > > >>> >> >>>> deserialization itself and that may be enough of an argument > to > > >>> raise > > >>> >> >>>> `RecordDeserializationException` in the case of a corrupt > > >>> record. I > > >>> >> >>>> personally find that misleading. > > >>> >> >>> > > >>> >> >>> Hmm. I think that by definition, corrupt records are records > > that > > >>> >> can't > > >>> >> >>> be deserialized. There's no difference between the two cases > -- > > >>> if > > >>> >> (and > > >>> >> >>> only if) the message is corrupt, it can't be deserialized. If > > >>> (and > > >>> >> only > > >>> >> >>> if) it can't be deserialized, it is corrupt. > > >>> >> >>> > > >>> >> >>>> > > >>> >> >>>> In the end, I think it might be worth it to have a bit of a > > >>> >> >>>> wider-encompassing `FaultyRecordException` (or even > > >>> >> >>>> `UnconsumableRecordException`) which would allow users to > skip > > >>> over > > >>> >> the > > >>> >> >>>> record. > > >>> >> >>> > > >>> >> >>> If you are inheriting from SerializationException, your > derived > > >>> class > > >>> >> >>> should also be a kind of serialization exception. Not > something > > >>> more > > >>> >> >>> general. > > >>> >> >>> > > >>> >> >>>> We could then potentially have more specific exceptions (e.g > > >>> >> >>>> deserialization) inherit that but I'm not sure if we should. > > >>> >> >>>> A case for a more general exception which provides access to > > the > > >>> >> >>>> partition/offset is future backwards-compatibility. If there > is > > >>> ever > > >>> >> a > > >>> >> >>> new > > >>> >> >>>> scenario that would make the user need to skip a specific > > record > > >>> - > > >>> >> >> there > > >>> >> >>>> would already be such an exception and this will provide some > > >>> >> >>>> backward-compatibility with older clients. > > >>> >> >>> > > >>> >> >>> Hmm. Can you think of any new scenarios that would make Kafka > > >>> force > > >>> >> the > > >>> >> >>> user need to skip a specific record? Perhaps one scenario is > if > > >>> >> records > > >>> >> >>> are lost but we don't know how many. > > >>> >> >>> > > >>> >> >>> If we really want to have something more general, perhaps we > > need > > >>> >> >>> something like LostDataException. But in that case, we can't > > >>> inherit > > >>> >> >> from > > >>> >> >>> SerializationException, since lost data is more general than > > >>> >> >> serialization > > >>> >> >>> issues. > > >>> >> >>> > > >>> >> >>> best, > > >>> >> >>> Colin > > >>> >> >>> > > >>> >> >>> > > >>> >> >>>> > > >>> >> >>>> Best, > > >>> >> >>>> Stanislav > > >>> >> >>>> > > >>> >> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe < > > cmcc...@apache.org > > >>> > > > >>> >> >> wrote: > > >>> >> >>>> > > >>> >> >>>>> Hi Stanislav, > > >>> >> >>>>> > > >>> >> >>>>> Thanks for the KIP. > > >>> >> >>>>> > > >>> >> >>>>> As as user, "FaultyRecordException" seems a little vague. > > >>> What's > > >>> >> >>> faulty > > >>> >> >>>>> about it? Is it too big? Does it not match the schema of > the > > >>> other > > >>> >> >>>>> records? Was it not found? > > >>> >> >>>>> > > >>> >> >>>>> Of course, we know that the specific problem is with > > >>> [de]serializing > > >>> >> >>> the > > >>> >> >>>>> record, not with any of those those things. So we should > > >>> choose a > > >>> >> >> name > > >>> >> >>>>> that reflects that serialization is the problem. How about > > >>> >> >>>>> RecordSerializationException? > > >>> >> >>>>> > > >>> >> >>>>> best, > > >>> >> >>>>> Colin > > >>> >> >>>>> > > >>> >> >>>>> > > >>> >> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote: > > >>> >> >>>>>> Hi Jason and Ted, > > >>> >> >>>>>> > > >>> >> >>>>>> @Jason > > >>> >> >>>>>> I did not oppose the idea as I'm not too familiar with Java > > >>> >> >>> conventions. > > >>> >> >>>>> I > > >>> >> >>>>>> agree it is a non-ideal way for the user to catch the > > >>> exception so > > >>> >> >> I > > >>> >> >>>>>> changed it back. > > >>> >> >>>>>> > > >>> >> >>>>>> I've updated the KIP and PR with the latest changes. Now, > > >>> there is > > >>> >> >>> only > > >>> >> >>>>> one > > >>> >> >>>>>> public exception `FaultyRecordException` which is thrown to > > the > > >>> >> >> user > > >>> >> >>> in > > >>> >> >>>>>> both scenarios (corrupt record and deserialization error). > > >>> >> >>>>>> Please take a look again. > > >>> >> >>>>>> > > >>> >> >>>>>> Best, > > >>> >> >>>>>> Stanislav > > >>> >> >>>>>> > > >>> >> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson < > > >>> ja...@confluent.io > > >>> >> >>> > > >>> >> >>>>> wrote: > > >>> >> >>>>>> > > >>> >> >>>>>>> Hey Stanislav, > > >>> >> >>>>>>> > > >>> >> >>>>>>> I should have noticed it earlier from your example, but I > > just > > >>> >> >>> realized > > >>> >> >>>>>>> that interfaces don't mix well with exceptions. There is > no > > >>> way > > >>> >> >> to > > >>> >> >>>>> catch > > >>> >> >>>>>>> the interface type, which means you have to depend on > > >>> instanceof > > >>> >> >>>>> checks, > > >>> >> >>>>>>> which is not very conventional. At the moment, we raise > > >>> >> >>>>>>> SerializationException if there is a problem parsing the > > >>> error, > > >>> >> >>> and we > > >>> >> >>>>>>> raise KafkaException if the record fails its CRC check. > > Since > > >>> >> >>>>>>> SerializationException extends KafkaExeption, it seems > like > > we > > >>> >> >> can > > >>> >> >>>>> handle > > >>> >> >>>>>>> both cases in a compatible way by handling both cases > with a > > >>> >> >> single > > >>> >> >>>>> type > > >>> >> >>>>>>> that extends SerializationException. Maybe something like > > >>> >> >>>>>>> RecordDeserializationException? > > >>> >> >>>>>>> > > >>> >> >>>>>>> Thanks, > > >>> >> >>>>>>> Jason > > >>> >> >>>>>>> > > >>> >> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu < > yuzhih...@gmail.com > > > > > >>> >> >>> wrote: > > >>> >> >>>>>>> > > >>> >> >>>>>>>> +1 > > >>> >> >>>>>>>> -------- Original message --------From: Stanislav > > Kozlovski < > > >>> >> >>>>>>>> stanis...@confluent.io> Date: 8/2/18 2:41 AM > (GMT-08:00) > > >>> To: > > >>> >> >>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include > > >>> >> >> partitions > > >>> >> >>> in > > >>> >> >>>>>>>> exceptions raised during consumer record > > >>> >> >>> deserialization/validation > > >>> >> >>>>>>>> Hey everybody, > > >>> >> >>>>>>>> > > >>> >> >>>>>>>> I'd like to start a vote thread for KIP-334 Include > > >>> partitions > > >>> >> >> in > > >>> >> >>>>>>>> exceptions raised during consumer record > > >>> >> >>> deserialization/validation > > >>> >> >>>>>>>> < > > >>> >> >>>>>>> > > >>> >> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action? > > >>> >> >>> pageId=87297793 > > >>> >> >>>>>>>>> > > >>> >> >>>>>>>> > > >>> >> >>>>>>>> -- > > >>> >> >>>>>>>> Best, > > >>> >> >>>>>>>> Stanislav > > >>> >> >>>>>>>> > > >>> >> >>>>>>> > > >>> >> >>>>>> > > >>> >> >>>>>> > > >>> >> >>>>>> -- > > >>> >> >>>>>> Best, > > >>> >> >>>>>> Stanislav > > >>> >> >>>>> > > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> -- > > >>> >> >>>> Best, > > >>> >> >>>> Stanislav > > >>> >> >>> > > >>> >> >> > > >>> >> > > > >>> >> > > > >>> >> > > >>> >> > > >>> > > >> > > >> > > >> -- > > >> Best, > > >> Stanislav > > >> > > > > > > > > > -- > > > Best, > > > Stanislav > > > > > > > > > -- > > Best, > > Stanislav > > > -- Best, Stanislav