Thanks Sophie! Makes it much clearer where you are coming from.

About the Type unsafety: isn't this also an issue for the `handleSerialziationException` case, because the handler is used for all sink topics, and thus key/value types are not really know w/o taking the sink topic into account? -- So I am not sure if having two handler methods really helps much with regard to type safety?

Just want to make this small comment for completeness. Let's hear what others think. Given that we both don't have a strong opinion but just a personal preference, we should be able to come to a conclusion quickly and get this KIP approved for 3.8 :)


-Matthias

On 5/8/24 3:12 PM, Sophie Blee-Goldman wrote:
Well I definitely don't feel super strongly about it, and more importantly,
I'm not a user. So I will happily defer to the preference of anyone who
will actually be using this feature. But  I'll explain my reasoning:

There *is* a relevant distinction between these two callbacks -- because
the passed-in record will have a different type depending on whether it was
a serialization exception or something else. Even if we combined them into
a single #handle method, users will still end up implementing two distinct
branches depending on whether it was a serialization exception or not,
since that determines the type of the ProducerRecord passed in.

Not to mention they'll need to cast it to a ProducerRecord<byte[], byte[]>
when we could have just passed it in as this type via a dedicated callback.
And note that because of the generics, they can't do an instanceof check to
make sure that the record type is ProducerRecord<byte[], byte[]> and will
have to suppress the "unchecked cast" warning.

So if we combined the two callbacks, their handler will look something like
this:

@SuppressWarnings("unchecked")
public ProductionExceptionHandlerResponse handle(final ErrorHandlerContext
context,
final ProducerRecord<?, ?> record,
final Exception exception) {
if (exception instanceof SerializationException) {
if (exception.origin().equals(KEY)) {
log.error("Failed to serialize key", exception);
} else {
log.error("Failed to serialize value", exception);
}

} else {
final ProducerRecord<byte[], byte[]> serializedRecord = (ProducerRecord<byte[],
byte[]>) record;
log.error("Failed to produce record with serialized key={} and serialized
value={}",
serializedRecord.key(), serializedRecord.value());
}
return ProductionExceptionHandlerResponse.FAIL;
}

That seems like the most basic case, and it still haswith distinct logic
even if they ultimately handle exceptions the same way. And looking forward
to KIP-1034: Dead-letter queues, it seems all the more likely that the
actual handling response might be different depending on whether it's a
serialization exception or not: a serialized record can probably be
retried/sent to a DLQ, whereas a record that can't be serialized should not
(can't, really) be forwarded to a DLQ. So if they're going to have
completely different implementations depending on whether it's a
serialization exception, why not just give them two separate callbacks?

And that's all assuming the user is perfectly aware of the different
exception types and their implications for the type of the ProducerRecord.
Many people might just miss the existence of the
RecordSerializationException altogether --
there are already so many different exception types, ESPECIALLY when it
comes to the Producer. Not to mention they'll need to understand the
nuances of how the ProducerRecord type changes depending on the type of
exception that's passed in. And on top of all that, they'll need to know
that there is metadata stored in the RecordSerializationException regarding
the origin of the error. Whereas if we just passed in the
SerializationExceptionOrigin to a #handlerSerialization callback, well,
that's pretty impossible to miss.

That all just seems like a lot for most people to have to understand to
implement a ProductionExceptionHandler, which imo is not at all an advanced
feature and should be as straightforward and easy to use as possible.

Lastly -- I don't think it's quite fair to compare this to the
RecordDeserializationException. We have a dedicated handler that's just for
deserialization exceptions specifically, hence there's no worry about users
having to be aware of the different exception types they might have to deal
with in the DeserializtionExceptionHandler. Whereas serialization
exceptions are just a subset of what might get passed in to the
ProductionExceptionHandler...

Just explaining my reasoning -- in the end I leave it up to the KIP authors
and anyone who will actually be using this feature in their applications :)



On Tue, May 7, 2024 at 8:35 PM Matthias J. Sax <mj...@apache.org> wrote:

@Loic, yes, what you describe is exactly what I had in mind.



@Sophie, can you elaborate a little bit?

First of all, I agree that it makes sense to maintain the two separate
callbacks for the ProductionExceptionHandler, since one of them is
specifically for serialization exceptions while the other is used for
everything/anything else.

What makes a serialization exception special compare to other errors
that it's valuable to treat it differently? Why can we put "everything
else" into a single bucket? By your train of though, should we not split
out the "everything else" bucket into a different callback method for
every different error? If no, why not, but only for serialization errors?

  From what I believe to remember, historically, we added the
ProductionExceptionHandler, and kinda just missed the serialization
error case. And later, when we extended the handler we just could not
re-use the existing callback as it was typed with `<byte[], byte[]>` and
it would have been an incompatible change; so it was rather a workaround
to add the second method to then handler, but not really intended design?


It's of course only my personal opinion that I believe a single callback
method is simpler/cleaner compared to sticking with two, and adding the
new exception type to make it backward compatible seems worth it. It
also kinda introduces the same patter we use elsewhere (cf KIP-1036)
what I actually think is an argument for introducing
`RercordSerializationExcetpion`, to unify user experience across the board.

Would be great to hear from others about this point. It's not that I
strongly object to having two methods, and I would not block this KIP on
this question.



-Matthias


On 5/7/24 3:40 PM, Sophie Blee-Goldman wrote:
First of all, I agree that it makes sense to maintain the two separate
callbacks for the ProductionExceptionHandler, since one of them is
specifically for serialization exceptions while the other is used for
everything/anything else. I also think we can take advantage of this fact
to simplify things a bit and cut down on the amount of new stuff added to
the API by just adding a parameter to the #handleSerializationException
callback and use that to pass in the SerializationExceptionOrigin enum to
distinguish between key vs value. This way we wouldn't need to introduce
yet another exception type (the RecordSerializationException) just to
pass
in this information.

Thoughts?

On Tue, May 7, 2024 at 8:33 AM Loic Greffier <loic.greff...@michelin.com

wrote:

Hi Matthias,

To sum up with the ProductionExceptionHandler callback methods (106)
proposed changes.

A new method ProductionExceptionHandler#handle is added with the
following
signature:

ProductionExceptionHandlerResponse handle(final ErrorHandlerContext
context, final ProducerRecord<?,?> record, final Exception exception);

The ProducerRecord<?,?> parameter has changed to accept a serialized or
non-serialized record.
Thus, the new ProductionExceptionHandler#handle method can handle both
production exception and serialization exception.

Both old ProductionExceptionHandler#handle and
ProductionExceptionHandler#handleSerializationException methods are now
deprecated.
The old ProductionExceptionHandler#handle method gets a default
implementation, so users do not have to implement a deprecated method.

To handle backward compatibility, the new
ProductionExceptionHandler#handle method gets a default implementation.

default ProductionExceptionHandlerResponse handle(final
ErrorHandlerContext context, final ProducerRecord<?, ?> record, final
Exception exception) {
    if (exception instanceof RecordSerializationException) {
        this.handleSerializationException(record, exception.getCause());
    }

    return handle((ProducerRecord<byte[], byte[]>) record, exception);
}

The default implementation either invokes #handleSerializationException
or
#handle depending on the type of the exception, thus users still
relying on
deprecated ProductionExceptionHandler#handle
or ProductionExceptionHandler#handleSerializationException custom
implementations won't break.

The new ProductionExceptionHandler#handle method is now invoked in case
of
serialization exception:

public <K, V> void send(final String topic, final K key, final V value,
...) {
      try {
          keyBytes = keySerializer.serialize(topic, headers, key);
          ...
      } catch (final ClassCastException exception) {
        ...
      } catch (final Exception exception) {

          try {
              response = productionExceptionHandler.handle(context,
record, new
RecordSerializationException(SerializationExceptionOrigin.KEY,
exception));
          } catch (final Exception e) {
          ...
          }
      }
}

To wrap the origin serialization exception and determine whether it
comes
from the key or the value, a new exception class is created:

public class RecordSerializationException extends
SerializationException
{
      public enum SerializationExceptionOrigin {
          KEY,
          VALUE
      }

      public RecordSerializationException(SerializationExceptionOrigin
origin, final Throwable cause);
}

Hope it all makes sense,
Loïc




Reply via email to