Hi all, Pulsar supports AUTO_PRODUCE schema, but this feature was introduced at an early time [1] when there was no PIP. I have read the documents [2] and found the example scenario.
> Suppose that: > - You have a producer processing messages from a Kafka topic K. > - You have a Pulsar topic P, and you do not know its schema type. > - Your application reads the messages from K and writes the messages to P. It seems to assume the format of messages from the source topic (`K`) is **unknown**, but we tried to use a **known schema** from an existing topic to encode the bytes. This operation is very weird. First, how do you guarantee the schema can be used to encode the raw bytes whose format is unknown? Second, messages that cannot be encoded by the schema can only be discarded, i.e. message lost. Third, schema in Pulsar is convenient because it can support sending any object of type `T` and the Pulsar client is responsible to serialize `T` to the bytes. However, when using AUTO_PRODUCE schema, the producer still sends raw bytes. It looks like the AUTO_PRODUCE schema is used when you assume most of the source messages can be decoded via a known schema and you can tolerate discarding other messages. BTW, the document doesn't describe how to handle the exception. You need to catch the SchemaSerializationException for `sendAsync`. It changed the common way of how to use `sendAsync` because the asynchronous method should not throw any exception in regular cases. And the exception message might look like > java.lang.ArrayIndexOutOfBoundsException: Index -39 out of bounds for length 2 It's not helpful to know why a specific message cannot be encoded by the existing schema and hard to detect the problem. I cannot think of a scenario where the `AUTO_PRODUCE` schema is useful. It just forces the producers to validate messages, rather than consumers. With AUTO_PRODUCE schema, the exception is thrown from `Producer#sendAsync`, while without it the exception will be thrown from `Message#getValue`. When we want to use schema, the producer side should know the format of messages to send. Schema should be used when you know the format of messages to send while the topic doesn't accept this format [3]. In conclusion, I think it's a very bad feature and we should not encourage users to use this feature. i.e. mark it as deprecated and remove it from the documents. Feel free to comment your thoughts! [1] https://github.com/apache/pulsar/pull/2685 [2] https://pulsar.apache.org/docs/2.10.x/schema-understand/#auto_produce [3] https://pulsar.apache.org/docs/2.10.x/schema-get-started/#why-use-schema Thanks, Yunze