Hi, I have to agree with what Huang and Jingsong said. We should think more about it from the user's(developers who use the API) perspective. The first method T deserialize(byte[]) is convenient for users to deserialize a single event. Many formats are using it, i.e. Avro, csv, etc. There should also be Json cases for single event deserialization, if I am not mistaken.
void deserialize(byte[], Collector<T>) method is a default interface method. There will be a big code refactoring if we use it to replace T deserialize(byte[]). The benefit is very limited after considering all concerns. For few cases that do not need T deserialize(byte[]), the maintenance effort is IMHO acceptable. It is, after all, only one method. Best regards, Jing On Tue, Feb 28, 2023 at 9:47 AM Benchao Li <libenc...@apache.org> wrote: > I share the same concerns with Jingsong and Hang, however, I'll raise a > point why keeping both is also not a good idea. > > In FLINK-18590[1], we are introducing a feature that we'll deserialize JSON > array into multiple records. This feature can only be used in `void > deserialize(byte[] message, Collector<T> out)`. And many more cdc formats > are doing the similar thing. If we keep both methods, many formats/features > will not be available for `T deserialize(byte[] message)`. > > And for format maintenance, we usually need to keep these two methods, > which is also a burden for format maintainers. > > [1] https://issues.apache.org/jira/browse/FLINK-18590 > > Jingsong Li <jingsongl...@gmail.com> 于2023年2月28日周二 16:03写道: > > > - `T deserialize(byte[] message)` is widely used and it is a public > > api. It is very friendly for single record deserializers. > > - `void deserialize(byte[] message, Collector<T> out)` supports > > multiple records. > > > > I think we can just keep them as they are. > > > > Best, > > Jingsong > > > > > > On Tue, Feb 28, 2023 at 3:08 PM Hang Ruan <ruanhang1...@gmail.com> > wrote: > > > > > > Hi, Shammon, > > > > > > I think the method `void deserialize(byte[] message, Collector<T> out)` > > > with a default implementation encapsulate how to deal with null for > > > developers. If we remove the `T deserialize(byte[] message)`, the > > > developers have to remember to handle null. Maybe we will get duplicate > > > code among them. > > > And I find there are only 5 implementations override the method `void > > > deserialize(byte[] message, Collector<T> out)`. Other implementations > > reuse > > > the same code to handle null. > > > I don't know the benefits of removing this method. Looking forward to > > other > > > people's opinions. > > > > > > Best, > > > Hang > > > > > > > > > > > > Shammon FY <zjur...@gmail.com> 于2023年2月28日周二 14:14写道: > > > > > > > Hi devs > > > > > > > > Currently there are two deserialization methods in > > `DeserializationSchema` > > > > 1. `T deserialize(byte[] message)`, only deserialize one record from > > > > binary, if there is no record it should return null. > > > > 2. `void deserialize(byte[] message, Collector<T> out)`, supports > > > > deserializing none, one or multiple records gracefully, it can > > completely > > > > replace method `T deserialize(byte[] message)`. > > > > > > > > The deserialization logic in the above two methods is basically > > coincident, > > > > we recommend users use the second method to deserialize data. To > > improve > > > > code maintainability, I'd like to mark the first function as > > `@Deprecated`, > > > > and remove it when it is no longer used in the future. > > > > > > > > I have created an issue[1] to track it, looking forward to your > > feedback, > > > > thanks > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-31251 > > > > > > > > > > > > Best, > > > > Shammon > > > > > > > > > -- > > Best, > Benchao Li >