Hi Theo,

However, in most benchmarks, avro turns out to be rather slow in terms of
> CPU cycles ( e.g. [1] <https://github.com/eishay/jvm-serializers/wiki> )


Avro is slower compared to what?
You should not only benchmark the CPU cycles for serializing the data. If
you are sending JSON strings across the network, you'll probably have a lot
more bytes to send across the network, making everything slower (usually
network is slower than CPU)

One of the reasons why people use Avro it supports schema evolution.

Regarding your questions:
1. For this use case, you can use the Flink data format as an internal
message format (between the star architecture jobs)
2. Generally speaking no
3. You will at leave have a dependency to flink-core. And this is a
somewhat custom setup, so you might be facing breaking API changes.
4. I'm not aware of any benchmarks. The Flink serializers are mostly for
internal use (between our operators), Kryo is our fallback (to not suffer
to much from the not invented here syndrome), while Avro is meant for
cross-system serialization.

I have the feeling that you can move ahead with using Flink's Pojo
serializer everywhere :)

Best,
Robert




On Wed, Mar 4, 2020 at 1:04 PM Theo Diefenthal <
theo.diefent...@scoop-software.de> wrote:

> Hi,
>
> Without knowing too much about flink serialization, I know that Flinks
> states that it serializes POJOtypes much faster than even the fast Kryo for
> Java. I further know that it supports schema evolution in the same way as
> avro.
>
> In our project, we have a star architecture, where one flink job produces
> results into a kafka topic and where we have multiple downstream consumers
> from that kafka topic (Mostly other flink jobs).
> For fast development cycles, we currently use JSON as output format for
> the kafka topic due to easy debugging capabilities and best migration
> possibilities. However, when scaling up, we need to switch to a more
> efficient format. Most often, Avro is mentioned in combination with a
> schema registry, as its much more efficient then JSON where essentially,
> each message contains the schema as well. However, in most benchmarks, avro
> turns out to be rather slow in terms of CPU cycles ( e.g. [1]
> <https://github.com/eishay/jvm-serializers/wiki> )
>
> My question(s) now:
> 1. Is it reasonable to use flink serializers as message format in Kafka?
> 2. Are there any downsides in using flinks serialization result as output
> format to kafka?
> 3. Can downstream consumers, written in Java, but not flink components,
> also easily deserialize flink serialized POJOs? Or do they have a
> dependency to at least full flink-core?
> 4. Do you have benchmarks comparing flink (de-)serialization performance
> to e.g. kryo and avro?
>
> The only thing I come up with why I wouldn't use flink serialization is
> that we wouldn't have a schema registry, but in our case, we share all our
> POJOs in a jar which is used by all components, so that is kind of a schema
> registry already and if we only make avro compatible changes, which are
> also well treated by flink, that shouldn't be any limitation compared to
> like avro+registry?
>
> Best regards
> Theo
>
> [1] https://github.com/eishay/jvm-serializers/wiki
>

Reply via email to