I know that an `Object` is treated as a generic data type by Flink and
hence serialized using Kryo. I wonder if there is anything one can do to
improve performance w.r.t. to the Kryo-based serializer or if that is
simply an inherent worst case scenario and nothing can be done without
actually switc
Hi Mingliang,
first of all the POJO serializer is not very performant. Tuple or Row
are better. If you want to improve the performance of a collect()
between operators, you could also enable object reuse. You can read more
about this here [1] (section "Issue 2: Object Reuse"), but make sure
y
Hi all,
I’m currently using the keyed process function, I see there’s serialization
happening when I collect the object / update the object to rocksdb. For me the
performance of serialization seems to be the bottleneck.
By default, POJO serializer is used, and the timecost of collect / update to
for Kryo right now.
>
>
>
>
>
> *From:* Stephan Ewen [mailto:se...@apache.org]
> *Sent:* Tuesday, March 07, 2017 6:21 AM
> *To:* user@flink.apache.org
> *Subject:* Re: Serialization performance
>
>
>
> Hi Billy!
>
>
>
> Out of curiosity: Were you ab
Hi Billy,
on the Beam side, you probably have looked into writing your own Coder
(the equivalent of a TypeSerializer in Flink). If yes, did that not work
out for you? And if yes, why?
Best,
Aljoscha
On Thu, Mar 2, 2017, at 22:02, Stephan Ewen wrote:
> Hi!
>
> I can write some more deta
s.toString();
>
>builder = *null*;
>
>*this*.comment = comment;
>
>
>
>GRKryoSerializer.*preregisterSchema*(comment, s);
>
> }
>
>
>
> *public* *synchronized* GenericRecordBuilder getBuilder()
>
> {
>
e.org]
Sent: Thursday, March 02, 2017 3:07 PM
To: user@flink.apache.org; Aljoscha Krettek
Subject: Re: Serialization performance
Hi!
Thanks for this writeup, very cool stuff !
For part (1) - serialization: I think that can be made a bit nicer. Avro is a
bit of an odd citizen in Flink, because Flink
Hi!
Thanks for this writeup, very cool stuff !
For part (1) - serialization: I think that can be made a bit nicer. Avro is
a bit of an odd citizen in Flink, because Flink serialization is actually
schema aware, but does not integrate with Avro. That's why Avro types go
through Kryo.
We should tr
We've been working on performance for the last while. We're using flink 1.2
right now. We are writing batch jobs which process avro and parquet input files
and produce parquet files.
Flink serialization costs seem to be the most significant aspect of our wall
clock time. We have written a custo