Best serialization performance for `Objects`

2024-04-25 Thread Salva Alcántara
I know that an `Object` is treated as a generic data type by Flink and hence serialized using Kryo. I wonder if there is anything one can do to improve performance w.r.t. to the Kryo-based serializer or if that is simply an inherent worst case scenario and nothing can be done without actually switc

Re: Stream collector serialization performance

2018-08-15 Thread Timo Walther
Hi Mingliang, first of all the POJO serializer is not very performant. Tuple or Row are better. If you want to improve the performance of a collect() between operators, you could also enable object reuse. You can read more about this here [1] (section "Issue 2: Object Reuse"), but make sure y

Stream collector serialization performance

2018-08-15 Thread 祁明良
Hi all, I’m currently using the keyed process function, I see there’s serialization happening when I collect the object / update the object to rocksdb. For me the performance of serialization seems to be the bottleneck. By default, POJO serializer is used, and the timecost of collect / update to

Re: Serialization performance

2017-03-07 Thread Stephan Ewen
for Kryo right now. > > > > > > *From:* Stephan Ewen [mailto:se...@apache.org] > *Sent:* Tuesday, March 07, 2017 6:21 AM > *To:* user@flink.apache.org > *Subject:* Re: Serialization performance > > > > Hi Billy! > > > > Out of curiosity: Were you ab

Re: Serialization performance

2017-03-03 Thread Aljoscha Krettek
Hi Billy, on the Beam side, you probably have looked into writing your own Coder (the equivalent of a TypeSerializer in Flink). If yes, did that not work out for you? And if yes, why? Best, Aljoscha On Thu, Mar 2, 2017, at 22:02, Stephan Ewen wrote: > Hi! > > I can write some more deta

Re: Serialization performance

2017-03-02 Thread Stephan Ewen
s.toString(); > >builder = *null*; > >*this*.comment = comment; > > > >GRKryoSerializer.*preregisterSchema*(comment, s); > > } > > > > *public* *synchronized* GenericRecordBuilder getBuilder() > > { >

RE: Serialization performance

2017-03-02 Thread Newport, Billy
e.org] Sent: Thursday, March 02, 2017 3:07 PM To: user@flink.apache.org; Aljoscha Krettek Subject: Re: Serialization performance Hi! Thanks for this writeup, very cool stuff ! For part (1) - serialization: I think that can be made a bit nicer. Avro is a bit of an odd citizen in Flink, because Flink

Re: Serialization performance

2017-03-02 Thread Stephan Ewen
Hi! Thanks for this writeup, very cool stuff ! For part (1) - serialization: I think that can be made a bit nicer. Avro is a bit of an odd citizen in Flink, because Flink serialization is actually schema aware, but does not integrate with Avro. That's why Avro types go through Kryo. We should tr

Serialization performance

2017-03-02 Thread Newport, Billy
We've been working on performance for the last while. We're using flink 1.2 right now. We are writing batch jobs which process avro and parquet input files and produce parquet files. Flink serialization costs seem to be the most significant aspect of our wall clock time. We have written a custo