I was expecting Parquet + thrift to perform faster but I wasn't expecting
that much, it was just to know whether my results were right or not. Thanks
for the moment Fabian!
On Fri, Nov 27, 2015 at 4:22 PM, Fabian Hueske wrote:
> Parquet is much cleverer that the TypeSerializer and applies column
Parquet is much cleverer that the TypeSerializer and applies columnar
storage and compression technique.
The TypeSerializerIOFs just use Flink's element-wise serializers to write
and read binary data.
I'd go with Parquet if it is working well for you.
2015-11-27 16:15 GMT+01:00 Flavio Pompermaier
I made a simple test and using parquet + thrift vs TypeSerializer IF/OF:
the former outperformed the second approach for a simple filter (not pushed
down) and a map+sum (something like 2 s vs 33s, and not considering disk
space usage that is much worse). Is that normal or TypeSerializer is
supposed
If you are just looking for an exchange format between two Flink jobs, I
would go for the TypeSerializerInput/OutputFormat.
Note that these are binary formats.
Best, Fabian
2015-11-27 15:28 GMT+01:00 Flavio Pompermaier :
> Hi to all,
>
> I have a complex POJO (with nexted objects) that I'd like
Hi to all,
I have a complex POJO (with nexted objects) that I'd like to write and read
with Flink (batch).
What is the simplest way to do that? I can't find any example of it :(
Best,
Flavio