I think that's independent of the serializer registration.
What's important is registering the types at the execution environment.
On Fri, Feb 24, 2017 at 7:06 PM, Dmitry Golubets
wrote:
> Hi Robert,
>
> The bottleneck operator is working with a state (many hash maps basically)
> and it's algor
Hi Robert,
The bottleneck operator is working with a state (many hash maps basically)
and it's algorithm is not parallelizeable.
We took an approach of preloading all required data from external systems,
so that operators don't have to do any network communication during a
data-record processing (
Hi Dmitry,
Cool! Looks like you've taken the right approach to analyze the performance
issues!
Often the deserialization of the input is already a performance killer :)
What is this one operator that is the bottleneck doing?
Does it have a lot of state? Is it CPU intensive, or talking to an exter
Hi Robert,
In dev environment I load data via zipped csv files from s3.
Data is parsed in a case classes.
It's quite fast, I'm able to get ~80k/sec with only source and "dev/null"
sink.
Checkpointing is enabled with 1 hour intervals.
Yes, one of the operators is a bottleneck and it backpressures
Hi Dmitry,
sorry for the late response.
Where are you reading the data from?
Did you check if one operator is causing backpressure?
Are you using checkpointing?
Serialization is often the cause for slow processing. However, its very
hard to diagnose potential other causes without any details on
Hi Daniel,
I've implemented a macro that generates message pack serializers in our
codebase.
Resulting code is basically a series of writes\reads like in hand-written
structured serialization.
E.g. given
case class Data1(str: String, subdata: Data2)
case class Data2(num: Int)
serialization code
Hello Dimitry,
Could you please elaborate on your tuning on ->
environment.addDefaultKryoSerializer(..) .
I'm interested on knowing what have you done there for a boost of about
50% .
Some small or simple example would be very nice.
Thank you very much in advance.
Kind Regards,
Daniel Sa
One network setting is mentioned here:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html#controlling-latency
From: Dmitry Golubets mailto:dgolub...@gmail.com>>
Date: Friday, February 17, 2017 at 6:43 AM
To: mailto:user@flink.apache.org>>
Subject: Performance tun