Hi Ivan,
Just to add up to chaining: When splitting the map into two parts, objects
need to be copied from one operator to the chained operator. Since your
objects are very heavy that can take quite long, especially if you don't
have a specific serializer configured but rely on Kryo.
You can avoi
Generally there should be no difference.
Can you check whether the maps are running as a chain (as a single task)?
If they are running in a chain, then I would suspect that /something/
else is skewing your results.
If not, then the added network/serialization pressure would explain it.
I will a
Hi,
We have a Flink job that reads data from an input stream, then converts each
event from JSON string Avro object, finally writes to parquet files using
StreamingFileSink with OnCheckPointRollingPolicy of 5 mins. Basically a
stateless job. Initially, we use one map operator to convert Json st
Task Managers
>
> 21
>
> Task Slots
>
> 20
>
> Available Task Slots
>
>
>
>
>
> Best regards,
>
> Serhiy.
>
>
>
> *From:* Robert Metzger [mailto:rmetz...@apache.org]
> *Sent:* 13 May 2016 15:26
> *To:* user@flink.apache.org
> *Su
eing occupied. Something I am doing is wrong..
3
Task Managers
21
Task Slots
20
Available Task Slots
Best regards,
Serhiy.
From: Robert Metzger [mailto:rmetz...@apache.org]
Sent: 13 May 2016 15:26
To: user@flink.apache.org
Subject: Re: Flink performance tuning
Hi,
Can you try running the job with
One issue may be that the selection of YARN containers is not HDFS locality
aware here.
Hence, Flink may read more splits remotely, where MR reads more splits
locally.
On Fri, May 13, 2016 at 3:25 PM, Robert Metzger wrote:
> Hi,
>
> Can you try running the job with 8 slots, 7 GB (maybe you need
Hi,
Can you try running the job with 8 slots, 7 GB (maybe you need to go down
to 6 GB) and only three TaskManagers (-n 3) ?
I'm suggesting this, because you have many small JVMs running on your
machines. On such small machines you can probably get much more use out of
your available memory by run
Hey,
I have successfully integrated Flink into our very small test cluster (3
machines with 8 cores, 8GBytes of memory and 2x1TB disks). Basically I am
started the session to use YARN as RM and the data is being read from HDFS.
/yarn-session.sh -n 21 -s 1 -jm 1024 -tm 1024
My code is very simpl