subject:"Re\: same parallelism with different taskmanager and slots, skew occurs"

Re: same parallelism with different taskmanager and slots, skew occurs

2019-01-04 Thread Till Rohrmann

Hi, could you tell me how exactly you started the cluster and with which parameters (configured memory, maybe vcores, etc.)? Cheers, Till On Thu, Jan 3, 2019 at 2:37 AM varuy322 wrote: > Hi, Till > It's very kind of your reply. I got your point, I'm sorry to not make it > clear about my issue.

Re: same parallelism with different taskmanager and slots, skew occurs

2019-01-02 Thread varuy322

Hi, Till It's very kind of your reply. I got your point, I'm sorry to not make it clear about my issue. I generated data by streaming benchmark just as the link: https://github.com/dataArtisans/databricks-benchmark/blob/master/src/main/scala/com/databricks/benchmark/flink/EventGenerator.scala . Wh

Re: same parallelism with different taskmanager and slots, skew occurs

2019-01-02 Thread Till Rohrmann

Hi Rui, such a situation can occur if you have data skew in your data set (differently sized partitions if you key by some key). Assume you have 2 TMs with 2 slots each and you key your data by some key x. The partition assignment could look like: TM1: slot_1 = Partition_1, slot_2 = Partition_2 T