Re: Scaling Flink for batch jobs

Gorjan Todorovski Mon, 16 Aug 2021 01:42:31 -0700

Thanks, I'll check more about job tuning.

On Mon, 16 Aug 2021 at 06:28, Caizhi Weng <tsreape...@gmail.com> wrote:


> Hi!
>
> if I use parallelism of 2 or 4 - it takes the same time.
>>
> It might be that there is no data in some parallelisms. You can click on
> the nodes in Flink web UI and see if it is the case for each parallelism,
> or you can check out the metrics of each operator.
>
> if I don't increase parallelism and just run the job on a fixed number of
>> task slots, the job will fail (due to lack of memory on the task manager)or
>> it will just take longer time to process the data?
>>
> It depends on a lot of aspects, such as the type of source you are using,
> the type of operators you are running, etc. Ideally we hope it will just
> take longer but for some specific operators or connectors it might fail.
> This is where users have to tune their jobs.
>
> Gorjan Todorovski <gor...@gmail.com> 于2021年8月13日周五 下午6:48写道：
>
>> Hi!
>>
>> I want to implement a Flink cluster as a native Kubernetes session
>> cluster, with intention of executing Apache Beam jobs that will process
>> only batch data, but I am not sure I understand how I would scale the
>> cluster if I need to process large datasets.
>>
>> My understanding is that to be able to process a bigger dataset, you
>> could run it with higher parallelism, so the processing will be spread on
>> multiple task slots, which might run multiple nodes.
>> But running Beam jobs which actually in my case execute TensorFlow
>> Extended pipelines, I am not able to have control over partitioning over
>> some keys and I don't see any difference in throughput (the time it takes
>> to process specific dataset), if I use parallelism of 2 or 4 - it takes the
>> same time.
>>
>> Also, does it mean if I want to process a dataset of any size since the
>> execution is of type "PIPELINED", does this mean, if I don't increase
>> parallelism and just run the job on a fixed number of task slots, the job
>> will fail (due to lack of memory on the task manager)or it will just take
>> longer time to process the data?
>>
>> Thanks,
>> Gorjan
>>
>

Re: Scaling Flink for batch jobs

Reply via email to