I think more runtime information would help figure out where the problem is.
1) how many parallelisms actually working
2) the metrics for each operator
3) the jvm profiling information, etc

*Best Regards,*
*Zhenghua Gao*


On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
wrote:

> Thanks Gao for the reply. I used the parallelism parameter with different
> values like 6 and 8 but still the execution time is not comparable with a
> single threaded python script. What would be the reasonable value for the
> parallelism?
>
> Best,
>
> Habib
> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>
> The reason might be the parallelism of your task is only 1, that's too
> low.
> See [1] to specify proper parallelism  for your job, and the execution
> time should be reduced significantly.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
> wrote:
>
>> Hi all,
>>
>> I am running Flink on a standalone cluster and getting very long
>> execution time for the streaming queries like WordCount for a fixed text
>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
>> have a text file with size of 2GB. When I run the Flink on a standalone
>> cluster, i.e., one JobManager and one taskManager with 25GB of heapsize,
>> it took around two hours to finish counting this file while a simple
>> python script can do it in around 7 minutes. Just wondering what is
>> wrong with my setup. I ran the experiments on a cluster with six
>> taskManagers, but I still get very long execution time like 25 minutes
>> or so. I tried to increase the JVM heap size to have lower execution
>> time but it did not help. I attached the log file and the Flink
>> configuration file to this email.
>>
>> Best,
>>
>> Habib
>>
>> --
> Habib Mostafaei, Ph.D.
> Postdoctoral researcher
> TU Berlin,
> FG INET, MAR 4.003
> Marchstraße 23, 10587 Berlin
>
>

Reply via email to