The reason might be the parallelism of your task is only 1, that's too low. See [1] to specify proper parallelism for your job, and the execution time should be reduced significantly.
[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html *Best Regards,* *Zhenghua Gao* On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <ha...@inet.tu-berlin.de> wrote: > Hi all, > > I am running Flink on a standalone cluster and getting very long > execution time for the streaming queries like WordCount for a fixed text > file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I > have a text file with size of 2GB. When I run the Flink on a standalone > cluster, i.e., one JobManager and one taskManager with 25GB of heapsize, > it took around two hours to finish counting this file while a simple > python script can do it in around 7 minutes. Just wondering what is > wrong with my setup. I ran the experiments on a cluster with six > taskManagers, but I still get very long execution time like 25 minutes > or so. I tried to increase the JVM heap size to have lower execution > time but it did not help. I attached the log file and the Flink > configuration file to this email. > > Best, > > Habib > >