Re: low performance in running queries

Habib Mostafaei Wed, 30 Oct 2019 05:26:26 -0700

Thanks Gao for the reply. I used the parallelism parameter withdifferent values like 6 and 8 but still the execution time is notcomparable with a single threaded python script. What would be thereasonable value for the parallelism?


Best,


Habib

On 10/30/2019 1:17 PM, Zhenghua Gao wrote:

The reason might be the parallelism of your task is only 1, that's toolow.See [1] to specify proper parallelism for your job, and the executiontime should be reduced significantly.

[1]https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html


*Best Regards,*
*Zhenghua Gao*

On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei<ha...@inet.tu-berlin.de <mailto:ha...@inet.tu-berlin.de>> wrote:


    Hi all,

    I am running Flink on a standalone cluster and getting very long
    execution time for the streaming queries like WordCount for a
    fixed text
    file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
    have a text file with size of 2GB. When I run the Flink on a
    standalone
    cluster, i.e., one JobManager and one taskManager with 25GB of
    heapsize,
    it took around two hours to finish counting this file while a simple
    python script can do it in around 7 minutes. Just wondering what is
    wrong with my setup. I ran the experiments on a cluster with six
    taskManagers, but I still get very long execution time like 25
    minutes
    or so. I tried to increase the JVM heap size to have lower execution
    time but it did not help. I attached the log file and the Flink
    configuration file to this email.

    Best,

    Habib

--
Habib Mostafaei, Ph.D.
Postdoctoral researcher
TU Berlin,
FG INET, MAR 4.003
Marchstraße 23, 10587 Berlin

Re: low performance in running queries

Reply via email to