Re: low performance in running queries

Chris Miller Wed, 30 Oct 2019 07:12:34 -0700

I haven't run any benchmarks with Flink or even used it enough todirectly help with your question, however I suspect that the followingarticle might be relevant:


http://dsrg.pdos.csail.mit.edu/2016/06/26/scalability-cost/

Given the computation you're performing is trivial, it's possible thatthe additional overhead of serialisation, interprocess communication,state management etc that distributed systems like Flink require aredominating the runtime here. 2 hours (or even 25 minutes) still seemstoo long to me however, so hopefully it really is just a configurationissue of some sort. Either way, if you do figure this out or anyone withgood knowledge of the article above in relation to Flink is able to givetheir thoughts, I'd be very interested in hearing more.


Regards,
Chris


------ Original Message ------
From: "Habib Mostafaei" <ha...@inet.tu-berlin.de>
To: "Zhenghua Gao" <doc...@gmail.com>

Cc: "user" <user@flink.apache.org>; "Georgios Smaragdakis"<georg...@inet.tu-berlin.de>; "Niklas Semmler"<nik...@inet.tu-berlin.de>

Sent: 30/10/2019 12:25:28
Subject: Re: low performance in running queries

Thanks Gao for the reply. I used the parallelism parameter withdifferent values like 6 and 8 but still the execution time is notcomparable with a single threaded python script. What would be thereasonable value for the parallelism?
Best,

Habib

On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
The reason might be the parallelism of your task is only 1, that's toolow.See [1] to specify proper parallelism for your job, and the executiontime should be reduced significantly.
[1]https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
Best Regards,
Zhenghua Gao
On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei<ha...@inet.tu-berlin.de> wrote:
Hi all,

I am running Flink on a standalone cluster and getting very long
execution time for the streaming queries like WordCount for a fixedtext
file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
have a text file with size of 2GB. When I run the Flink on astandalonecluster, i.e., one JobManager and one taskManager with 25GB ofheapsize,
it took around two hours to finish counting this file while a simple
python script can do it in around 7 minutes. Just wondering what is
wrong with my setup. I ran the experiments on a cluster with six
taskManagers, but I still get very long execution time like 25minutes
or so. I tried to increase the JVM heap size to have lower execution
time but it did not help. I attached the log file and the Flink
configuration file to this email.

Best,

Habib
--
Habib Mostafaei, Ph.D.
Postdoctoral researcher
TU Berlin,
FG INET, MAR 4.003
Marchstraße 23, 10587 Berlin

Re: low performance in running queries

Reply via email to