Re: low performance in running queries

2019-11-04 Thread Piotr Nowojski
Hi, Unfortunately your VisualVM snapshot doesn’t contain the profiler output. It should look like this [1]. > Checking the timeline of execution shows that the source operation is done in > less than a second while Map and Reduce operations take long running time. It could well be that the ove

Re: low performance in running queries

2019-11-04 Thread Habib Mostafaei
Hi, On 11/1/2019 4:40 PM, Piotr Nowojski wrote: Hi, More important would be the code profiling output. I think VisualVM allows to share the code profiling result as “snapshots”? If you could analyse or share this, it would be helpful. Enclosed is a snapshot of VisualVM. From the attached s

Re: low performance in running queries

2019-11-03 Thread Zhenghua Gao
Hi, I ran the streaming WordCount with a 2GB text file(copied /usr/share/dict/words 400 times) last weekend and didn't reproduce your result(16 minutes in my case). But i find some clues may help you: The streaming WordCount job would output all intermedia result in your output file(if specified)

Re: low performance in running queries

2019-11-01 Thread Piotr Nowojski
Hi, More important would be the code profiling output. I think VisualVM allows to share the code profiling result as “snapshots”? If you could analyse or share this, it would be helpful. From the attached screenshot the only thing that is visible is that there are no GC issues, and secondly th

Re: low performance in running queries

2019-11-01 Thread Habib Mostafaei
Hi Piotrek, Thanks for the list of profilers. I used VisualVM and here is the resource usage for taskManager. Habib On 11/1/2019 9:48 AM, Piotr Nowojski wrote: Hi, >  Is there a simple way to get profiling information in Flink? Flink doesn’t provide any special tooling for that. Just use

Re: low performance in running queries

2019-11-01 Thread Piotr Nowojski
Hi, > Is there a simple way to get profiling information in Flink? Flink doesn’t provide any special tooling for that. Just use your chosen profiler, for example: Oracle’s Mission Control (free on non production clusters, no need to install anything if already using Oracle’s JVM), VisualVM (I

Re: low performance in running queries

2019-11-01 Thread Habib Mostafaei
I used streaming WordCount provided by Flink and the file contains text like "This is some text...". I just copied several times. Best, Habib On 11/1/2019 6:03 AM, Zhenghua Gao wrote: 2019-10-30 15:59:52,122 INFO org.apache.flink.runtime.taskmanager.Task - Split Reader:

Re: low performance in running queries

2019-10-31 Thread Zhenghua Gao
2019-10-30 15:59:52,122 INFO org.apache.flink.runtime.taskmanager.Task - Split Reader: Custom File Source -> Flat Map (1/1) (6a17c410c3e36f524bb774d2dffed4a4) switched from DEPLOYING to RUNNING. 2019-10-30 17:45:10,943 INFO org.apache.flink.runtime.taskmanager.Task

Re: low performance in running queries

2019-10-31 Thread Habib Mostafaei
I enclosed all logs from the run and for this run I used parallelism one. However, for other runs I checked and found that all parallel workers were working properly. Is there a simple way to get profiling information in Flink? Best, Habib On 10/31/2019 2:54 AM, Zhenghua Gao wrote: I think

Re: low performance in running queries

2019-10-30 Thread Zhenghua Gao
I think more runtime information would help figure out where the problem is. 1) how many parallelisms actually working 2) the metrics for each operator 3) the jvm profiling information, etc *Best Regards,* *Zhenghua Gao* On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei wrote: > Thanks Gao for t

Re: low performance in running queries

2019-10-30 Thread Piotr Nowojski
;Georgios > Smaragdakis" <mailto:georg...@inet.tu-berlin.de>>; "Niklas Semmler" > mailto:nik...@inet.tu-berlin.de>> > Sent: 30/10/2019 12:25:28 > Subject: Re: low performance in running queries > >> Thanks Gao for the reply. I used the parallelis

Re: low performance in running queries

2019-10-30 Thread Chris Miller
To: "Zhenghua Gao" Cc: "user" ; "Georgios Smaragdakis" ; "Niklas Semmler" Sent: 30/10/2019 12:25:28 Subject: Re: low performance in running queries Thanks Gao for the reply. I used the parallelism parameter with different values like 6 and 8 but still the

Re: low performance in running queries

2019-10-30 Thread Zhenghua Gao
The reason might be the parallelism of your task is only 1, that's too low. See [1] to specify proper parallelism for your job, and the execution time should be reduced significantly. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html *Best Regards,* *Zhenghua Gao* On

Re: low performance in running queries

2019-10-30 Thread Habib Mostafaei
Thanks Gao for the reply. I used the parallelism parameter with different values like 6 and 8 but still the execution time is not comparable with a single threaded python script. What would be the reasonable value for the parallelism? Best, Habib On 10/30/2019 1:17 PM, Zhenghua Gao wrote: Th

low performance in running queries

2019-10-29 Thread Habib Mostafaei
Hi all, I am running Flink on a standalone cluster and getting very long execution time for the streaming queries like WordCount for a fixed text file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I have a text file with size of 2GB. When I run the Flink on a standalone cluste