> "For csv reading, i deliberately did not use csv reader since i want to run
> same code across spark and flink."
>
> If your objective deviates from writing and running the fastest Spark and
> fastest Flink programs, then your comparison is worthless.
Well, I don't really agree with this. I woul
"For csv reading, i deliberately did not use csv reader since i want to run
same code across spark and flink."
If your objective deviates from writing and running the fastest Spark and
fastest Flink programs, then your comparison is worthless.
On Fri, Nov 18, 2016 at 5:37 AM, CPC wrote:
> Hi G
Thank you Flavio. I will generate flamegraph for flink and compare them.
On 18 November 2016 at 13:43, Flavio Pompermaier
wrote:
> I think this could be very helpful for your study:
>
> http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance-
> improvements-investigated-flame-gr
I think this could be very helpful for your study:
http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance-improvements-investigated-flame-graphs
Best,
Flavio
On Fri, Nov 18, 2016 at 11:37 AM, CPC wrote:
> Hi Gabor,
>
> Thank you for your kind response. I forget to mention tha
Hi Gabor,
Thank you for your kind response. I forget to mention that i have actually
three workers. This is why i set default paralelism to 6.
For csv reading, i deliberately did not use csv reader since i want to run
same code across spark and flink. Collect is returning 40k records which is
not
Hello,
Your program looks mostly fine, but there are a few minor things that
might help a bit:
Parallelism: In your attached flink-conf.yaml, you have 2 task slots
per task manager, and if you have 1 task manager, then your total
number of task slots is also 2. However, your default parallelism i
Hi all,
In the mean time i have three workers. Any thoughts about improving flink
performance?
Thank you...
On Nov 17, 2016 00:38, "CPC" wrote:
> Hi all,
>
> I am trying to compare spark and flink batch performance. In my test i am
> using ratings.csv in http://files.grouplens.org/
> datasets/
Hi all,
I am trying to compare spark and flink batch performance. In my test i am
using ratings.csv in
http://files.grouplens.org/datasets/movielens/ml-latest.zip dataset. I
also concatenated ratings.csv 16 times to increase dataset size(total of
390465536 records almost 10gb).I am reading from go