subject:"spark vs flink batch performance"

Re: spark vs flink batch performance

2016-11-18 Thread Gábor Gévay

> "For csv reading, i deliberately did not use csv reader since i want to run > same code across spark and flink." > > If your objective deviates from writing and running the fastest Spark and > fastest Flink programs, then your comparison is worthless. Well, I don't really agree with this. I woul

Re: spark vs flink batch performance

2016-11-18 Thread Greg Hogan

"For csv reading, i deliberately did not use csv reader since i want to run same code across spark and flink." If your objective deviates from writing and running the fastest Spark and fastest Flink programs, then your comparison is worthless. On Fri, Nov 18, 2016 at 5:37 AM, CPC wrote: > Hi G

Re: spark vs flink batch performance

2016-11-18 Thread CPC

Thank you Flavio. I will generate flamegraph for flink and compare them. On 18 November 2016 at 13:43, Flavio Pompermaier wrote: > I think this could be very helpful for your study: > > http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance- > improvements-investigated-flame-gr

Re: spark vs flink batch performance

2016-11-18 Thread Flavio Pompermaier

I think this could be very helpful for your study: http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance-improvements-investigated-flame-graphs Best, Flavio On Fri, Nov 18, 2016 at 11:37 AM, CPC wrote: > Hi Gabor, > > Thank you for your kind response. I forget to mention tha

Re: spark vs flink batch performance

2016-11-18 Thread CPC

Hi Gabor, Thank you for your kind response. I forget to mention that i have actually three workers. This is why i set default paralelism to 6. For csv reading, i deliberately did not use csv reader since i want to run same code across spark and flink. Collect is returning 40k records which is not

Re: spark vs flink batch performance

2016-11-18 Thread Gábor Gévay

Hello, Your program looks mostly fine, but there are a few minor things that might help a bit: Parallelism: In your attached flink-conf.yaml, you have 2 task slots per task manager, and if you have 1 task manager, then your total number of task slots is also 2. However, your default parallelism i

Re: spark vs flink batch performance

2016-11-17 Thread CPC

Hi all, In the mean time i have three workers. Any thoughts about improving flink performance? Thank you... On Nov 17, 2016 00:38, "CPC" wrote: > Hi all, > > I am trying to compare spark and flink batch performance. In my test i am > using ratings.csv in http://files.grouplens.org/ > datasets/

spark vs flink batch performance

2016-11-16 Thread CPC

Hi all, I am trying to compare spark and flink batch performance. In my test i am using ratings.csv in http://files.grouplens.org/datasets/movielens/ml-latest.zip dataset. I also concatenated ratings.csv 16 times to increase dataset size(total of 390465536 records almost 10gb).I am reading from go

Re: spark vs flink batch performance

Re: spark vs flink batch performance

Re: spark vs flink batch performance

Re: spark vs flink batch performance

Re: spark vs flink batch performance

Re: spark vs flink batch performance

Re: spark vs flink batch performance

spark vs flink batch performance

8 matches

Site Navigation

Mail list logo

Footer information