subject:"Re\: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe"

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

2016-05-13 Thread Amit Sela

Taking it to a more basic level, I compared between a simple transformation with RDDs and with Datasets. This is far simpler than Renato's use case and this brungs up two good question: 1. Is the time it takes to "spin-up" a standalone instance of Spark(SQL) is just an additional one-time overhead

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

2016-05-12 Thread Renato Marroquín Mogrovejo

Hi Amit, This is very interesting indeed because I have got similar resutls. I tried doing a filtter + groupBy using DataSet with a function, and using the inner RDD of the DF(RDD[row]). I used the inner RDD of a DataFrame because apparently there is no straight-forward way to create an RDD of Par

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

2016-05-11 Thread Amit Sela

Some how missed that ;) Anything about Datasets slowness ? On Wed, May 11, 2016, 21:02 Ted Yu wrote: > Which release are you using ? > > You can use the following to disable UI: > --conf spark.ui.enabled=false > > On Wed, May 11, 2016 at 10:59 AM, Amit Sela wrote: > >> I've ran a simple WordCou

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

2016-05-11 Thread Ted Yu

Which release are you using ? You can use the following to disable UI: --conf spark.ui.enabled=false On Wed, May 11, 2016 at 10:59 AM, Amit Sela wrote: > I've ran a simple WordCount example with a very small List as > input lines and ran it in standalone (local[*]), and Datasets is very slow..