Taking it to a more basic level, I compared between a simple transformation
with RDDs and with Datasets. This is far simpler than Renato's use case and
this brungs up two good question:
1. Is the time it takes to "spin-up" a standalone instance of Spark(SQL) is
just an additional one-time overhead
Hi Amit,
This is very interesting indeed because I have got similar resutls. I tried
doing a filtter + groupBy using DataSet with a function, and using the
inner RDD of the DF(RDD[row]). I used the inner RDD of a DataFrame because
apparently there is no straight-forward way to create an RDD of Par
Some how missed that ;)
Anything about Datasets slowness ?
On Wed, May 11, 2016, 21:02 Ted Yu wrote:
> Which release are you using ?
>
> You can use the following to disable UI:
> --conf spark.ui.enabled=false
>
> On Wed, May 11, 2016 at 10:59 AM, Amit Sela wrote:
>
>> I've ran a simple WordCou
Which release are you using ?
You can use the following to disable UI:
--conf spark.ui.enabled=false
On Wed, May 11, 2016 at 10:59 AM, Amit Sela wrote:
> I've ran a simple WordCount example with a very small List as
> input lines and ran it in standalone (local[*]), and Datasets is very slow..