*Sent:* 29 July 2016 13:41
> *To:* Gourav Sengupta
> *Cc:* user
> *Subject:* Re: how to save spark files as parquets efficiently
>
>
>
> Hey Gourav,
>
>
>
> Well so I think that it is my execution plan that is at fault. So
> basically df.write as a spark job
.
Thanks,
Ewan
From: Sumit Khanna [mailto:sumit.kha...@askme.in]
Sent: 29 July 2016 13:41
To: Gourav Sengupta
Cc: user
Subject: Re: how to save spark files as parquets efficiently
Hey Gourav,
Well so I think that it is my execution plan that is at fault. So basically
df.write as a spark job on
Hey Gourav,
Well so I think that it is my execution plan that is at fault. So basically
df.write as a spark job on localhost:4040/ well being an action will
include the time taken for all the umpteen transformation on it right? All
I wanted to know is "what apt env/config params are needed to some
Hi,
The default write format in SPARK is parquet. And I have never faced any
issues writing over a billion records in SPARK. Are you using
virtualization by any chance or an obsolete hard disk or Intel Celeron may
be?
Regards,
Gourav Sengupta
On Fri, Jul 29, 2016 at 7:27 AM, Sumit Khanna wrote:
Hey,
So I believe this is the right format to save the file, as in optimization
is never in the write part, but with the head / body of my execution plan
isnt it?
Thanks,
On Fri, Jul 29, 2016 at 11:57 AM, Sumit Khanna
wrote:
> Hey,
>
> master=yarn
> mode=cluster
>
> spark.executor.memory=8g
>