Re: Dataset : Issue with Save

Bahubali Jain Thu, 16 Mar 2017 20:42:22 -0700

I am using SPARK 2.0 . There are comments in the ticket since Oct-2016
which clearly mention that issue still persists even in 2.0.
I agree 1G is very small today's world, and I have already resolved by
increasing the
*spark.driver.maxResultSize.*
I was more intrigued as to why is the data being sent to driver during
save(similat to collect() action ), are there any plans to fix this
behavior/issue ?


Thanks,
Baahu

On Fri, Mar 17, 2017 at 8:17 AM, Yong Zhang <java8...@hotmail.com> wrote:

> Did you read the JIRA ticket? Are you confirming that it is fixed in Spark
> 2.0, or you complain that it still exists in Spark 2.0?
>
>
> First, you didn't tell us what version of your Spark you are using. The
> JIRA clearly said that it is a bug in Spark 1.x, and should be fixed in
> Spark 2.0. So help yourself and the community, to confirm if this is the
> case.
>
>
> If you are looking for workaround, the JIRA ticket clearly show you how to
> increase your driver heap. 1G in today's world really is kind of small.
>
>
> Yong
>
>
> ------------------------------
> *From:* Bahubali Jain <bahub...@gmail.com>
> *Sent:* Thursday, March 16, 2017 10:34 PM
> *To:* Yong Zhang
> *Cc:* user@spark.apache.org
> *Subject:* Re: Dataset : Issue with Save
>
> Hi,
> Was this not yet resolved?
> Its a very common requirement to save a dataframe, is there a better way
> to save a dataframe by avoiding data being sent to driver?.
>
>
> * "Total size of serialized results of 3722 tasks (1024.0 MB) is bigger
> than spark.driver.maxResultSize (1024.0 MB) " *
> Thanks,
> Baahu
>
> On Fri, Mar 17, 2017 at 1:19 AM, Yong Zhang <java8...@hotmail.com> wrote:
>
>> You can take a look of https://issues.apache.org/jira/browse/SPARK-12837
>>
>>
>> Yong
>> Spark driver requires large memory space for serialized ...
>> <https://issues.apache.org/jira/browse/SPARK-12837>
>> issues.apache.org
>> Executing a sql statement with a large number of partitions requires a
>> high memory space for the driver even there are no requests to collect data
>> back to the driver.
>>
>>
>>
>> ------------------------------
>> *From:* Bahubali Jain <bahub...@gmail.com>
>> *Sent:* Thursday, March 16, 2017 1:39 PM
>> *To:* user@spark.apache.org
>> *Subject:* Dataset : Issue with Save
>>
>> Hi,
>> While saving a dataset using       *
>> mydataset.write().csv("outputlocation")  *                 I am running
>> into an exception
>>
>>
>>
>> * "Total size of serialized results of 3722 tasks (1024.0 MB) is bigger
>> than spark.driver.maxResultSize (1024.0 MB)" *
>> Does it mean that for saving a dataset whole of the dataset contents are
>> being sent to driver ,similar to collect()  action?
>>
>> Thanks,
>> Baahu
>>
>
>
>
> --
> Twitter:http://twitter.com/Baahu
>
>


-- 
Twitter:http://twitter.com/Baahu

Re: Dataset : Issue with Save

Reply via email to