I am using SPARK 2.0 . There are comments in the ticket since Oct-2016 which clearly mention that issue still persists even in 2.0. I agree 1G is very small today's world, and I have already resolved by increasing the *spark.driver.maxResultSize.* I was more intrigued as to why is the data being sent to driver during save(similat to collect() action ), are there any plans to fix this behavior/issue ?
Thanks, Baahu On Fri, Mar 17, 2017 at 8:17 AM, Yong Zhang <java8...@hotmail.com> wrote: > Did you read the JIRA ticket? Are you confirming that it is fixed in Spark > 2.0, or you complain that it still exists in Spark 2.0? > > > First, you didn't tell us what version of your Spark you are using. The > JIRA clearly said that it is a bug in Spark 1.x, and should be fixed in > Spark 2.0. So help yourself and the community, to confirm if this is the > case. > > > If you are looking for workaround, the JIRA ticket clearly show you how to > increase your driver heap. 1G in today's world really is kind of small. > > > Yong > > > ------------------------------ > *From:* Bahubali Jain <bahub...@gmail.com> > *Sent:* Thursday, March 16, 2017 10:34 PM > *To:* Yong Zhang > *Cc:* user@spark.apache.org > *Subject:* Re: Dataset : Issue with Save > > Hi, > Was this not yet resolved? > Its a very common requirement to save a dataframe, is there a better way > to save a dataframe by avoiding data being sent to driver?. > > > * "Total size of serialized results of 3722 tasks (1024.0 MB) is bigger > than spark.driver.maxResultSize (1024.0 MB) " * > Thanks, > Baahu > > On Fri, Mar 17, 2017 at 1:19 AM, Yong Zhang <java8...@hotmail.com> wrote: > >> You can take a look of https://issues.apache.org/jira/browse/SPARK-12837 >> >> >> Yong >> Spark driver requires large memory space for serialized ... >> <https://issues.apache.org/jira/browse/SPARK-12837> >> issues.apache.org >> Executing a sql statement with a large number of partitions requires a >> high memory space for the driver even there are no requests to collect data >> back to the driver. >> >> >> >> ------------------------------ >> *From:* Bahubali Jain <bahub...@gmail.com> >> *Sent:* Thursday, March 16, 2017 1:39 PM >> *To:* user@spark.apache.org >> *Subject:* Dataset : Issue with Save >> >> Hi, >> While saving a dataset using * >> mydataset.write().csv("outputlocation") * I am running >> into an exception >> >> >> >> * "Total size of serialized results of 3722 tasks (1024.0 MB) is bigger >> than spark.driver.maxResultSize (1024.0 MB)" * >> Does it mean that for saving a dataset whole of the dataset contents are >> being sent to driver ,similar to collect() action? >> >> Thanks, >> Baahu >> > > > > -- > Twitter:http://twitter.com/Baahu > > -- Twitter:http://twitter.com/Baahu