hi, consider transfer dataframe to rdd and then use* rdd.toLocalIterator *to collect data on the driver node.
On Fri, Jul 15, 2016 at 9:05 AM, Pedro Rodriguez <ski.rodrig...@gmail.com> wrote: > Out of curiosity, is there a way to pull all the data back to the driver > to save without collect()? That is, stream the data in chunks back to the > driver so that maximum memory used comparable to a single node’s data, but > all the data is saved on one node. > > — > Pedro Rodriguez > PhD Student in Large-Scale Machine Learning | CU Boulder > Systems Oriented Data Scientist > UC Berkeley AMPLab Alumni > > pedrorodriguez.io | 909-353-4423 > github.com/EntilZha | LinkedIn > <https://www.linkedin.com/in/pedrorodriguezscience> > > On July 14, 2016 at 6:02:12 PM, Jacek Laskowski (ja...@japila.pl) wrote: > > Hi, > > Please re-consider your wish since it is going to move all the > distributed dataset to the single machine of the driver and may lead > to OOME. It's more pro to save your result to HDFS or S3 or any other > distributed filesystem (that is accessible by the driver and > executors). > > If you insist... > > Use collect() after select() and work with Array[T]. > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Fri, Jul 15, 2016 at 12:15 AM, vr.n. nachiappan > <nachiappan_...@yahoo.com.invalid> wrote: > > Hello, > > > > I am using data frames to join two cassandra tables. > > > > Currently when i invoke save on data frames as shown below it is saving > the > > join results on executor nodes. > > > > joineddataframe.select(<col1>, <col2> > > ...).format("com.databricks.spark.csv").option("header", > > "true").save(<path>) > > > > I would like to persist the results of the join on Spark Master/Driver > node. > > Is it possible to save the results on Spark Master/Driver and how to do > it. > > > > I appreciate your help. > > > > Nachi > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- *___________________* Quant | Engineer | Boy *___________________* *blog*: http://litaotao.github.io <http://litaotao.github.io?utm_source=spark_mail> *github*: www.github.com/litaotao