Re: Saving data frames on Spark Master/Driver

Taotao.Li Thu, 14 Jul 2016 22:14:26 -0700

hi, consider transfer dataframe to rdd and then use* rdd.toLocalIterator *to
collect data on the driver node.


On Fri, Jul 15, 2016 at 9:05 AM, Pedro Rodriguez <ski.rodrig...@gmail.com>
wrote:

> Out of curiosity, is there a way to pull all the data back to the driver
> to save without collect()? That is, stream the data in chunks back to the
> driver so that maximum memory used comparable to a single node’s data, but
> all the data is saved on one node.
>
> —
> Pedro Rodriguez
> PhD Student in Large-Scale Machine Learning | CU Boulder
> Systems Oriented Data Scientist
> UC Berkeley AMPLab Alumni
>
> pedrorodriguez.io | 909-353-4423
> github.com/EntilZha | LinkedIn
> <https://www.linkedin.com/in/pedrorodriguezscience>
>
> On July 14, 2016 at 6:02:12 PM, Jacek Laskowski (ja...@japila.pl) wrote:
>
> Hi,
>
> Please re-consider your wish since it is going to move all the
> distributed dataset to the single machine of the driver and may lead
> to OOME. It's more pro to save your result to HDFS or S3 or any other
> distributed filesystem (that is accessible by the driver and
> executors).
>
> If you insist...
>
> Use collect() after select() and work with Array[T].
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Jul 15, 2016 at 12:15 AM, vr.n. nachiappan
> <nachiappan_...@yahoo.com.invalid> wrote:
> > Hello,
> >
> > I am using data frames to join two cassandra tables.
> >
> > Currently when i invoke save on data frames as shown below it is saving
> the
> > join results on executor nodes.
> >
> > joineddataframe.select(<col1>, <col2>
> > ...).format("com.databricks.spark.csv").option("header",
> > "true").save(<path>)
> >
> > I would like to persist the results of the join on Spark Master/Driver
> node.
> > Is it possible to save the results on Spark Master/Driver and how to do
> it.
> >
> > I appreciate your help.
> >
> > Nachi
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
*___________________*
Quant | Engineer | Boy
*___________________*
*blog*:    http://litaotao.github.io
<http://litaotao.github.io?utm_source=spark_mail>
*github*: www.github.com/litaotao

Re: Saving data frames on Spark Master/Driver

Reply via email to