Re: best practices for pushing an RDD into a database

Christopher Nguyen Thu, 13 Mar 2014 20:48:06 -0700

Nicholas,

> (Can we make that a thing? Let's make that a thing. :)


Yes, we're soon releasing something called Distributed DataFrame (DDF) to
the community that will make this (among other useful idioms) "a
(straightforward) thing" for Spark.

Sent while mobile. Pls excuse typos etc.
On Mar 13, 2014 2:05 PM, "Nicholas Chammas" <nicholas.cham...@gmail.com>
wrote:

> My fellow welders<https://www.google.com/search?q=welding+sparks&tbm=isch>
> ,
>
> (Can we make that a thing? Let's make that a thing. :)
>
> I'm trying to wedge Spark into an existing model where we process and
> transform some data and then load it into an MPP database. I know that part
> of the sell of Spark and Shark is that you shouldn't have to copy data
> around like this, so please bear with me. :)
>
> Say I have an RDD of about 10GB in size that's cached in memory. What is
> the best/fastest way to push that data into an MPP database like 
> Redshift<http://aws.amazon.com/redshift/>?
> Has anyone done something like this?
>
> I'm assuming that pushing the data straight from memory into the database
> is much faster than writing the RDD to HDFS and then COPY-ing it from there
> into the database.
>
> Is there, for example, a way to perform a bulk load into the database that
> runs on each partition of the in-memory RDD in parallel?
>
> Nick
>
>
> ------------------------------
> View this message in context: best practices for pushing an RDD into a
> database<http://apache-spark-user-list.1001560.n3.nabble.com/best-practices-for-pushing-an-RDD-into-a-database-tp2681.html>
> Sent from the Apache Spark User List mailing list 
> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>

Re: best practices for pushing an RDD into a database

Reply via email to