Spark RDS data insertion

Bill Milan Thu, 25 Jun 2015 16:05:46 -0700

Hi all,

I am running a program which connects to Amazon RDS and generate some data
from S3 into RDD. When I run rdd.collect and insert the results into RDS
using JDBC, I get "communication link failure". I tried to insert results
into RDS using both python and mysql client in the master machine and
everything went well. However, when I used Spark, the insertion was not
successful. My questions are:



1) When I establish connection with RDS before RDD is generated, is this
done in master?

2) When I calll rdd.collect, is the returned array in master or slave nodes?

3) When I insert the results of rdd.collect, where does the insertion
happen?

Thanks!

Bill

Spark RDS data insertion

Reply via email to