Sean, would this work -- rdd.mapPartitions { partition => Iterator(partition) }.foreach(
// Some setup code here // save partition to DB // Some cleanup code here ) I tried a pretty simple example ... I can see that the setup and cleanup are executed on the executor node, once per partition (I used mapPartitionWithIndex instead of mapPartition to track this a little better). Seems like an easier solution than Tobias's but I'm wondering if it's perhaps incorrect On Mon, Aug 18, 2014 at 3:29 AM, Henry Hung <ythu...@winbond.com> wrote: > I slightly modify the code to use while(partitions.hasNext) { } instead of > partitions.map(func) > I suppose this can eliminate the uncertainty from lazy execution. > > -----Original Message----- > From: Sean Owen [mailto:so...@cloudera.com] > Sent: Monday, August 18, 2014 3:10 PM > To: MA33 YTHung1 > Cc: user@spark.apache.org > Subject: Re: a noob question for how to implement setup and cleanup in > Spark map > > I think this was a more comprehensive answer recently. Tobias is right > that it is not quite that simple: > > http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAPH-c_O9kQO6yJ4khXUVdO=+D4vj=JfG2tP9eqn5RPko=dr...@mail.gmail.com%3E > > On Mon, Aug 18, 2014 at 8:04 AM, Henry Hung <ythu...@winbond.com> wrote: > > Hi All, > > > > > > > > Please ignore my question, I found a way to implement it via old > > archive > > mails: > > > > > > > > http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3CCAF > > _KkPzpU4qZWzDWUpS5r9bbh=-hwnze2qqg56e25p--1wv...@mail.gmail.com%3E > > > > > > > > Best regards, > > > > Henry > > > > > > > > From: MA33 YTHung1 > > Sent: Monday, August 18, 2014 2:42 PM > > To: user@spark.apache.org > > Subject: a noob question for how to implement setup and cleanup in > > Spark map > > > > > > > > Hi All, > > > > > > > > I’m new to Spark and Scala, just recently using this language and love > > it, but there is a small coding problem when I want to convert my > > existing map reduce code from Java to Spark… > > > > > > > > In Java, I create a class by extending > > org.apache.hadoop.mapreduce.Mapper > > and override the setup(), map() and cleanup() methods. > > > > But in the Spark, there is no a method called setup(), so I write the > > setup() code into map(), but it performs badly. > > > > The reason is I create database connection in the setup() once and > > run() will execute SQL query, then cleanup() will close the connection. > > > > Could someone tell me how to do it in Spark? > > > > > > > > Best regards, > > > > Henry Hung > > > > > > > > ________________________________ > > > > The privileged confidential information contained in this email is > > intended for use only by the addressees as indicated by the original > > sender of this email. If you are not the addressee indicated in this > > email or are not responsible for delivery of the email to such a > > person, please kindly reply to the sender indicating this fact and > > delete all copies of it from your computer and network server > > immediately. Your cooperation is highly appreciated. It is advised > > that any unauthorized use of confidential information of Winbond is > > strictly prohibited; and any information in this email irrelevant to > > the official business of Winbond shall be deemed as neither given nor > endorsed by Winbond. > > > > > > ________________________________ > > The privileged confidential information contained in this email is > > intended for use only by the addressees as indicated by the original > > sender of this email. If you are not the addressee indicated in this > > email or are not responsible for delivery of the email to such a > > person, please kindly reply to the sender indicating this fact and > > delete all copies of it from your computer and network server > > immediately. Your cooperation is highly appreciated. It is advised > > that any unauthorized use of confidential information of Winbond is > > strictly prohibited; and any information in this email irrelevant to > > the official business of Winbond shall be deemed as neither given nor > endorsed by Winbond. > > The privileged confidential information contained in this email is > intended for use only by the addressees as indicated by the original sender > of this email. If you are not the addressee indicated in this email or are > not responsible for delivery of the email to such a person, please kindly > reply to the sender indicating this fact and delete all copies of it from > your computer and network server immediately. Your cooperation is highly > appreciated. It is advised that any unauthorized use of confidential > information of Winbond is strictly prohibited; and any information in this > email irrelevant to the official business of Winbond shall be deemed as > neither given nor endorsed by Winbond. >