I have the following code:

stream foreachRDD { rdd =>
                if (rdd.take (1).size == 1) {
                    rdd foreachPartition { iterator =>
                        initDbConnection ()
                        iterator foreach {
                            write to db
                        }
                        closeDbConnection ()
                    }
                }
            }

On Sun, Sep 7, 2014 at 1:26 PM, Sean Owen <so...@cloudera.com> wrote:

> ... I'd call out that last bit as actually tricky: "close off the driver"
>
> See this message for the right-est way to do that, along with the
> right way to open DB connections remotely instead of trying to
> serialize them:
>
>
> http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAPH-c_O9kQO6yJ4khXUVdO=+D4vj=JfG2tP9eqn5RPko=dr...@mail.gmail.com%3E
>
> On Sun, Sep 7, 2014 at 4:19 PM, Mayur Rustagi <mayur.rust...@gmail.com>
> wrote:
> > Standard pattern is to initialize the mysql jdbc driver in your
> mappartition
> > call , update database & then close off the driver.
> > Couple of gotchas
> > 1. New driver initiated for all your partitions
> > 2. If the effect(inserts & updates) is not idempotent, so if your server
> > crashes, Spark will replay updates to mysql & may cause data corruption.
> >
> >
> > Regards
> > Mayur
> >
> > Mayur Rustagi
> > Ph: +1 (760) 203 3257
> > http://www.sigmoidanalytics.com
> > @mayur_rustagi
> >
> >
> > On Sun, Sep 7, 2014 at 11:54 AM, jchen <jc...@pivotal.io> wrote:
> >>
> >> Hi,
> >>
> >> Has someone tried using Spark Streaming with MySQL (or any other
> >> database/data store)? I can write to MySQL at the beginning of the
> driver
> >> application. However, when I am trying to write the result of every
> >> streaming processing window to MySQL, it fails with the following error:
> >>
> >> org.apache.spark.SparkException: Job aborted due to stage failure: Task
> >> not
> >> serializable: java.io.NotSerializableException:
> >> com.mysql.jdbc.JDBC4PreparedStatement
> >>
> >> I think it is because the statement object should be serializable, in
> >> order
> >> to be executed on the worker node. Has someone tried the similar cases?
> >> Example code will be very helpful. My intension is to execute
> >> INSERT/UPDATE/DELETE/SELECT statements for each sliding window.
> >>
> >> Thanks,
> >> JC
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-database-access-e-g-MySQL-tp13644.html
> >> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to