Re: Spark Streaming and database access (e.g. MySQL)

Sean Owen Mon, 08 Sep 2014 00:41:32 -0700

That should be OK, since the iterator is definitely consumed, and
therefore the connection actually done with, at the end of a 'foreach'
method. You might put the close in a finally block.


On Mon, Sep 8, 2014 at 12:29 AM, Soumitra Kumar
<kumar.soumi...@gmail.com> wrote:
> I have the following code:
>
> stream foreachRDD { rdd =>
>                 if (rdd.take (1).size == 1) {
>                     rdd foreachPartition { iterator =>
>                         initDbConnection ()
>                         iterator foreach {
>                             write to db
>                         }
>                         closeDbConnection ()
>                     }
>                 }
>             }
>
> On Sun, Sep 7, 2014 at 1:26 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> ... I'd call out that last bit as actually tricky: "close off the driver"
>>
>> See this message for the right-est way to do that, along with the
>> right way to open DB connections remotely instead of trying to
>> serialize them:
>>
>>
>> http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAPH-c_O9kQO6yJ4khXUVdO=+D4vj=JfG2tP9eqn5RPko=dr...@mail.gmail.com%3E
>>
>> On Sun, Sep 7, 2014 at 4:19 PM, Mayur Rustagi <mayur.rust...@gmail.com>
>> wrote:
>> > Standard pattern is to initialize the mysql jdbc driver in your
>> > mappartition
>> > call , update database & then close off the driver.
>> > Couple of gotchas
>> > 1. New driver initiated for all your partitions
>> > 2. If the effect(inserts & updates) is not idempotent, so if your server
>> > crashes, Spark will replay updates to mysql & may cause data corruption.
>> >
>> >
>> > Regards
>> > Mayur
>> >
>> > Mayur Rustagi
>> > Ph: +1 (760) 203 3257
>> > http://www.sigmoidanalytics.com
>> > @mayur_rustagi
>> >
>> >
>> > On Sun, Sep 7, 2014 at 11:54 AM, jchen <jc...@pivotal.io> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Has someone tried using Spark Streaming with MySQL (or any other
>> >> database/data store)? I can write to MySQL at the beginning of the
>> >> driver
>> >> application. However, when I am trying to write the result of every
>> >> streaming processing window to MySQL, it fails with the following
>> >> error:
>> >>
>> >> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>> >> not
>> >> serializable: java.io.NotSerializableException:
>> >> com.mysql.jdbc.JDBC4PreparedStatement
>> >>
>> >> I think it is because the statement object should be serializable, in
>> >> order
>> >> to be executed on the worker node. Has someone tried the similar cases?
>> >> Example code will be very helpful. My intension is to execute
>> >> INSERT/UPDATE/DELETE/SELECT statements for each sliding window.
>> >>
>> >> Thanks,
>> >> JC
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-database-access-e-g-MySQL-tp13644.html
>> >> Sent from the Apache Spark User List mailing list archive at
>> >> Nabble.com.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: user-h...@spark.apache.org
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Streaming and database access (e.g. MySQL)

Reply via email to