Standard pattern is to initialize the mysql jdbc driver in your mappartition call , update database & then close off the driver. Couple of gotchas 1. New driver initiated for all your partitions 2. If the effect(inserts & updates) is not idempotent, so if your server crashes, Spark will replay updates to mysql & may cause data corruption.
Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Sun, Sep 7, 2014 at 11:54 AM, jchen <jc...@pivotal.io> wrote: > Hi, > > Has someone tried using Spark Streaming with MySQL (or any other > database/data store)? I can write to MySQL at the beginning of the driver > application. However, when I am trying to write the result of every > streaming processing window to MySQL, it fails with the following error: > > org.apache.spark.SparkException: Job aborted due to stage failure: Task not > serializable: java.io.NotSerializableException: > com.mysql.jdbc.JDBC4PreparedStatement > > I think it is because the statement object should be serializable, in order > to be executed on the worker node. Has someone tried the similar cases? > Example code will be very helpful. My intension is to execute > INSERT/UPDATE/DELETE/SELECT statements for each sliding window. > > Thanks, > JC > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-database-access-e-g-MySQL-tp13644.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >