Standard pattern is to initialize the mysql jdbc driver in your
mappartition call , update database & then close off the driver.
Couple of gotchas
1. New driver initiated for all your partitions
2. If the effect(inserts & updates) is not idempotent, so if your server
crashes, Spark will replay updates to mysql & may cause data corruption.


Regards
Mayur

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>


On Sun, Sep 7, 2014 at 11:54 AM, jchen <jc...@pivotal.io> wrote:

> Hi,
>
> Has someone tried using Spark Streaming with MySQL (or any other
> database/data store)? I can write to MySQL at the beginning of the driver
> application. However, when I am trying to write the result of every
> streaming processing window to MySQL, it fails with the following error:
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task not
> serializable: java.io.NotSerializableException:
> com.mysql.jdbc.JDBC4PreparedStatement
>
> I think it is because the statement object should be serializable, in order
> to be executed on the worker node. Has someone tried the similar cases?
> Example code will be very helpful. My intension is to execute
> INSERT/UPDATE/DELETE/SELECT statements for each sliding window.
>
> Thanks,
> JC
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-database-access-e-g-MySQL-tp13644.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to