I posted yesterday about a related issue but resolved it shortly after. I'm using Spark Streaming to summarize event data from Kafka and save it to a MySQL table. Currently the bottleneck is in writing to MySQL and I'm puzzled as to how to speed it up. I've tried repartitioning with several different values but it looks like only one worker is actually doing the writing to MySQL. Obviously this is not ideal because I need the parallelism to insert this data in a timely manner.
Here's the code https://gist.github.com/maddenpj/5032c76aeb330371a6e6 <https://gist.github.com/maddenpj/5032c76aeb330371a6e6> I'm running this on a cluster of 6 spark nodes (2 cores, 7.5 GB memory) and tried repartition sizes of 6, 12 and 48. How do I ensure that there is parallelism in writing to the database? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-No-parallelism-in-writing-to-database-MySQL-tp15174.html Sent from the Apache Spark User List mailing list archive at Nabble.com.