Thanks much Gerard & Manas for your inputs. I'll keep in mind the connection
pooling part.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Create-one-DB-connection-per-executor-tp26588p26601.html
Sent from the Apache Spark User List mailing list archive at N
You are on the right track.
The only thing you will have to take care is when two of your partitions try
to access the same connection at the same time.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Create-one-DB-connection-per-executor-tp26588p26593.html
Hi Manas,
The approach is correct, with one caveat: You may have several tasks
executing in parallel in one executor. Having one single connection per JVM
will either fail, if the connection is not thread-safe or become a
bottleneck b/c all task will be competing for the same resource.
The best ap