While processing DStream in the Spark Programming Guide, the suggested usage of connection is the following,
dstream.foreachRDD(rdd => { rdd.foreachPartition(partitionOfRecords => { // ConnectionPool is a static, lazily initialized pool of connections val connection = ConnectionPool.getConnection() partitionOfRecords.foreach(record => connection.send(record)) ConnectionPool.returnConnection(connection) // return to the pool for future reuse }) }) In this case processing and the insertion is done in the workers. There, we don’t use batch insert in db. How about this use case, where we can process(parse string JSON to obj) and send back those objects to master and then send a bulk insert request. Is there any benefit for sending individually using connection pool vs use of bulk operation in the master? A.K.M. Ashrafuzzaman Lead Software Engineer NewsCred (M) 880-175-5592433 Twitter | Blog | Facebook Check out The Academy, your #1 source for free content marketing resources