While processing DStream in the Spark Programming Guide, the suggested usage of 
connection is the following,

dstream.foreachRDD(rdd => {
      rdd.foreachPartition(partitionOfRecords => {
          // ConnectionPool is a static, lazily initialized pool of connections
          val connection = ConnectionPool.getConnection()
          partitionOfRecords.foreach(record => connection.send(record))
          ConnectionPool.returnConnection(connection)  // return to the pool 
for future reuse
      })
  })

In this case processing and the insertion is done in the workers. There, we 
don’t use batch insert in db. How about this use case, where we can 
process(parse string JSON to obj) and send back those objects to master and 
then send a bulk insert request. Is there any benefit for sending individually 
using connection pool vs use of bulk operation in the master?
        
A.K.M. Ashrafuzzaman
Lead Software Engineer
NewsCred

(M) 880-175-5592433
Twitter | Blog | Facebook

Check out The Academy, your #1 source
for free content marketing resources

Reply via email to