What's the expected number of partitions in your use case ? Have you thought of doing batching in the workers ?
Cheers On Sat, Mar 7, 2015 at 10:54 PM, A.K.M. Ashrafuzzaman < ashrafuzzaman...@gmail.com> wrote: > While processing DStream in the Spark Programming Guide, the suggested > usage of connection is the following, > > dstream.foreachRDD(rdd => { > rdd.foreachPartition(partitionOfRecords => { > // ConnectionPool is a static, lazily initialized pool of > connections > val connection = ConnectionPool.getConnection() > partitionOfRecords.foreach(record => connection.send(record)) > ConnectionPool.returnConnection(connection) // return to the pool > for future reuse > }) > }) > > > In this case processing and the insertion is done in the workers. There, > we don’t use batch insert in db. How about this use case, where we can > process(parse string JSON to obj) and send back those objects to master and > then send a bulk insert request. Is there any benefit for sending > individually using connection pool vs use of bulk operation in the master? > > A.K.M. Ashrafuzzaman > Lead Software Engineer > NewsCred <http://www.newscred.com/> > > (M) 880-175-5592433 > Twitter <https://twitter.com/ashrafuzzaman> | Blog > <http://jitu-blog.blogspot.com/> | Facebook > <https://www.facebook.com/ashrafuzzaman.jitu> > > Check out The Academy <http://newscred.com/theacademy>, your #1 source > for free content marketing resources > >