I located the issue: Having the following seems to be necessary in the pool object to make it serialized:
*private transient *ConcurrentLinkedQueue<KafkaProducer<String, String>> *pool*; However this means open connections cannot be re-used in subsequent micro-batches, as transient objects are not persistent. How can we go around this problem? Thanks, Nipun On Tue, Feb 7, 2017 at 6:35 PM, Nipun Arora <nipunarora2...@gmail.com> wrote: > Ryan, > > Apologies for coming back so late, I created a github repo to resolve > this problem. On trying your solution for making the pool a Singleton, > I get a null pointer exception in the worker. > Do you have any other suggestions, or a simpler mechanism for handling > this? > > I have put all the current code which was forked from an existing git repo > here: > https://github.com/nipunarora/Spark-Kafka-Writer > > There does seem to be duplicate creation of Kafka Writers in every > micro-batch. > > Thanks > Nipun > > P.S the version I shared before was writing JavaDStream<String>, the > one in the github project writes JavaDStream<Tuple2<>> >