Re: Resource Leak in Spark Streaming

Nipun Arora Tue, 07 Feb 2017 17:18:07 -0800

I located the issue:

Having the following seems to be necessary in the pool object to make it
serialized:


*private transient *ConcurrentLinkedQueue<KafkaProducer<String, String>>
*pool*;

However this means open connections cannot be re-used in subsequent
micro-batches, as transient objects are not persistent. How can we go
around this problem?


Thanks,

Nipun

On Tue, Feb 7, 2017 at 6:35 PM, Nipun Arora <nipunarora2...@gmail.com>
wrote:

> Ryan,
>
> Apologies for coming back so late, I created a github repo to resolve
> this problem. On trying your solution for making the pool a Singleton,
> I get a null pointer exception in the worker.
> Do you have any other suggestions, or a simpler mechanism for handling
> this?
>
> I have put all the current code which was forked from an existing git repo
> here:
> https://github.com/nipunarora/Spark-Kafka-Writer
>
> There does seem to be duplicate creation of Kafka Writers in every
> micro-batch.
>
> Thanks
> Nipun
>
> P.S the version I shared before was writing JavaDStream<String>, the
> one in the github project writes JavaDStream<Tuple2<>>
>

Re: Resource Leak in Spark Streaming

Reply via email to