[ https://issues.apache.org/jira/browse/KAFKA-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178856#comment-14178856 ]
Bhavesh Mistry commented on KAFKA-1721: --------------------------------------- I have filled https://github.com/xerial/snappy-java/issues/88 for tracking for Snappy. There is patch provided and Thanks to [~ewencp] for testing the patch. Please see above link for more details. Thanks, Bhavesh > Snappy compressor is not thread safe > ------------------------------------ > > Key: KAFKA-1721 > URL: https://issues.apache.org/jira/browse/KAFKA-1721 > Project: Kafka > Issue Type: Bug > Components: compression > Reporter: Ewen Cheslack-Postava > Assignee: Ewen Cheslack-Postava > > From the mailing list, it can generate this exception: > 2014-10-20 18:55:21.841 [kafka-producer-network-thread] ERROR > org.apache.kafka.clients.producer.internals.Sender - Uncaught error in > kafka producer I/O thread: > *java.lang.NullPointerException* > at > org.xerial.snappy.BufferRecycler.releaseInputBuffer(BufferRecycler.java:153) > at org.xerial.snappy.SnappyOutputStream.close(SnappyOutputStream.java:317) > at java.io.FilterOutputStream.close(FilterOutputStream.java:160) > at org.apache.kafka.common.record.Compressor.close(Compressor.java:94) > at > org.apache.kafka.common.record.MemoryRecords.close(MemoryRecords.java:119) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.drain(RecordAccumulator.java:285) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) > at java.lang.Thread.run(Thread.java:744) > This appears to be an issue with the snappy-java library using ThreadLocal > for an internal buffer recycling object which results in that object being > shared unsafely across threads if one thread sends to multiple producers: > {quote} > I think the issue is that you're > using all your producers across a thread pool and the snappy library > uses ThreadLocal BufferRecyclers. When new Snappy streams are allocated, > they may be allocated from the same thread (e.g. one of your MyProducer > classes calls Producer.send() on multiple producers from the same > thread) and therefore use the same BufferRecycler. Eventually you hit > the code in the stacktrace, and if two producer send threads hit it > concurrently they improperly share the unsynchronized BufferRecycler. > This seems like a pain to fix -- it's really a deficiency of the snappy > library and as far as I can see there's no external control over > BufferRecycler in their API. One possibility is to record the thread ID > when we generate a new stream in Compressor and use that to synchronize > access to ensure no concurrent BufferRecycler access. That could be made > specific to snappy so it wouldn't impact other codecs. Not exactly > ideal, but it would work. Unfortunately I can't think of any way for you > to protect against this in your own code since the problem arises in the > producer send thread, which your code should never know about. > Another option would be to setup your producers differently to avoid the > possibility of unsynchronized access from multiple threads (i.e. don't > use the same thread pool approach), but whether you can do that will > depend on your use case. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)