Thanks for the details. I ran the same test from the branch you used with oracle java 8 , java 7 and IBM java 8 on linux in a loop of 50 didn't see any issue. Can you give me details on which version of IBM jdk did you used and what OS.
java -version java version "1.8.0" Java(TM) SE Runtime Environment (build pxi3280sr1fp10- 20150711_01(SR1 FP10)) IBM J9 VM (build 2.8, JRE 1.8.0 Linux x86-32 20150630_255633 (JIT enabled, AOT enabled) J9VM - R28_jvm.28_20150630_1742_B255633 JIT - tr.r14.java_20150625_95081.01 GC - R28_jvm.28_20150630_1742_B255633 J9CL - 20150630_255633) JCL - 20150711_01 based on Oracle jdk8u51-b15 Thanks, Harsha On Mon, Aug 10, 2015, at 01:52 PM, Rajini Sivaram wrote: > > Harsha, > > I am using the code from > https://github.com/harshach/kafka/tree/KAFKA-1690-V1 with the latest > commit on 25 July which I think corresponds to the latest patch in KAFKA- > 1690. Is that correct? > > I have run SSLConsumerTest, SSLProducerSendTest and SSLSelectorTest > several times and the only one that fails to complete is > SSLSelectorTest.testRenegotiate(). I added some trace to the code and > the sequence in the failed run is: > > 1. Handshake starts > 2. Handshake finishes with handshakeStatus=FINISHED > appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] > *(there is no data in appReadBuffer)* > 3. Several reads and writes occur > 4. read() results in handshake() due to renegotiation. > appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] > *(no data in appReadBuffer)* > 5. Handshake status changes from NEED_TASK to NEED_WRAP to > NEED_UNWRAP > 6. Unwrap call during handshake results in data in appReadBuffer: > handshakeStatus=NEED_UNWRAP > appReadBuffer=java.nio.DirectByteBuffer[pos=100 lim=16916 > cap=16916] *(data is now in appReadBuffer)* > 7. Handshake status changes to NEED_UNWRAP followed by FINISHED. > Final call to handshake() returns with > handshakeStatus=NOT_HANDSHAKING > appReadBuffer=java.nio.DirectByteBuffer[pos=100 lim=16916 > cap=16916] *(data is still in appReadBuffer)* Test continues to > call selector.poll() forever. But no more data arrives on the > channel since all the data in the test has already arrived, and > hence SSLTransportLayer.read() is never invoked on the channel and > the data that was unwrapped earlier simply remains in > appReadBuffer. This is not a scenario that occurs in the other > tests where some data always arrives on the network after the > handshake() resulting in SSLTransportLayer.read() being invoked. > > I can recreate this problem quite easily with IBM JDK (it hangs around > one in ten runs), so if you require more trace, please let me know. > > Thank you, > > Rajini > > > > On Mon, Aug 10, 2015 at 5:29 PM, Sriharsha Chintalapani > <harsh...@fastmail.fm> wrote: >> Thanks for testing out Rajini. I did ran that test in a loop and it >> never hanged for me. I am hoping you are using the latest patch since >> the data left over issue is addressed in latest patch. Also if thats >> an issue SSLConsumerTest and SSLProducerTest will hang too. Did you >> notice those are having any issues? >> >> I am addressing left over reviews will be sending a new patch in a >> day or two. >> >> Thanks, Harsha >> >> On August 10, 2015 at 3:35:34 AM, Rajini Sivaram >> (rajinisiva...@googlemail.com) wrote: >>> >>> >>> I was running a Kafka cluster with the latest SSL patch over the >>> weekend >>> >>> with IBM JRE, and it has been running fine without any issues. >>> There was >>> >>> light load on the cluster throughout and intermittent heavy load, >>> all using >>> >>> SSL clients. >>> >>> >>> However I am seeing an intermittent unit test hang in >>> >>> org.apache.kafka.common.network.SSLSelectorTest.testRenegotiate(). >>> >>> I am not sure if that is related to the IBM JRE I am using for the >>> build. >>> >>> It looks like data left in appReadBuffer after a handshake may >>> not get >>> >>> processed if no more data arrives on the network, causing the test >>> to loop >>> >>> forever. It will be good if this can be fixed (or at least the test >>> >>> commented out) before the code is committed to avoid breaking the >>> build. >>> >>> >>> Even though there are quite a few outstanding review comments, I >>> do agree >>> >>> that being such a big patch, it will be good to commit the code >>> soon and >>> >>> work on minor issues afterwards. >>> >>> >>> >>> >>> >>> On Mon, Aug 10, 2015 at 12:19 AM, Jun Rao (JIRA) <j...@apache.org> >>> wrote: >>> >>> >>> > >>> >>> > [ >>> >>> > https://issues.apache.org/jira/browse/KAFKA-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679397#comment-14679397 >>> > ] >>> >>> > >>> >>> > Jun Rao commented on KAFKA-1690: >>> >>> > -------------------------------- >>> >>> > >>> >>> > [~rsivaram], this jira is getting pretty close to be committed. >>> > Could you >>> >>> > test this out on an IBM jvm to see if there is any issue >>> > especially with >>> >>> > respect to the usage of the sendfile api? >>> >>> > >>> >>> > >>> >>> > > new java producer needs ssl support as a client >>> >>> > > ----------------------------------------------- >>> >>> > > >>> >>> > > Key: KAFKA-1690 >>> >>> > > URL: >>> > > https://issues.apache.org/jira/browse/KAFKA-1690 >>> > > Project: Kafka >>> >>> > > Issue Type: Sub-task >>> >>> > > Reporter: Joe Stein >>> >>> > > Assignee: Sriharsha Chintalapani >>> >>> > > Fix For: 0.8.3 >>> >>> > > >>> >>> > > Attachments: KAFKA-1690.patch, KAFKA-1690.patch, >>> >>> > KAFKA-1690_2015-05-10_23:20:30.patch, KAFKA-1690_2015-05- >>> > 10_23:31:42.patch, >>> >>> > KAFKA-1690_2015-05-11_16:09:36.patch, KAFKA-1690_2015-05- >>> > 12_16:20:08.patch, >>> >>> > KAFKA-1690_2015-05-15_07:18:21.patch, KAFKA-1690_2015-05- >>> > 20_14:54:35.patch, >>> >>> > KAFKA-1690_2015-05-21_10:37:08.patch, KAFKA-1690_2015-06- >>> > 03_18:52:29.patch, >>> >>> > KAFKA-1690_2015-06-23_13:18:20.patch, KAFKA-1690_2015-07- >>> > 20_06:10:42.patch, >>> >>> > KAFKA-1690_2015-07-20_11:59:57.patch, KAFKA-1690_2015-07- >>> > 25_12:10:55.patch >>> >>> > > >>> >>> > > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > -- >>> >>> > This message was sent by Atlassian JIRA >>> >>> > (v6.3.4#6332) >>> >>> > >>> > >