Harsha, The test is very timing sensitive and doesn't always go through a renegotiation. Here is the trace from a run that passed and a failed run (I added the logging to the end of SSLTransportLayer.handshake()). The successful run shows a single handshake at the start, the failed run that hangs shows a second handshake from the renegotiation. Can you check that your test runs do go through two handshakes and whether appReadBuffer has any data at the end of the second handshake?
Successful run: handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=NEED_WRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=NEED_WRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=NEED_WRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=FINISHED appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] <== End of first handshake Failed run that hangs: handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=NEED_WRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=NEED_WRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=NEED_WRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] handshake() status=FINISHED appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] <== End of first handshake handshake() status=NEED_WRAP appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] <== Start of renegotiation handshake handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=29 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=38 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=40 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=45 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=46 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=48 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=54 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=55 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=57 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=63 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=64 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=66 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=72 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=72 lim=16916 cap=16916] handshake() status=NEED_UNWRAP appReadBuffer=java.nio.DirectByteBuffer[pos=72 lim=16916 cap=16916] handshake() status=NEED_WRAP appReadBuffer=java.nio.DirectByteBuffer[pos=72 lim=16916 cap=16916] handshake() status=NEED_WRAP appReadBuffer=java.nio.DirectByteBuffer[pos=72 lim=16916 cap=16916] handshake() status=FINISHED appReadBuffer=java.nio.DirectByteBuffer[pos=72 lim=16916 cap=16916] handshake() status=NOT_HANDSHAKING appReadBuffer=java.nio.DirectByteBuffer[pos=72 lim=16916 cap=16916] <== End of renegotiation handshake, appReadBuffer contains data I have tried with IBM JDK 7.1 and IBM JDK 8.0 on Windows, as well as OpenJDK 8.0 on Linux (see versions below). All of them hang intermittently. Windows: java version "1.7.0" Java(TM) SE Runtime Environment (build pwa6470_27sr3fp10-20150708_01(SR3 FP10)) IBM J9 VM (build 2.7, JRE 1.7.0 Windows 7 amd64-64 Compressed References 20150630_255653 (JIT enabled, AOT enabled) J9VM - R27_Java727_SR3_20150630_2236_B255653 JIT - tr.r13.java_20150623_94888.01 GC - R27_Java727_SR3_20150630_2236_B255653_CMPRSS J9CL - 20150630_255653) JCL - 20150628_01 based on Oracle jdk7u85-b15 java version "1.8.0" Java(TM) SE Runtime Environment (build pwa6480sr1fp10-20150711_01(SR1 FP10)) IBM J9 VM (build 2.8, JRE 1.8.0 Windows 7 amd64-64 Compressed References 20150630_255633 (JIT enabled, AOT enabled) J9VM - R28_jvm.28_20150630_1742_B255633 JIT - tr.r14.java_20150625_95081.01 GC - R28_jvm.28_20150630_1742_B255633_CMPRSS J9CL - 20150630_255633) JCL - 20150711_01 based on Oracle jdk8u51-b15 Linux: openjdk version "1.8.0_51" OpenJDK Runtime Environment (build 1.8.0_51-b16) OpenJDK 64-Bit Server VM (build 25.51-b03, mixed mode) Thank you, Rajini On Tue, Aug 11, 2015 at 2:37 AM, Harsha <harsh...@fastmail.fm> wrote: > Thanks for the details. I ran the same test from the branch you used with > oracle java 8 , java 7 and IBM java 8 on linux in a loop of 50 didn't see > any issue. Can you give me details on which version of IBM jdk did you used > and what OS. > > java -version > java version "1.8.0" Java(TM) SE Runtime Environment (build > pxi3280sr1fp10-20150711_01(SR1 FP10)) > IBM J9 VM (build 2.8, JRE 1.8.0 Linux x86-32 20150630_255633 (JIT enabled, > AOT enabled) > J9VM - R28_jvm.28_20150630_1742_B255633 > JIT - tr.r14.java_20150625_95081.01 > GC - R28_jvm.28_20150630_1742_B255633 J9CL - 20150630_255633) > JCL - 20150711_01 based on Oracle jdk8u51-b15 > > Thanks, > Harsha > > > On Mon, Aug 10, 2015, at 01:52 PM, Rajini Sivaram wrote: > > > Harsha, > > I am using the code from > https://github.com/harshach/kafka/tree/KAFKA-1690-V1 with the latest > commit on 25 July which I think corresponds to the latest patch in > KAFKA-1690. Is that correct? > > I have run SSLConsumerTest, SSLProducerSendTest and SSLSelectorTest > several times and the only one that fails to complete > is SSLSelectorTest.testRenegotiate(). I added some trace to the code and > the sequence in the failed run is: > > > 1. Handshake starts > 2. Handshake finishes with handshakeStatus=FINISHED > appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] *(there > is no data in appReadBuffer)* > 3. Several reads and writes occur > 4. read() results in handshake() due to renegotiation. > appReadBuffer=java.nio.DirectByteBuffer[pos=0 lim=16916 cap=16916] *(no > data in appReadBuffer)* > 5. Handshake status changes from NEED_TASK to NEED_WRAP to NEED_UNWRAP > 6. Unwrap call during handshake results in data in appReadBuffer: > handshakeStatus=NEED_UNWRAP > appReadBuffer=java.nio.DirectByteBuffer[pos=100 lim=16916 cap=16916] *(data > is now in appReadBuffer)* > 7. Handshake status changes to NEED_UNWRAP followed by FINISHED. Final > call to handshake() returns with handshakeStatus=NOT_HANDSHAKING > appReadBuffer=java.nio.DirectByteBuffer[pos=100 lim=16916 cap=16916] *(data > is still in appReadBuffer)* > > Test continues to call selector.poll() forever. But no more data arrives > on the channel since all the data in the test has already arrived, and > hence SSLTransportLayer.read() is never invoked on the channel and the data > that was unwrapped earlier simply remains in appReadBuffer. This is not a > scenario that occurs in the other tests where some data always arrives on > the network after the handshake() resulting in SSLTransportLayer.read() > being invoked. > > I can recreate this problem quite easily with IBM JDK (it hangs around one > in ten runs), so if you require more trace, please let me know. > > Thank you, > > Rajini > > > > On Mon, Aug 10, 2015 at 5:29 PM, Sriharsha Chintalapani < > harsh...@fastmail.fm> wrote: > > Thanks for testing out Rajini. I did ran that test in a loop and it never > hanged for me. I am hoping you are using the latest patch since the data > left over issue is addressed in latest patch. > Also if thats an issue SSLConsumerTest and SSLProducerTest will hang too. > Did you notice those are having any issues? > > I am addressing left over reviews will be sending a new patch in a day or > two. > > Thanks, > Harsha > > > On August 10, 2015 at 3:35:34 AM, Rajini Sivaram ( > rajinisiva...@googlemail.com) wrote: > > > > I was running a Kafka cluster with the latest SSL patch over the weekend > with IBM JRE, and it has been running fine without any issues. There was > light load on the cluster throughout and intermittent heavy load, all > using > SSL clients. > > However I am seeing an intermittent unit test hang in > org.apache.kafka.common.network.SSLSelectorTest.testRenegotiate(). > I am not sure if that is related to the IBM JRE I am using for the build. > It looks like data left in appReadBuffer after a handshake may not get > processed if no more data arrives on the network, causing the test to loop > forever. It will be good if this can be fixed (or at least the test > commented out) before the code is committed to avoid breaking the build. > > Even though there are quite a few outstanding review comments, I do agree > that being such a big patch, it will be good to commit the code soon and > work on minor issues afterwards. > > > > > On Mon, Aug 10, 2015 at 12:19 AM, Jun Rao (JIRA) <j...@apache.org> wrote: > > > > > [ > > > https://issues.apache.org/jira/browse/KAFKA-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679397#comment-14679397 > > ] > > > > Jun Rao commented on KAFKA-1690: > > -------------------------------- > > > > [~rsivaram], this jira is getting pretty close to be committed. Could > you > > test this out on an IBM jvm to see if there is any issue especially with > > respect to the usage of the sendfile api? > > > > > > > new java producer needs ssl support as a client > > > ----------------------------------------------- > > > > > > Key: KAFKA-1690 > > > URL: https://issues.apache.org/jira/browse/KAFKA-1690 > > > Project: Kafka > > > Issue Type: Sub-task > > > Reporter: Joe Stein > > > Assignee: Sriharsha Chintalapani > > > Fix For: 0.8.3 > > > > > > Attachments: KAFKA-1690.patch, KAFKA-1690.patch, > > KAFKA-1690_2015-05-10_23:20:30.patch, > KAFKA-1690_2015-05-10_23:31:42.patch, > > KAFKA-1690_2015-05-11_16:09:36.patch, > KAFKA-1690_2015-05-12_16:20:08.patch, > > KAFKA-1690_2015-05-15_07:18:21.patch, > KAFKA-1690_2015-05-20_14:54:35.patch, > > KAFKA-1690_2015-05-21_10:37:08.patch, > KAFKA-1690_2015-06-03_18:52:29.patch, > > KAFKA-1690_2015-06-23_13:18:20.patch, > KAFKA-1690_2015-07-20_06:10:42.patch, > > KAFKA-1690_2015-07-20_11:59:57.patch, > KAFKA-1690_2015-07-25_12:10:55.patch > > > > > > > > > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v6.3.4#6332) > > > > > > > > > -- Thank you... Regards, Rajini