Re: Cluster freeze with SSL enabled and JDK 11

Loredana Radulescu Ivanoff Thu, 07 Feb 2019 15:32:00 -0800

Hello,

I would like to restart this topic because I can get a repro on Windows 10
with Java 11 and SSL enabled by starting two nodes using just the 2.7
Ignite distribution. I'm starting the Ignite nodes via ignite.bat, and I've
only added a few extra JVM options to allow Ignite to start with Java 11,
as follows:


--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
--add-exports=java.base/sun.nio.ch=ALL-UNNAMED
-Djdk.tls.server.protocols="TLSv1.2" -Djdk.tls.client.protocols="TLSv1.2"
-Djdk.tls.acknowledgeCloseNotify=true -DIGNITE_QUIET=false
-DIGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT=60000

I'm attaching the logs from work/log and the configuration I've used. Could
you please take a look and let me know if you see something wrong in the
configuration, or a possible explanation?

What is also interesting is that I used the same setup on two CentOS
machines, and the same type of configuration, and the nodes do connect
(with SSL and Java 11), without any errors. Could there be a platform issue
here?

Additionally, I confirmed that the nodes are able to connect as expected on
both Windows and CentOS when SSL is disabled (I used the same
configuration, but with the sslContextFactory bean commented out.

Any help on the issue would be greatly appreciated. Thank you!



On Thu, Oct 18, 2018 at 2:56 PM Loredana Radulescu Ivanoff <
[email protected]> wrote:

> Hello,
>
> I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and SSL
> enabled:
>
>
>    - the second node that I bring up joins, and then shortly after
>    freezes and prints this message every minute:
>
> "WARN ...[*Initialization*]
> processors.cache.GridCachePartitionExchangeManager: Still waiting for
> initial partition map exchange"
>
>
>    - once the second node joins, the first node starts experiencing very
>    frequent 100% CPU spikes; these are the messages I see:
>
> WARN 2018-10-18T13:50:52,728-0700 []
> communication.tcp.TcpCommunicationSpi: Communication SPI session write
> timed out (consider increasing 'socketWriteTimeout' configuration property)
> [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
> WARN 2018-10-18T13:50:52,737-0700 []
> communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
> gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect SSL
> engine status after closeOutbound call [status=OK,
> handshakeStatus=NEED_WRAP,
> WARN 2018-10-18T13:51:01,441-0700 []
> dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
> release latch within timeout: ServerLatch [permits=1,
> pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
> [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]
>
> Other observations:
>
> I can reproduce this every time I start the nodes, and it doesn't matter
> which node comes up first.
>
>
> The issue goes away if I disable SSL.
>
>
> Increasing the socketWriteTimeout, networkTimeout or the
> failureDetectionTimeout does not help.
>
> It seems to be happening only with JDK 11, and not with JDK 8.
>
>
> Do you have any suggestions/known issues about this?
>
> Thank you,
>
> Loredana
>
>
>
>
>

<<attachment: node_logs_feb_7.zip>>

Re: Cluster freeze with SSL enabled and JDK 11

Reply via email to