Bill Burcham created GEODE-8817:
-----------------------------------

             Summary: server hangs in cache close with ssl enabled due to 
active client connection; client side (CacheClientUpdater.close()) is hung in 
SSLSocketImpl$AppInputStream.deplete()
                 Key: GEODE-8817
                 URL: https://issues.apache.org/jira/browse/GEODE-8817
             Project: Geode
          Issue Type: Bug
          Components: client/server, security
    Affects Versions: 1.14.0
            Reporter: Bill Burcham


A proprietary TLS/SSL-enabled application encountered a network partition. A 
server hangs in cache close due to active client connection; client side 
({{CacheClientUpdater.close()}}) is hung in 
{{SSLSocketImpl$AppInputStream.deplete()}}

The configuration is:

{noformat}
==========================================
losingSide              |    survivingSide
==========================================
11110                   |    10627
11115                   |    10632
------------------------------------------
11139                   |    10655
                        |    10662
------------------------------------------
{noformat}

The stuck threads were stuck in sun's SSL code. Geode's client/Server framework 
uses old I/O and that was also part of where they were stuck. If the clients 
had closed their connections to the server then it would not have been stuck 
here. But the server shutdown shouldn't hang because of client that refuses to 
disconnect.

The Geode client-side of the connection is hung here:

{code:java}
\[warn 2020/11/06 14:56:56.577 PST <ThreadsMonitor> tid=0x18] Thread <50> 
(0x32) that was executed at <06 Nov 2020 14:55:43 PST> has been stuck for 
<72.81 seconds> and number of thread monitor iteration <1>
Thread Name <poolTimer-brclient-7> state <BLOCKED>
Waiting on <sun.security.ssl.SSLSocketImpl$AppInputStream@3a9b72d3>
Owned By <Cache Client Updater Thread  on 
10.32.108.224(bridgep2_host2_10627:10627)<v3>:41003 port 27636> with ID <43>
Executor Group <ScheduledThreadPoolExecutorWithKeepAlive>
Monitored metric <ResourceManagerStats.numThreadsStuck>
Thread stack:
sun.security.ssl.SSLSocketImpl$AppInputStream.deplete(SSLSocketImpl.java:1016)
sun.security.ssl.SSLSocketImpl$AppInputStream.access$100(SSLSocketImpl.java:816)
sun.security.ssl.SSLSocketImpl.bruteForceCloseInput(SSLSocketImpl.java:702)
sun.security.ssl.SSLSocketImpl.duplexCloseOutput(SSLSocketImpl.java:553)
sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:485)
org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.close(CacheClientUpdater.java:546)
org.apache.geode.cache.client.internal.QueueConnectionImpl.internalDestroy(QueueConnectionImpl.java:112)
org.apache.geode.cache.client.internal.QueueManagerImpl.endpointCrashed(QueueManagerImpl.java:379)
org.apache.geode.cache.client.internal.QueueManagerImpl.connectionCrashed(QueueManagerImpl.java:357)
org.apache.geode.cache.client.internal.QueueConnectionImpl.destroy(QueueConnectionImpl.java:88)
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:645)
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:504)
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:334)
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:303)
org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:839)
org.apache.geode.cache.client.internal.PingOp.execute(PingOp.java:38)
org.apache.geode.cache.client.internal.LiveServerPinger$PingTask.run2(LiveServerPinger.java:90)
org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1329)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
org.apache.geode.internal.ScheduledThreadPoolExecutorWithKeepAlive$DelegatingScheduledFuture.run(ScheduledThreadPoolExecutorWithKeepAlive.java:279)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
Lock owner thread stack
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:171)
java.net.SocketInputStream.read(SocketInputStream.java:141)
sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:475)
sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:469)
sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:69)
sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1228)
sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:75)
sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:915)
org.apache.geode.internal.cache.tier.sockets.Message.fetchHeader(Message.java:830)
org.apache.geode.internal.cache.tier.sockets.Message.readHeaderAndBody(Message.java:677)
org.apache.geode.internal.cache.tier.sockets.Message.receiveWithHeaderReadTimeout(Message.java:1140)
org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.processMessages(CacheClientUpdater.java:1591)
org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.run(CacheClientUpdater.java:483)
{code}

>From the Geode log we can see the expected exceptions for messages to losing 
>side (Operation not permitted) after the network is dropped:

{code:java}
[warn 2020/11/06 14:50:20.865 PST <Geode Failure Detection thread 3> tid=0x9b] 
Unable to send message to 10.32.110.94(bridgep1_host1_11115:11115)<v4>:41001
java.io.IOException: Operation not permitted (sendto failed)
        at java.net.PlainDatagramSocketImpl.send(Native Method)
        at java.net.DatagramSocket.send(DatagramSocket.java:693)
        at org.jgroups.protocols.UDP._send(UDP.java:224)
        at org.jgroups.protocols.UDP.sendUnicast(UDP.java:215)
        at org.jgroups.protocols.TP.sendToSingleMember(TP.java:1985)
        at org.jgroups.protocols.TP.doSend(TP.java:1962)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.Transport.doSend(Transport.java:85)
        at org.jgroups.protocols.TP.send(TP.java:1948)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.Transport._send(Transport.java:57)
        at org.jgroups.protocols.TP.down(TP.java:1515)
        at org.jgroups.stack.Protocol.down(Protocol.java:439)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.down(StatRecorder.java:87)
        at org.jgroups.protocols.UNICAST3.down(UNICAST3.java:646)
        at org.jgroups.protocols.FlowControl.down(FlowControl.java:347)
        at org.jgroups.protocols.FRAG2.down(FRAG2.java:136)
        at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1039)
        at org.jgroups.JChannel.down(JChannel.java:790)
        at org.jgroups.JChannel.send(JChannel.java:426)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.send(JGroupsMessenger.java:819)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.send(JGroupsMessenger.java:670)
        at 
org.apache.geode.distributed.internal.membership.gms.fd.GMSHealthMonitor.doCheckMember(GMSHealthMonitor.java:504)
        at 
org.apache.geode.distributed.internal.membership.gms.fd.GMSHealthMonitor.lambda$checkMember$1(GMSHealthMonitor.java:455)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to