[ 
https://issues.apache.org/jira/browse/CXF-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zander updated CXF-9115:
----------------------------
    Description: 
It is possible for {{HttpClientHTTPConduit.HttpClientBodyPublisher#subscribe}} 
to be called _after_ the underlying subscription has already been cancelled, 
for example, if a connect timeout happens _before_ 
{{HttpClientHTTPConduit.HttpClientBodyPublisher#subscribe}} is called.
In this case, the writing thread will be stuck in 
{{HttpClientHTTPConduit.HttpClientPipedOutputStream#write}}, waiting forever 
for space in the write buffer.

This happens every once in a while in our production system, causing it to 
hang. The threads are stuck here:
{code}
"demo.hw.client.ComplexClient.main()@4789" tid=0x3e nid=NA waiting
  java.lang.Thread.State: WAITING
          at java.lang.Object.wait0(Object.java:-1)
          at java.lang.Object.wait(Object.java:366)
          at java.io.PipedInputStream.awaitSpace(PipedInputStream.java:279)
          at java.io.PipedInputStream.receive(PipedInputStream.java:237)
          at java.io.PipedOutputStream.write(PipedOutputStream.java:154)
          at 
org.apache.cxf.transport.http.HttpClientHTTPConduit$HttpClientPipedOutputStream.write(HttpClientHTTPConduit.java:554)
          at 
org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:51)
          at 
org.apache.cxf.io.AbstractThresholdOutputStream.write(AbstractThresholdOutputStream.java:69)
          at com.ctc.wstx.io.UTF8Writer.flush(UTF8Writer.java:100)
          at 
com.ctc.wstx.sw.BufferingXmlWriter.flush(BufferingXmlWriter.java:242)
          at com.ctc.wstx.sw.BaseStreamWriter.flush(BaseStreamWriter.java:260)
          at 
org.apache.cxf.interceptor.AbstractOutDatabindingInterceptor.writeParts(AbstractOutDatabindingInterceptor.java:107)
          at 
org.apache.cxf.wsdl.interceptors.BareOutInterceptor.handleMessage(BareOutInterceptor.java:68)
          at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
          - locked <0x2369> (a org.apache.cxf.phase.PhaseInterceptorChain)
          at org.apache.cxf.endpoint.ClientImpl.doInvoke(ClientImpl.java:530)
          at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:441)
          at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:356)
          at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:314)
          at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:334)
          at demo.hw.client.ComplexClient.main(ComplexClient.java:106)
{code}
The {{PipedInputStream}} looks like this (so it is connected, but doesn't yet 
have a thread registered as the {{readSide}}, and never will have one. It 
therefore doesn't consider the read end to be gone/dead and keeps looping 
forever in {{awaitSpace()}}):
!screenshot-1.png!

I can reproduce this issue every time by
* Placing a breakpoint in this line: 
https://github.com/apache/cxf/blob/7fb95ad266e4a5ced561a0dc56c038db43967ca4/rt/transports/http/src/main/java/org/apache/cxf/transport/http/HttpClientHTTPConduit.java#L637
* sending a request with a body that is larger than the chunking threshold 
(4096 bytes by default), and larger than the chunk length,
* waiting for the breakpoint to be hit,
* then waiting for the connect timeout to be exceeded (30s by default),
* then resuming the program.

I recommend running with 
{{-Djdk.httpclient.HttpClient.log=errors,requests,headers,frames:control:data:window,ssl,trace,channel}}.
 That way we can see debug logs printed by the {{HttpClient}} that tell us when 
timeouts happen and subscriptions are being cancelled.

As a reproducer project, you can use the [wsdl_first_dynamic_client 
sample|https://github.com/apache/cxf/tree/cxf-4.1.0/distribution/src/main/release/samples/wsdl_first_dynamic_client],
 with the following modification in the client to trigger chunking, and to have 
the timeouts happen a little sooner:
{code}
Index: 
distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git 
a/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
 
b/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
--- 
a/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
   (revision 7fb95ad266e4a5ced561a0dc56c038db43967ca4)
+++ 
b/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
   (date 1741191623466)
@@ -35,6 +35,7 @@
 import org.apache.cxf.service.model.BindingOperationInfo;
 import org.apache.cxf.service.model.MessagePartInfo;
 import org.apache.cxf.service.model.ServiceInfo;
+import org.apache.cxf.transport.http.HTTPConduit;
 
 /**
  *
@@ -70,6 +71,12 @@
         JaxWsDynamicClientFactory factory = 
JaxWsDynamicClientFactory.newInstance();
         Client client = factory.createClient(wsdlURL.toExternalForm(), 
SERVICE_NAME);
         ClientImpl clientImpl = (ClientImpl) client;
+        ((HTTPConduit) 
clientImpl.getConduit()).getClient().setChunkingThreshold(8);
+        ((HTTPConduit) clientImpl.getConduit()).getClient().setChunkLength(8);
+        ((HTTPConduit) 
clientImpl.getConduit()).getClient().setConnectionTimeout(5000);
+        ((HTTPConduit) 
clientImpl.getConduit()).getClient().setReceiveTimeout(5000);
+
+
         Endpoint endpoint = clientImpl.getEndpoint();
         ServiceInfo serviceInfo = 
endpoint.getService().getServiceInfos().get(0);
         QName bindingName = new QName("http://Company.com/Application";,
{code}
Start the server with {{mvn -Pserver}}, set the breakpoint as described above 
and start {{mvn -Pclient}} in the debugger. Once the breakpoint is hit, wait ~5 
seconds and resume. The process will now hang forever.

  was:
It is possible for {{HttpClientHTTPConduit.HttpClientBodyPublisher#subscribe}} 
to be called _after_ the underlying subscription has already been cancelled, 
for example, if a connect timeout happens _before_ 
{{HttpClientHTTPConduit.HttpClientBodyPublisher#subscribe}} is called.
In this case, the writing thread will be stuck in 
{{HttpClientHTTPConduit.HttpClientPipedOutputStream#write}}, waiting forever 
for space in the write buffer.

This happens every once in a while on our production system, causing it to 
hang. The threads are stuck here:
{code}
"demo.hw.client.ComplexClient.main()@4789" tid=0x3e nid=NA waiting
  java.lang.Thread.State: WAITING
          at java.lang.Object.wait0(Object.java:-1)
          at java.lang.Object.wait(Object.java:366)
          at java.io.PipedInputStream.awaitSpace(PipedInputStream.java:279)
          at java.io.PipedInputStream.receive(PipedInputStream.java:237)
          at java.io.PipedOutputStream.write(PipedOutputStream.java:154)
          at 
org.apache.cxf.transport.http.HttpClientHTTPConduit$HttpClientPipedOutputStream.write(HttpClientHTTPConduit.java:554)
          at 
org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:51)
          at 
org.apache.cxf.io.AbstractThresholdOutputStream.write(AbstractThresholdOutputStream.java:69)
          at com.ctc.wstx.io.UTF8Writer.flush(UTF8Writer.java:100)
          at 
com.ctc.wstx.sw.BufferingXmlWriter.flush(BufferingXmlWriter.java:242)
          at com.ctc.wstx.sw.BaseStreamWriter.flush(BaseStreamWriter.java:260)
          at 
org.apache.cxf.interceptor.AbstractOutDatabindingInterceptor.writeParts(AbstractOutDatabindingInterceptor.java:107)
          at 
org.apache.cxf.wsdl.interceptors.BareOutInterceptor.handleMessage(BareOutInterceptor.java:68)
          at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
          - locked <0x2369> (a org.apache.cxf.phase.PhaseInterceptorChain)
          at org.apache.cxf.endpoint.ClientImpl.doInvoke(ClientImpl.java:530)
          at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:441)
          at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:356)
          at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:314)
          at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:334)
          at demo.hw.client.ComplexClient.main(ComplexClient.java:106)
{code}
The {{PipedInputStream}} looks like this (so it is connected, but doesn't yet 
have a thread registered as the {{readSide}}, and never will have one. It 
therefore doesn't consider the read end to be gone/dead and keeps looping 
forever in {{awaitSpace()}}):
!screenshot-1.png!

I can reproduce this issue every time by
* Placing a breakpoint in this line: 
https://github.com/apache/cxf/blob/7fb95ad266e4a5ced561a0dc56c038db43967ca4/rt/transports/http/src/main/java/org/apache/cxf/transport/http/HttpClientHTTPConduit.java#L637
* sending a request with a body that is larger than the chunking threshold 
(4096 bytes by default), and larger than the chunk length,
* waiting for the breakpoint to be hit,
* then waiting for the connect timeout to be exceeded (30s by default),
* then resuming the program.

I recommend running with 
{{-Djdk.httpclient.HttpClient.log=errors,requests,headers,frames:control:data:window,ssl,trace,channel}}.
 That way we can see debug logs printed by the {{HttpClient}} that tell us when 
timeouts happen and subscriptions are being cancelled.

As a reproducer project, you can use the [wsdl_first_dynamic_client 
sample|https://github.com/apache/cxf/tree/cxf-4.1.0/distribution/src/main/release/samples/wsdl_first_dynamic_client],
 with the following modification in the client to trigger chunking, and to have 
the timeouts happen a little sooner:
{code}
Index: 
distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git 
a/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
 
b/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
--- 
a/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
   (revision 7fb95ad266e4a5ced561a0dc56c038db43967ca4)
+++ 
b/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
   (date 1741191623466)
@@ -35,6 +35,7 @@
 import org.apache.cxf.service.model.BindingOperationInfo;
 import org.apache.cxf.service.model.MessagePartInfo;
 import org.apache.cxf.service.model.ServiceInfo;
+import org.apache.cxf.transport.http.HTTPConduit;
 
 /**
  *
@@ -70,6 +71,12 @@
         JaxWsDynamicClientFactory factory = 
JaxWsDynamicClientFactory.newInstance();
         Client client = factory.createClient(wsdlURL.toExternalForm(), 
SERVICE_NAME);
         ClientImpl clientImpl = (ClientImpl) client;
+        ((HTTPConduit) 
clientImpl.getConduit()).getClient().setChunkingThreshold(8);
+        ((HTTPConduit) clientImpl.getConduit()).getClient().setChunkLength(8);
+        ((HTTPConduit) 
clientImpl.getConduit()).getClient().setConnectionTimeout(5000);
+        ((HTTPConduit) 
clientImpl.getConduit()).getClient().setReceiveTimeout(5000);
+
+
         Endpoint endpoint = clientImpl.getEndpoint();
         ServiceInfo serviceInfo = 
endpoint.getService().getServiceInfos().get(0);
         QName bindingName = new QName("http://Company.com/Application";,
{code}
Start the server with {{mvn -Pserver}}, set the breakpoint as described above 
and start {{mvn -Pclient}} in the debugger. Once the breakpoint is hit, wait ~5 
seconds and resume. The process will now hang forever.


> Race Condition in HttpClientHttpConduit Causes Writing Thread to Hang Forever
> -----------------------------------------------------------------------------
>
>                 Key: CXF-9115
>                 URL: https://issues.apache.org/jira/browse/CXF-9115
>             Project: CXF
>          Issue Type: Bug
>          Components: Transports
>    Affects Versions: 4.1.0, 4.0.6
>            Reporter: Kai Zander
>            Priority: Major
>         Attachments: screenshot-1.png
>
>
> It is possible for 
> {{HttpClientHTTPConduit.HttpClientBodyPublisher#subscribe}} to be called 
> _after_ the underlying subscription has already been cancelled, for example, 
> if a connect timeout happens _before_ 
> {{HttpClientHTTPConduit.HttpClientBodyPublisher#subscribe}} is called.
> In this case, the writing thread will be stuck in 
> {{HttpClientHTTPConduit.HttpClientPipedOutputStream#write}}, waiting forever 
> for space in the write buffer.
> This happens every once in a while in our production system, causing it to 
> hang. The threads are stuck here:
> {code}
> "demo.hw.client.ComplexClient.main()@4789" tid=0x3e nid=NA waiting
>   java.lang.Thread.State: WAITING
>         at java.lang.Object.wait0(Object.java:-1)
>         at java.lang.Object.wait(Object.java:366)
>         at java.io.PipedInputStream.awaitSpace(PipedInputStream.java:279)
>         at java.io.PipedInputStream.receive(PipedInputStream.java:237)
>         at java.io.PipedOutputStream.write(PipedOutputStream.java:154)
>         at 
> org.apache.cxf.transport.http.HttpClientHTTPConduit$HttpClientPipedOutputStream.write(HttpClientHTTPConduit.java:554)
>         at 
> org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:51)
>         at 
> org.apache.cxf.io.AbstractThresholdOutputStream.write(AbstractThresholdOutputStream.java:69)
>         at com.ctc.wstx.io.UTF8Writer.flush(UTF8Writer.java:100)
>         at 
> com.ctc.wstx.sw.BufferingXmlWriter.flush(BufferingXmlWriter.java:242)
>         at com.ctc.wstx.sw.BaseStreamWriter.flush(BaseStreamWriter.java:260)
>         at 
> org.apache.cxf.interceptor.AbstractOutDatabindingInterceptor.writeParts(AbstractOutDatabindingInterceptor.java:107)
>         at 
> org.apache.cxf.wsdl.interceptors.BareOutInterceptor.handleMessage(BareOutInterceptor.java:68)
>         at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
>         - locked <0x2369> (a org.apache.cxf.phase.PhaseInterceptorChain)
>         at org.apache.cxf.endpoint.ClientImpl.doInvoke(ClientImpl.java:530)
>         at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:441)
>         at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:356)
>         at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:314)
>         at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:334)
>         at demo.hw.client.ComplexClient.main(ComplexClient.java:106)
> {code}
> The {{PipedInputStream}} looks like this (so it is connected, but doesn't yet 
> have a thread registered as the {{readSide}}, and never will have one. It 
> therefore doesn't consider the read end to be gone/dead and keeps looping 
> forever in {{awaitSpace()}}):
> !screenshot-1.png!
> I can reproduce this issue every time by
> * Placing a breakpoint in this line: 
> https://github.com/apache/cxf/blob/7fb95ad266e4a5ced561a0dc56c038db43967ca4/rt/transports/http/src/main/java/org/apache/cxf/transport/http/HttpClientHTTPConduit.java#L637
> * sending a request with a body that is larger than the chunking threshold 
> (4096 bytes by default), and larger than the chunk length,
> * waiting for the breakpoint to be hit,
> * then waiting for the connect timeout to be exceeded (30s by default),
> * then resuming the program.
> I recommend running with 
> {{-Djdk.httpclient.HttpClient.log=errors,requests,headers,frames:control:data:window,ssl,trace,channel}}.
>  That way we can see debug logs printed by the {{HttpClient}} that tell us 
> when timeouts happen and subscriptions are being cancelled.
> As a reproducer project, you can use the [wsdl_first_dynamic_client 
> sample|https://github.com/apache/cxf/tree/cxf-4.1.0/distribution/src/main/release/samples/wsdl_first_dynamic_client],
>  with the following modification in the client to trigger chunking, and to 
> have the timeouts happen a little sooner:
> {code}
> Index: 
> distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> diff --git 
> a/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
>  
> b/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
> --- 
> a/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
>  (revision 7fb95ad266e4a5ced561a0dc56c038db43967ca4)
> +++ 
> b/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
>  (date 1741191623466)
> @@ -35,6 +35,7 @@
>  import org.apache.cxf.service.model.BindingOperationInfo;
>  import org.apache.cxf.service.model.MessagePartInfo;
>  import org.apache.cxf.service.model.ServiceInfo;
> +import org.apache.cxf.transport.http.HTTPConduit;
>  
>  /**
>   *
> @@ -70,6 +71,12 @@
>          JaxWsDynamicClientFactory factory = 
> JaxWsDynamicClientFactory.newInstance();
>          Client client = factory.createClient(wsdlURL.toExternalForm(), 
> SERVICE_NAME);
>          ClientImpl clientImpl = (ClientImpl) client;
> +        ((HTTPConduit) 
> clientImpl.getConduit()).getClient().setChunkingThreshold(8);
> +        ((HTTPConduit) 
> clientImpl.getConduit()).getClient().setChunkLength(8);
> +        ((HTTPConduit) 
> clientImpl.getConduit()).getClient().setConnectionTimeout(5000);
> +        ((HTTPConduit) 
> clientImpl.getConduit()).getClient().setReceiveTimeout(5000);
> +
> +
>          Endpoint endpoint = clientImpl.getEndpoint();
>          ServiceInfo serviceInfo = 
> endpoint.getService().getServiceInfos().get(0);
>          QName bindingName = new QName("http://Company.com/Application";,
> {code}
> Start the server with {{mvn -Pserver}}, set the breakpoint as described above 
> and start {{mvn -Pclient}} in the debugger. Once the breakpoint is hit, wait 
> ~5 seconds and resume. The process will now hang forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to