On 14/12/2011, at 10:33 PM, James Peach wrote:

> Hi all,
> 
> I've been looking at an issue where TSFetchUrl does not deliver the EOS event 
> to the caller until long after the origin server closes the read socket. I 
> made a small change to read_from_net() so that it notices when read(2) 
> returns 0 even if it returned > 0 a previous time round the loop. This causes 
> it to deliver the EOS event up to the HTTP state machine.
> 
> In the end, however, the EOS event is not delivered to the FetchSM. Here's 
> what happens …
> 
> We receive the response from the origin server. It's 
> 
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (http_tunnel) [0] 
> producer_handler [http server VC_EVENT_READ_READY]
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (http_tunnel) [0] 
> producer_handler_dechunked [http server VC_EVENT_READ_READY]
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (http_chunk) creating a 
> chunk of size 3296 bytes
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (http_redirect) 
> [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Passive: 
> reenable Write
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (read_from_net) 
> attempting to read 32768 bytes
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (read_from_net) read 0 
> bytes
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (read_from_net) 
> triggering EOS
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (http_tunnel) [0] 
> producer_handler [http server VC_EVENT_EOS]
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (http_tunnel) [0] 
> producer_handler_dechunked [http server VC_EVENT_EOS]
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (http_redirect) 
> [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 104
> [Dec 14 20:36:30.192] Server {0x7fff7b5f9960} DEBUG: (http) [0] 
> [&HttpSM::tunnel_handler_server, VC_EVENT_EOS]
> 
> At this point, we know that the read socket closed and we are trying to 
> deliver the EOS event up the stack. Then this happens:
> 
> [Dec 14 20:36:31.338] Server {0x7fff7b5f9960} DEBUG: (http_ss) [0] session 
> closed
> [Dec 14 20:36:31.362] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Passive: 
> reenable Write
> [Dec 14 20:36:31.362] Server {0x7fff7b5f9960} DEBUG: (pvc_event) [0] Passive: 
> Received event 1
> [Dec 14 20:36:31.362] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Passive: 
> process_write_side
> [Dec 14 20:36:31.363] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Passive: 
> process_write_side; act_on 3499
> [Dec 14 20:36:31.386] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Passive: 
> process_write_side; added 3499
> [Dec 14 20:36:31.386] Server {0x7fff7b5f9960} DEBUG: (http_tunnel) [0] 
> consumer_handler [user agent VC_EVENT_WRITE_COMPLETE]
> [Dec 14 20:36:31.387] Server {0x7fff7b5f9960} DEBUG: (http) [0] 
> [&HttpSM::tunnel_handler_ua, VC_EVENT_WRITE_COMPLETE]
> [Dec 14 20:36:31.387] Server {0x7fff7b5f9960} DEBUG: (http_cs) [0] session 
> released by sm [0]
> [Dec 14 20:36:31.387] Server {0x7fff7b5f9960} DEBUG: (http_cs) [0] initiating 
> io for next header
> [Dec 14 20:36:31.387] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Passive: 
> do_io_read for -1 bytes
> [Dec 14 20:36:31.387] Server {0x7fff7b5f9960} DEBUG: (http) [0] 
> [HttpSM::main_handler, HTTP_TUNNEL_EVENT_DONE]
> [Dec 14 20:36:31.387] Server {0x7fff7b5f9960} DEBUG: (http) [0] 
> [&HttpSM::tunnel_handler, HTTP_TUNNEL_EVENT_DONE]
> [Dec 14 20:36:31.387] Server {0x7fff7b5f9960} DEBUG: (http_redirect) 
> [HttpTunnel::deallocate_postdata_copy_buffers]
> [Dec 14 20:36:31.396] Server {0x7fff7b5f9960} DEBUG: (http_seq) 
> [HttpStateMachineGet::update_stats] Logging transaction
> [Dec 14 20:36:31.396] Server {0x7fff7b5f9960} DEBUG: (http) [0] dellocating sm
> 
> So now the HttpSM knows that it's all over because it is deallocating itself 
> in HttpSM::kill_this().
> 
> [Dec 14 20:36:31.396] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Active: 
> process_read_side
> [Dec 14 20:36:31.396] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Active: 
> process_read_side; act_on 3499
> [Dec 14 20:36:31.396] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Active: 
> process_read_side; added 3499
> [Dec 14 20:36:31.396] Server {0x7fff7b5f9960} DEBUG: (FetchSM) 
> [fetch_handler] calling fetch_plugin
> [Dec 14 20:36:31.396] Server {0x7fff7b5f9960} DEBUG: (FetchSM) 
> [process_fetch_read] I am here read
> [Dec 14 20:36:31.396] Server {0x7fff7b5f9960} DEBUG: (FetchSM) 
> [process_fetch_read] number of bytes in read ready 3499
> [Dec 14 20:36:31.397] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Active: 
> reenable Read
> [Dec 14 20:36:31.397] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Passive: 
> reenable Write
> [Dec 14 20:36:31.397] Server {0x7fff7b5f9960} DEBUG: (pvc_event) [0] Passive: 
> Received event 1
> [Dec 14 20:36:31.397] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Passive: 
> process_read_side
> [Dec 14 20:36:31.397] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Passive: 
> process_read_side; act_on 0
> [Dec 14 20:36:31.397] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Passive: 
> closed? 0 shutdown? 0
> [Dec 14 20:36:31.397] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Passive: 
> process_write_side
> [Dec 14 20:36:31.397] Server {0x7fff7b5f9960} DEBUG: (pvc_event) [0] Active: 
> Received event 1
> [Dec 14 20:36:31.397] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Active: 
> process_read_side
> [Dec 14 20:36:31.397] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Active: 
> process_read_side; act_on 0
> [Dec 14 20:36:31.397] Server {0x7fff7b5f9960} DEBUG: (pvc) [0] Active: 
> closed? 0 shutdown? 0
> 
> proxy/PluginVC.cc:
> 624   if (act_on <= 0) {
> 625     Debug("pvc", "[%u] %s: closed? %u shutdown? %u", PVC_ID, PVC_TYPE,
> 626       other_side->closed, other_side->write_state.shutdown);
> 627     if (other_side->closed || other_side->write_state.shutdown) {
> 628       read_state.vio._cont->handleEvent(VC_EVENT_EOS, &read_state.vio);
> 629     }
> 630     return;
> 631   }
> 
> So the Plugin VC does not think its other_side is closed (because no-one has 
> called do_io_close()?). But eventually there is a HttpClientSession timeout, 
> which closes the right PluginVC and wakes it. Now the FetchSM is kicked and 
> the TSFetchUrl event triggers:
> 
> [Dec 14 20:38:26.380] Server {0x102602000} DEBUG: (pvc_event) [0] Passive: 
> Received event 2
> [Dec 14 20:38:26.380] Server {0x102602000} DEBUG: (http_cs) [0] 
> [&HttpClientSession::state_keep_alive, VC_EVENT_ACTIVE_TIMEOUT]
> [Dec 14 20:38:26.380] Server {0x102602000} DEBUG: (pvc) [0] Passive: 
> do_io_close
> [Dec 14 20:38:26.380] Server {0x102602000} DEBUG: (http_cs) [0] session closed
> [Dec 14 20:38:26.380] Server {0x102602000} DEBUG: (http_cs) [0] session 
> destroy
> [Dec 14 20:38:26.380] Server {0x102602000} DEBUG: (pvc) [0] Passive: 
> process_close
> [Dec 14 20:38:26.380] Server {0x102602000} DEBUG: (pvc_event) [0] Active: 
> Received event 1
> [Dec 14 20:38:26.380] Server {0x102602000} DEBUG: (pvc) [0] Active: 
> process_read_side
> [Dec 14 20:38:26.380] Server {0x102602000} DEBUG: (pvc) [0] Active: 
> process_read_side; act_on 0
> [Dec 14 20:38:26.380] Server {0x102602000} DEBUG: (pvc) [0] Active: closed? 1 
> shutdown? 0
> [Dec 14 20:38:26.380] Server {0x102602000} DEBUG: (FetchSM) [fetch_handler] 
> calling fetch_plugin
> 
> After beating my head against this for a while, I don't see what the right 
> way to deliver the EOS event up from the HTTP layer to the PluginVC is. Can 
> anyone hold my hand a little and give me some clues about how this is 
> supposed to work?

This appears to do the trick:

diff --git a/proxy/http/HttpSM.cc b/proxy/http/HttpSM.cc
index 624637b..b450755 100644
--- a/proxy/http/HttpSM.cc
+++ b/proxy/http/HttpSM.cc
@@ -2842,6 +2842,7 @@ HttpSM::tunnel_handler_server(int event, 
HttpTunnelProducer * 
   if (close_connection) {
     p->vc->do_io_close();
     p->read_vio = NULL;
+    t_state.client_info.keep_alive = HTTP_NO_KEEPALIVE;
   } else {
     server_session->attach_hostname(t_state.current.server->name);
     server_session->server_trans_stat--;

I don't understand the HTTP state machine well enough to know whether this is a 
generally correct fix, so I'd love it if anyone could give this a thumbs up or 
down

J

Reply via email to