Thanks all, Between the info from Alan and Chao I think I understand a bit better why we saw a max of 2 buffer reads (even with my temp tweak). I also was not clear on how the resumption of the read worked. Think I need to go back and look at that part again based on Alan’s comment below. Issue we hit was that we never saw the read resume. MIME posted a PARSE_CONT and then we never saw a resumption of the read operation. In a few cases we also saw SSL_ERROR_ZERO_RETURN. Debug trace below...
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <SSLNetVConnection.cc:205 (ssl_read_from_net)> (ssl) trace=FALSE 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <SSLNetVConnection.cc:210 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] b->write_avail()=125 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <SSLNetVConnection.cc:217 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] nread=125 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <SSLNetVConnection.cc:296 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] bytes_read=125 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HttpClientSession.cc:409 (state_keep_alive)> (http_cs) [4] [&HttpClientSession::state_keep_alive, VC_EVENT_READ_READY] 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HttpClientSession.cc:139 (new_transaction)> (http_cs) [4] Starting transaction 3 using sm [8] 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HttpSM.cc:2535 (main_handler)> (http) [8] [HttpSM::main_handler, VC_EVENT_READ_READY] 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HttpSM.cc:574 (state_read_client_request_header)> (http) [8] [&HttpSM::state_read_client_request_header, VC_EVENT_READ_READY] 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HttpSM.cc:625 (state_read_client_request_header)> (http) [8] HOOP calling parse_req. 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HdrTSOnly.cc:74 (parse_req)> (http) HOOP here is the data start[0]: 47 45 54 <<< GET 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HTTP.cc:883 (http_parser_parse_req)> (http) HOOP in http_parser_parse_req. 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HTTP.cc:885 (http_parser_parse_req)> (http) HOOP in if of http_parser_parse_req. 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HTTP.cc:904 (http_parser_parse_req)> (http) craigs http_parser_parse_req start: 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HTTP.cc:921 (http_parser_parse_req)> (http) HOOP in http_parser_parse_req returning PARSE CONT. <<< So above only indicated 125 bytes read? Where is the rest? 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HdrTSOnly.cc:67 (parse_req)> (http) HOOP no more data so breaking 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HttpSM.cc:629 (state_read_client_request_header)> (http) [8] HOOP after parse_req. 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HttpSM.cc:639 (state_read_client_request_header)> (http) [8] HOOP before is transparent passthrough allowed. 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HttpSM.cc:702 (state_read_client_request_header)> (http) [8] HOOP in switch parse cont. 2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1b0f700} DEBUG: <HttpSM.cc:716 (state_read_client_request_header)> (http) [8] HOOP in switch parse cont in else. 2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1f13700} DEBUG: <SSLNetVConnection.cc:205 (ssl_read_from_net)> (ssl) trace=FALSE 2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1f13700} DEBUG: <SSLNetVConnection.cc:210 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] b->write_avail()=4096 2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1f13700} DEBUG: <SSLUtils.cc:1622 (ssl_callback_info)> (ssl) ssl_callback_info ssl: 0x55db68e492c0 where: 16388 ret: 256 2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1f13700} DEBUG: <SSLNetVConnection.cc:217 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] nread=0 2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1f13700} DEBUG: <SSLNetVConnection.cc:281 (ssl_read_from_net)> (ssl.error) [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_ZERO_RETURN <<< tried to read more and got ZERO??? 2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1f13700} DEBUG: <HttpSM.cc:2535 (main_handler)> (http) [7] [HttpSM::main_handler, VC_EVENT_EOS] 2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1f13700} DEBUG: <HttpSM.cc:574 (state_read_client_request_header)> (http) [7] [&HttpSM::state_read_client_request_header, VC_EVENT_EOS] 2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1f13700} DEBUG: <HttpSM.cc:586 (state_read_client_request_header)> (http) EOS for ssl vc 0x55db68dffd70 at read_first_btye state 2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1f13700} DEBUG: <HttpClientSession.cc:325 (do_io_close)> (http_cs) [6] session closed 2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: {0x2b88b1f13700} DIAG: (drop_failed_auth) Session End Interesting part is we added the following tweak in ssl_read_from_net() and the complete packet was read (provided it fit within 2 buffers) and all was well… We understand that this is not the correct way to address this but it allowed us a workaround while this investigation continues... case SSL_ERROR_NONE: #if DEBUG SSLDebugBufferPrint("ssl_buff", b->end() + offset, nread, "SSL Read"); #endif ink_assert(nread); bytes_read += nread; offset += nread; block_write_avail -= nread; ink_assert(block_write_avail >= 0); // Temp tweak to work around read issue... if ((block_write_avail == 0) && (b->next == NULL)) { buf.writer()->add_block(); Debug("ssl", "[SSL_NetVConnection::ssl_read_from_net] add_block() b=%p, b->next=%p", b, bnxt); } // End temp tweak continue; I will dig further. Please let me know if there are any hints in the debug trace above that I may have overlooked… One question… Is it worth me trying Susan’s 629 [TS-4309] changes below? We are approaching product release so moving off of 6.0.0 is not a option at the moment as we do not have sufficient time to retest a new ATS release. If I can easily integrate the 629 [TS-4309] changes I am glad to give that a try. How close to commit ready do we believe the 629 [TS-4309] changes are? Are there any other changes TS that TS-4309 is dependent upon? Thanks, Craig S. On 5/27/16, 11:25 AM, "Alan Carroll" <solidwallofc...@yahoo-inc.com.INVALID> wrote: >New blocks should be added to the writer chain in >SSLNetVConnection::net_read_io with the call to buf.writer()->read_avail >(because of the type of buf, MIOBufferAccessor, buf.writer() is MIOBuffer* and >its write_avail() should extend the write chain if there is less than the low >water mark of space left to write). So after not reading all of the packet in >one pass, the socket should still have WRITE signaled and net_read_io should >be called again, calling MIOBuffer::write_avail() and then ssl_read_from_net. >This code is a bit messed up, Susan's PR is an attempt to make it better (more >efficient, robust, and understandable). > > > On Friday, May 27, 2016 10:20 AM, Susan Hinrichs > <shinr...@network-geographics.com> wrote: > > > There is a PR that uses the buffer interface instead of the block >interface which results in simpler code. We are running this code >internally in Yahoo. It fixed a performance problem introduced by a fix >not yet landed in open source. Since the current code works, I haven't >pushed this PR. But if debugging anything in this area, I'd suggest >first moving to the buffer interface. > >https://github.com/apache/trafficserver/pull/629/files > >Alan just rediscovered how additional blocks are added in the existing >code. I'll let him respond with details on that. > >Also, I'd suggest moving up to 6.1 or 6.2.x rather than 6.0.0. I don't >think many folks have deployed 6.0. We are starting to deploy 6.2 and >we tested a bit with 6.1 (and others have deployed 6.1). > > >On 5/27/2016 9:46 AM, Craig Schomburg (craigs) wrote: >> Hey folks, >> >> We are encountering a SSLVNetConnection IOBufferBlock buffer management >> issue in ATS 6.0.0 that we did not see in the earlier ATS 4.0.1 release >> Which we were using. >> >> What we see in ssl_read_from_net() is when we get multiple GET requests on >> a single SSL session, as each GET is processed and ACK/NACK’ed that the >> buffer is not reset and the space released for reuse. As a result, the >> available write_avail() space in the session IOBufferBlock buffer is >> reduced with each subsequent packet until we have insufficient space to >> buffer the packet. >> >> Also appears that ATS is set up to support a chain of 2 IOBufferBlock >> but since only 1 is allocated we bail out of the read loop in >> ssl_read_from_net() with a incomplete packet and then drop it. >> >> Request Response Txn-ID VC >> ---------------------------- ---------------- ------ -------------- >> GET /call/187972?debug=1 200 OK 4 0x560bb93e6420 >> b->write_avail()=4096, nread=0 >> b->write_avail()=4096, nread=1900 (2196 left in buffer) >> nread=0 PARSE DONE >> >> GET /call/widget.jsp... 200 OK 5 0x560bb93e6420 >> b->write_avail()=2196, nread=0 >> b->write_avail()=2196, nread=2120 ( 76 left in buffer) >> nread=0 PARSE DONE >> >> GET /call/js/libs/require.js 304 Not Modified 6 0x560bb93e6420 >> b->write_avail()=76, nread=0 >> b->write_avail()=76, nread=76 ( 0 left) >> b->next is NULL so ssl_read_from_net() bails on read loop and remainder >> of packet is not read >> >> We hacked the ssl_read_from_net() code in the SSL_ERROR_NONE case to >> add_block() if b->next == NULL and block_write_avail == 0 and that >> “appeared” to get us working again but I am not convinced that was the >> correct solution. Concerned because it appears that other areas of the >> code assume there will never be more than 2 buffers in the list and we >> did not put a limit on the list length. >> >> So my question is when should the IOBufferBlock _end and _start have >> been reset() to free the buffer space? I assumed that since we were seeing >> a serial Packet(GET), ACK, Packet(GET), ACK, Packet(GET), ACK, that the >> Buffer space could/should have been reset after each ACK? >> >> Also curious if this is a known issue with ATS 6.0.0 that has been >> addressed or is known/unaddressed? >> >> Continuing to dig through the code in the mean time. Any feedback, insight, >> etc. would be appreciated… >> >> Thanks, >> >> Craig S. >> > > > >