Thanks all,

Between the info from Alan and Chao I think I understand a bit better why we 
saw a max of 2 buffer reads (even with my temp tweak).  I also was not clear on 
how the resumption of the read worked.  Think I need to go back and look at 
that part again based on Alan’s comment below.  Issue we hit was that we never 
saw the read resume.  MIME posted a PARSE_CONT and then we never saw a 
resumption of the read operation.  In a few cases we also saw 
SSL_ERROR_ZERO_RETURN.  Debug trace below...

2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <SSLNetVConnection.cc:205 (ssl_read_from_net)> (ssl) 
trace=FALSE
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <SSLNetVConnection.cc:210 (ssl_read_from_net)> (ssl) 
[SSL_NetVConnection::ssl_read_from_net] b->write_avail()=125
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <SSLNetVConnection.cc:217 (ssl_read_from_net)> (ssl) 
[SSL_NetVConnection::ssl_read_from_net] nread=125
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <SSLNetVConnection.cc:296 (ssl_read_from_net)> (ssl) 
[SSL_NetVConnection::ssl_read_from_net] bytes_read=125

2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HttpClientSession.cc:409 (state_keep_alive)> (http_cs) 
[4] [&HttpClientSession::state_keep_alive, VC_EVENT_READ_READY]
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HttpClientSession.cc:139 (new_transaction)> (http_cs) 
[4] Starting transaction 3 using sm [8]

2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HttpSM.cc:2535 (main_handler)> (http) [8] 
[HttpSM::main_handler, VC_EVENT_READ_READY]
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HttpSM.cc:574 (state_read_client_request_header)> 
(http) [8] [&HttpSM::state_read_client_request_header, VC_EVENT_READ_READY]
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HttpSM.cc:625 (state_read_client_request_header)> 
(http) [8] HOOP calling parse_req.

2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HdrTSOnly.cc:74 (parse_req)> (http) HOOP here is the 
data start[0]: 47 45 54 <<< GET

2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HTTP.cc:883 (http_parser_parse_req)> (http) HOOP in 
http_parser_parse_req.
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HTTP.cc:885 (http_parser_parse_req)> (http) HOOP in if 
of http_parser_parse_req.
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HTTP.cc:904 (http_parser_parse_req)> (http) craigs 
http_parser_parse_req start:
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HTTP.cc:921 (http_parser_parse_req)> (http) HOOP in 
http_parser_parse_req returning PARSE CONT.        <<< So above only indicated 
125 bytes read?  Where is the rest?
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HdrTSOnly.cc:67 (parse_req)> (http) HOOP no more data 
so breaking

2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HttpSM.cc:629 (state_read_client_request_header)> 
(http) [8] HOOP after parse_req.
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HttpSM.cc:639 (state_read_client_request_header)> 
(http) [8] HOOP before is transparent passthrough allowed.
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HttpSM.cc:702 (state_read_client_request_header)> 
(http) [8] HOOP in switch parse cont.
2016-05-25T19:30:48.711+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1b0f700} DEBUG: <HttpSM.cc:716 (state_read_client_request_header)> 
(http) [8] HOOP in switch parse cont in else.

2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1f13700} DEBUG: <SSLNetVConnection.cc:205 (ssl_read_from_net)> (ssl) 
trace=FALSE
2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1f13700} DEBUG: <SSLNetVConnection.cc:210 (ssl_read_from_net)> (ssl) 
[SSL_NetVConnection::ssl_read_from_net] b->write_avail()=4096
2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1f13700} DEBUG: <SSLUtils.cc:1622 (ssl_callback_info)> (ssl) 
ssl_callback_info ssl: 0x55db68e492c0 where: 16388 ret: 256
2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1f13700} DEBUG: <SSLNetVConnection.cc:217 (ssl_read_from_net)> (ssl) 
[SSL_NetVConnection::ssl_read_from_net] nread=0
2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1f13700} DEBUG: <SSLNetVConnection.cc:281 (ssl_read_from_net)> 
(ssl.error) [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_ZERO_RETURN      
<<< tried to read more and got ZERO???
2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1f13700} DEBUG: <HttpSM.cc:2535 (main_handler)> (http) [7] 
[HttpSM::main_handler, VC_EVENT_EOS]
2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1f13700} DEBUG: <HttpSM.cc:574 (state_read_client_request_header)> 
(http) [7] [&HttpSM::state_read_client_request_header, VC_EVENT_EOS]
2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1f13700} DEBUG: <HttpSM.cc:586 (state_read_client_request_header)> 
(http) EOS for ssl vc 0x55db68dffd70 at read_first_btye state
2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1f13700} DEBUG: <HttpClientSession.cc:325 (do_io_close)> (http_cs) [6] 
session closed
2016-05-25T19:30:53.794+08:00 jabberc-vcse-001 traffic_server[11959]: 
{0x2b88b1f13700} DIAG: (drop_failed_auth) Session End

Interesting part is we added the following tweak in ssl_read_from_net() and the 
complete packet was read (provided it fit within 2 buffers) and all was well…  
We understand that this is not the correct way to address this but it allowed 
us a workaround while this investigation continues...


      case SSL_ERROR_NONE:
#if DEBUG
        SSLDebugBufferPrint("ssl_buff", b->end() + offset, nread, "SSL Read");
#endif

        ink_assert(nread);

        bytes_read += nread;
        offset += nread;
        block_write_avail -= nread;
        ink_assert(block_write_avail >= 0);

        // Temp tweak to work around read issue...
        if ((block_write_avail == 0) && (b->next == NULL)) {
          buf.writer()->add_block();
          Debug("ssl", "[SSL_NetVConnection::ssl_read_from_net] add_block() 
b=%p, b->next=%p", b, bnxt);
        }
        // End temp tweak

        continue;

I will dig further.  Please let me know if there are any hints in the debug 
trace above that I may have overlooked…


One question… Is it worth me trying Susan’s 629 [TS-4309] changes below?  We 
are approaching product release so moving off of 6.0.0 is not a option at the 
moment as we do not have sufficient time to retest a new ATS release.  If I can 
easily integrate the 629 [TS-4309] changes I am glad to give that a try.  How 
close to commit ready do we believe the 629 [TS-4309] changes are?  Are there 
any other changes TS that TS-4309 is dependent upon?

Thanks,

Craig S.



On 5/27/16, 11:25 AM, "Alan Carroll" <solidwallofc...@yahoo-inc.com.INVALID> 
wrote:

>New blocks should be added to the writer chain in 
>SSLNetVConnection::net_read_io with the call to buf.writer()->read_avail 
>(because of the type of buf, MIOBufferAccessor, buf.writer() is MIOBuffer* and 
>its write_avail() should extend the write chain if there is less than the low 
>water mark of space left to write). So after not reading all of the packet in 
>one pass, the socket should still have WRITE signaled and net_read_io should 
>be called again, calling MIOBuffer::write_avail() and then ssl_read_from_net. 
>This code is a bit messed up, Susan's PR is an attempt to make it better (more 
>efficient, robust, and understandable).
> 
>
>    On Friday, May 27, 2016 10:20 AM, Susan Hinrichs 
> <shinr...@network-geographics.com> wrote:
> 
>
> There is a PR that uses the buffer interface instead of the block 
>interface which results in simpler code.  We are running this code 
>internally in Yahoo.  It fixed a performance problem introduced by a fix 
>not yet landed in open source.  Since the current code works, I haven't 
>pushed this PR.  But if debugging anything in this area, I'd suggest 
>first moving to the buffer interface.
>
>https://github.com/apache/trafficserver/pull/629/files
>
>Alan just rediscovered how additional blocks are added in the existing 
>code. I'll let him respond with details on that.
>
>Also, I'd suggest moving up to 6.1 or 6.2.x rather than 6.0.0.  I don't 
>think many folks have deployed 6.0.  We are starting to deploy 6.2 and 
>we tested a bit with 6.1 (and others have deployed 6.1).
>
>
>On 5/27/2016 9:46 AM, Craig Schomburg (craigs) wrote:
>> Hey folks,
>>
>> We are encountering a SSLVNetConnection IOBufferBlock buffer management
>> issue in ATS 6.0.0 that we did not see in the earlier ATS 4.0.1 release
>> Which we were using.
>>
>> What we see in ssl_read_from_net() is when we get multiple GET requests on
>> a single SSL session, as each GET is processed and ACK/NACK’ed that the
>> buffer is not reset and the space released for reuse.  As a result, the
>> available write_avail() space in the session IOBufferBlock buffer is
>> reduced with each subsequent packet until we have insufficient space to
>> buffer the packet.
>>
>> Also appears that ATS is set up to support a chain of 2 IOBufferBlock
>> but since only 1 is allocated we bail out of the read loop in
>> ssl_read_from_net() with a incomplete packet and then drop it.
>>
>> Request                      Response          Txn-ID  VC
>> ----------------------------  ----------------  ------  --------------
>> GET /call/187972?debug=1      200 OK            4      0x560bb93e6420
>>    b->write_avail()=4096, nread=0
>>    b->write_avail()=4096, nread=1900 (2196 left in buffer)
>>    nread=0                PARSE DONE
>>
>> GET /call/widget.jsp...      200 OK            5      0x560bb93e6420
>>    b->write_avail()=2196, nread=0
>>    b->write_avail()=2196, nread=2120 (  76 left in buffer)
>>    nread=0                PARSE DONE
>>
>> GET /call/js/libs/require.js  304 Not Modified  6      0x560bb93e6420
>>    b->write_avail()=76,  nread=0
>>    b->write_avail()=76,  nread=76  (  0 left)
>>    b->next is NULL so ssl_read_from_net() bails on read loop and remainder
>>      of packet is not read
>>
>> We hacked the ssl_read_from_net() code in the SSL_ERROR_NONE case to
>> add_block() if b->next == NULL and block_write_avail == 0 and that
>> “appeared” to get us working again but I am not convinced that was the
>> correct solution.  Concerned because it appears that other areas of the
>> code assume there will never be more than 2 buffers in the list and we
>> did not put a limit on the list length.
>>
>> So my question is when should the IOBufferBlock _end and _start have
>> been reset() to free the buffer space?  I assumed that since we were seeing
>> a serial Packet(GET), ACK, Packet(GET), ACK, Packet(GET), ACK, that the
>> Buffer space could/should have been reset after each ACK?
>>
>> Also curious if this is a known issue with ATS 6.0.0 that has been
>> addressed or is known/unaddressed?
>>
>> Continuing to dig through the code in the mean time.  Any feedback, insight,
>> etc. would be appreciated…
>>
>> Thanks,
>>
>> Craig S.
>>
>
>
>
>  

Reply via email to