Craig,

Of the bugs you listed below, I'd only look at TS-4309. There are still some issues there that oknet points out. But that might be easier to debug. If nothing else, your initial read shouldn't be limited to 125 bytes.

Looked more closely at your debug output again. Is this the first HTTP request after the SSL handshake? Or another request after previous requests on the connection? Is this just HTTP1.1 over SSL or HTTP2/SPDY? Does your HTTP client perhaps spit out parts of the HTTP request in separate packets? Is this normally generated client traffic? Or some sort of client test-harness generated traffic? Would be interesting to reproduce this behavior outside of your environment.

It might be interesting to turn on "Wire tracing" to see the contents of processed SSL data. Look at the documentation for the proxy.config.ssl.wire_trace_enabled config element and related config items. This will place the traced data in error.log.

Susan

On 5/31/2016 9:11 AM, Craig Schomburg (craigs) wrote:
Susan,

Have a few questions in trying to determine what direction we need to go from 
our end on our issue.

Due to our pre-release test cycle time we are currently prepared to ship our 
next product update using the ATS 6.0.0 code base.  Unfortunately we do not 
have the test window to move to a later release at this time.

So based on your apparent intimate knowledge in this area as well as your 
knowledge of the changes that have been made over the past months and changes 
in the queue, were we to stick with ATS 6.0.0 what changes in this area do you 
recommend we try and pick up once committed and ready?  I saw the following TS 
cases mentioned in various threads and all seemed to have dependancies of some 
sorts on TS-4309.  So what of these should we potentially target trying to pick 
up/patch into our ATS 6.0.0 based release?

  - TS-4309 Reduced SSL upload/download speed after event loop change (TS-4260)
  - TS-4487 Don't reschedule read depend on needs & did not check the change of 
lock at the return callback with wbe.
  - TS-4260 Change event loop to always stall on waiting for I/O.
  - TS-4424 ASAN: heap-buffer-overflow in 6.2.x branch

We did a temporary hack to get around the read issue we were hitting in 6.0.0 
but based on internal testing we realize it is not the proper solution (i.e. 
Was not the root cause fix but rather a temporary bandaid while we continue our 
investigation).    I am now digging into the previous ATS release we had 
working (4.1.0) to try and identify what changed that is leading to the issue 
we are now hitting (PARSE_CONT on incomplete read but read does not continue to 
read the remaining data).  Hope to have more/better data to work with on this 
issue later this week.

Any insight or words of wisdom from the experts in this area is greatly 
appreciated!

Thanks,

Craig S.





On 5/27/16, 11:19 AM, "Susan Hinrichs" <shinr...@network-geographics.com> wrote:

There is a PR that uses the buffer interface instead of the block
interface which results in simpler code.  We are running this code
internally in Yahoo.  It fixed a performance problem introduced by a fix
not yet landed in open source.  Since the current code works, I haven't
pushed this PR.  But if debugging anything in this area, I'd suggest
first moving to the buffer interface.

https://github.com/apache/trafficserver/pull/629/files

Alan just rediscovered how additional blocks are added in the existing
code. I'll let him respond with details on that.

Also, I'd suggest moving up to 6.1 or 6.2.x rather than 6.0.0.  I don't
think many folks have deployed 6.0.  We are starting to deploy 6.2 and
we tested a bit with 6.1 (and others have deployed 6.1).


On 5/27/2016 9:46 AM, Craig Schomburg (craigs) wrote:
Hey folks,

We are encountering a SSLVNetConnection IOBufferBlock buffer management
issue in ATS 6.0.0 that we did not see in the earlier ATS 4.0.1 release
Which we were using.

What we see in ssl_read_from_net() is when we get multiple GET requests on
a single SSL session, as each GET is processed and ACK/NACK’ed that the
buffer is not reset and the space released for reuse.  As a result, the
available write_avail() space in the session IOBufferBlock buffer is
reduced with each subsequent packet until we have insufficient space to
buffer the packet.

Also appears that ATS is set up to support a chain of 2 IOBufferBlock
but since only 1 is allocated we bail out of the read loop in
ssl_read_from_net() with a incomplete packet and then drop it.

Request                       Response          Txn-ID  VC
----------------------------  ----------------  ------  --------------
GET /call/187972?debug=1      200 OK             4      0x560bb93e6420
    b->write_avail()=4096, nread=0
    b->write_avail()=4096, nread=1900 (2196 left in buffer)
    nread=0                PARSE DONE

GET /call/widget.jsp...       200 OK             5      0x560bb93e6420
    b->write_avail()=2196, nread=0
    b->write_avail()=2196, nread=2120 (  76 left in buffer)
    nread=0                PARSE DONE

GET /call/js/libs/require.js  304 Not Modified   6      0x560bb93e6420
    b->write_avail()=76,   nread=0
    b->write_avail()=76,   nread=76   (   0 left)
    b->next is NULL so ssl_read_from_net() bails on read loop and remainder
      of packet is not read

We hacked the ssl_read_from_net() code in the SSL_ERROR_NONE case to
add_block() if b->next == NULL and block_write_avail == 0 and that
“appeared” to get us working again but I am not convinced that was the
correct solution.  Concerned because it appears that other areas of the
code assume there will never be more than 2 buffers in the list and we
did not put a limit on the list length.

So my question is when should the IOBufferBlock _end and _start have
been reset() to free the buffer space?  I assumed that since we were seeing
a serial Packet(GET), ACK, Packet(GET), ACK, Packet(GET), ACK, that the
Buffer space could/should have been reset after each ACK?

Also curious if this is a known issue with ATS 6.0.0 that has been
addressed or is known/unaddressed?

Continuing to dig through the code in the mean time.  Any feedback, insight,
etc. would be appreciated…

Thanks,

Craig S.


Reply via email to