Susan, Have a few questions in trying to determine what direction we need to go from our end on our issue.
Due to our pre-release test cycle time we are currently prepared to ship our next product update using the ATS 6.0.0 code base. Unfortunately we do not have the test window to move to a later release at this time. So based on your apparent intimate knowledge in this area as well as your knowledge of the changes that have been made over the past months and changes in the queue, were we to stick with ATS 6.0.0 what changes in this area do you recommend we try and pick up once committed and ready? I saw the following TS cases mentioned in various threads and all seemed to have dependancies of some sorts on TS-4309. So what of these should we potentially target trying to pick up/patch into our ATS 6.0.0 based release? - TS-4309 Reduced SSL upload/download speed after event loop change (TS-4260) - TS-4487 Don't reschedule read depend on needs & did not check the change of lock at the return callback with wbe. - TS-4260 Change event loop to always stall on waiting for I/O. - TS-4424 ASAN: heap-buffer-overflow in 6.2.x branch We did a temporary hack to get around the read issue we were hitting in 6.0.0 but based on internal testing we realize it is not the proper solution (i.e. Was not the root cause fix but rather a temporary bandaid while we continue our investigation). I am now digging into the previous ATS release we had working (4.1.0) to try and identify what changed that is leading to the issue we are now hitting (PARSE_CONT on incomplete read but read does not continue to read the remaining data). Hope to have more/better data to work with on this issue later this week. Any insight or words of wisdom from the experts in this area is greatly appreciated! Thanks, Craig S. On 5/27/16, 11:19 AM, "Susan Hinrichs" <shinr...@network-geographics.com> wrote: >There is a PR that uses the buffer interface instead of the block >interface which results in simpler code. We are running this code >internally in Yahoo. It fixed a performance problem introduced by a fix >not yet landed in open source. Since the current code works, I haven't >pushed this PR. But if debugging anything in this area, I'd suggest >first moving to the buffer interface. > >https://github.com/apache/trafficserver/pull/629/files > >Alan just rediscovered how additional blocks are added in the existing >code. I'll let him respond with details on that. > >Also, I'd suggest moving up to 6.1 or 6.2.x rather than 6.0.0. I don't >think many folks have deployed 6.0. We are starting to deploy 6.2 and >we tested a bit with 6.1 (and others have deployed 6.1). > > >On 5/27/2016 9:46 AM, Craig Schomburg (craigs) wrote: >> Hey folks, >> >> We are encountering a SSLVNetConnection IOBufferBlock buffer management >> issue in ATS 6.0.0 that we did not see in the earlier ATS 4.0.1 release >> Which we were using. >> >> What we see in ssl_read_from_net() is when we get multiple GET requests on >> a single SSL session, as each GET is processed and ACK/NACK’ed that the >> buffer is not reset and the space released for reuse. As a result, the >> available write_avail() space in the session IOBufferBlock buffer is >> reduced with each subsequent packet until we have insufficient space to >> buffer the packet. >> >> Also appears that ATS is set up to support a chain of 2 IOBufferBlock >> but since only 1 is allocated we bail out of the read loop in >> ssl_read_from_net() with a incomplete packet and then drop it. >> >> Request Response Txn-ID VC >> ---------------------------- ---------------- ------ -------------- >> GET /call/187972?debug=1 200 OK 4 0x560bb93e6420 >> b->write_avail()=4096, nread=0 >> b->write_avail()=4096, nread=1900 (2196 left in buffer) >> nread=0 PARSE DONE >> >> GET /call/widget.jsp... 200 OK 5 0x560bb93e6420 >> b->write_avail()=2196, nread=0 >> b->write_avail()=2196, nread=2120 ( 76 left in buffer) >> nread=0 PARSE DONE >> >> GET /call/js/libs/require.js 304 Not Modified 6 0x560bb93e6420 >> b->write_avail()=76, nread=0 >> b->write_avail()=76, nread=76 ( 0 left) >> b->next is NULL so ssl_read_from_net() bails on read loop and remainder >> of packet is not read >> >> We hacked the ssl_read_from_net() code in the SSL_ERROR_NONE case to >> add_block() if b->next == NULL and block_write_avail == 0 and that >> “appeared” to get us working again but I am not convinced that was the >> correct solution. Concerned because it appears that other areas of the >> code assume there will never be more than 2 buffers in the list and we >> did not put a limit on the list length. >> >> So my question is when should the IOBufferBlock _end and _start have >> been reset() to free the buffer space? I assumed that since we were seeing >> a serial Packet(GET), ACK, Packet(GET), ACK, Packet(GET), ACK, that the >> Buffer space could/should have been reset after each ACK? >> >> Also curious if this is a known issue with ATS 6.0.0 that has been >> addressed or is known/unaddressed? >> >> Continuing to dig through the code in the mean time. Any feedback, insight, >> etc. would be appreciated… >> >> Thanks, >> >> Craig S. >> >