Yep, HttpSM::setup_blind_tunnel is being called less that SSLNextProtocolAccept::mainEvent. Sorry for confusion, I fixed comment recently.
stiple@proxy-test-proxy:~$ cat traffic.out | grep "tunnel" | wc -l 19829 stiple@proxy-test-proxy:~$ cat traffic.out | grep "mainEvent" | wc -l 19833 2015-03-20 16:45 GMT+02:00 Susan Hinrichs <shinr...@network-geographics.com>: > > > On 3/20/2015 9:39 AM, Lev Stipakov wrote: >> >> Hi Susan, >> >> I use master branch (commit b12b0399f0aeb85cec3a622ebf61e541d9c29912), >> so I assume that TS-3424 is fixed there. >> >> I have tried your suggestion from TS-3453 but unfortunately it didn't >> help. Moreover, in my use case (transparent SSL proxy) code branch “if >> (!vc->getSSLHandShakeComplete()) {}” never gets executed, so it does >> not reach code you've mentioned in TS-3424. >> >> I checked read.triggered and read.enabled values, they are sometimes >> zeros, sometimes ones. If “enabled” value is 0, it is set to 1 in >> UnixNetVConnection.reenable() call, which is triggered from >> HttpSM::setup_blind_tunnel. If “triggered” value is 0, it is set to 1 >> in NetHandler::mainNetEvent method. So far I won't be able to see a >> link between those values and stalled connections. >> >> I also noticed that HttpSM::setup_blind_tunnel is called N times less >> than SSLNextProtocolAccept:mainEvent, where N is an amount of stalled >> connections. > > > Just want to clarify. HttpSM::setup_blind_tunnel is being called less. Not > SSLNextProtocolAccept::mainEvent? That seems believable. I read your > comment on TS-3456 the other way. It was scratching my head on how > mainEvent could be called less than the setup_blind_tunnel. > > Good info. I spent much of yesterday getting Tsung setup, so should be able > to dig in more today. > >> >> JIRA ticked for this issue is >> https://issues.apache.org/jira/browse/TS-3456 >> >> -Lev >> >> 2015-03-19 18:16 GMT+02:00 Susan Hinrichs >> <shinr...@network-geographics.com>: >>> >>> Interesting. It appears that we are dropping the reenable action from >>> your >>> thread sometimes. I'm guessing there is an issue with the >>> readReschedule. >>> >>> I'll give this a go in my transparent dev environment. A couple things >>> you >>> might want to look at. >>> >>> * Brian uncovered a fairly serious SSL startup issue tracked in TS-3424. >>> It >>> is now fixed on main and is (or will be soon) fixed on 5.2.x. >>> >>> * If your read event got the trigger cleared, the readReschedule wouldn't >>> really do anything and things might stall. I saw odd things with regards >>> to >>> SSL and reschedules which I've reported in TS-3453. There is a proposed >>> fix >>> in there you might want to give a go. Not at all clear that is what you >>> are >>> running into. I'm still getting my head around the scheduling logic. >>> You >>> might check whether the this->read.triggered and this->read.enabled bits >>> are >>> set in SSLNetVConnection::reenable before you call readReschedule. >>> >>> In any case, could you file a bug for this. Definitely an issue we need >>> to >>> track and resolve. >>> >>> >>> On 3/19/2015 8:41 AM, Lev Stipakov wrote: >>>> >>>> Hello, >>>> >>>> I made a simple plugin that sets up TS_SSL_SNI_HOOK and creates a >>>> blind tunnel from a separate thread. With low load everything works >>>> fine, but with moderate load (100 simultaneous users, each user sends >>>> 200 HTTPS requests) I see somewhat strange behavior. >>>> >>>> On a client side I use Tsung, which creates users and sends number of >>>> requests per user. For each user Tsung waits for a response before >>>> sending a new request, so if response never arrives, a particular user >>>> (and the whole test) stalls. >>>> >>>> So, with load mentioned above I see few 'stalled' connections on both >>>> client and proxy – netstat shows them as ”established”, ATS seems to >>>> have data structures for those (checked >>>> proxy.process.net.connections_currently_open value), but no traffic >>>> goes between proxy and client. >>>> >>>> Client side (.175): >>>> >>>> tcp 0 0 10.133.3.175:40737 10.133.3.250:443 ESTABLISHED 14332/beam.smp >>>> (more similar connections here) >>>> >>>> Proxy side (.250 is a server): >>>> tcp 0 0 10.133.3.250:443 10.133.3.175:40737 ESTABLISHED >>>> 28117/traffic_serve >>>> (more similar connections here) >>>> >>>> I checked traffic.out log and found out that >>>> ”SSLNextProtocolAccept:mainEvent” does not get called as many times as >>>> it should. This can probably be explained by the fact that client does >>>> not send requests for given user anymore if response to previous >>>> request hasn't been received. Which, in turn, may indicate that at >>>> some point tunnel has not been created. >>>> >>>> The interesting thing is that everything works fine if a tunnel is >>>> created directly from TS_SSL_SNI_HOOK but not from the separate >>>> thread. >>>> >>>> The plugin code is very simple – I set up TS_SSL_SNI_HOOK and start a >>>> thread with TSThreadCreate. When hook got called, I push TSVConn to a >>>> thread-safe queue. The thread wakes up when item has been pushed, >>>> calls TSVConnTunnel / TSVConnReenable for given vconn and then waits >>>> for the next item. I have attached the code. >>>> >>>> Do you guys have any suggestions how I can investigate this problem >>>> further? >>>> >> >> > -- -Lev