Hi Kyle, Great testing, btw.
So when you say "x5", did you change the settings as follows: rxBufAize="125940" (=> 25188 x 5) By any chance, have you analyzed a heapdump of Tomcat at periodic intervals to see which class is hogging heap during the session replication? Thanks. -Shanti On Fri, Sep 7, 2012 at 12:19 PM, <kharp...@oreillyauto.com> wrote: > Chris: > >Assembling the sessions into a Collection is likely to be very fast, > >since it's just copying references around: the size of the individual > >sessions should not matter. Of course, pushing all those bytes to the > >other servers... > > >Perhaps Tomcat does something like serialize the session to a big > >binary structure and then sends that (which sounds insane -- streaming > >binary data is how that should be done -- but I haven't checked to > >code to be sure). > > It appears that tomcat is serializing all the data into a singular > structure, rather than a collection of references. Watching VisualVM plot > heap usage during replication, it nearly doubles (in my test env, this was > the only thing running so that makes sense). If you're sure Tomcat is only > making references, then I'd propose there is a problem with the JVM > dereferencing the collection elements and double-counting the memory used. > Either way, it's enough to make the JVM report a doubling of heap usage and > a raise to the heap allocation. As soon as replication is done, heap use > goes back to normal. I've attached a screenshot to the zip file. > > > Now for data: > I did tests of 200 sessions (~20 MB) at a time (200, 400, 600... up to > 3000). I then tested in groups of 1000 (3000, 4000, 5000... up to 10k). > At no point did I receive any exceptions or OOME issues. Heap usage never > climbed above 60% Xmx. My lab was isolated to help give consistent > results. Here are some points. > > 1. There is a pivotal point where replication performance degrades > dramatically. In my tests, this happened around 2400-2600 sessions. I > restarted tomcat and was able to avoid the issue, until I hit 2800 sessions > (~300 MB total session data). There was a 153% jump in time required to > perform replication at this point. From there, each subsequent test took > marginally longer per session (15-25%) than the test before it. Chris was > correct, it's not exponential, but the ms/session gets worse and worse as > we climb. I have no explanation for the sharp jump or the continued > degradation as we climb. I've seem similar performance issues with sort > and comparative logic, but those don't make sense here. Perhaps this > serialized object is being jerked around Young Gen/Old Gen and having to be > constantly reallocated? Grasping at straws here... > > 2. Networking is a large portion of the bottleneck for large data sets. > The thread size and pool size attributes to the sender/receiver had no > impact on throughput. Also, a packet capture revealed nothing naughty > happening. However, the rxBufSize and txBufSize values on the Nio receiver > and the PooledParallel transport elements made a profound difference. I > generated 7000 sessions (~700MB) and used default settings: 74 sec. > Increasing the rx/tx settings by x5 I was able to replicate the sessions in > 33 sec. Gains beyond x5 were almost nil; at x100 (which is absurd) only > resulted in 29.3 sec replication. > A simple SCP transfer of a 700 MB file (using tmpfs folders) between these > same two systems took 13 seconds. > > My conclusion is that tuning the network was obviously a great help, but it > still took 30 seconds to replicate 700MB worth of session data on a network > with enough throughput to perform the transfer in 13 seconds. I don't know > if further network settings could be changed for the DeltaManager to aid in > speeding up replication, but given the spike in memory use and the pivotal > performance drop at a consistent point I'm inclined to think we're hitting > some edge case regarding session size and memory settings (Xmx/Heap and > NewSize/SurvivorRatio). As Chris said, if Tomcat isn't collecting just > references, it probably should be. > > Feel free to pick apart my data or thoughts. I tried to be as analytical > as possible, but there's a lot of conjecture in here. > > Attachment > (See attached file: SessionResearch.zip) > If the list strips it, find copy here: > https://docs.google.com/open?id=0B876X8DOwh8peEkyZVd6RVc4cWc > > Thanks. > > Kyle Harper > > > > > > > > > This communication and any attachments are confidential, protected by > Communications Privacy Act 18 USCS ยง 2510, solely for the use of the > intended recipient, and may contain legally privileged material. If you are > not the intended recipient, please return or destroy it immediately. Thank > you. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org >