I wonder where the perceived bottleneck is. I mean, you have two boxes
connected by ethernet (whatever speed), and you're running a sftp bulk
file transfer. What is the limiting factor? Are the boxes less than
20% idle? Is the nework saturated or is there room for more throughput?
Much of this is answered in the papers or presentations on the HPN-SSH
website (http://www.psc.edu/networking/projects/hpn-ssh/). The executive
summary is that once you start hitting GigE speeds the CPU becomes the
limiting factor. In our test environment (two 8 core machines hooked up
over a LAN) AES128-CTR got us around 500Mb/s. At that point the SSH
process was pegging a core at 100%. When we tried the same test with th
threaded/parallelized AES-CTR mode cipher we saw the performance nearly
double to 938Mb/s.
In other words, with a multiple-CPU box, how much would threads help?
A lot. Even in a dual core box I was able to essentially double my
throughput along a transatlantic path (nearly 800Mb/s fully encrypted
between Pittsburgh and Switzerland).
Within the whole ssh section, where does the CPU spend its time? Is it
crypto or is it in shuffling network packets? Would offloading the
crypto to a separate process (and therefore processor) help?
This is a good question and we don't actually have numbers that are firm
enough that I feel comfortable sharing. However, I can say that well
more than 50% of the time is spent in encryption and the HMAC cost is
probably under 20%.
There are definite and distinct advantages to parallelization in terms
of performance.