On Sat, 2007-07-28 at 14:26 +0100, Dominic Bishop wrote: > I've just been testing out GELI performance on an underlying RAID using a > 3ware 9550SXU-12 running RELENG_6 as of yesterday and seem to be hitting a > performance bottleneck, but I can't see where it is coming from. > > Testing with an unencrypted 100GB GPT partition (/dev/da0p1) gives me around > 200-250MB/s read and write speeds to give an idea of the capability of the > disk device itself. > > Using GELI with a default 128bit AES key seems to limit at ~50MB/s , > changing the sector size all the way upto 128KB makes no difference > whatsoever to the performance. If I use the threads sysctl in loader.conf > and drop the geli threads to 1 thread only (instead of the usual 3 it spawns > on this system) the performance still does not change at all. Monitoring > during writes with systat confirms that it really is spawning 1 or 3 threads > correctly in these cases. > > Here is a uname -a from the machine > > FreeBSD 004 6.2-STABLE FreeBSD 6.2-STABLE #2: Fri Jul 27 20:10:05 CEST 2007 > [EMAIL PROTECTED]:/u1/obj/u1/src/sys/004 amd64 > > Kernel is a copy of GENERIC with GELI option added > > Encrypted partition created using : geli init -s 65536 /dev/da0p1 > > Simple write test done with: dd if=/dev/zero of=/dev/da0p1.eli bs=1m > count=10000 (same as I did on the unencyrpted, a full test with bonnie++ > shows similar speeds) > > Systat output whilst writing, showing 3 threads: > > > /0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10 > Load Average |||| > > /0 /10 /20 /30 /40 /50 /60 /70 /80 /90 /100 > root idle: cpu3 XXXXXXXXX > root idle: cpu1 XXXXXXXX > <idle> XXXXXXXX > root idle: cpu0 XXXXXXX > root idle: cpu2 XXXXXX > root g_eli[2] d XXX > root g_eli[0] d XXX > root g_eli[1] d X > root g_up > root dd > > Output from vmstat -w 5 > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr ad4 da0 in sy cs us sy > id > 0 1 0 38124 3924428 208 0 1 0 9052 0 0 0 1758 451 6354 1 > 15 84 > 0 1 0 38124 3924428 0 0 0 0 13642 0 0 411 2613 128 9483 0 > 22 78 > 0 1 0 38124 3924428 0 0 0 0 13649 0 0 411 2614 130 9483 0 > 22 78 > 0 1 0 38124 3924428 0 0 0 0 13642 0 0 411 2612 128 9477 0 > 22 78 > 0 1 0 38124 3924428 0 0 0 0 13642 0 0 411 2611 128 9474 0 > 23 77 > > Output from iostat -x 5 > extended device statistics > device r/s w/s kr/s kw/s wait svc_t %b > ad4 2.2 0.7 31.6 8.1 0 3.4 1 > da0 0.2 287.8 2.3 36841.5 0 0.4 10 > pass0 0.0 0.0 0.0 0.0 0 0.0 0 > extended device statistics > device r/s w/s kr/s kw/s wait svc_t %b > ad4 0.0 0.0 0.0 0.0 0 0.0 0 > da0 0.0 411.1 0.0 52622.1 0 0.4 15 > pass0 0.0 0.0 0.0 0.0 0 0.0 0 > extended device statistics > device r/s w/s kr/s kw/s wait svc_t %b > ad4 0.0 0.0 0.0 0.0 0 0.0 0 > da0 0.0 411.1 0.0 52616.2 0 0.4 15 > pass0 0.0 0.0 0.0 0.0 0 0.0 0 > > > Looking at these results myself I cannot see where the bottleneck is, I > would assume since changing the sector size or the geli threads doesn't > affect performance that there is some other single threaded part limiting it > but I don't know enough about how it works to say what. > > CPU in the machine is a pair of these: > CPU: Intel(R) Xeon(R) CPU 5110 @ 1.60GHz (1603.92-MHz K8-class > CPU) > > I've also come across some other strange issues with some other machines > which have identical arrays but only a pair of 32bit 3.0Ghz xeons in them > (Also using releng_6 as of yesterday, just i386 not amd64). On those geli > will launch a single thread by default (cores-1 seems to be the default) > however I cannot force it to launch 2 by using the sysctl, although on the 4 > core machine I can successfully use it to launch 4. It would be nice to be > able to use both cores on the 32bit machines for geli but given the results > I've shown here I'm not sure it would gain me much at the moment. > > Another problem I've found is that if I use a sector size for GELI > 8192 > bytes then I'm unable to newfs the encrypted partition afterwards, it fails > immediately with this error: > > newfs /dev/da0p1.eli > increasing block size from 16384 to fragment size (65536) > /dev/da0p1.eli: 62499.9MB (127999872 sectors) block size 65536, fragment > size 65536 > using 5 cylinder groups of 14514.56MB, 232233 blks, 58112 inodes. > newfs: can't read old UFS1 superblock: read error from block device: Invalid > argument > > The underlying device is readable/writeable however as dd can read/write to > it without any errors. > > If anyone has any suggestions/thoughts on any of these points it would be > much appreciated, these machines will be performing backups over 1Gbit LAN > so more speed than I can currently get would be preferable. > > I sent this to geom@ and meant to CC here as that seems to be a pretty quiet > list so might not get seen there, I forgot the CC so apologies for sending > separately here. I'll add here a few extra bits sent to geom@ to a response: > > Trying newfs with -S option to specify sector size matching -s option to > geli init: > > newfs -S 65536 /dev/da0p1.eli > increasing block size from 16384 to fragment size (65536) > /dev/da0p1.eli: 62499.9MB (127999872 sectors) block size 65536, fragment > size 65536 > using 5 cylinder groups of 14514.56MB, 232233 blks, 58112 inodes. > newfs: can't read old UFS1 superblock: read error from block device: Invalid > argument > > Diskinfo reports correct sector size for geli layer and 512 byte for > underlying GPT partition: > diskinfo -v /dev/da0p1 > /dev/da0p1 > 512 # sectorsize > 65536000000 # mediasize in bytes (61G) > 128000000 # mediasize in sectors > 7967 # Cylinders according to firmware. > 255 # Heads according to firmware. > 63 # Sectors according to firmware. > > diskinfo -v /dev/da0p1.eli > /dev/da0p1.eli > 65536 # sectorsize > 65535934464 # mediasize in bytes (61G) > 999999 # mediasize in sectors > 62 # Cylinders according to firmware. > 255 # Heads according to firmware. > 63 # Sectors according to firmware. > > Testing on a onetime geli encryption of the underlying raw device to bypass > the GPT shows very similar poor results: > > dd if=/dev/da0.eli of=/dev/null bs=1m count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 29.739186 secs (35259069 bytes/sec) > > dd if=/dev/zero of=/dev/da0.eli bs=1m count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 23.501061 secs (44618241 bytes/sec) > > For comparison the same test done on the unencrypted raw device: > > dd if=/dev/da0 of=/dev/null bs=1m count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 5.802704 secs (180704717 bytes/sec) > > dd if=/dev/zero of=/dev/da0 bs=1m count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 4.026869 secs (260394859 bytes/sec) > > > Looking at 'top -S -s1' whilst doing a long read/write using geli shows a > geli thread for each core but there only ever seems to be one in a running > state at any given time, the others will be in a state of 'geli:w'. This > would suggest why performance is identical with only 1 geli thread and with > 4 geli threads. > > Regards, > > Dominic Bishop >
A simple solution is just to add some crypto hardware into the mix to beef things up. Something like a Soekris VPN 1401 would do the trick. See hifn(4) and http://www.soekris.com/vpn1401.htm
signature.asc
Description: This is a digitally signed message part