A very interesting thread (http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/) and some thinking about the design of SSD's lead to a experiment I did with the Intel X25-M SSD. The question was:
Is my data safe, once it has reached the disk and has been commited to my application ? All transactional safety in ZFS requires the correct impementation of the synchronize cache command (see http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg27264.html, where someone used Opensolaris within VirtualBox, which per default - does ignore the cache flush command). Thus qualified hardware is VERY essential (also see http://www.snia.org/events/storage-developer2009/presentations/monday/JeffBonwick_zfs-What_Next-SDC09.pdf). What I did (for a Intel X25-M G2 (default settings = write cache on) and a Seagate SATA drive (ST3500418AS)): a) Create a Pool b) Create a Programm that opens a file synchronously and writes to the file. It also prints the latest record written successfully. c) pull the power of the SATA disk d) power cycle everything e) open the pool again and verify the content of the file is the one that has been to the application e1) if it is the same - nice hardware e2) if it is NOT the same - BAD hardware What I found out was: Intel X25-M G2: - If I pull the power cable much data is lost, altought commited to the app (some hundred) - If I pull the sata cable no data is lost ST3500418AS: - If I pull the power cable almost no data is lost, but still the last write is lost (strange!) - If I pull the sata cable no data is lost Actually this result was partially expected. Howerver the one missing transaction in my SATA HDD Disk (Seagate) is strange. Unfortunately I do not have "enterprise SAS hardware" handy to verify that my test procedure is correct. Maybe someone can run this test on a SAS test machine ? (see script attached) --- Attachments --- --- script (call it with script.pl --file /mypool/testfile) --- #!/usr/bin/env perl # for O_SYNC use Fcntl qw(:DEFAULT :flock SEEK_CUR SEEK_SET SEEK_END); use IO::File; use Getopt::Long; my $pool="disk"; my $mountroot="/volumes"; my $file="$mountroot/$pool/testfile"; my $abort=0; my $count=0; GetOptions( "pool=s" => \$pool, "testfile|file=s" => \$file, "count=i" => \$count, ); my $dir = $file; $dir =~ s/[^\/]+$//g; if (-e $file) { print "ERROR: File $file already exists\n"; exit 1; } if (! -d "$dir" ) { print "ERROR: Directory $dir does not exist\n"; exit 1; } sysopen (FILE, "$file", O_RDWR | O_CREAT | O_EXCL | O_SYNC) or die "ERROR Opening file $file: $!\n"; $SIG{INT}= sub { print " ... signalling Abort ... (file: $file)\n"; $abort=1; }; $|=1; my $lastok=undef; my $i=0; my $msg=sprintf("This is round number %20s", $i); # O_SYNC, O_CREAT while (!$abort) { $i++; if ($count && $i>$count) { last; }; $msg=sprintf("This is round number %20s", $i); sysseek (FILE, 0, SEEK_SET); print "$msg"; my $rc=syswrite FILE,$msg; if (!defined($rc)) { print "ERROR\n"; print "ERROR While writing $msg\n"; print "ERROR: $!\n"; last; } else { print " DONE \n"; $lastok=$msg; } } close(FILE); print "\nTHE LAST MESSAGE WRITTEN to file $file was:\n\n\t\"$lastok\"\n\n"; ---- Here's the logs of my tests ---- 1) Test the SATA SSD (Intel X25-M) ---------------------------------- .. start write.pl This is round number 67482 This is round number 67483 This is round number 67484 This is round number 67485 This is round number 67486 This is round number 67487 This is round number 67488 This is round number 67489 This is round number 67490 ( .. I pull the POWER CABLE of the SATA SSD .. ) .. I/O hangs .. zpool status shows zpool status -v pool: ssd state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-JQ scrub: none requested config: NAME STATE READ WRITE CKSUM ssd UNAVAIL 0 11 0 insufficient replicas c3t5d0 UNAVAIL 3 2 0 cannot open errors: Permanent errors have been detected in the following files: ssd:<0x0> /volumes/ssd/ /volumes/ssd/testfile ... now I power cycled the machine and put back the power cable ... lets see the pool status pool: ssd state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM ssd ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 errors: No known data errors .... lets look at the file content ... .. remember last reported successful transaction was "67490" r...@nexenta:/volumes/ssd# cat testfile This is round number 67246 r...@nexenta:/volumes/ssd# ... UPS 244 transactions missing - bummer ... .. Ok repeeat the test with pulling the SATA cable only !!! (thus the device has time to write out the changes) This is round number 39451 This is round number 39452 This is round number 39453 This is round number 39454 This is round number 39455 This is round number 39456 This is round number 39457 This is round number 39458 This is round number 39459 .. hangs .. reboot pool: ssd state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM ssd ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 ... cat ssd/testfile (last commited = 39459) This is round number 39459 .. this is OK 1) Test the SATA HDD (Seagate ST3500418AS) ..... same test with a HDD ... This is round number 3548 This is round number 3549 This is round number 3550 This is round number 3551 This is round number 3552 This is round number 3553 This is round number 3554 This is round number 3555 This is round number 3556 This is round number 3557 This is round number 3558 .. hangs pool: disk state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-JQ scrub: none requested config: NAME STATE READ WRITE CKSUM disk UNAVAIL 0 10 0 insufficient replicas c3t5d0 UNAVAIL 3 2 0 cannot open .. reboot .. check file (last commited = 3558) n...@nexenta:/disk$ cat testfile This is round number 3557 .. Again one transaction missing, strange, test again ... .. Again (Disk) ... This is round number 1689 DONE This is round number 1690 DONE This is round number 1691 DONE This is round number 1692 DONE This is round number 1693 DONE This is round number 1694 DONE This is round number 1695 .. pull power cable .. reboot .. check n...@nexenta:/$ cat disk/testfile This is round number 1693 ... again just one missing .. test the SATA cable pull .... This is round number 1269 DONE This is round number 1270 DONE This is round number 1271 DONE This is round number 1272 DONE This is round number 1273 DONE This is round number 1274 DONE This is round number 1275 DONE This is round number 1276 DONE This is round number 1277 .. pull sata cable (not power) n...@nexenta:/$ cat disk/testfile This is round number 1276 .. this is OK -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss