> From: Adam Levin [mailto:levi...@gmail.com] > > I'm not sure I understand exactly what you're doing. Are you using RDMs > and giving each VM a direct LUN to the storage system, or are you presenting > datastores via iSCSI? Are you saying you're presenting one datastore per > VM?
Yeah, iscsi, one datastore per VM. There's no requirement to have separate datastores per VM - it's just that it's nice to have each VM independent of the other. So you can snapshot/rollback/destroy VM's without any relation to the others. > Managing RDMs for 2500 VMs is simply impractical, and there's a limit to the > number of datastores VMWare supports anyway. When it's automated, there's no work impact on me, that would make it impractical. I don't know how many datastores vmware supports - Thanks for mentioning it. Looks like: Virtual disks per datastore cluster 9000 Datastores per datastore cluster 64 Datastore clusters per vCenter 256 http://www.vmware.com/pdf/vsphere5/r55/vsphere-55-configuration-maximums.pdf > As for the filesystems, it's true that most filesystems today can survive a > hard > reboot, but the applications may or may not. Dunno what filesystems or applications you support, but these aren't concerns for the *filesystems* ext3/4, btrfs, ntfs, xfs, zfs, hfs+... Which is all the filesystems I can think of, in current usage anywhere I've ever worked. As for applications - Most applications other than databases have no problems. (Depends on what applications you're supporting - transactional credit card processing, for example, and probably some other applications, should be handled with care. Haven't been a concern to me.) For databases, you make sure it's ACID compliant, and then it's not a problem. Plus, you're backing up databases by other means in addition to OS snapshots, right? So there's a multiple safety net. SQLite is ACID compliant. MySql is ACID compliant when using InnoDB. Prior to 5.5, that was not the default (but you could choose InnoDB if you wanted). But mysql >= 5.5 they made InnoDB the default. Postgres is ACID compliant. I haven't checked into MS SQL or Oracle. I would be very shocked if they weren't. > Crash consistency provides for the OS to come back, do some > filesystem repairs, and hopefully most of your data is intact, I would describe it differently: After a hard crash, a journaling or intent logging filesystem is able to (without any effort) instantly detect which, if any, write operations had been interrupted, and then either back it out as if it never existed, or complete it as if it were never interrupted, thus guaranteeing the filesystem is always in a consistent state - meaning - A state through which the filesystem had passed, but suddenly got interrupted, during normal operation. After reading the rest of your post, it's clear that the difference between your and my ideas boils down to this: I think the filesystems and applications can survive a crash. You don't believe that - but you also believe that quiescing addresses that problem - I disagree on both points. Quiescing is just flushing buffers. But the filesystem itself, and any ACID compliant databases, already have awareness of which writes need to be flushed to ensure consistency - And they are already flushing those particular writes, just in case of power loss or kernel crash, while allowing the other writes to exist in write buffers. They aren't counting on *you* to trigger a quiesce before a crash. They assume a crash could occur at any time, without any warning. If you snapshot the storage without flushing the buffers, yes there will be data in memory that wasn't included in the snapshot, but no there won't be an inconsistent filesystem or unusable database after recovery. Yes, it's true that the 15 seconds of in-memory buffered writes will be excluded from the snapshot, but it's also true that the 3 hours and 59 minutes of writes that will occur before your next snapshot will also be excluded from the current snapshot. Suddenly the 15 seconds of buffered writes you might save by flushing buffers prior to snapshot becomes less relevant. _______________________________________________ Tech mailing list Tech@lists.lopsa.org https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/