> From: Adam Levin [mailto:levi...@gmail.com]
> 
> I'm not sure I understand exactly what you're doing.  Are you using RDMs
> and giving each VM a direct LUN to the storage system, or are you presenting
> datastores via iSCSI?  Are you saying you're presenting one datastore per
> VM?

Yeah, iscsi, one datastore per VM. There's no requirement to have separate 
datastores per VM - it's just that it's nice to have each VM independent of the 
other. So you can snapshot/rollback/destroy VM's without any relation to the 
others.


> Managing RDMs for 2500 VMs is simply impractical, and there's a limit to the
> number of datastores VMWare supports anyway.

When it's automated, there's no work impact on me, that would make it 
impractical. I don't know how many datastores vmware supports - Thanks for 
mentioning it. Looks like:

Virtual disks per datastore cluster 9000
Datastores per datastore cluster 64
Datastore clusters per vCenter 256
http://www.vmware.com/pdf/vsphere5/r55/vsphere-55-configuration-maximums.pdf


> As for the filesystems, it's true that most filesystems today can survive a 
> hard
> reboot, but the applications may or may not.

Dunno what filesystems or applications you support, but these aren't concerns 
for the *filesystems* ext3/4, btrfs, ntfs, xfs, zfs, hfs+... Which is all the 
filesystems I can think of, in current usage anywhere I've ever worked.

As for applications - Most applications other than databases have no problems. 
(Depends on what applications you're supporting - transactional credit card 
processing, for example, and probably some other applications, should be 
handled with care. Haven't been a concern to me.) For databases, you make sure 
it's ACID compliant, and then it's not a problem. Plus, you're backing up 
databases by other means in addition to OS snapshots, right? So there's a 
multiple safety net.

SQLite is ACID compliant.
MySql is ACID compliant when using InnoDB. Prior to 5.5, that was not the 
default (but you could choose InnoDB if you wanted). But mysql >= 5.5 they made 
InnoDB the default.
Postgres is ACID compliant.

I haven't checked into MS SQL or Oracle. I would be very shocked if they 
weren't.


> Crash consistency provides for the OS to come back, do some
> filesystem repairs, and hopefully most of your data is intact, 

I would describe it differently: After a hard crash, a journaling or intent 
logging filesystem is able to (without any effort) instantly detect which, if 
any, write operations had been interrupted, and then either back it out as if 
it never existed, or complete it as if it were never interrupted, thus 
guaranteeing the filesystem is always in a consistent state - meaning - A state 
through which the filesystem had passed, but suddenly got interrupted, during 
normal operation.

After reading the rest of your post, it's clear that the difference between 
your and my ideas boils down to this: I think the filesystems and applications 
can survive a crash. You don't believe that - but you also believe that 
quiescing addresses that problem - I disagree on both points. Quiescing is just 
flushing buffers. But the filesystem itself, and any ACID compliant databases, 
already have awareness of which writes need to be flushed to ensure consistency 
- And they are already flushing those particular writes, just in case of power 
loss or kernel crash, while allowing the other writes to exist in write 
buffers. They aren't counting on *you* to trigger a quiesce before a crash. 
They assume a crash could occur at any time, without any warning. 

If you snapshot the storage without flushing the buffers, yes there will be 
data in memory that wasn't included in the snapshot, but no there won't be an 
inconsistent filesystem or unusable database after recovery. Yes, it's true 
that the 15 seconds of in-memory buffered writes will be excluded from the 
snapshot, but it's also true that the 3 hours and 59 minutes of writes that 
will occur before your next snapshot will also be excluded from the current 
snapshot. Suddenly the 15 seconds of buffered writes you might save by flushing 
buffers prior to snapshot becomes less relevant.
_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to