>>>>> "rsk" == Roy Sigurd Karlsbakk <r...@karlsbakk.net> writes: >>>>> "dm" == David Magda <dma...@ee.ryerson.ca> writes: >>>>> "tt" == Travis Tabbal <tra...@tabbal.net> writes:
rsk> Disabling ZIL is, according to ZFS best practice, NOT rsk> recommended. dm> As mentioned, you do NOT want to run with this in production, dm> but it is a quick way to check. REPEAT: I disagree. Once you associate the disasterizing and dire warnings from the developer's advice-wiki with the specific problems that ZIL-disabling causes for real sysadmins rather than abstract notions of ``POSIX'' or ``the application'', a lot more people end up wanting to disable their ZIL's. In fact, most of the SSD's sold seem to be relying on exactly the trick disabled-ZIL ZFS does for much of their high performance, if not their feasibility within their price bracket period: provide a guarantee of write ordering without durability, and many applications are just, poof, happy. If the SSD's arrange that no writes are reordered across a SYNC CACHE, but don't bother actually providing durability, end uzarZ will ``OMG windows fast and no corruption.'' --> ssd sales. The ``do-not-disable-buy-SSD!!!1!'' advice thus translates to ``buy one of these broken SSD's, and you will be basically happy. Almost everyone is. When you aren't, we can blame the SSD instead of ZFS.'' all that bottlenecked SATA traffic host<->SSD is just CYA and of no real value (except for kernel panics). Now, if someone would make a Battery FOB, that gives broken SSD 60 seconds of power, then we could use the consumer crap SSD's in servers again with real value instead of CYA value. FOB should work like this: == RUNNING == battery ,-------> SATA port: pass -----. recharged? / power to SSD: on \ input / \ power ( . lost | | . input ,---\ v power / v restored / =power lost= =power restored= . =hold-down = =hold down = -- SATA port: block power to SSD: off power to SSD: on ^ | | | . . 60 seconds input \ / elapsed power . =power off= , restored -------- power to SSD: off <- The device must know when its battery has gone bad and stick itself in ``power restored hold down'' state. Knowing when the battery is bad may require more states to test the battery, but this is the general idea. I think it would be much cheaper to build an SSD with supercap, and simpler because you can assume the supercap is good forever instead of testing it. However because of ``market forces'' the FOB approach might sell for cheaper because the FOB cannot be tied to the SSD and used as a way to segment the market. If there are 2 companies making only FOB's and not making SSD's, only then competition will work like people want it to. Otherwise FOBs will be $1000 or something because only ``enterprise'' users are smart/dumb enough to demand them. Normally I would have a problem that the FOB and SSD are separable, but see, the FOB and SSD can be put together with double-sided tape: the tape only has to hold for 60 seconds after $event, and there's no way to separate the two by tripping over a cord. You can safely move SSD+FOB from one chassis to another without fearing all is lost if you jiggle the connection. I think it's okay overall. tt> This risk is mostly mitigated by UPS backup and auto-shutdown tt> when the UPS detects power loss, correct? no no it's about cutting off a class of failure cases and constraining ourselves to relatively sane forms of failure. We are not haggling about NO FAILURES EVAR yet. First, for STEP 1 we isolate the insane kinds of failure that cost us days or months of data rather than just a few seconds, the kinds that call for crazy unplannable ad-hoc recovery methods like `Viktor plz help me' and ``is anyone here a Postgres data recovery expert?'' and ``is there a way I can invalidate the batch of billing auth requests I uploaded yesterday so I can rerun it without double-billing anyone?'' For STEP 1 we make the insane fail almost impossible through clever software and planning. A UPS never never ever qualifies as ``almost impossible''. Then, once that's done, we come back for STEP 2 where we try to minimize the sane failures also, and for step 2 things like UPS might be useful. For STEP 2 it makes sense to talk about percent availability, probability of failure, length of time to recover from Scenario X. but in STEP 1 all the failures are insane ones, so you cannot measure any of these things. UPS is not about how ``paranoid'' you are or how far you want to take STEP 1. you take STEP 1 all the way to completion before worrying about STEP 2. For NFS, the STEP 1 risk on the table is ``server reboots, client does not.'' It is okay if both reboot at once. It is okay if neither reboots. but if you disable ZIL OR have broken SSD like X25 AND NFS server reboots, client doesn't then you have a STEP 1 insane failure case that can cause corrupted database files or virtual disk images on the NFS clients. For example if you fail to complete STEP 1, and then you plug the NFS clients into a more expensive UPS with proper transfer switches for maintenance and A/B power, and the server into a rather ordinary UPS, then you will be at greater risk of this particular NFS problem than if you used no UPS at all. That's not intuitive! But it's true! This comes from putting step 2 before step 1. You must do them in order if you want to stay sane. If you do not care about this NFS problem (or the others) then maybe you can just disable the ZIL. It is a matter of working through step 1. Working through STEP 1 might be ``doesn't affect us. Disable ZIL.'' Or it might be ``get slog with supercap''. STEP 1 will never be ``plug in OCZ Vertex cheaposlog that ignores cacheflush'' if you are doing it right. And Step 2 has nothing to do with anything yet until we finish STEP 1 and the insane failure cases.
pgpvHWfZAembS.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss