>>>>> "cg" == Christopher George <cgeo...@ddrdrive.com> writes:
cg> I agree, it would be very informative if RAID HBA vendors cg> would publish failure statistics of their Li-Ion based BBU cg> products. If they haven't, then on what are you basing your decision *not* to use one? Just the random thought that they might fail? cg> inflexible proprietary nature of Li-Ion You can get complete systems with charging microcontroller and battery without any undue encumbrances I can detect on sparkfun.com. What's ``proprietary'' mean in this context? cg> the ignition risk, thermal wear-out, and the inflexible cg> proprietary nature of Li-Ion solutions simply outweigh the cg> benefits of internal or all inclusive mounting for enterprise cg> bound NVRAM. well...for *HOME* use based on the failure modes I've observed I'd prefer to keep the battery next to the SDRAM like ACARD and LSI do. for the enterprise, someone should warn netapp/hitachi/emc/storagetek who are presumably Lion based nvram users. One thing on which I can agree: if the vendor has used Lion it's hard to tell if the implementation is proper, ex whether it will warn of an aged battery without enough capacity. For slog, IMHO the ideal behavior would be: 1. weekly test-flushes to CF or USBstick or whatever is the NAND backing-store 2. the device should shut itself off, as if SATA cable were pulled, or in some other way ZFS detects instantly, if the battery's not got capacity left after the test flush completes. One way would be to require *two* consecutive successful test flushes each week. 3. there should be a button you can press to simulate the battery-failure-powerdown behavior, so you can test that ZFS and your controller respond properly. 4. ``redundant'' power should mean the device has (1) power from host, and (2) enough stored energy in the battery to do two consecutive flushes. Whenever the device does not have ``redundant'' power, it should: a. disable itself as in (3) b. flush SDRAM to NAND. This means, if the device's battery is exhausted, the system may boot with the device disconnected. The host will have to suport hotplug so the slog can come back after the battery charges. so, (2) is really a special case of (4). and AIUI Lion will last longer if you don't charge it to 100%. laptops usually want 100% because they compete on mAh/kg at initial purchase, but for this application charging to 70% should be fine which from what I heard will make them last a lot longer before crystalizing. cg> can detect not only a disconnect but any loss of power. In cg> all cases, the card throws an interrupt so that the device cg> driver (and ultimately user space) can be immediately cg> notified. We need to look at the overall system, though. Does a ZFS system using the card disable the slog when this happens? or does it just print a warning in dmesg and do nothing? When you're using a LSI BBU, the disks behind the controller have their write cahce disabled. so, if you evil-tune ZFS to skip issuing SYNC CACHE, but then the BBU dies and becomes write-through, the overall system is still safe (albeit slow). Also what you describe still doesn't seem to detect the failure case you brought up yourself, of a worn-out battery. UPS's do test their batteries, but ones with worn-out batteries enter bypass mode, they don't turn themselves off, which seems to be the only way your card would have to hear a warning. cg> attaching/detaching the external power cable has no effect on cg> data integrity as long as the host is powered on. In other words, as long as you don't trip over both cables at once. :( Does the device partially obey my (4) and immediately flush to NAND when the host is powered off? or does it keep the data in SDRAM only for as long as possible, until told to do otherwise by ``the user'' or something?
pgptaCigYmnol.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss