On Thu, 11 Sep 2003 14:03:17 -0400, Theodore Ts'o <[EMAIL PROTECTED]> wrote in message <[EMAIL PROTECTED]>:
> On Thu, Sep 11, 2003 at 02:04:19AM +0200, Arnt Karlsen wrote: > > ..I still believe in raid-1, but, ext3fs??? > > > > ..how does xfs, jfs and Reiserfs compare? > > If you have random disk corruptions happening as often as you are, no > filesystem is going to be able to help you. The only question is how > quickly the filesystem notices *before* user data starts getting > irrecovably lost. Ext3 generally tends to be one of the more paranoid > filesystems about checking assertions and "should never happen cases", > although I don't know how it compares to reiserfs, jfs, et. al. ..ok, how about ext3 versus ext2 on raid-1? > > > Unless you're talking about *software* RAID-1 under Linux, and the > > > > ..bingo, I should have said so. > > > > > fact that you have to rebuild mirror after an unclean shutdown, > > > but that's arguably a defect in the software RAID 1 > > > implementation. On other systems, such as AIX's software RAID-1, > > > the RAID-1 is implemented with a journal, > > > > ..but software RAID-1 under Linux is not or did I miss something > > here? > > No, software RAID-1 does not do journalling at the RAID level. That > means that in the case of a unclean shutdown, the RAID system will > need to restablish the mirror. ..and after a journal death, and fsck, the raid set will be able to re-establish itself, no? Or does the journal do both/all disks in a raid set? > As I said, this is a performance issue, since half the disk bandwidth > of the RAID array will be diverted to restablishing the mirror during > the unclean shutdown. Note also this is true *regardless* of what > filesystem you use, journaling and non-journaling. ..noted, non-issue in my case. > > ..ok, for my throttle boxes, here is where I should honk the > > horn and divert logging to a log server and schedule a fsck? > > (And ofcourse just reboot my mailservers on the same error.) > > For your throttle boxes, do you need to have any writes to your > filesystems at all? If what you care about is zero downtime, why not > just run syslog over the network, and keep all of your filesystems > mounted read/only? Some extreme configurations I've seen (especially > where ISP's don't have direct/easy access to their systems at remote > POP's), use a read-only flash filesystem, and a ramdisk for /tmp, and > no spinning disks at all. This significantly increases reliability > caused by disk failures, since the hard drive is often the most > vulnerable part of the system, especially in the face of heat > vibrations, etc. ..sounds like an idea. The major point against is geography, I like to arrive at stand-alone one-box solutions, but networked logging is a good way to verify the network status. What is used, ssh tunnels? > > ..IMHO the debian bootstrap should first read the rpm database > > and generate a deb database, and then do 'apt-get update && \ > > apt-get dist-upgrade'. _Is_ there such a bootstrap beast? > > While this would be interesting for those people who are converting > from Red Hat to Debian, it's a lot more complicated than that, since > you also have to convert over the configuration files; Red Hat and > Debian don't necessarily store files in the same location. ..I know. ;-) > I generally find that for production systems, it's much safer and > simpler to install Debian on a new disk (and on a new system), and > then copy over the new configuration files over. That way, you can > test the system and make sure everything is A-OK before cutting over > something on a production system. ..yeah, my pipe dream. ;-) > (By the way, it seems like 50% of your problems is that you're doing > things on the cheap, and yet you still want 100% reliability. If you > want "carrier-grade reliability", you need to pay a little bit extra, > and do things like have hot spares, and installation scripts that > allow you to create and configure new servers automatically, without > needing manual handwork.) ..hey, the isp shop is not mine, and it _is_ a small operation, so I need to grow it so I can charge'em. ;-) These guys are Wintendo convertites, and I do the hard stuff for 'em. ;-) > > ..256MB, but the disks may be marginal, on the known bad disks I get > > write errors. I have seen this same error on power "blinks", > > failures lasting for about a 1/3 of a second without losing monitor > > sync etc on my desktops, once frying a power supply, but usually > > these "blinks" cause no harm. > > Sounds like you have marginal power. Do you have a UPS (preferably a > continuous UPS) to protect your systems? If not, why not? (Again, > it's a bad idea to expect "carrier-grade relaibility" when you're not > willing pay for the basic high-quality equipment, backup equipment, > and devices such as UPS's to protect your equipment.) ..2 different sites, I have marginal power in my lab, but the isp gear is on ups, and that again is on a priority grid feed. ..will be producing my own power on this; geek code suggestions?: http://crest.org/discussion/gasification/199903/msg00055.html ;-) > > ..ah. So with a 30GB /var ext3fs raid-1 I would have 25% or 13% > > consumed by backup copies of the superblock and block group > > descriptors? > > It's an order n**2 problem; so it's not a linear relationship. And > most people get annoyed by that kind of overhead, long before it gets > to 10% or above. ..so I'm tolerant. ;-) > > ..how does the journalling system choose which blocks to work from? > > What I've been able to see, the journal dies when their super blocks > > go bad? > > The filesystem needs the superblock in order to find the journal. If > you have a single gigantic filesystem mounted on /, then if the > primary superblock is corrupted, the kernel will not be able to mount > /, and you're hosed. E2fsck will automatically try the primary > superblock, and if that is corrupt, it will try the first backup > superblock. Failing that, a human will need to manually try one of > the other backup superblocks, if it is corrupted as well. ..this can be tuned to try more blocks before whining for manpower? > If your primary superblock is getting corrupted often, then first of > all, you should try to figure out why this is happening, and take > affirmative actions to prevent them. (The fact that you're reporting > marginal power is supremely suspicious; marginal power can cause disk > corruptions very easily. Getting higher quality power supplies will > help, but a UPS is the first thing I would get.) ..yeah, I'm working on the power bit. ;-) > Secondly, you're better off using a small root filesystem that > generally isn't modified often. What I normally do is use a 128 meg > root filesystem, with a separate /var partition (or /var symlinked to > /usr/var), and /tmp as a ram disk. With the root filesystem rarely > changing, it's much less likely that it will be corrupted due to > hardware problems. Then the root filesystem can come up, and e2fsck > can repair the other filesystems. ..yeah, except for /tmp on ramdisk, that's how I do my boxes, and my isp business client is learning his lesson good. ;-) > But I repeat, your filesystems shouldn't be getting corrupted in the > first place. Using a separate root filesystem is a good idea, and > will help you recover from hardware problems, but your primary > priority should be to avoid the hardware problems in the first place. > > - Ted > -- ..med vennlig hilsen = with Kind Regards from Arnt... ;-) ...with a number of polar bear hunters in his ancestry... Scenarios always come in sets of three: best case, worst case, and just in case. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

