Stan Hoeppner wrote: > Dan Ritter wrote: > > You can put cheap SATA disks in, instead of expensive SAS disks. > > The performance may not be as good, but I suspect you are > > looking at sheer capacity rather than IOPS. > > Stick with enterprise quality SATA disks. Throwing "drive of the week" > consumer models, i.e. WD20EARS, in the chassis simply causes unnecessary > heartache down the road.
There are inexpensive disks and then there are cheap disks. Those "green" drives are in definitely the "cheap" category. Much harder to deal with than the "inexpensive" category. I think there are three big classifications of drives. Good quality enterprise drives. Inexpensive drives. Cheap drives. The cheap drives are really terrible! > > Now, the next thing: I know it's tempting to make a single > > filesystem over all these disks. Don't. The fsck times will be > > horrendous. Make filesystems which are the size you need, plus a > > little extra. It's rare to actually need a single gigantic fs. Agreed. But for me it isn't about the fsck time. It is about the size of the problem. If you have full 100G filesystem and there is a problem then you have a 100G problem. It is painful. But you can handle it. If you have a full 10T filesystem and there is a problem then there is a *HUGE* problem. It is so much more than painful. Therefore when practical I like to compartmentalize things so that there is isolation between problems. Whether the problem is due to a hardware failure, a software failure or a human failure. All of which are possible. Having compartmentalization makes dealing with the problem easier and smaller. > Whjat? Are you talking crash recovery boot time "fsck"? With any > modern journaled FS log recovery is instantaneous. If you're talking > about an actual structure check, XFS is pretty quick regardless of inode > count as the check is done in parallel. I can't speak to EXTx as I > don't use them. You should try an experiment and set up a terabyte ext3 and ext4 filesystem and then perform a few crash recovery reboots of the system. It will change your mind. :-) > For a multi terabyte backup server, XFS is the only way to go > anyway. Using XFS also allows infinite growth without requiring > array reshapes nor LVM, while maintaining striped write alignment > and thus maintaining performance. I agree that XFS is a superior filesystem for large filesystems. I have used it there for years. XFS has one unfortunate missing feature. You can't resize a filesystem to be smaller. You can resize them larger. But not smaller. This is a missing feature that I miss as compared to other filesystems. Unfortunately I have some recent FUD concerning xfs. I have had some recent small idle xfs filesystems trigger kernel watchdog timer recoveries recently. Emphasis on idle. Active filesystems are always fine. I used /tmp as a large xfs filesystem but swapped it to be ext4 due to these lockups. Squeeze. Everything current. But when idle it would periodically lock up and the only messages in the syslog and on the system console were concerning xfs threads timed out. When the kernel froze it always had these messages displayed[1]. It was simply using /tmp as a hundred gig or so xfs filesystem. Doing nothing but changing /tmp from xfs to ext4 resolved the problem and it hasn't seen a kernel lockup since. I saw that problem on three different machines but effectively all mine and very similar software configurations. And by kernel lockup I mean unresponsive and it took a power cycle to free it. I hesitated to say anything because of lacking real data but it means I can't completely recommend xfs today even though I have given it strong recommendations in the past. I am thinking that recent kernels are not completely clean specifically for idle xfs filesystems. Meanwhile active ones seem to be just fine. Would love to have this resolved one way or the other so I could go back to recommending xfs again without reservations. > There are hundreds of 30TB+ and dozens of 100TB+ XFS filesystems in > production today, and I know of one over 300TB and one over 500TB, > attached to NASA's two archival storage servers. Definitely XFS can handle large filesystems. And definitely when there is a good version of everything all around it has been a very good and reliable performer for me. I wish my recent bad experiences were resolved. But for large filesystems such as that I think you need a very good and careful administrator to manage the disk farm. And that includes disk use policies as much as it includes managing kernel versions and disk hardware. Huge problems of any sort need more careful management. > When using correctly architected reliable hardware there's no reason one > can't use a single 500TB XFS filesystem. Although I am sure it would work I would hate to have to deal with a problem that large when there is a need for disaster recovery. I guess that is why *I* don't manage storage farms that are that large. :-) Bob [1] Found an old log trace. Stock Squeeze. Everything current. /tmp was the only xfs filesystem on the machine. Most of the time the recovery would work fine. But whenever the machine was locked up frozen this was always displayed on the console. Doing nothing but replacing xfs /tmp with ext4 /tmp and the system freeze problem disappeared. I could put it back and see if the kernel freeze reappears but I don't want to. May 21 09:05:38 fs kernel: [3865560.844047] INFO: task xfssyncd:1794 blocked for more than 120 seconds. May 21 09:05:38 fs kernel: [3865560.925322] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 21 09:05:38 fs kernel: [3865561.021188] xfssyncd D 00000010 0 1794 2 0x00000000 May 21 09:05:38 fs kernel: [3865561.021204] f618d940 00000046 f6189980 00000010 f8311020 c13fc000 c13fc000 c13f7604 May 21 09:05:38 fs kernel: [3865561.021234] f618dafc c3508000 00000001 00036b3f 00000000 00000000 c3554700 d21b3280 May 21 09:05:38 fs kernel: [3865561.021265] c3503604 f618dafc 3997fc05 c35039c8 c106ef0b 00000000 00000000 00000000 May 21 09:05:38 fs kernel: [3865561.021295] Call Trace: May 21 09:05:38 fs kernel: [3865561.021326] [<c106ef0b>] ? rcu_process_gp_end+0x27/0x63 May 21 09:05:38 fs kernel: [3865561.021339] [<c125d891>] ? schedule_timeout+0x20/0xb0 May 21 09:05:38 fs kernel: [3865561.021352] [<c1132b9b>] ? __lookup_tag+0x8e/0xee May 21 09:05:38 fs kernel: [3865561.021362] [<c125d79a>] ? wait_for_common+0xa4/0x100 May 21 09:05:38 fs kernel: [3865561.021374] [<c102daad>] ? default_wake_function+0x0/0x8 May 21 09:05:38 fs kernel: [3865561.021405] [<f8bad6d2>] ? xfs_reclaim_inode+0xca/0x117 [xfs] May 21 09:05:38 fs kernel: [3865561.021425] [<f8bade3c>] ? xfs_inode_ag_walk+0x44/0x73 [xfs] May 21 09:05:38 fs kernel: [3865561.021445] [<f8bad71f>] ? xfs_reclaim_inode_now+0x0/0x4c [xfs] May 21 09:05:38 fs kernel: [3865561.021465] [<f8badea1>] ? xfs_inode_ag_iterator+0x36/0x58 [xfs] May 21 09:05:38 fs kernel: [3865561.021484] [<f8bad71f>] ? xfs_reclaim_inode_now+0x0/0x4c [xfs] May 21 09:05:38 fs kernel: [3865561.021504] [<f8baded1>] ? xfs_reclaim_inodes+0xe/0x10 [xfs] May 21 09:05:38 fs kernel: [3865561.021530] [<f8badef6>] ? xfs_sync_worker+0x23/0x5c [xfs] May 21 09:05:38 fs kernel: [3865561.021549] [<f8bad901>] ? xfssyncd+0x134/0x17d [xfs] May 21 09:05:38 fs kernel: [3865561.021569] [<f8bad7cd>] ? xfssyncd+0x0/0x17d [xfs] May 21 09:05:38 fs kernel: [3865561.021580] [<c10441e0>] ? kthread+0x61/0x66 May 21 09:05:38 fs kernel: [3865561.021590] [<c104417f>] ? kthread+0x0/0x66 May 21 09:05:38 fs kernel: [3865561.021601] [<c1003d47>] ? kernel_thread_helper+0x7/0x10
signature.asc
Description: Digital signature