Urgent interrupt processed, I got back to my questions :)
Thanks Casper for his suggestion, the box is scheduled to reboot soon and I'll try newer Solaris (oi_151a3 probably) as well. UPDATE: Yes, oi_151a3 has seen all "2.73Tb" of the disk, so my old question is resolved: the original Thumper (Sun Fire X4500) does see the 3Tb disks at least with a current OS, hardware limitations seem to be absent. The disk is recognized as "ATA-Hitachi HUA72303-A580-2.73Tb". Booted back into snv_117, the box again sees the smaller disk size - so it is an OS thing indeed. OS migration into upgrade plans, check! ;} 2012-05-15 13:41, Jim Klimov wrote:
Hello all, I'd like some practical advice on migration of a Sun Fire X4500 (Thumper) from aging data disks to a set of newer disks. Some questions below are my own, others are passed from the customer and I may consider not all of them sane - but must ask anyway ;) 1) They hope to use 3Tb disks, and hotplugged an Ultrastar 3Tb for testing. However, the system only sees it as a 802Gb device, via Solaris format/fdisk as well as via parted [1]. Is that a limitation of the Marvell controller, disk, the current OS (snv_117)? Would it be cleared by a reboot and proper disk detection on POST (I'll test tonight) or these big disks won't work in X4500, period? [1] http://code.google.com/p/solaris-parted/downloads/detail?name=solaris-parted-0.2.tar.gz&can=2&q=
The Thumper box has 48 250Gb disks, beginning to die off, now arranged as two zfs pools - an rpool built over the two bootable drives, and the data pool built as a 45-drive array as 9*(4+1) raidz1 striped, and one hotspare. AFAIK the number of raidz vdevs can not be brought down without compromising data integrity/protection, and it is the only server around with the ~9Tb storage capacity - so there are no backups or even nowhere to temporarily and safely to migrate to. Budget is tight. We are estimating assorted options, and would like suggestions - perhaps some of the list users have passed through similar transitions, and/or know which options to avoid like fire ;) We know that large redundancy is highly recommended for big HDDs, so in-place autoexpansion of the raidz1 pool onto 3Tb disks is out of the question. So far the plan is to migrate the current pool onto 3Tb drives, and it seems that with the recommended 3-disk redundancy for large drives, a raidz3 of 8+3 disks and one hotspare would fit nicely onto 6 controllers (2 disks each). Mirroring of 1+2 or 1+3 disks times 5 (minimum desired new volume) would fill most of the box and cost a lot for relatively little volume (reading would be fast though). What would the experienced people suggest - would raidz3 be good? Should SSDs help? I'm primarily thinking of L2ARC, though there is NFS serving and iSCSI serving that might benefit from ZILs as well. What SSD sizing and models would people suggest for the 16GB RAM server? AFAIK it might be possible to replace the RAM up to 32GB (maybe costly), but sadly no more can be installed according to docs and availability of compatible memory modules; should the RAM doubling be pursued? I know it is hard to give suggestions about something vague; The storage profile is "a bit of everything" in a software development company - homedirs, regular rolling backups, images of produced software, VM images for test systems (executed on remote VM hosts, use Thumper's storage via ZFS/NFS and ZFS/iSCSI), some databases "of practically unlimited capacity" for the testbed systems. Fragmentation is rather high, resilver of one disk took 15 hours; weekly scrubs take about 85 hours. The server uses a 1Gbit LAN connection (might become a 4Gbit via aggregation, but the server did not produce such big bursts of disk storage even locally as to saturate the one uplink). Now on to the migration options we brainstormed... IDEA 1 By far, seems like the safest option: rent or buy a 12-disk eSATA enclosure and PCI-X adapter (model suggestions welcome - should support 3TB disks), configure the new pool in the enclosure, zfs send|zfs recv data, restart local zones with tasks (databases) and nfs/iscsi services from the new pool. Ultimately take out disks of old pool, plug the disks of new pool (and SSDs) inside Thumper, live happy and fast :) This option requires an enclosure and adapter, with no clues what to choose and how much that would cost above the raw disk price. IDEA 2 Split the original data into several pools, migrate onto mirrors starting with one big disk. This idea proposes that the one hotspare disk bay becomes populated by one new big disk at a time (first one already inside), and a pool is created on top of this one disk. Up to 3Tb of data is sent to the new pool, then a new disk and pool are inserted/created/sent. The original pool remains intact while the new pools are upgraded to Nway mirrors and if some sectors do become corrupt - the data can be restored with some manual fuss about plugging the old pool back in. This allows to enforce tiering of information (i.e. pour stale WORM data on some pools, and dynamic data tending to fragment - on another); however, free space would become individual to each such pool while cost overheads of mirrors may be considered prohibitive. IDEA 3 This idea involved possible unsafety to the data at some moments, however it allows to migrate the existing datasets onto a complete new raidz3 8+3 pool, and with little downtime. The idea is such: allocate 9 250Gb partitions on the new big drive, and resilver one disk from each raidz1 set onto a new partition. This way all raidz1 sets remain redundant, but the big disk becomes a SPOF if anything happens during data transition and makes all of the pool's raidz sets non-redundant. However, this frees 9 HDD bays where we can stick 9 new big disks and set up the 8+3 array with 2 missing devices, so there is also strength for only one disk breaking down during migration. After the old pool's data has been synced to the new pool, and if no two drives break during this time, the 250Gb disks can be taken out and the 8+3 set gets the remaining two disks and the hotspare (the disk with copies of original 9 partitions, it should remain untouched until the end). IDEA 4 Similar to IDEA 3 except it has less risks to data at expense of server uptime: the 9 partitions are DD'ed to the new big disk, and the pool is mounted read-only using these vdevs (hey, if it is possible to stick in restored images via lofi - then using partitions instead of original drives should be possible, right?) DDing works a lot faster, I estimate 15 hours for all 9 disks instead of 15hrs to resilver one. Then the readonly pool is zfs-sent to the new 8+1(+2missing) raidz3 pool as above. If anything bad happens during this migration, the original disks were not modified (unlike replacement of disks with partitions as in IDEA3) and can be easily reinstated. Main problem is that services would be downed for at least a week, although this can be remedied by migrating them off the box for a while. It is also questionable whether DD'ed images of pool disks would be picked up by zfs from many partitions on one disk. IDEA 5 Like ideas 3 or 4, but involving SVM mirrors as the block VDEVs for ZFS, instead of resilvering or DDing. These SVM mirrors would contain the current 9 disks on one side, and a partition on the new disk on another. Since the SVM metadevice remains defined, the backend storage can be juggled freely. ------ So, a few wild options have been discussed, some risky to data, some risky to uptime of the critical server, some rather costly - or it seems so. I please ask the community to take them seriously and not let my friends make some predictable fatal mistake ;) Are any of these options (other than IDEA1) viable and/or reasonable (i.e. would you do something similar, ever)? PS: Again, suggestions on L2ARC and ZIL are welcome for a 16GB RAM server with a big addressable storage and relatively small working set (perhaps a hundred Gb are regularly needed more than once). Or, do "16Gb RAM" and "~24Tb disks" not come together in one sentence? PPS: How much can a used X4540 be bargained for, as an orthogonal solution, and how much RAM can be put in it? //Jim Klimov _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss