Urgent interrupt processed, I got back to my questions :)

Thanks Casper for his suggestion, the box is scheduled to
reboot soon and I'll try newer Solaris (oi_151a3 probably)
as well. UPDATE: Yes, oi_151a3 has seen all "2.73Tb" of
the disk, so my old question is resolved: the original
Thumper (Sun Fire X4500) does see the 3Tb disks at least
with a current OS, hardware limitations seem to be absent.
The disk is recognized as "ATA-Hitachi HUA72303-A580-2.73Tb".

Booted back into snv_117, the box again sees the smaller
disk size - so it is an OS thing indeed. OS migration into
upgrade plans, check! ;}

2012-05-15 13:41, Jim Klimov wrote:
Hello all, I'd like some practical advice on migration of a
Sun Fire X4500 (Thumper) from aging data disks to a set of
newer disks. Some questions below are my own, others are
passed from the customer and I may consider not all of them
sane - but must ask anyway ;)

1) They hope to use 3Tb disks, and hotplugged an Ultrastar 3Tb
for testing. However, the system only sees it as a 802Gb
device, via Solaris format/fdisk as well as via parted [1].
Is that a limitation of the Marvell controller, disk,
the current OS (snv_117)? Would it be cleared by a reboot
and proper disk detection on POST (I'll test tonight) or
these big disks won't work in X4500, period?

[1]
http://code.google.com/p/solaris-parted/downloads/detail?name=solaris-parted-0.2.tar.gz&can=2&q=


The Thumper box has 48 250Gb disks, beginning to die off,
now arranged as two zfs pools - an rpool built over the
two bootable drives, and the data pool built as a 45-drive
array as 9*(4+1) raidz1 striped, and one hotspare. AFAIK
the number of raidz vdevs can not be brought down without
compromising data integrity/protection, and it is the only
server around with the ~9Tb storage capacity - so there
are no backups or even nowhere to temporarily and safely
to migrate to. Budget is tight. We are estimating assorted
options, and would like suggestions - perhaps some of the
list users have passed through similar transitions, and/or
know which options to avoid like fire ;)

We know that large redundancy is highly recommended for
big HDDs, so in-place autoexpansion of the raidz1 pool
onto 3Tb disks is out of the question.

So far the plan is to migrate the current pool onto 3Tb
drives, and it seems that with the recommended 3-disk
redundancy for large drives, a raidz3 of 8+3 disks and
one hotspare would fit nicely onto 6 controllers (2 disks
each). Mirroring of 1+2 or 1+3 disks times 5 (minimum
desired new volume) would fill most of the box and cost
a lot for relatively little volume (reading would be fast
though).

What would the experienced people suggest - would raidz3
be good?

Should SSDs help? I'm primarily thinking of L2ARC, though
there is NFS serving and iSCSI serving that might benefit
from ZILs as well. What SSD sizing and models would people
suggest for the 16GB RAM server? AFAIK it might be possible
to replace the RAM up to 32GB (maybe costly), but sadly no
more can be installed according to docs and availability
of compatible memory modules; should the RAM doubling be
pursued?

I know it is hard to give suggestions about something vague;
The storage profile is "a bit of everything" in a software
development company - homedirs, regular rolling backups,
images of produced software, VM images for test systems
(executed on remote VM hosts, use Thumper's storage via
ZFS/NFS and ZFS/iSCSI), some databases "of practically
unlimited capacity" for the testbed systems. Fragmentation
is rather high, resilver of one disk took 15 hours; weekly
scrubs take about 85 hours. The server uses a 1Gbit LAN
connection (might become a 4Gbit via aggregation, but the
server did not produce such big bursts of disk storage
even locally as to saturate the one uplink).

Now on to the migration options we brainstormed...

IDEA 1

By far, seems like the safest option: rent or buy a 12-disk
eSATA enclosure and PCI-X adapter (model suggestions welcome
- should support 3TB disks), configure the new pool in the
enclosure, zfs send|zfs recv data, restart local zones with
tasks (databases) and nfs/iscsi services from the new pool.
Ultimately take out disks of old pool, plug the disks of new
pool (and SSDs) inside Thumper, live happy and fast :)

This option requires an enclosure and adapter, with no clues
what to choose and how much that would cost above the raw
disk price.

IDEA 2

Split the original data into several pools, migrate onto
mirrors starting with one big disk.

This idea proposes that the one hotspare disk bay becomes
populated by one new big disk at a time (first one already
inside), and a pool is created on top of this one disk.
Up to 3Tb of data is sent to the new pool, then a new disk
and pool are inserted/created/sent. The original pool
remains intact while the new pools are upgraded to Nway
mirrors and if some sectors do become corrupt - the data
can be restored with some manual fuss about plugging the
old pool back in.

This allows to enforce tiering of information (i.e. pour
stale WORM data on some pools, and dynamic data tending to
fragment - on another); however, free space would become
individual to each such pool while cost overheads of mirrors
may be considered prohibitive.

IDEA 3

This idea involved possible unsafety to the data at some
moments, however it allows to migrate the existing datasets
onto a complete new raidz3 8+3 pool, and with little downtime.

The idea is such: allocate 9 250Gb partitions on the new big
drive, and resilver one disk from each raidz1 set onto a
new partition. This way all raidz1 sets remain redundant,
but the big disk becomes a SPOF if anything happens during
data transition and makes all of the pool's raidz sets
non-redundant.

However, this frees 9 HDD bays where we can stick 9 new big
disks and set up the 8+3 array with 2 missing devices, so
there is also strength for only one disk breaking down
during migration. After the old pool's data has been synced
to the new pool, and if no two drives break during this time,
the 250Gb disks can be taken out and the 8+3 set gets the
remaining two disks and the hotspare (the disk with copies
of original 9 partitions, it should remain untouched until
the end).

IDEA 4

Similar to IDEA 3 except it has less risks to data at expense
of server uptime: the 9 partitions are DD'ed to the new big
disk, and the pool is mounted read-only using these vdevs
(hey, if it is possible to stick in restored images via lofi -
then using partitions instead of original drives should be
possible, right?)

DDing works a lot faster, I estimate 15 hours for all 9 disks
instead of 15hrs to resilver one.

Then the readonly pool is zfs-sent to the new 8+1(+2missing)
raidz3 pool as above. If anything bad happens during this
migration, the original disks were not modified (unlike
replacement of disks with partitions as in IDEA3) and can
be easily reinstated.

Main problem is that services would be downed for at least
a week, although this can be remedied by migrating them off
the box for a while. It is also questionable whether DD'ed
images of pool disks would be picked up by zfs from many
partitions on one disk.

IDEA 5

Like ideas 3 or 4, but involving SVM mirrors as the block
VDEVs for ZFS, instead of resilvering or DDing. These SVM
mirrors would contain the current 9 disks on one side,
and a partition on the new disk on another. Since the SVM
metadevice remains defined, the backend storage can be
juggled freely.

------


So, a few wild options have been discussed, some risky to
data, some risky to uptime of the critical server, some
rather costly - or it seems so.

I please ask the community to take them seriously and not
let my friends make some predictable fatal mistake ;)

Are any of these options (other than IDEA1) viable and/or
reasonable (i.e. would you do something similar, ever)?

PS: Again, suggestions on L2ARC and ZIL are welcome for
a 16GB RAM server with a big addressable storage and
relatively small working set (perhaps a hundred Gb are
regularly needed more than once). Or, do "16Gb RAM" and
"~24Tb disks" not come together in one sentence?
PPS: How much can a used X4540 be bargained for, as an
orthogonal solution, and how much RAM can be put in it?

//Jim Klimov


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to