> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Matt Banks > > Am I crazy for putting something like this into production using Solaris 10/11? > On paper, it really seems ideal for our needs.
Do you have an objection to solaris 10/11 for some reason? No, it's not crazy (and I wonder why you would ask). > Also, maybe I read it wrong, but why is it that (in the previous thread about > hw raid and zpools) zpools with large numbers of physical drives (eg 20+) Clarification that I know others have already added, but I reiterate: It's not the number of devices in a zpool that matters. It's the amount of data in the resilvering vdev, and the number of devices inside the vdev, and your usage patterns (where the typical use pattern is the worst case usage pattern, especially for a database server). Together these of course have a relation to the number of devices in the pool, but that's not what matters. The problem basically applies to HDD's. By creating your pool of SSD's, this problem should be eliminated. Here is the problem: Assuming the data in the pool is evenly distributed amongst the vdev's, then the more vdev's you have, the less data is in each one. If you make your pool of a small number of large raidzN vdev's, then you're going to have relatively a lot of data in each vdev, and therefore a lot of data in the resilvering vdev. When a vdev resilvers, it will read each slab of data, in essentially time order, which is approximately random disk order, in order to reconstruct the data that must be written on the resilvering device. This creates two problems, (a) Since each disk must fetch a piece of each slab, the random access time of the vdev as a whole is approximately the random access time of the slowest individual device. So the more devices in the vdev, the worse the IOPS for the vdev... And (b) the more data slabs in the vdev, the more iterations of random IO operations must be completed. In other words, during resilvers, you're IOPS limited. If your pool is made of all SSD's, then problem (a) is basically nonexistent, since the random access time of all the devices are equal and essentially zero. Problem (b) isn't necessarily a problem... It's like, if somebody is giving you $1,000 for free every month and then they suddenly drop down to only $500, you complain about what you've lost. ;-) (See below.) In a hardware raid system, resilvering will be done sequentially on all disks in the array. Depending on your specs, a typical time might be 2hrs. All blocks will be resilvered regardless of whether or not they're used. But in ZFS, only used blocks will be resilvered. That means, if your vdev is empty, your resilver is completed instantly. Also, if your vdev is made of SSD's, then the random access times will be just like the sequential access times, and your worst case is still equal to hardware raid resilver. The only time there's a problem is when you have a vdev made of HDD's, and there's a bunch of data in it, and it's scattered randomly (which typically happens due to the nature of COW and snapshot deletion/creation over time). So the HDD's thrash around spending all their time doing random access, with very little payload for each random op. In these cases, even HDD mirrors end up having resilver times that are several times longer than sequentially resilvering the whole disk including unused blocks. In this case, mirrors are the best case scenario, because they're both (a) minimal data in each vdev, and (b) minimal number of devices in the resilvering vdev. Even so, the mirror resilver time might be like 12 hours, in my experience, instead of the 2hrs that hardware would have needed to resilver the whole disk. But if you were using a big vdev (raidzN) of a bunch of HDD's (let's say, 21 disks in a raidz3), you might get resilver times that are a couple orders of magnitude too long... Like 20 days instead of 10 hours. At this level, you should assume your resilver will never complete. So again: Not a problem if you're making your pool out of SSD's. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss