Re: [zfs-discuss] Large scale ZFS deployments out there (>200 disks)

Henrik Johansen Fri, 29 Jan 2010 13:55:11 -0800

On 01/29/10 07:36 PM, Richard Elling wrote:

On Jan 29, 2010, at 12:45 AM, Henrik Johansen wrote:

On 01/28/10 11:13 PM, Lutz Schumann wrote:

While thinking about ZFS as the next generation filesystem
without limits I am wondering if the real world is ready for this
kind of incredible technology ...


I'm actually speaking of hardware :)

ZFS can handle a lot of devices. Once in the import bug
(http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6761786)

is fixed it should be able to handle a lot of disks.


That was fixed in build 125.

I want to ask the ZFS community and users what large scale
deploments are out there.  How man disks ? How much capacity ?
Single pool or many pools on a server ? How does resilver work in
those environtments ? How to you backup ? What is the experience
so far ? Major headakes ?

It would be great if large scale users would share their setups
and experiences with ZFS.


The largest ZFS deployment that we have is currently comprised of
22 Dell MD1000 enclosures (330 750 GB Nearline SAS disks). We have
3 head nodes and use one zpool per node, comprised of rather narrow
(5+2) RAIDZ2 vdevs. This setup is exclusively used for storing
backup data.


This is an interesting design.  It looks like a good use of hardware
and redundancy for backup storage. Would you be able to share more of
the details? :-)

Each head node (Dell PE 2900's) has 3 PERC 6/E controllers (LSI 1078based) with 512 MB cache each.

The PERC 6/E supports both load-balancing and path failover so eachcontroller has 2 SAS connections to a daisy chained group of 3 MD1000enclosures.

The RAIDZ2 vdev layout was chosen because it gives a reasonableperformance vs space ratio and it maps nicely onto the 15 disk MD1000's( 2 x (5+2) +1 ).

There is room for improvement in the design (fewer disks per controller,faster PCI Express slots, etc) but performance is good enough for ourcurrent needs.

Resilver times could be better - I am sure that this will improve
once we upgrade from S10u9 to 2010.03.


Nit: Solaris 10 u9 is 10/03 or 10/04 or 10/05, depending on what you
read. Solaris 10 u8 is 11/09.

One of the things that I am missing in ZFS is the ability to
prioritize background operations like scrub and resilver. All our
disks are idle during daytime and I would love to be able to take
advantage of this, especially during resilver operations.


Scrub I/O is given the lowest priority and is throttled. However, I
am not sure that the throttle is in Solaris 10, because that source
is not publicly available. In general, you will not notice a resource
cap until the system utilization is high enough that the cap is
effective.  In other words, if the system is mostly idle, the scrub
consumes the bulk of the resources.

That's not what I am seeing - resilver operations crawl even when thepool is idle.

This setup has been running for about a year with no major issues
so far. The only hickups we've had were all HW related (no fun in
firmware upgrading 200+ disks).


ugh. -- richard



--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Large scale ZFS deployments out there (>200 disks)

Reply via email to