Re: [zfs-discuss] hardware sizing for a zfs-based system?
Heya Kent, Kent Watsen wrote: >> It sounds good, that way, but (in theory), you'll see random I/O >> suffer a bit when using RAID-Z2: the extra parity will drag >> performance down a bit. > I know what you are saying, but I , wonder if it would be noticeable? I Well, "noticeable" again comes back to your workflow. As you point out to Richard, it's (theoretically) 2x IOPS difference, which can be very significant for some people. > think my worst case scenario would be 3 myth frontends watching 1080p > content while 4 tuners are recording 1080p content - with each 1080p > stream being 27Mb/s, that would be 108Mb/s writes and 81Mb/s reads (all > sequential I/O) - does that sound like it would even come close to > pushing a 4(4+2) array? I would say no, not even close to pushing it. Remember, we're measuring performance in MBytes/s, and video throughput is measured in Mbit/s (and even then, I imagine that a 27 Mbit/s stream over the air is going to be pretty rare). So I'm figuring you're just scratching the surface of even a minimal array. Put it this way: can a single, modern hard drive keep up with an ADSL2+ (24 Mbit/s) connection? Throw 24 spindles at the problem, and I'd say you have headroom for a *lot* of streams. >> The RAS guys will flinch at this, but have you considered 8*(2+1) >> RAID-Z1? > That configuration showed up in the output of the program I posted back > in July > (http://mail.opensolaris.org/pipermail/zfs-discuss/2007-July/041778.html): > >24 bays w/ 500 GB drives having MTBF=5 years > - can have 8 (2+1) w/ 0 spares providing 8000 GB with MTTDL of >95.05 years > - can have 4 (4+2) w/ 0 spares providing 8000 GB with MTTDL of >8673.50 years > > But it is 91 times more likely to fail and this system will contain data > that I don't want to risk losing I wasn't sure, with your workload. I know with mine, I'm seeing the data store as being mostly temporary. With that much data streaming in and out, are you planning on archiving *everything*? Cos that's "only" one month's worth of HD video. I'd consider tuning a portion of the array for high throughput, and another for high redundancy as an archive for whatever you don't want to lose. Whether that's by setting copies=2, or by having a mirrored zpool (smart for an archive, because you'll be less sensitive to the write performance that suffers there), it's up to you... ZFS gives us a *lot* of choices. (But then you knew that, and it's what brought you to the list :) >> I don't want to over-pimp my links, but I do think my blogged >> experiences with my server (also linked in another thread) might give >> you something to think about: >> http://lindsay.at/blog/archive/tag/zfs-performance/ > I see that you also set up a video server (myth?), For the uncompressed HD test case, no. It'd be for storage/playout of Ultra-Grid-like streams, and really, that's there so our network guys can give their 10Gb links a little bit of a workout. > from you blog, I > think you are doing 5(2+1) (plus a hot-spare?) - this is what my > program says about a 16-bay system: > >16 bays w/ 500 GB drives having MTBF=5 years > - can have 5 (2+1) w/ 1 spares providing 5000 GB with MTTDL of >1825.00 years > [snipped some interesting numbers] > Note that are MTTDL isn't quite as bad as 8(2+1) since you have three > less strips. I also committed to having at least one hot spare, which, after staring at relling's graphs for days on end, seems to be the cheapest, easiest way of upping the MTTDL for any array. I'd recommend it. >Also, its interesting for me to note that have have 5 > strips and my 4(4+2) setup would have just one less - so the question to > answer if your extra strip is better than my 2 extra disks in each > raid-set? As I understand it, 5(2+1) would scale to better IOPS performance than 4(4+2), and IOPS represents the performance baseline; as you ask the array to do more and more at once, it'll look more like random seeks. What you get from those bigger zvol groups of 4+2 is higher performance per zvol. That said, with my few datapoints on 4+1 RAID-Z groups (running on 2 controllers) suggest that that configuration runs into a bottleneck somewhere, and underperforms from what's expected. >> Testing 16 disks locally, however, I do run into noticeable I/O >> bottlenecks, and I believe it's down to the top limits of the PCI-X bus. > Yes, too bad Supermicro doesn't make a PCIe-based version... But > still, the limit of a 64-bit, 133.3MHz PCI-X bus is 1067 MB/s whereas a > 64-bit, 100MHz, PCI-X bus is 800MB/s - either way, its much faster than > my worst case scenario from above where 7 1080p streams would be 189Mb/s... Oh, the bus will far exceed your needs, I think. The exercise is to specify something that handles what you need without breaking the bank, no? BTW, where are these HDTV streams coming from/going to? Ethernet? A capture card? (and which o
[zfs-discuss] PLOGI errors
Hello, today we made some tests with failed drives on a zpool. (SNV60, 2xHBA, 4xJBOD connected through 2 Brocade 2800) On the log we found hundred of the following errors: Sep 16 12:04:23 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11dca failed state=Timeout, reason=Hardware Error Sep 16 12:04:23 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 11dca failed. state=c reason=1. Sep 16 12:04:24 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dcc failed state=Timeout, reason=Hardware Error Sep 16 12:04:24 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI to 11dcc failed. state=c reason=1. Sep 16 12:04:43 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11d01 failed state=Timeout, reason=Hardware Error Sep 16 12:04:43 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 11d01 failed. state=c reason=1. Sep 16 12:04:44 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dca failed state=Timeout, reason=Hardware Error Sep 16 12:04:44 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI to 11dca failed. state=c reason=1. Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11dd6 failed state=Timeout, reason=Hardware Error Sep 16 12:05:04 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 11dd6 failed. state=c reason=1. Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dd6 failed state=Timeout, reason=Hardware Error Could be related to http://sunsolve.sun.com/search/document.do?assetkey=1-26-57773-1 ?? Gino This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PLOGI errors
Gino, although these messages show some similarity to ones in the Sun Alert you are referring to, it looks like this is unrelated. Sun Alert 57773 describes symptoms of a problem seen in SAN configurations with specific switches (Brocade SilkWorm Switch 12000, 24000, 3250, 3850, 3900) with specific FabOS version (prior to 4.4.0b). Gino пишет: > Hello, > today we made some tests with failed drives on a zpool. > (SNV60, 2xHBA, 4xJBOD connected through 2 Brocade 2800) Your switch model is different, so I believe Sun Alert 57773 is not applicable here. Hth, Victor > On the log we found hundred of the following errors: > > Sep 16 12:04:23 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to > 11dca failed state=Timeout, reason=Hardware Error > Sep 16 12:04:23 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI > to 11dca failed. state=c reason=1. > Sep 16 12:04:24 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to > 11dcc failed state=Timeout, reason=Hardware Error > Sep 16 12:04:24 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI > to 11dcc failed. state=c reason=1. > Sep 16 12:04:43 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to > 11d01 failed state=Timeout, reason=Hardware Error > Sep 16 12:04:43 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI > to 11d01 failed. state=c reason=1. > Sep 16 12:04:44 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to > 11dca failed state=Timeout, reason=Hardware Error > Sep 16 12:04:44 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI > to 11dca failed. state=c reason=1. > Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to > 11dd6 failed state=Timeout, reason=Hardware Error > Sep 16 12:05:04 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI > to 11dd6 failed. state=c reason=1. > Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to > 11dd6 failed state=Timeout, reason=Hardware Error > > Could be related to > http://sunsolve.sun.com/search/document.do?assetkey=1-26-57773-1 ?? > > Gino > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
> - can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of > 28911.68 years This should, of course, set off one's common-sense alert. > it is 91 times more likely to fail and this system will contain data > that I don't want to risk losing If you don't want to risk losing data, you need multiple -- off-site -- copies. (Incidentally, I rarely see these discussions touch upon what sort of UPS is being used. Power fluctuations are a great source of correlated disk failures.) Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [xen-discuss] hardware sizing for a zfs-based system?
> One option I'm still holding on to is to also use the ZFS system as a > Xen-server - that is OpenSolaris would be running in Dom0... Given > that > the Xen hypervisor has a pretty small cpu/memory footprint, do you > think > it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of > Dom0 and bump the memory up 512MB? A dom0 with 4G and 2 cores should be plenty to run ZFS and the support necessary for a reasonable (<16) paravirtualised domains. If the guest domains end up using HVM then the dom0 load is higher, but we haven't done the work to quantify this properly yet. dme. -- David Edmondson, Solaris Engineering, http://dme.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [xen-discuss] hardware sizing for a zfs-based system?
David Edmondson wrote: >> One option I'm still holding on to is to also use the ZFS system as a >> Xen-server - that is OpenSolaris would be running in Dom0... Given that >> the Xen hypervisor has a pretty small cpu/memory footprint, do you think >> it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of >> Dom0 and bump the memory up 512MB? > > A dom0 with 4G and 2 cores should be plenty to run ZFS and the support > necessary for a reasonable (<16) paravirtualised domains. If the guest > domains end up using HVM then the dom0 load is higher, but we haven't > done the work to quantify this properly yet. A tasty insight - a million thanks! I think if I get 2 quad-cores and 16Gb mem, I'd be able to stomach the overhead of 25%cpu and 25%mem going to the host - as the cost differential of have a dedicated SAN with another totally-redundant Xen box would be more expensive Cheers! Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
>> I know what you are saying, but I , wonder if it would be noticeable? I > > Well, "noticeable" again comes back to your workflow. As you point out > to Richard, it's (theoretically) 2x IOPS difference, which can be very > significant for some people. Yeah, but my point is if it would be noticeable to *me* (yes, I am a bit self-centered) > I would say no, not even close to pushing it. Remember, we're > measuring performance in MBytes/s, and video throughput is measured in > Mbit/s (and even then, I imagine that a 27 Mbit/s stream over the air > is going to be pretty rare). So I'm figuring you're just scratching > the surface of even a minimal array. > > Put it this way: can a single, modern hard drive keep up with an > ADSL2+ (24 Mbit/s) connection? > Throw 24 spindles at the problem, and I'd say you have headroom for a > *lot* of streams. Sweet! I should probably hang-up this thread now, but there are too many other juicy bits to respond too... > I wasn't sure, with your workload. I know with mine, I'm seeing the > data store as being mostly temporary. With that much data streaming in > and out, are you planning on archiving *everything*? Cos that's "only" > one month's worth of HD video. Well, not to down-play the importance of my TV recordings, which is really a laugh because I'm not really a big TV watcher, I simply don't want to ever have to think about this again after getting it setup > I'd consider tuning a portion of the array for high throughput, and > another for high redundancy as an archive for whatever you don't want > to lose. Whether that's by setting copies=2, or by having a mirrored > zpool (smart for an archive, because you'll be less sensitive to the > write performance that suffers there), it's up to you... > ZFS gives us a *lot* of choices. (But then you knew that, and it's > what brought you to the list :) All true, but if 4(4+2) serves all my needs, I think that its simpler to administrate as I can arbitrarily allocate space as needed without needing to worry about what kind of space it is - all the space is "good and fast" space... > I also committed to having at least one hot spare, which, after > staring at relling's graphs for days on end, seems to be the cheapest, > easiest way of upping the MTTDL for any array. I'd recommend it. No doubt that a hot-spare gives you a bump in MTTDL, but double-parity trumps it big time - check out Richard's blog... > As I understand it, 5(2+1) would scale to better IOPS performance than > 4(4+2), and IOPS represents the performance baseline; as you ask the > array to do more and more at once, it'll look more like random seeks. > > What you get from those bigger zvol groups of 4+2 is higher > performance per zvol. That said, with my few datapoints on 4+1 RAID-Z > groups (running on 2 controllers) suggest that that configuration runs > into a bottleneck somewhere, and underperforms from what's expected. Er? Can anyone fill in the missing blank here? > Oh, the bus will far exceed your needs, I think. > The exercise is to specify something that handles what you need > without breaking the bank, no? Bank, smank - I build a system every 5+ years and I want it to kick ass all the way until I build the next one - cheers! > BTW, where are these HDTV streams coming from/going to? Ethernet? A > capture card? (and which ones will work with Solaris?) Glad you asked, for the lists sake, I'm using two HDHomeRun tuners (http://www.silicondust.com/wiki/products/hdhomerun) - actually, I bought 3 of them because I felt like I needed a spare :-D > Yeah, perhaps I've been a bit too circumspect about it, but I haven't > been all that impressed with my PCI-X bus configuration. Knowing what > I know now, I might've spec'd something different. Of all the > suggestions that've gone out on the list, I was most impressed with > Tim Cook's: > >> Won't come cheap, but this mobo comes with 6x pci-x slots... should >> get the job done :) >> >> http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE-X.cfm >> > > That has 3x 133MHz PCI-X slots each connected to the Southbridge via a > different PCIe bus, which sounds worthy of being the core of the > demi-Thumper you propose. Yeah, but getting back to PCIe I see these tasty SAS/SATA HBAs from LSI: http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3081er/index.html (note, LSI also sells matching PCI-X HBA controllers, in case you need to balance your mobo's architecture] > ...But It all depends what you intend to spend. (This is what I > was going to say in my next blog entry on the system:) We're talking > about benchmarks that are really far past what you say is your most > taxing work load. I say I'm "disappointed" with the contention on my > bus putting limits on maximum throughputs, but really, what I have far > outstrips my ability to get data into or out of the system. So moving to the PCIe-based cards shoul
Re: [zfs-discuss] hardware sizing for a zfs-based system?
>> - can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of >> 28911.68 years >> > > This should, of course, set off one's common-sense alert. > So true, I pointed the same thing out in this list a while back [sorry, can't find the link] where it was beyond my lifetime and folks responded that the "years" unit should not ne taken literally - there are way too many variables to cause wild mischief with these theoretical numbers > If you don't want to risk losing data, you need multiple -- off-site -- > copies. > Har, har - like I'm going to do that for my personal family archive ;) > (Incidentally, I rarely see these discussions touch upon what sort of UPS is > being used. Power fluctuations are a great source of correlated disk > failures.) > Glad you brought that up - I currently have an APC 2200XL (http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=SU2200XLNET) - its rated for 1600 watts, but my current case selections are saying they have a 1500W 3+1, should I be worried? Thanks! Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
Kent Watsen wrote: > > Glad you brought that up - I currently have an APC 2200XL > (http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=SU2200XLNET) > > - its rated for 1600 watts, but my current case selections are saying > they have a 1500W 3+1, should I be worried? > > Probably not, my box has 10 drives and two very thirsty FX74 processors and it draws 450W max. At 1500W, I'd be more concerned about power bills and cooling than the UPS! Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NOINUSE_CHECK not working on ZFS
It is weird. Did you run label subcommand after modifying the partition table? Did you try unset NOINUSE_CHECK before running format? Larry Bill Casale wrote: > Sun Fire 280R > > Solaris 10 11/06, KU Generic_125100-08 > > Created a ZFS pool with disk c5t0d5, format c5t0d5 shows the disk is > part of a ZFS pool. > Then ran format=>partition=>modify and was able to change the > partition for it. This > resulted in panic and crash when a zpool status was run. From what I > can tell > NOINUSE_CHECK should prevent the modification of a partition that's > part of a ZFS > pool. I verified that NOINUSE_CHECK=1 is not set in the environment. > Also this is > on a non clustered system. > > Any idea's on why this is happening? > > -- > Thanks, > Bill > > > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss