date:20070916

Re: [zfs-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Adam Lindsay

Heya Kent,

Kent Watsen wrote:
>> It sounds good, that way, but (in theory), you'll see random I/O 
>> suffer a bit when using RAID-Z2: the extra parity will drag 
>> performance down a bit.
> I know what you are saying, but I , wonder if it would be noticeable?  I 

Well, "noticeable" again comes back to your workflow. As you point out 
to Richard, it's (theoretically) 2x IOPS difference, which can be very 
significant for some people.

> think my worst case scenario would be 3 myth frontends watching 1080p 
> content while 4 tuners are recording 1080p content - with each 1080p 
> stream being 27Mb/s, that would be 108Mb/s writes and 81Mb/s reads (all 
> sequential I/O) - does that sound like it would even come close to 
> pushing a 4(4+2) array?

I would say no, not even close to pushing it. Remember, we're measuring 
performance in MBytes/s, and video throughput is measured in Mbit/s (and 
even then, I imagine that a 27 Mbit/s stream over the air is going to be 
pretty rare). So I'm figuring you're just scratching the surface of even 
a minimal array.

Put it this way: can a single, modern hard drive keep up with an ADSL2+ 
(24 Mbit/s) connection?
Throw 24 spindles at the problem, and I'd say you have headroom for a 
*lot* of streams.

>> The RAS guys will flinch at this, but have you considered 8*(2+1) 
>> RAID-Z1?
> That configuration showed up in the output of the program I posted back 
> in July 
> (http://mail.opensolaris.org/pipermail/zfs-discuss/2007-July/041778.html):
> 
>24 bays w/ 500 GB drives having MTBF=5 years
>  - can have 8 (2+1) w/ 0 spares providing 8000 GB with MTTDL of
>95.05 years
>  - can have 4 (4+2) w/ 0 spares providing 8000 GB with MTTDL of
>8673.50 years
> 
> But it is 91 times more likely to fail and this system will contain data 
> that  I don't want to risk losing

I wasn't sure, with your workload. I know with mine, I'm seeing the data 
store as being mostly temporary. With that much data streaming in and 
out, are you planning on archiving *everything*? Cos that's "only" one 
month's worth of HD video.

I'd consider tuning a portion of the array for high throughput, and 
another for high redundancy as an archive for whatever you don't want to 
lose. Whether that's by setting copies=2, or by having a mirrored zpool 
(smart for an archive, because you'll be less sensitive to the write 
performance that suffers there), it's up to you...
ZFS gives us a *lot* of choices. (But then you knew that, and it's what 
brought you to the list :)

>> I don't want to over-pimp my links, but I do think my blogged 
>> experiences with my server (also linked in another thread) might give 
>> you something to think about:
>>  http://lindsay.at/blog/archive/tag/zfs-performance/
> I see that you also set up a video server (myth?), 

For the uncompressed HD test case, no. It'd be for storage/playout of 
Ultra-Grid-like streams, and really, that's there so our network guys 
can give their 10Gb links a little bit of a workout.

> from you blog, I 
> think you are doing 5(2+1) (plus a hot-spare?)  - this is what my 
> program says about a 16-bay system:
> 
>16 bays w/ 500 GB drives having MTBF=5 years
>  - can have 5 (2+1) w/ 1 spares providing 5000 GB with MTTDL of
>1825.00 years
 > [snipped some interesting numbers]
> Note that are MTTDL isn't quite as bad as 8(2+1) since you have three 
> less strips.  

I also committed to having at least one hot spare, which, after staring 
at relling's graphs for days on end, seems to be the cheapest, easiest 
way of upping the MTTDL for any array. I'd recommend it.

>Also, its interesting for me to note that have have 5 
> strips and my 4(4+2) setup would have just one less - so the question to 
> answer if your extra strip is better than my 2 extra disks in each 
> raid-set?

As I understand it, 5(2+1) would scale to better IOPS performance than 
4(4+2), and IOPS represents the performance baseline; as you ask the 
array to do more and more at once, it'll look more like random seeks.

What you get from those bigger zvol groups of 4+2 is higher performance 
per zvol. That said, with my few datapoints on 4+1 RAID-Z groups 
(running on 2 controllers) suggest that that configuration runs into a 
bottleneck somewhere, and underperforms from what's expected.

>> Testing 16 disks locally, however, I do run into noticeable I/O 
>> bottlenecks, and I believe it's down to the top limits of the PCI-X bus.
> Yes, too bad Supermicro doesn't make a PCIe-based version...   But 
> still, the limit of a 64-bit, 133.3MHz PCI-X bus is 1067 MB/s whereas a 
> 64-bit, 100MHz, PCI-X bus is 800MB/s - either way, its much faster than 
> my worst case scenario from above where 7 1080p streams would be 189Mb/s...

Oh, the bus will far exceed your needs, I think.
The exercise is to specify something that handles what you need without 
breaking the bank, no?

BTW, where are these HDTV streams coming from/going to? Ethernet? A 
capture card? (and which o

[zfs-discuss] PLOGI errors

2007-09-16 Thread Gino

Hello,
today we made some tests with failed drives on a zpool.
(SNV60, 2xHBA, 4xJBOD connected through 2 Brocade 2800)
On the log we found hundred of the following errors:

Sep 16 12:04:23 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11dca 
failed state=Timeout, reason=Hardware Error
Sep 16 12:04:23 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 
11dca failed. state=c reason=1.
Sep 16 12:04:24 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dcc 
failed state=Timeout, reason=Hardware Error
Sep 16 12:04:24 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI to 
11dcc failed. state=c reason=1.
Sep 16 12:04:43 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11d01 
failed state=Timeout, reason=Hardware Error
Sep 16 12:04:43 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 
11d01 failed. state=c reason=1.
Sep 16 12:04:44 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dca 
failed state=Timeout, reason=Hardware Error
Sep 16 12:04:44 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI to 
11dca failed. state=c reason=1.
Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11dd6 
failed state=Timeout, reason=Hardware Error
Sep 16 12:05:04 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 
11dd6 failed. state=c reason=1.
Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dd6 
failed state=Timeout, reason=Hardware Error

Could be related to 
http://sunsolve.sun.com/search/document.do?assetkey=1-26-57773-1  ??

Gino
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] PLOGI errors

2007-09-16 Thread Victor Latushkin

Gino,

although these messages show some similarity to ones in the Sun Alert 
you are referring to, it looks like this is unrelated. Sun Alert 57773 
describes symptoms of a problem seen in SAN configurations with specific 
switches (Brocade SilkWorm Switch 12000, 24000, 3250, 3850, 3900) with 
specific FabOS version (prior to 4.4.0b).

Gino пишет:
> Hello,
> today we made some tests with failed drives on a zpool.
> (SNV60, 2xHBA, 4xJBOD connected through 2 Brocade 2800)

Your switch model is different, so I believe Sun Alert 57773 is not 
applicable here.

Hth,
Victor

> On the log we found hundred of the following errors:
> 
> Sep 16 12:04:23 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 
> 11dca failed state=Timeout, reason=Hardware Error
> Sep 16 12:04:23 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI 
> to 11dca failed. state=c reason=1.
> Sep 16 12:04:24 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 
> 11dcc failed state=Timeout, reason=Hardware Error
> Sep 16 12:04:24 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI 
> to 11dcc failed. state=c reason=1.
> Sep 16 12:04:43 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 
> 11d01 failed state=Timeout, reason=Hardware Error
> Sep 16 12:04:43 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI 
> to 11d01 failed. state=c reason=1.
> Sep 16 12:04:44 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 
> 11dca failed state=Timeout, reason=Hardware Error
> Sep 16 12:04:44 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI 
> to 11dca failed. state=c reason=1.
> Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 
> 11dd6 failed state=Timeout, reason=Hardware Error
> Sep 16 12:05:04 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI 
> to 11dd6 failed. state=c reason=1.
> Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 
> 11dd6 failed state=Timeout, reason=Hardware Error
> 
> Could be related to 
> http://sunsolve.sun.com/search/document.do?assetkey=1-26-57773-1  ??
> 
> Gino
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Anton B. Rang

> - can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of
> 28911.68 years

This should, of course, set off one's common-sense alert.

> it is 91 times more likely to fail and this system will contain data 
> that I don't want to risk losing

If you don't want to risk losing data, you need multiple -- off-site -- copies.

(Incidentally, I rarely see these discussions touch upon what sort of UPS is 
being used. Power fluctuations are a great source of correlated disk failures.)

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [xen-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread David Edmondson

> One option I'm still holding on to is to also use the ZFS system as a
> Xen-server - that is OpenSolaris would be running in Dom0...  Given  
> that
> the Xen hypervisor has a pretty small cpu/memory footprint, do you  
> think
> it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of
> Dom0 and bump the memory up 512MB?

A dom0 with 4G and 2 cores should be plenty to run ZFS and the  
support necessary for a reasonable (<16) paravirtualised domains. If  
the guest domains end up using HVM then the dom0 load is higher, but  
we haven't done the work to quantify this properly yet.

dme.
-- 
David Edmondson, Solaris Engineering, http://dme.org


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [xen-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Kent Watsen

David Edmondson wrote:
>> One option I'm still holding on to is to also use the ZFS system as a
>> Xen-server - that is OpenSolaris would be running in Dom0...  Given that
>> the Xen hypervisor has a pretty small cpu/memory footprint, do you think
>> it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of
>> Dom0 and bump the memory up 512MB?
>
> A dom0 with 4G and 2 cores should be plenty to run ZFS and the support 
> necessary for a reasonable (<16) paravirtualised domains. If the guest 
> domains end up using HVM then the dom0 load is higher, but we haven't 
> done the work to quantify this properly yet.

A tasty insight - a million thanks!

I think if I get 2 quad-cores and 16Gb mem, I'd be able to stomach the 
overhead of 25%cpu and 25%mem going to the host - as the cost 
differential of have a dedicated SAN with another totally-redundant Xen 
box would be more expensive

Cheers!
Kent

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Kent Watsen


>> I know what you are saying, but I , wonder if it would be noticeable?  I 
>
> Well, "noticeable" again comes back to your workflow. As you point out 
> to Richard, it's (theoretically) 2x IOPS difference, which can be very 
> significant for some people.
Yeah, but my point is if it would be noticeable to *me* (yes, I am a bit 
self-centered)

> I would say no, not even close to pushing it. Remember, we're 
> measuring performance in MBytes/s, and video throughput is measured in 
> Mbit/s (and even then, I imagine that a 27 Mbit/s stream over the air 
> is going to be pretty rare). So I'm figuring you're just scratching 
> the surface of even a minimal array.
>
> Put it this way: can a single, modern hard drive keep up with an 
> ADSL2+ (24 Mbit/s) connection?
> Throw 24 spindles at the problem, and I'd say you have headroom for a 
> *lot* of streams.
Sweet!  I should probably hang-up this thread now, but there are too 
many other juicy bits to respond too...

> I wasn't sure, with your workload. I know with mine, I'm seeing the 
> data store as being mostly temporary. With that much data streaming in 
> and out, are you planning on archiving *everything*? Cos that's "only" 
> one month's worth of HD video.
Well, not to down-play the importance of my TV recordings, which is 
really a laugh because I'm not really a big TV watcher, I simply don't 
want to ever have to think about this again after getting it setup

> I'd consider tuning a portion of the array for high throughput, and 
> another for high redundancy as an archive for whatever you don't want 
> to lose. Whether that's by setting copies=2, or by having a mirrored 
> zpool (smart for an archive, because you'll be less sensitive to the 
> write performance that suffers there), it's up to you...
> ZFS gives us a *lot* of choices. (But then you knew that, and it's 
> what brought you to the list :)
All true, but if 4(4+2) serves all my needs, I think that its simpler to 
administrate as I can arbitrarily allocate space as needed without 
needing to worry about what kind of space it is - all the space is "good 
and fast" space...

> I also committed to having at least one hot spare, which, after 
> staring at relling's graphs for days on end, seems to be the cheapest, 
> easiest way of upping the MTTDL for any array. I'd recommend it.
No doubt that a hot-spare gives you a bump in MTTDL, but double-parity 
trumps it big time - check out Richard's blog...

> As I understand it, 5(2+1) would scale to better IOPS performance than 
> 4(4+2), and IOPS represents the performance baseline; as you ask the 
> array to do more and more at once, it'll look more like random seeks.
>
> What you get from those bigger zvol groups of 4+2 is higher 
> performance per zvol. That said, with my few datapoints on 4+1 RAID-Z 
> groups (running on 2 controllers) suggest that that configuration runs 
> into a bottleneck somewhere, and underperforms from what's expected.
Er?  Can anyone fill in the missing blank here?


> Oh, the bus will far exceed your needs, I think.
> The exercise is to specify something that handles what you need 
> without breaking the bank, no?
Bank, smank - I build a system every 5+ years and I want it to kick ass 
all the way until I build the next one - cheers!


> BTW, where are these HDTV streams coming from/going to? Ethernet? A 
> capture card? (and which ones will work with Solaris?)
Glad you asked, for the lists sake, I'm using two HDHomeRun tuners 
(http://www.silicondust.com/wiki/products/hdhomerun) - actually, I 
bought 3 of them because I felt like I needed a spare :-D


> Yeah, perhaps I've been a bit too circumspect about it, but I haven't 
> been all that impressed with my PCI-X bus configuration. Knowing what 
> I know now, I might've spec'd something different. Of all the 
> suggestions that've gone out on the list, I was most impressed with 
> Tim Cook's:
>
>> Won't come cheap, but this mobo comes with 6x pci-x slots... should 
>> get the job done :)
>>
>> http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE-X.cfm 
>>
>
> That has 3x 133MHz PCI-X slots each connected to the Southbridge via a 
> different PCIe bus, which sounds worthy of being the core of the 
> demi-Thumper you propose.
Yeah, but getting back to PCIe I see these tasty SAS/SATA HBAs from LSI: 
http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3081er/index.html
 
(note, LSI also sells matching PCI-X HBA controllers, in case you need 
to balance your mobo's architecture]

> ...But It all depends what you intend to spend. (This is what I 
> was going to say in my next blog entry on the system:) We're talking 
> about benchmarks that are really far past what you say is your most 
> taxing work load. I say I'm "disappointed" with the contention on my 
> bus putting limits on maximum throughputs, but really, what I have far 
> outstrips my ability to get data into or out of the system.
So moving to the PCIe-based cards shoul

Re: [zfs-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Kent Watsen


>> - can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of
>> 28911.68 years
>> 
>
> This should, of course, set off one's common-sense alert.
>   
So true, I pointed the same thing out in this list a while back [sorry, 
can't find the link] where it was beyond my lifetime and folks responded 
that the "years" unit should not ne taken literally - there are way too 
many variables to cause wild mischief with these theoretical numbers


> If you don't want to risk losing data, you need multiple -- off-site -- 
> copies.
>   
Har, har - like I'm going to do that for my personal family archive  ;)

> (Incidentally, I rarely see these discussions touch upon what sort of UPS is 
> being used. Power fluctuations are a great source of correlated disk 
> failures.)
>   

Glad you brought that up - I currently have an APC 2200XL 
(http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=SU2200XLNET) 
- its rated for 1600 watts, but my current case selections are saying 
they have a 1500W 3+1, should I be worried?


Thanks!
Kent


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Ian Collins

Kent Watsen wrote:
>
> Glad you brought that up - I currently have an APC 2200XL 
> (http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=SU2200XLNET)
>  
> - its rated for 1600 watts, but my current case selections are saying 
> they have a 1500W 3+1, should I be worried?
>
>   
Probably not, my box has 10 drives and two very thirsty FX74 processors
and it draws 450W max.

At 1500W, I'd be more concerned about power bills and cooling than the UPS!

Ian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] NOINUSE_CHECK not working on ZFS

2007-09-16 Thread yu larry liu

It is weird. Did you run label subcommand after modifying the partition 
table? Did you try unset NOINUSE_CHECK before running format?

Larry

Bill Casale wrote:
> Sun Fire 280R
>
> Solaris 10 11/06, KU Generic_125100-08
>
> Created a  ZFS pool with disk c5t0d5, format c5t0d5 shows the disk is 
> part of a ZFS pool.
> Then ran format=>partition=>modify and was able to change the 
> partition for it. This
> resulted in panic and crash when a zpool status was run. From what I 
> can tell
> NOINUSE_CHECK should prevent the modification of a partition that's 
> part of a ZFS
> pool.  I verified that NOINUSE_CHECK=1 is not set in the environment. 
> Also this is
> on a non clustered system.
>
> Any idea's on why this is happening?
>
> -- 
> Thanks,
> Bill
>
>
> 
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] hardware sizing for a zfs-based system?

[zfs-discuss] PLOGI errors

Re: [zfs-discuss] PLOGI errors

Re: [zfs-discuss] hardware sizing for a zfs-based system?

Re: [zfs-discuss] [xen-discuss] hardware sizing for a zfs-based system?

Re: [zfs-discuss] [xen-discuss] hardware sizing for a zfs-based system?

Re: [zfs-discuss] hardware sizing for a zfs-based system?

Re: [zfs-discuss] hardware sizing for a zfs-based system?

Re: [zfs-discuss] hardware sizing for a zfs-based system?

Re: [zfs-discuss] NOINUSE_CHECK not working on ZFS

10 matches

Site Navigation

Mail list logo

Footer information