Re: [zfs-discuss] panic with zfs

2007-01-29 Thread Ihsan Dogan
Am 24.1.2007 15:49 Uhr, Michael Schuster schrieb:

>> I am going to create the same conditions here but with snv_55b and
>> then yank
>> a disk from my zpool.  If I get a similar response then I will *hope*
>> for a
>> crash dump.
>>
>> You must be kidding about the "open a case" however.  This is
>> OpenSolaris.
> 
> no, I'm not. That's why I said "If you have a supported version of
> Solaris". Also, Ihsan seems to disagree about OpenSolaris:

I opened a case this morning. Lets see, what the support guys are saying.



Ihsan

-- 
[EMAIL PROTECTED]   http://ihsan.dogan.ch/
http://gallery.dogan.ch/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: ZFS or UFS - what to do?

2007-01-29 Thread Roch - PAE
Anantha N. Srirama writes:
 > Agreed, I guess I didn't articulate my point/thought very well. The
 > best config is to present JBoDs and let ZFS provide the data
 > protection. This has been a very stimulating conversation thread; it
 > is shedding new light into how to best use ZFS. 
 >  
 >  

I would say:

To enable the unique ZFS feature of self-healing
ZFS must be allowed to manage a level of
redundancy: mirroring or Raid-z.

The  type  of LUNs   (JBOD/Raid-*/iscsi) used is not
relevant in this statement.

Now, if  one also relies on ZFS  to  reconstruct data in the
face of  disk failures (as opposed   tostorage based
reconstruction), better make  sure that  single/double  disk
failures do not bring down multiple LUNS at once. So better
protection is achieved by configuring LUNS that maps to
seggregated sets of physical things (disks & controllers).

-r

 > This message posted from opensolaris.org
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ftruncate is failing on ZFS

2007-01-29 Thread dudekula mastan
Hi All,
   
  In my test set up, I have one zpool of size 1000M bytes and it has only 30 M 
free space (970 M is used for some other purpose). On this zpool I created one 
file (using open () call) and i attempted to write 2MB data on it ( with 
write() call) but it is failed. It written only 1.3 MB (the written value of 
write() call) data,  it is because of "No space left on the device". After that 
I tried to truncate this file to 1.3 Mb data but it is failing. 
   
  Any clues on this?
   
  -Masthan

 
-
Food fight? Enjoy some healthy debate
in the Yahoo! Answers Food & Drink Q&A.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: can I use zfs on just a partition?

2007-01-29 Thread Brian Hechinger
On Sun, Jan 28, 2007 at 01:53:04PM +0100, [EMAIL PROTECTED] wrote:
> 
> >is this tuneable somehow/somewhere? can i enabyle writecache if only using a 
> >dedicated partition ?
> 
> If does put the additional data at some what of a risk; not really
> for swap but perhaps not nice for UFS.

How about two partitions used in two different ZPOOLs?  once ZFS boot comes
along, I'm sure RAIDZ(2) won't be supported as a boot device?  If that's the
case, I wouldn't mind splitting the disks into a mirrored OS portion and a
RAIDZ data portion (think system with 3 or 4 disks).

-brian
-- 
"The reason I don't use Gnome: every single other window manager I know of is
very powerfully extensible, where you can switch actions to different mouse
buttons. Guess which one is not, because it might confuse the poor users?
Here's a hint: it's not the small and fast one."--Linus
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] panic with zfs

2007-01-29 Thread George Wilson

Ihsan,

If you are running Solaris 10 then you are probably hitting:

6456939 sd_send_scsi_SYNCHRONIZE_CACHE_biodone() can issue TUR which 
calls biowait()and deadlock/hangs host


This was fixed in opensolaris (build 48) but a patch is not yet 
available for Solaris 10.


Thanks,
George

Ihsan Dogan wrote:

Am 24.1.2007 15:49 Uhr, Michael Schuster schrieb:


I am going to create the same conditions here but with snv_55b and
then yank
a disk from my zpool.  If I get a similar response then I will *hope*
for a
crash dump.

You must be kidding about the "open a case" however.  This is
OpenSolaris.

no, I'm not. That's why I said "If you have a supported version of
Solaris". Also, Ihsan seems to disagree about OpenSolaris:


I opened a case this morning. Lets see, what the support guys are saying.



Ihsan


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-29 Thread Darren Dunham
> > Our Netapp does double-parity RAID.  In fact, the filesystem design is
> > remarkably similar to that of ZFS.  Wouldn't that also detect the
> > error?  I suppose it depends if the `wrong sector without notice'
> > error is repeated each time.  Or is it random?
> 
> On most (all?) other systems the parity only comes into effect when a  
> drive fails. When all the drives are reporting "OK" most (all?) RAID  
> systems don't use the parity data at all. ZFS is the first (only?)  
> system that actively checks the data returned from disk, regardless  
> of whether the drives are reporting they're okay or not.
> 
> I'm sure I'll be corrected if I'm wrong. :)

Netapp/OnTAP does do read verification, but it does it outside the
raid-4/raid-dp protection (just like ZFS does it outside the raidz
protction).  So it's correct that the parity data is not read at all in
either OnTAP or ZFS, but both attempt to do verification of the data on
all reads.

See also: http://blogs.sun.com/bonwick/entry/zfs_end_to_end_data for a
few more specifics on it and the differences from the ZFS data check.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
 < This line left intentionally blank to confuse you. >
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Adding my own compression to zfs

2007-01-29 Thread Dick Davies

Have a look at:

 http://blogs.sun.com/ahl/entry/a_little_zfs_hack

On 27/01/07, roland <[EMAIL PROTECTED]> wrote:

is it planned to add some other compression algorithm to zfs ?

lzjb is quite good and especially performing very well, but i`d like to have 
better compression (bzip2?) - no matter how worse performance drops with this.

regards
roland


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] data wanted: disk kstats

2007-01-29 Thread Richard Elling

Robert Milkowski wrote:

Hello Richard,

Friday, January 26, 2007, 11:36:07 PM, you wrote:

RE> We've been talking a lot recently about failure rates and types of
RE> failures.  As you may know, I do look at field data and generally don't
RE> ask the group for more data.  But this time, for various reasons (I
RE> might have found a bug or deficiency) I'm soliciting for more data at
RE> large.

RE> What I'd like to gather is the error rates per bytes transferred. This
RE> data is collected in kstats, but is reset when you reboot.  One of the
RE> features of my vast collection of field data is that it is often collected
RE> rather soon after a reboot. Thus, there aren't very many bytes transferred
RE> yet, and the corresponding error rates tend to be small (often 0).  A 
perfect
RE> collection would be from a machine connected to lots of busy disks which
RE> has been up for a very long time.

RE> Can you help?  It is real simple.  Just email me the output of:

I've sent you off list.


Thanks.


Will those results (total statistics, not site specific) be publicly
provided by you (here?)?


Sure.
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Adding my own compression to zfs

2007-01-29 Thread Cindy . Swearingen

See the following bug:

http://bugs.opensolaris.org/view_bug.do?bug_id=6280662

Cindy

roland wrote:

is it planned to add some other compression algorithm to zfs ?

lzjb is quite good and especially performing very well, but i`d like to have better compression (bzip2?) - no matter how worse performance drops with this. 


regards
roland
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or UFS - what to do?

2007-01-29 Thread Jonathan Edwards

On Jan 26, 2007, at 09:16, Jeffery Malloch wrote:


Hi Folks,

I am currently in the midst of setting up a completely new file  
server using a pretty well loaded Sun T2000 (8x1GHz, 16GB RAM)  
connected to an Engenio 6994 product (I work for LSI Logic so  
Engenio is a no brainer).  I have configured a couple of zpools  
from Volume groups on the Engenio box - 1x2.5TB and 1x3.75TB.  I  
then created sub zfs systems below that and set quotas and  
sharenfs'd them so that it appears that these "file systems" are  
dynamically shrinkable and growable.


ah - the 6994 is the controller we use in the 6140/6540 if i'm not  
mistaken .. i guess this thread will go down in a flaming JBOD vs  
RAID controller religious war again .. oops, too late :P


yes - the dynamic LUN expansion bits in ZFS is quite nice and handy  
for managing dynamic growth of a pool or file system.  so going back  
to Jeffery's original questions:




1.  How stable is ZFS?  The Engenio box is completely configured  
for RAID5 with hot spares and write cache (8GB) has battery backup  
so I'm not too concerned from a hardware side.  I'm looking for an  
idea of how stable ZFS itself is in terms of corruptability, uptime  
and OS stability.


I think the stability issue has already been answered pretty well ..

8GB battery backed cache is nice .. performance wise you might find  
some odd interactions with the ZFS adaptive cache integration and the  
way in which the intent log operates (O_DSYNC writes can potentially  
impose a lot of in flight commands for relatively little work) -  
there's a max blocksize of 128KB (also maxphys), so you might want to  
experiment with tuning back the stripe width .. i seem to recall the  
the 6994 controller seemed to perform best with 256KB or 512KB stripe  
width .. so there may be additional tuning on the read-ahead or write- 
behind algorithms.


2.  Recommended config.  Above, I have a fairly simple setup.  In  
many of the examples the granularity is home directory level and  
when you have many many users that could get to be a bit of a  
nightmare administratively.  I am really only looking for high  
level dynamic size adjustability and am not interested in its built  
in RAID features.  But given that, any real world recommendations?


Not being interested in the RAID functionality as Roch points out  
eliminates the self-healing functionality and reconstruction bits in  
ZFS .. but you still get other nice benefits like dynamic LUN expansion


As i see it, since we seem to have excess CPU and bus capacity on  
newer systems (most applications haven't quite caught up to impose  
enough of a load yet) .. we're back to the mid '90s where host based  
volume management and caching makes sense and is being proposed  
again.  Being proactive, we might want to consider putting an  
embedded Solaris/ZFS on a RAID controller to see if we've really got  
something novel in the caching and RAID algorithms for when the  
application load really does catch up and impose more of a load on  
the host.  Additionally - we're seeing that there's a big benefit in  
moving the filesystem closer to the storage array since most users  
care more about their consistency of their data (upper level) than  
the reliability of the disk subsystem or RAID controller.   
Implementing a RAID controller that's more intimately aware of the  
upper data levels seems like the next logical evolutionary step.


3.  Caveats?  Anything I'm missing that isn't in the docs that  
could turn into a BIG gotchya?


I would say be careful of the ease at which you can destroy file  
systems and pools .. while convenient - there's typically no warning  
if you or an administrator does a zfs or zpool destroy .. so i could  
see that turning into an issue.  Also if a LUN goes offline, you may  
not see this right away and you would have the potential to corrupt  
your pool or panic your system.  Hence the self-healing and scrub  
options to detect and repair failure a little bit faster.  People on  
this forum have been finding RAID controller inconsistencies .. hence  
the religious JBOD vs RAID ctlr "disruptive paradigm shift"


4.  Since all data access is via NFS we are concerned that 32 bit  
systems (Mainly Linux and Windows via Samba) will not be able to  
access all the data areas of a 2TB+ zpool even if the zfs quota on  
a particular share is less then that.  Can anyone comment?


Doing 2TB+ shouldn't be a problem for the NFS or Samba mounted  
filesystem regardless if the host is 32bit or not.  The only place  
where you can run into a problem is if the size of an individual file  
crosses 2 or 4TB on a 32bit system.  I know we've implemented file  
systems (QFS in this case) that were samba shared to 32bit windows  
hosts in excess of 40-100TB without any major issues.  I'm sure  
there's similar cases with ZFS and thumper .. i just don't have that  
data.


a little late to the discussion, but hth
---
.je
___

[zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-29 Thread Jeffery Malloch
Hi Guys,

SO...

>From what I can tell from this thread ZFS if VERY fussy about managing 
>writes,reads and failures.  It wants to be bit perfect.  So if you use the 
>hardware that comes with a given solution (in my case an Engenio 6994) to 
>manage failures you risk a) bad writes that don't get picked up due to 
>corruption from write cache to disk b) failures due to data changes that ZFS 
>is unaware of that the hardware imposes when it tries to fix itself.

So now I have a $70K+ lump that's useless for what it was designed for.  I 
should have spent $20K on a JBOD.  But since I didn't do that, it sounds like a 
traditional model works best (ie. UFS et al) for the type of hardware I have.  
No sense paying for something and not using it.  And by using ZFS just as a 
method for ease of file system growth and management I risk much more 
corruption.

The other thing I haven't heard is why NOT to use ZFS.  Or people who don't 
like it for some reason or another.

Comments?

Thanks,

Jeff

PS - the responses so far have been great and are much appreciated!  Keep 'em 
coming...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-29 Thread Jason J. W. Williams

Hi Jeff,

Maybe I mis-read this thread, but I don't think anyone was saying that
using ZFS on-top of an intelligent array risks more corruption. Given
my experience, I wouldn't run ZFS without some level of redundancy,
since it will panic your kernel in a RAID-0 scenario where it detects
a LUN is missing and can't fix it. That being said, I wouldn't run
anything but ZFS anymore. When we had some database corruption issues
awhile back, ZFS made it very simple to prove it was the DB. Just did
a scrub and boom, verification that the data was laid down correctly.
RAID-5 will have better random read performance the RAID-Z for reasons
Robert had to beat into my head. ;-) But if you really need that
performance, perhaps RAID-10 is what you should be looking at? Someone
smarter than I can probably give a better idea.

Regarding the failure detection, is anyone on the list have the
ZFS/FMA traps fed into a network management app yet? I'm curious what
the experience with it is?

Best Regards,
Jason

On 1/29/07, Jeffery Malloch <[EMAIL PROTECTED]> wrote:

Hi Guys,

SO...

>From what I can tell from this thread ZFS if VERY fussy about managing 
writes,reads and failures.  It wants to be bit perfect.  So if you use the 
hardware that comes with a given solution (in my case an Engenio 6994) to manage 
failures you risk a) bad writes that don't get picked up due to corruption from 
write cache to disk b) failures due to data changes that ZFS is unaware of that 
the hardware imposes when it tries to fix itself.

So now I have a $70K+ lump that's useless for what it was designed for.  I 
should have spent $20K on a JBOD.  But since I didn't do that, it sounds like a 
traditional model works best (ie. UFS et al) for the type of hardware I have.  
No sense paying for something and not using it.  And by using ZFS just as a 
method for ease of file system growth and management I risk much more 
corruption.

The other thing I haven't heard is why NOT to use ZFS.  Or people who don't 
like it for some reason or another.

Comments?

Thanks,

Jeff

PS - the responses so far have been great and are much appreciated!  Keep 'em 
coming...


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Project Proposal: Availability Suite

2007-01-29 Thread Jason J. W. Williams

Thank you for the detailed explanation. It is very helpful to
understand the issue. Is anyone successfully using SNDR with ZFS yet?

Best Regards,
Jason

On 1/26/07, Jim Dunham <[EMAIL PROTECTED]> wrote:

Jason J. W. Williams wrote:
> Could the replication engine eventually be integrated more tightly
> with ZFS?
Not it in the present form. The architecture and implementation of
Availability Suite is driven off block-based replication at the device
level (/dev/rdsk/...), something that allows the product to replicate
any Solaris file system, database, etc., without any knowledge of what
it is actually replicating.

To pursue ZFS replication in the manner of Availability Suite, one needs
to see what replication looks like from an abstract point of view. So
simplistically, remote replication is like the letter 'h', where the
left side of the letter is the complete I/O path on the primary node,
the horizontal part of the letter is the remote replication network
link, and the right side of the letter is only the bottom half of the
complete I/O path on the secondary node.

Next ZFS would have to have its functional I/O path split into two
halves, a top and bottom piece.  Next we configure replication, the
letter 'h', between two given nodes, running both a top and bottom piece
of ZFS on the source node, and just the bottom half of ZFS on the
secondary node.

Today, the SNDR component of Availability Suite works like the letter
'h' today, where we split the Solaris I/O stack into a top and bottom
half. The top half is that software (file system, database or
application I/O) that directs its I/Os to the bottom half (raw device,
volume manager or block device).

So all that needs to be done is to design and build a new variant of the
letter 'h', and find the place to separate ZFS into two pieces.

- Jim Dunham

>
> That would be slick alternative to send/recv.
>
> Best Regards,
> Jason
>
> On 1/26/07, Jim Dunham <[EMAIL PROTECTED]> wrote:
>> Project Overview:
>>
>> I propose the creation of a project on opensolaris.org, to bring to
>> the community two Solaris host-based data services; namely volume
>> snapshot and volume replication. These two data services exist today
>> as the Sun StorageTek Availability Suite, a Solaris 8, 9 & 10,
>> unbundled product set, consisting of Instant Image (II) and Network
>> Data Replicator (SNDR).
>>
>> Project Description:
>>
>> Although Availability Suite is typically known as just two data
>> services (II & SNDR), there is an underlying Solaris I/O filter
>> driver framework which supports these two data services. This
>> framework provides the means to stack one or more block-based, pseudo
>> device drivers on to any pre-provisioned cb_ops structure, [
>> 
http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs
>> ], thereby shunting all cb_ops I/O into the top of a developed filter
>> driver, (for driver specific processing), then out the bottom of this
>> filter driver, back into the original cb_ops entry points.
>>
>> Availability Suite was developed to interpose itself on the I/O stack
>> of a block device, providing a filter driver framework with the means
>> to intercept any I/O originating from an upstream file system,
>> database or application layer I/O. This framework provided the means
>> for Availability Suite to support snapshot and remote replication
>> data services for UFS, QFS, VxFS, and more recently the ZFS file
>> system, plus various databases like Oracle, Sybase and PostgreSQL,
>> and also application I/Os. By providing a filter driver at this point
>> in the Solaris I/O stack, it allows for any number of data services
>> to be implemented, without regard to the underlying block storage
>> that they will be configured on. Today, as a snapshot and/or
>> replication solution, the framework allows both the source and
>> destination block storage device to not only differ in physical
>> characteristics (DAS, Fibre Channel, iSCSI, etc.), but also logical
>> characteristics such as in RAID type, volume managed storage (i.e.,
>> SVM, VxVM), lofi, zvols, even ram disks.
>>
>> Community Involvement:
>>
>> By providing this filter-driver framework, two working filter drivers
>> (II & SNDR), and an extensive collection of supporting software and
>> utilities, it is envisioned that those individuals and companies that
>> adopt OpenSolaris as a viable storage platform, will also utilize and
>> enhance the existing II & SNDR data services, plus have offered to
>> them the means in which to develop their own block-based filter
>> driver(s), further enhancing the use and adoption on OpenSolaris.
>>
>> A very timely example that is very applicable to Availability Suite
>> and the OpenSolaris community, is the recent announcement of the
>> Project Proposal: lofi [ compression & encryption ] -
>> http://www.opensolaris.org/jive/click.jspa&messageID=26841. By
>> leveraging both the Availability Suite and the lofi OpenSolar

Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-29 Thread Jonathan Edwards


On Jan 29, 2007, at 14:17, Jeffery Malloch wrote:


Hi Guys,

SO...

From what I can tell from this thread ZFS if VERY fussy about  
managing writes,reads and failures.  It wants to be bit perfect.   
So if you use the hardware that comes with a given solution (in my  
case an Engenio 6994) to manage failures you risk a) bad writes  
that don't get picked up due to corruption from write cache to  
disk b) failures due to data changes that ZFS is unaware of that  
the hardware imposes when it tries to fix itself.


So now I have a $70K+ lump that's useless for what it was designed  
for.  I should have spent $20K on a JBOD.  But since I didn't do  
that, it sounds like a traditional model works best (ie. UFS et al)  
for the type of hardware I have.  No sense paying for something and  
not using it.  And by using ZFS just as a method for ease of file  
system growth and management I risk much more corruption.


The other thing I haven't heard is why NOT to use ZFS.  Or people  
who don't like it for some reason or another.


Comments?


I put together this chart a while back .. i should probably update it  
for RAID6 and RAIDZ2


#   ZFS ARRAY HWCAPACITYCOMMENTS
--  --- 
1   R0  R1  N/2 hw mirror - no zfs healing
2   R0  R5  N-1 hw R5 - no zfs healing
3   R1  2 x R0  N/2 flexible, redundant, good perf
4   R1  2 x R5  (N/2)-1 flexible, more redundant,  
decent perf
5   R1  1 x R5  (N-1)/2 parity and mirror on same  
drives (XXX)

6   RZ  R0  N-1 standard RAID-Z no mirroring
7   RZ  R1 (tray)   (N/2)-1 RAIDZ+1
8   RZ  R1 (drives) (N/2)-1 RAID1+Z (highest redundancy)
9   RZ  3 x R5  N-4 triple parity calculations (XXX)
10  RZ  1 x R5  N-2 double parity calculations (XXX)

(note: I included the cases where you have multiple arrays with a  
single lun per vdisk (say) and where you only have a single array  
split into multiple LUNs.)


The way I see it, you're better off picking either controller parity  
or zfs parity .. there's no sense in computing parity multiple times  
unless you have cycles to spare and don't mind the performance hit ..  
so the questions you should really answer before you choose the  
hardware is what level of redundancy to capacity balance do you want?  
and whether or not you want to compute RAID in ZFS host memory or out  
on a dedicated blackbox controller?  I would say something about  
double caching too, but I think that's moot since you'll always cache  
in the ARC if you use ZFS the way it's currently written.


Other feasible filesystem options for Solaris - UFS, QFS, or vxfs  
with SVM or VxVM for volume mgmt if you're so inclined .. all depends  
on your budget and application.  There's currently tradeoffs in each  
one, and contrary to some opinions, the death of any of these has  
been grossly exaggerated.


---
.je
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS panics system during boot, after 11/06 upgrade

2007-01-29 Thread Jeff Victor

Hi,

I'm looking for assistance troubleshooting an x86 laptop that I upgraded 
from Solaris 10 6/06 to 11/06 using standard upgrade.


The upgrade went smoothly, but all attempts to boot it since then have 
failed.  Every time, it panics, leaving a partial stack trace on the 
screen for a few seconds.


The stack trace includes:

zfs:vdev_mirror_open+44
zpool:vdev_open+b2
zpool:vdev_vertex_load+b4
zpool:vdev_graph_traverse+25
zpool:vdev_graph_load+2d
zpool:spa_load+57
zpool:spa_open+10f
zpool:spa_directory_next_pool+ca
zpool:spa_open_all_pools+4f
zpool:spa_name_lock+19
zpool:spa_open+7c
zpool:spa_directory_next_pool+ca
zpool:dmu_objset_find+1f4
zvol:zvol_attach+7b
genunix:devi_attach+8f
genunix:attach_node+71
genunix:i_ndi_config_node+ab
genunix:i_ddi_attachchild+41
genunix:devi_attach_node+71
genunix:config_immediate_children+d7
genunix:devi_config_common+66
genunix:mt_config_thread+11a
unix:thread_start+8

I can boot the system into Solaris failsafe mode and mount the root file 
system.

There are ZFS file systems.  There are no zones.

Any help would be greatly appreciated, this is my everyday computer.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS panics system during boot, after 11/06 upgrade

2007-01-29 Thread Jim Walker
> There are ZFS file systems.  There are no zones.
> 
> Any help would be greatly appreciated, this is my
> everyday computer.
 
Take a look at page 167 of the admin guide:
http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf

You need to delete /etc/zfs/zpool.cache. And, use 
zpool import to recover.

Cheers,
Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS panics system during boot, after 11/06 upgrade

2007-01-29 Thread Jeff Victor

Jim Walker wrote:

There are ZFS file systems.  There are no zones.

Any help would be greatly appreciated, this is my
everyday computer.

 
Take a look at page 167 of the admin guide:

http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf

You need to delete /etc/zfs/zpool.cache. And, use 
zpool import to recover.


Cheers,
Jim
  

Thanks.

In Solaris Failsafe, I have mounted the root of the real OS instance 
onto /a.  There is a /a/etc/zfs directory, but it is empty.  No zpool.cache.


The doc doesn't describe a solution for that situation.  Any other ideas?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-29 Thread Albert Chin
On Mon, Jan 29, 2007 at 11:17:05AM -0800, Jeffery Malloch wrote:
> From what I can tell from this thread ZFS if VERY fussy about
> managing writes,reads and failures.  It wants to be bit perfect.  So
> if you use the hardware that comes with a given solution (in my case
> an Engenio 6994) to manage failures you risk a) bad writes that
> don't get picked up due to corruption from write cache to disk b)
> failures due to data changes that ZFS is unaware of that the
> hardware imposes when it tries to fix itself.
> 
> So now I have a $70K+ lump that's useless for what it was designed
> for.  I should have spent $20K on a JBOD.  But since I didn't do
> that, it sounds like a traditional model works best (ie. UFS et al)
> for the type of hardware I have.  No sense paying for something and
> not using it.  And by using ZFS just as a method for ease of file
> system growth and management I risk much more corruption.

Well, ZFS with HW RAID makes sense in some cases. However, it seems
that if you are unwilling to lose 50% disk space to RAID 10 or two
mirrored HW RAID arrays, you either use RAID 0 on the array with ZFS
RAIDZ/RAIDZ2 on top of that or a JBOD with ZFS RAIDZ/RAIDZ2 on top of
that.

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-29 Thread Frank Cusack
On January 29, 2007 11:17:05 AM -0800 Jeffery Malloch 
<[EMAIL PROTECTED]> wrote:

Hi Guys,

SO...


From what I can tell from this thread ZFS if VERY fussy about managing
writes,reads and failures.  It wants to be bit perfect.


It's funny to call that "fussy".  All filesystems WANT to be bit perfect,
zfs actually does something to ensure it.


 So if you use
the hardware that comes with a given solution (in my case an Engenio
6994) to manage failures you risk a) bad writes that don't get picked up
due to corruption from write cache to disk


You would always have that problem, JBOD or RAID.  There are many places
data can get corrupted, not just in the RAID write cache.  zfs will correct
it, or at least detect it depending on your configuration.


b) failures due to data
changes that ZFS is unaware of that the hardware imposes when it tries
to fix itself.


If that happens, you will be lucky to have ZFS to fix it.  If the array
changes data, it is broken.  This is not the same thing as correcting data.


The other thing I haven't heard is why NOT to use ZFS.  Or people who
don't like it for some reason or another.


If you need per-user quotas, zfs might not be a good fit.  (In many cases
per-filesystem quotas can be used effectively though.)

If you need NFS clients to traverse mount points on the server
(eg /home/foo), then this won't work yet.  Then again, does this work
with UFS either?  Seems to me it wouldn't.  The difference is that zfs
encourages you to create more filesystems.  But you don't have to.

If you have an application that is very highly tuned for a specific
filesystem (e.g. UFS with directio), you might not want to replace
it with zfs.

If you need incremental restore, you might need to stick with UFS.
(snapshots might be enough for you though)

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS panics system during boot, after 11/06 upgrade

2007-01-29 Thread Jeff Victor

More diagnostic information:

Before the afore-listed stack dump, the console displays many lines of 
text similar to the following, that scroll by very quickly.  I was only 
able to capture them with the help of a digital camera.


WARNING: kstat_create('unix', 0, zio_buf_#'): namespace_collision



Jeff Victor wrote:

Jim Walker wrote:

There are ZFS file systems.  There are no zones.

Any help would be greatly appreciated, this is my
everyday computer.

 
Take a look at page 167 of the admin guide:

http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf

You need to delete /etc/zfs/zpool.cache. And, use zpool import to 
recover.


Cheers,
Jim
  

Thanks.

In Solaris Failsafe, I have mounted the root of the real OS instance 
onto /a.  There is a /a/etc/zfs directory, but it is empty.  No 
zpool.cache.


The doc doesn't describe a solution for that situation.  Any other ideas?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] dumpadm and using dumpfile on zfs?

2007-01-29 Thread Peter Buckingham

Hi All,

I'd like to set up dumping to a file. This file is on a mirrored pool 
using zfs. It seems that the dump setup doesn't work with zfs. This 
worked for both a standard UFS slice and a SVM mirror using zfs.


Is there something that I'm doing wrong, or is this not yet supported on 
ZFS?


Note this is Solaris 10 Update 3, but I don't think that should matter..

thanks,

peter

Using ZFS

HON hcb116 ~ $  mkfile  -n 1g /var/adm/crash/dump-file
HON hcb116 ~ $ dumpadm -d /var/adm/crash/dump-file
dumpadm: dumps not supported on /var/adm/crash/dump-file


Using UFS

HON hcb115 ~ $  mkfile  -n 1g /data/0/test
HON hcb115 ~ $ dumpadm -d /data/0/test
 Dump content: kernel pages
  Dump device: /data/0/test (dedicated)
Savecore directory: /var/crash/stuff
 Savecore enabled: yes
HON hcb115 ~ $

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dumpadm and using dumpfile on zfs?

2007-01-29 Thread Lori Alt

Dumping to a file in a zfs file system is not supported yet.
The zfs file system does not support the VOP_DUMP and
VOP_DUMPCTL operations. This is bug 5008936 (ZFS and/or
zvol should support dumps).

Lori

Peter Buckingham wrote:

Hi All,

I'd like to set up dumping to a file. This file is on a mirrored pool 
using zfs. It seems that the dump setup doesn't work with zfs. This 
worked for both a standard UFS slice and a SVM mirror using zfs.


Is there something that I'm doing wrong, or is this not yet supported 
on ZFS?


Note this is Solaris 10 Update 3, but I don't think that should matter..

thanks,

peter

Using ZFS

HON hcb116 ~ $  mkfile  -n 1g /var/adm/crash/dump-file
HON hcb116 ~ $ dumpadm -d /var/adm/crash/dump-file
dumpadm: dumps not supported on /var/adm/crash/dump-file


Using UFS

HON hcb115 ~ $  mkfile  -n 1g /data/0/test
HON hcb115 ~ $ dumpadm -d /data/0/test
 Dump content: kernel pages
  Dump device: /data/0/test (dedicated)
Savecore directory: /var/crash/stuff
 Savecore enabled: yes
HON hcb115 ~ $

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dumpadm and using dumpfile on zfs?

2007-01-29 Thread Cindy . Swearingen

Hi Peter,

This operation isn't supported yet. See this bug:

http://bugs.opensolaris.org/view_bug.do?bug_id=5008936

Both the zfs man page and the ZFS Admin Guide identify
swap and dump limitations, here:

http://docs.sun.com/app/docs/doc/817-2271/6mhupg6gl?q=dump&a=view

Cindy


Peter Buckingham wrote:

Hi All,

I'd like to set up dumping to a file. This file is on a mirrored pool 
using zfs. It seems that the dump setup doesn't work with zfs. This 
worked for both a standard UFS slice and a SVM mirror using zfs.


Is there something that I'm doing wrong, or is this not yet supported on 
ZFS?


Note this is Solaris 10 Update 3, but I don't think that should matter..

thanks,

peter

Using ZFS

HON hcb116 ~ $  mkfile  -n 1g /var/adm/crash/dump-file
HON hcb116 ~ $ dumpadm -d /var/adm/crash/dump-file
dumpadm: dumps not supported on /var/adm/crash/dump-file


Using UFS

HON hcb115 ~ $  mkfile  -n 1g /data/0/test
HON hcb115 ~ $ dumpadm -d /data/0/test
  Dump content: kernel pages
   Dump device: /data/0/test (dedicated)
Savecore directory: /var/crash/stuff
  Savecore enabled: yes
HON hcb115 ~ $

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dumpadm and using dumpfile on zfs?

2007-01-29 Thread Peter Buckingham

Lori Alt wrote:

Dumping to a file in a zfs file system is not supported yet.
The zfs file system does not support the VOP_DUMP and
VOP_DUMPCTL operations. This is bug 5008936 (ZFS and/or
zvol should support dumps).


Ok, that's sort of what I expected thanks for the info.

peter
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Problems adding drive

2007-01-29 Thread zfs
I attempted to increase my zraid from 2 disks to 3, but it looks like I added 
the drive outside of the raid:

# zpool list

NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
amber  1.36T879G516G63%  ONLINE -
home   65.5G   1.30M   65.5G 0%  ONLINE -
[EMAIL PROTECTED]:/export/home/michael#
[EMAIL PROTECTED]:/export/home/michael# zpool status
  pool: amber
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
amber   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c1d0ONLINE   0 0 0
c0d0ONLINE   0 0 0
  c4d0  ONLINE   0 0 0

errors: No known data errors


I can't even seem to get rid of c4d0, I have not written anything to "amber" 
since adding c4d0. Any suggestions on how to remove it and re add it correctly?

Sincerely,
Michael
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Adding my own compression to zfs

2007-01-29 Thread roland
> Have a look at:
>
>  http://blogs.sun.com/ahl/entry/a_little_zfs_hack

thanks for the link, dick !

this sounds fantastic !

is the source for that (yet) available somewhere ?

>Adam Leventhal's Weblog
>inside the sausage factory

btw - just wondering - is this some english phrase or some running gag ? i 
have seen it once ago on another blog and so i`m wondering

greetings from the beer and sausage nation ;)

roland
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems adding drive

2007-01-29 Thread Wade . Stuart






[EMAIL PROTECTED] wrote on 01/29/2007 03:45:58 PM:

> I attempted to increase my zraid from 2 disks to 3, but it looks
> like I added the drive outside of the raid:
>
> # zpool list
>
> NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
> amber  1.36T879G516G63%  ONLINE -
> home   65.5G   1.30M   65.5G 0%  ONLINE -
> [EMAIL PROTECTED]:/export/home/michael#
> [EMAIL PROTECTED]:/export/home/michael# zpool status
>   pool: amber
>  state: ONLINE
>  scrub: none requested
> config:
>
> NAMESTATE READ WRITE CKSUM
> amber   ONLINE   0 0 0
>   raidz1ONLINE   0 0 0
> c1d0ONLINE   0 0 0
> c0d0ONLINE   0 0 0
>   c4d0  ONLINE   0 0 0
>
> errors: No known data errors
>
>
> I can't even seem to get rid of c4d0, I have not written anything to
> "amber" since adding c4d0. Any suggestions on how to remove it and
> re add it correctly?
>

Sure,   just run:  zpool evacuate amber c4t0.  =) Sorry.  This was just in
a few threads here, you will need to dump your data to tape (or another
disk), destroy your pool and then recreate it.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Adding my own compression to zfs

2007-01-29 Thread Matt Ingenthron

roland wrote:




Adam Leventhal's Weblog
inside the sausage factory



btw - just wondering - is this some english phrase or some running gag ? i 
have seen it once ago on another blog and so i`m wondering


greetings from the beer and sausage nation ;)

  
It's a response to a common English colloquialism which says 'nearly 
everybody likes eating sausage, but many people would probably rather 
not see how it's made'.


Adam is a Sausage maker in the Solaris world.  Open Solaris is the newly 
expanded, room for everyone, Solaris sausage factory.  His blog covers 
topics relating to what goes on in his sausage making duties.


- Matt

p.s.: The web says a German word for colloquialism is umgangssprachlich.


--
Matt Ingenthron - Web Infrastructure Solutions Architect
Sun Microsystems, Inc. - Global Systems Practice
http://blogs.sun.com/mingenthron/
email: [EMAIL PROTECTED] Phone: 310-242-6439

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] hot spares - in standby?

2007-01-29 Thread Toby Thain

Hi,

This is not exactly ZFS specific, but this still seems like a  
fruitful place to ask.


It occurred to me today that hot spares could sit in standby (spun  
down) until needed (I know ATA can do this, I'm supposing SCSI does  
too, but I haven't looked at a spec recently). Does anybody do this?  
Or does everybody do this already?


Does the tub curve (chance of early life failure) imply that hot  
spares should be burned in, instead of sitting there doing nothing  
from new? Just like a data disk, seems to me you'd want to know if a  
hot spare fails while waiting to be swapped in. Do they get tested  
periodically?


--Toby
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Bill Moore
You could easily do this in Solaris today by just using power.conf(4).
Just have it spin down any drives that have been idle for a day or more.

The periodic testing part would be an interesting project to kick off.


--Bill


On Mon, Jan 29, 2007 at 08:21:16PM -0200, Toby Thain wrote:
> Hi,
> 
> This is not exactly ZFS specific, but this still seems like a  
> fruitful place to ask.
> 
> It occurred to me today that hot spares could sit in standby (spun  
> down) until needed (I know ATA can do this, I'm supposing SCSI does  
> too, but I haven't looked at a spec recently). Does anybody do this?  
> Or does everybody do this already?
> 
> Does the tub curve (chance of early life failure) imply that hot  
> spares should be burned in, instead of sitting there doing nothing  
> from new? Just like a data disk, seems to me you'd want to know if a  
> hot spare fails while waiting to be swapped in. Do they get tested  
> periodically?
> 
> --Toby
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Adding my own compression to zfs

2007-01-29 Thread Sudarsan
The lzjb compression implementation (IMO) is the fastest one on SPARC Solaris  
systems. I've seen it beat lzo in speed while not necesarily in 
compressibility. I've measured both implementations inside Solaris SPARC 
kernels, and would love to hear from others about their experiences.  As some 
one else alluded, multithreading the compression implementation will certainly 
improve performancel.

Sri
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Adding my own compression to zfs

2007-01-29 Thread roland
hey, thanks for your overwhelming private lesson for english colloquialism  :D

now back to the technical :)

> # zfs create pool/gzip
> # zfs set compression=gzip pool/gzip
> # cp -r /pool/lzjb/* /pool/gzip
> # zfs list
> NAMEUSED  AVAIL  REFER  MOUNTPOINT
> pool/gzip  64.9M  33.2G  64.9M  /pool/gzip
> pool/lzjb   128M  33.2G   128M  /pool/lzjb
> 
> That's with a 1.2G crash dump (pretty much the most compressible file 
> imaginable). Here are the compression ratios with a pile of ELF binaries 
> (/usr/bin and /usr/lib):

> # zfs get compressratio
> NAME   PROPERTY   VALUE  SOURCE
> pool/gzip  compressratio  3.27x  -
> pool/lzjb  compressratio  1.89x  -

this looks MUCH better than i would have ever expected for smaller files. 

any real-world data how good or bad compressratio goes with lots of very small 
but good compressible files , for example some (evil for those solaris 
evangelists) untarred linux-source tree ?

i'm rather excited how effective gzip will compress here.

for comparison:

sun1:/comptest #  bzcat /tmp/linux-2.6.19.2.tar.bz2 |tar xvf -
--snipp--

sun1:/comptest # du -s -k *
143895  linux-2.6.19.2
1   pax_global_header

sun1:/comptest # du -s -k --apparent-size *
224282  linux-2.6.19.2
1   pax_global_header

sun1:/comptest # zfs get compressratio comptest
NAME  PROPERTY   VALUE  SOURCE
comptest tank  compressratio  1.79x  -
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Adding my own compression to zfs

2007-01-29 Thread Bill Sommerfeld
On Mon, 2007-01-29 at 14:15 -0800, Matt Ingenthron wrote:

> > > inside the sausage factory
> > > 
> > 
> > btw - just wondering - is this some english phrase or some running gag ? i 
> > have seen it once ago on another blog and so i`m wondering
> > 
> > greetings from the beer and sausage nation ;)
> > 
> >   
> It's a response to a common English colloquialism which says 'nearly
> everybody likes eating sausage, but many people would probably rather
> not see how it's made'.

I've actually seen the quote attributed to a German: Otto von Bismark,
rendered in English as:

"Laws are like sausages -- it is better not to see them being made."

or

"If you like laws and sausages, you should never watch either one being
made."

Of course, the same can, and has, been said about software...

- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-29 Thread Marion Hakanson
Albert Chin said:
> Well, ZFS with HW RAID makes sense in some cases. However, it seems that if
> you are unwilling to lose 50% disk space to RAID 10 or two mirrored HW RAID
> arrays, you either use RAID 0 on the array with ZFS RAIDZ/RAIDZ2 on top of
> that or a JBOD with ZFS RAIDZ/RAIDZ2 on top of that. 

I've been re-evaluating our local decision on this question (how to layout
ZFS on pre-existing RAID hardware).  In our case, the array does not allow
RAID-0 of any type, and we're unwilling to give up the expensive disk
space to a mirrored configuration.  In fact, in our last decision, we
came to the conclusion that we didn't want to layer RAID-Z on top of
HW RAID-5, thinking that the added loss of space is too high, given any
of the "XXX" layouts in Jonathan Edwards' chart:
> #   ZFS ARRAY HWCAPACITYCOMMENTS
> --  --- 
> . . .
> 5   R1  1 x R5  (N-1)/2 parity and mirror on same drives (XXX)
> 9   RZ  3 x R5  N-4 triple parity calculations (XXX)
> . . .
> 10  RZ  1 x R5  N-2 double parity calculations (XXX)


So, we ended up (some months ago) deciding to go with only HW RAID-5,
using ZFS to stripe together large-ish LUN's made up of independent HW
RAID-5 groups.  We'd have no ZFS redundancy, but at least ZFS would catch
any corruption that may come along.  We can restore individual corrupted
files from tape backups (which we're already doing anyway), if necessary.

However, given the default behavior of ZFS (as of Solaris-10U3) is to
panic/halt when it encounters a corrupted block that it can't repair,
I'm re-thinking our options, weighing against the possibility of a
significant downtime caused by a single-block corruption.

Today I've been pondering a variant of #10 above, the variation being
to slice a RAID-5 volume across than N LUN's, i.e. LUN's smaller than the
size of the individual disks that make up the HW R5 volume.  A larger
number of small LUN's results in less space given up to ZFS parity, which
is nice when overall disk space is important to us.

We're not expecting RAID-Z across these LUN's to make it possible to
survive failure of a whole disk, rather we only "need" RAID-Z to repair
the occasional block corruption, in the hopes that this might head off the
need to restore a whole multi-TB pool.  We'll rely on the HW RAID-5 to
protect against whole-disk failure.

Just thinking out loud here.  Now I'm off to see what kind of performance
cost there is, comparing (with 400GB disks):
Simple ZFS stripe on one 2198GB LUN from a 6+1 HW RAID5 volume
8+1 RAID-Z on 9 244.2GB LUN's from a 6+1 HW RAID5 volume

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Richard Elling

Toby Thain wrote:

Hi,

This is not exactly ZFS specific, but this still seems like a fruitful 
place to ask.


It occurred to me today that hot spares could sit in standby (spun down) 
until needed (I know ATA can do this, I'm supposing SCSI does too, but I 
haven't looked at a spec recently). Does anybody do this? Or does 
everybody do this already?


"luxadm stop" will work for many SCSI and FC JBODs.  If your drive doesn't
support it, it won't hurt anything, it will just claim "Unsupported" --
not very user friendly, IMHO.

I think it is a good idea, with one potential gotcha.  The gotcha is that
it can take 30 seconds or more to spin up. By default, the sd and ssd timeouts
are such that a pending iop will not notice that it took a while to spin up.
However, if you have changed those defaults, as sometimes occurs in high
availability requirements, then you probably shouldn't do this.

Does the tub curve (chance of early life failure) imply that hot spares 
should be burned in, instead of sitting there doing nothing from new?


Good question. If you consider that mechanical wear out is what ultimately
causes many failure modes, then the argument can be made that a spun down
disk should last longer. The problem is that there are failure modes which
are triggered by a spin up.  I've never seen field data showing the difference
between the two.  I spin mine down because they are too loud and consume
more electricity, and electricity is expensive in Southern California.

Just like a data disk, seems to me you'd want to know if a hot spare 
fails while waiting to be swapped in. Do they get tested periodically?


Another good question.  AFAIK, they are not accessed until needed.

Note: they will be queried on boot which will cause a spin up.  I use a cron
job to spin mine down in the late evening.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Al Hopper
On Mon, 29 Jan 2007, Toby Thain wrote:

> Hi,
>
> This is not exactly ZFS specific, but this still seems like a
> fruitful place to ask.
>
> It occurred to me today that hot spares could sit in standby (spun
> down) until needed (I know ATA can do this, I'm supposing SCSI does
> too, but I haven't looked at a spec recently). Does anybody do this?
> Or does everybody do this already?

I don't work with enough disk storage systems to know what is the industry
norm.  But there are 3 broad categories of disk drive spares:

a) Cold Spare.  A spare where the power is not connected until it is
required.  [1]

b) Warm Spare.  A spare that is active but placed into a low power mode.
Or into a "low mechanical ware & tare" mode.  In the case of a disk drive,
the controller board is active but the HDA (Head Disk Assembly) is
inactive (platters are stationary, heads unloaded [if the heads are
physically unloaded]); it has power applied and can be made "hot" by a
command over its data/command (bus) connection.  The supervisorary
hardware/software/firmware "knows" how long it *should* take the drive to
go from warm to hot.

c) Hot Spare.  A spare that is spun up and ready to accept
read/write/position (etc) requests.

> Does the tub curve (chance of early life failure) imply that hot
> spares should be burned in, instead of sitting there doing nothing
> from new? Just like a data disk, seems to me you'd want to know if a
> hot spare fails while waiting to be swapped in. Do they get tested
> periodically?

The ideal scenario, as you already allude to, would be for the disk
subsystem to initially configure the drive as a hot spare and send it
periodic "test" events for, say, the first 48 hours.  This would get it
past the first segment of the "bathtub" reliability curve - often referred
to as the "infant mortality" phase.  After that, (ideally) it would be
placed into "warm standby" mode and it would be periodically tested (once
a month??).

If saving power was the highest priority, then the ideal situation would
be where the disk subsystem could apply/remove power to the spare and move
it from warm to cold upon command.

One "trick" with disk subsystems, like ZFS that have yet to have the FMA
type functionality added and which (today) provide for hot spares only, is
to initially configure a pool with one (hot) spare, and then add a 2nd hot
spare, based on installing a brand new device, say, 12 months later.  And
another spare 12 months later.  What you are trying to achieve, with this
strategy, is to avoid the scenario whereby mechanical systems, like disk
drives, tend to "wear out" within the same general, relatively short,
timeframe.

One (obvious) issue with this strategy, is that it may be impossible to
purchase the same disk drive 12 and 24 months later.  However, it's always
possible to purchase a larger disk drive and simply commit to the fact
that the extra space provided by the newer drive will be wasted.

[1] The most common example is a disk drive mounted on a carrier but not
seated within the disk drive enclosure.  Simple "push in" when required.

Off Topic: To go off on a tangent - the same strategy applies to a UPS
(Uninterruptable Power Supply).  As per the following time line:

year 0: purchase the UPS and one battery cabinet
year 1: purchase and attach an additional battery cabinet
year 2: purchase and attach an additional battery cabinet
year 3: purchase and attach an additional battery cabinet
year 4: purchase and attach an additional battery cabinet and remove the
oldest battery cabinet
year 5 ... N: repeat year 4s scenario until its time to replace the UPS.

The advantage of this scenario is that you can budget a *fixed* cost for
the UPS and your management understands that there is a recurring cost so
that, when the power fails, your UPS will have working batteries!!

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Toby Thain


On 29-Jan-07, at 9:04 PM, Al Hopper wrote:


On Mon, 29 Jan 2007, Toby Thain wrote:


Hi,

This is not exactly ZFS specific, but this still seems like a
fruitful place to ask.

It occurred to me today that hot spares could sit in standby (spun
down) until needed (I know ATA can do this, I'm supposing SCSI does
too, but I haven't looked at a spec recently). Does anybody do this?
Or does everybody do this already?


I don't work with enough disk storage systems to know what is the  
industry

norm.  But there are 3 broad categories of disk drive spares:

a) Cold Spare.  A spare where the power is not connected until it is
required.  [1]

b) Warm Spare.  A spare that is active but placed into a low power  
mode. ...


c) Hot Spare.  A spare that is spun up and ready to accept
read/write/position (etc) requests.


Hi Al,

Thanks for reminding me of the distinction. It seems very few  
installations would actually require (c)?





Does the tub curve (chance of early life failure) imply that hot
spares should be burned in, instead of sitting there doing nothing
from new? Just like a data disk, seems to me you'd want to know if a
hot spare fails while waiting to be swapped in. Do they get tested
periodically?


The ideal scenario, as you already allude to, would be for the disk
subsystem to initially configure the drive as a hot spare and send it
periodic "test" events for, say, the first 48 hours.


For some reason that's a little shorter than I had in mind, but I  
take your word that that's enough burn-in for semiconductors, motors,  
servos, etc.



This would get it
past the first segment of the "bathtub" reliability curve ...

If saving power was the highest priority, then the ideal situation  
would
be where the disk subsystem could apply/remove power to the spare  
and move

it from warm to cold upon command.


I am surmising that it would also considerably increase the spare's  
useful lifespan versus "hot" and spinning.




One "trick" with disk subsystems, like ZFS that have yet to have  
the FMA
type functionality added and which (today) provide for hot spares  
only, is
to initially configure a pool with one (hot) spare, and then add a  
2nd hot
spare, based on installing a brand new device, say, 12 months  
later.  And
another spare 12 months later.  What you are trying to achieve,  
with this
strategy, is to avoid the scenario whereby mechanical systems, like  
disk

drives, tend to "wear out" within the same general, relatively short,
timeframe.

One (obvious) issue with this strategy, is that it may be  
impossible to
purchase the same disk drive 12 and 24 months later.  However, it's  
always

possible to purchase a larger disk drive


...which is not guaranteed to be compatible with your storage  
subsystem...!


--Toby


and simply commit to the fact
that the extra space provided by the newer drive will be wasted.

[1] The most common example is a disk drive mounted on a carrier  
but not
seated within the disk drive enclosure.  Simple "push in" when  
required.

...
Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 OpenSolaris Governing Board (OGB) Member - Feb 2006


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Jason J. W. Williams

Hi Guys,

I seem to remember the Massive Array of Independent Disk guys ran into
a problem I think they called static friction, where idle drives would
fail on spin up after being idle for a long time:
http://www.eweek.com/article2/0,1895,1941205,00.asp

Would that apply here?

Best Regards,
Jason

On 1/29/07, Toby Thain <[EMAIL PROTECTED]> wrote:


On 29-Jan-07, at 9:04 PM, Al Hopper wrote:

> On Mon, 29 Jan 2007, Toby Thain wrote:
>
>> Hi,
>>
>> This is not exactly ZFS specific, but this still seems like a
>> fruitful place to ask.
>>
>> It occurred to me today that hot spares could sit in standby (spun
>> down) until needed (I know ATA can do this, I'm supposing SCSI does
>> too, but I haven't looked at a spec recently). Does anybody do this?
>> Or does everybody do this already?
>
> I don't work with enough disk storage systems to know what is the
> industry
> norm.  But there are 3 broad categories of disk drive spares:
>
> a) Cold Spare.  A spare where the power is not connected until it is
> required.  [1]
>
> b) Warm Spare.  A spare that is active but placed into a low power
> mode. ...
>
> c) Hot Spare.  A spare that is spun up and ready to accept
> read/write/position (etc) requests.

Hi Al,

Thanks for reminding me of the distinction. It seems very few
installations would actually require (c)?

>
>> Does the tub curve (chance of early life failure) imply that hot
>> spares should be burned in, instead of sitting there doing nothing
>> from new? Just like a data disk, seems to me you'd want to know if a
>> hot spare fails while waiting to be swapped in. Do they get tested
>> periodically?
>
> The ideal scenario, as you already allude to, would be for the disk
> subsystem to initially configure the drive as a hot spare and send it
> periodic "test" events for, say, the first 48 hours.

For some reason that's a little shorter than I had in mind, but I
take your word that that's enough burn-in for semiconductors, motors,
servos, etc.

> This would get it
> past the first segment of the "bathtub" reliability curve ...
>
> If saving power was the highest priority, then the ideal situation
> would
> be where the disk subsystem could apply/remove power to the spare
> and move
> it from warm to cold upon command.

I am surmising that it would also considerably increase the spare's
useful lifespan versus "hot" and spinning.

>
> One "trick" with disk subsystems, like ZFS that have yet to have
> the FMA
> type functionality added and which (today) provide for hot spares
> only, is
> to initially configure a pool with one (hot) spare, and then add a
> 2nd hot
> spare, based on installing a brand new device, say, 12 months
> later.  And
> another spare 12 months later.  What you are trying to achieve,
> with this
> strategy, is to avoid the scenario whereby mechanical systems, like
> disk
> drives, tend to "wear out" within the same general, relatively short,
> timeframe.
>
> One (obvious) issue with this strategy, is that it may be
> impossible to
> purchase the same disk drive 12 and 24 months later.  However, it's
> always
> possible to purchase a larger disk drive

...which is not guaranteed to be compatible with your storage
subsystem...!

--Toby

> and simply commit to the fact
> that the extra space provided by the newer drive will be wasted.
>
> [1] The most common example is a disk drive mounted on a carrier
> but not
> seated within the disk drive enclosure.  Simple "push in" when
> required.
> ...
> Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
>Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
> OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
>  OpenSolaris Governing Board (OGB) Member - Feb 2006

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Toby Thain


On 29-Jan-07, at 11:02 PM, Jason J. W. Williams wrote:


Hi Guys,

I seem to remember the Massive Array of Independent Disk guys ran into
a problem I think they called static friction, where idle drives would
fail on spin up after being idle for a long time:


You'd think that probably wouldn't happen to a spare drive that was  
spun up from time to time. In fact this problem would be (mitigated  
and/or) caught by the periodic health check I suggested.


--T


http://www.eweek.com/article2/0,1895,1941205,00.asp

Would that apply here?

Best Regards,
Jason

On 1/29/07, Toby Thain <[EMAIL PROTECTED]> wrote:


On 29-Jan-07, at 9:04 PM, Al Hopper wrote:

> On Mon, 29 Jan 2007, Toby Thain wrote:
>
>> Hi,
>>
>> This is not exactly ZFS specific, but this still seems like a
>> fruitful place to ask.
>>
>> It occurred to me today that hot spares could sit in standby (spun
>> down) until needed (I know ATA can do this, I'm supposing SCSI  
does
>> too, but I haven't looked at a spec recently). Does anybody do  
this?

>> Or does everybody do this already?
>
> I don't work with enough disk storage systems to know what is the
> industry
> norm.  But there are 3 broad categories of disk drive spares:
>
> a) Cold Spare.  A spare where the power is not connected until  
it is

> required.  [1]
>
> b) Warm Spare.  A spare that is active but placed into a low power
> mode. ...
>
> c) Hot Spare.  A spare that is spun up and ready to accept
> read/write/position (etc) requests.

Hi Al,

Thanks for reminding me of the distinction. It seems very few
installations would actually require (c)?

>
>> Does the tub curve (chance of early life failure) imply that hot
>> spares should be burned in, instead of sitting there doing nothing
>> from new? Just like a data disk, seems to me you'd want to know  
if a

>> hot spare fails while waiting to be swapped in. Do they get tested
>> periodically?
>
> The ideal scenario, as you already allude to, would be for the disk
> subsystem to initially configure the drive as a hot spare and  
send it

> periodic "test" events for, say, the first 48 hours.

For some reason that's a little shorter than I had in mind, but I
take your word that that's enough burn-in for semiconductors, motors,
servos, etc.

> This would get it
> past the first segment of the "bathtub" reliability curve ...
>
> If saving power was the highest priority, then the ideal situation
> would
> be where the disk subsystem could apply/remove power to the spare
> and move
> it from warm to cold upon command.

I am surmising that it would also considerably increase the spare's
useful lifespan versus "hot" and spinning.

>
> One "trick" with disk subsystems, like ZFS that have yet to have
> the FMA
> type functionality added and which (today) provide for hot spares
> only, is
> to initially configure a pool with one (hot) spare, and then add a
> 2nd hot
> spare, based on installing a brand new device, say, 12 months
> later.  And
> another spare 12 months later.  What you are trying to achieve,
> with this
> strategy, is to avoid the scenario whereby mechanical systems, like
> disk
> drives, tend to "wear out" within the same general, relatively  
short,

> timeframe.
>
> One (obvious) issue with this strategy, is that it may be
> impossible to
> purchase the same disk drive 12 and 24 months later.  However, it's
> always
> possible to purchase a larger disk drive

...which is not guaranteed to be compatible with your storage
subsystem...!

--Toby

> and simply commit to the fact
> that the extra space provided by the newer drive will be wasted.
>
> [1] The most common example is a disk drive mounted on a carrier
> but not
> seated within the disk drive enclosure.  Simple "push in" when
> required.
> ...
> Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED] 
approach.com

>Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
> OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
>  OpenSolaris Governing Board (OGB) Member - Feb 2006

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread David Magda

On Jan 29, 2007, at 20:27, Toby Thain wrote:


On 29-Jan-07, at 11:02 PM, Jason J. W. Williams wrote:

I seem to remember the Massive Array of Independent Disk guys ran  
into
a problem I think they called static friction, where idle drives  
would

fail on spin up after being idle for a long time:


You'd think that probably wouldn't happen to a spare drive that was  
spun up from time to time. In fact this problem would be (mitigated  
and/or) caught by the periodic health check I suggested.


What about a rotating spare?

When setting up a pool a lot of people would (say) balance things  
around buses and controllers to minimize single  points of failure,  
and a rotating spare could disrupt this organization, but would it be  
useful at all?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Wee Yeh Tan

On 1/30/07, David Magda <[EMAIL PROTECTED]> wrote:

What about a rotating spare?

When setting up a pool a lot of people would (say) balance things
around buses and controllers to minimize single  points of failure,
and a rotating spare could disrupt this organization, but would it be
useful at all?


The costs involved in "rotating" spares in terms of IOPS reduction may
not be worth it.


--
Just me,
Wire ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Nathan Kroenert

Random thoughts:

If we were to use some intelligence in the design, we could perhaps have 
a monitor that profiles the workload on the system (a pool, for example) 
over a [week|month|whatever] and selects a point in time, based on 
history, that it would expect the disks to be quite, and can 'pre-build' 
the spare with the contents of the disk it's about to swap out. At the 
point of switch-over, it could be pretty much instantaneous... It could 
also bail if it happened that the system actually started to get 
genuinely busy...


That might actually be quite cool, though, if all disks are rotated, we 
end up with a whole bunch of disks that are evenly worn out again, which 
is just what we are really trying to avoid! ;)


Nathan.

Wee Yeh Tan wrote:

On 1/30/07, David Magda <[EMAIL PROTECTED]> wrote:

What about a rotating spare?

When setting up a pool a lot of people would (say) balance things
around buses and controllers to minimize single  points of failure,
and a rotating spare could disrupt this organization, but would it be
useful at all?


The costs involved in "rotating" spares in terms of IOPS reduction may
not be worth it.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Project Proposal: Availability Suite

2007-01-29 Thread Jim Dunham

Jason,

Thank you for the detailed explanation. It is very helpful to
understand the issue. Is anyone successfully using SNDR with ZFS yet?
Of the opportunities I've been involved with the answer is yes, but so 
far I've not seen SNDR with  ZFS in a production environment, but that 
does not mean they don't exists. It was not until late June '06, that 
AVS 4.0, Solaris 10 and ZFS were generally available, and to date AVS 
has not been made available for the Solaris Express, Community Release, 
but it will be real soon.


While I have your attention, there are two issues between ZFS and AVS 
that needs mentioning.


1). When ZFS is given an entire LUN to place in a ZFS storage pool, ZFS 
detect this, enabling SCSI write-caching on the LUN, and also opens the 
LUN with exclusive access, preventing other data services (like AVS) 
from accessing this device. The work-around is to manually format the 
LUN, typically placing all the blocks into a single partition, then just 
place this partition into the ZFS storage pool. ZFS detect this, not 
owning the entire LUN, and doesn't enable write-caching, which means it 
also doesn't open the LUN with exclusive access, and therefore AVS and 
ZFS can share the same LUN.


I thought about submitting an RFE to have ZFS provide a means to 
override this restriction, but I am not 100% certain that a ZFS 
filesystem directly accessing a write-cached enabled LUN is the same 
thing as a replicated ZFS filesystem accessing a write-cached enabled 
LUN. Even though AVS is write-order consistent, there are disaster 
recovery scenarios, when enacted, where block-order, verses write-order 
I/Os are issued.


2). One has to be very cautious in using "zpool import -f  " (forced 
import), especially on a LUN or LUNs in which SNDR is actively 
replicating into. If ZFS complains that the storage pool was not cleanly 
exported when issuing a "zpool import ...", and one attempts a "zpool 
import -f ", without checking the active replication state, they are 
sure to panic Solaris. Of  course this failure scenario is no different 
then accessing a LUN or LUNs on dual-ported, or SAN based storage when 
another Solaris host is still accessing the ZFS filesystem, or 
controller based replication, as they are all just different operational 
scenarios of the same issue, data blocks changing out from underneath 
the ZFS filesystem, and its CRC checking mechanisms.


Jim



Best Regards,
Jason


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Adding my own compression to zfs

2007-01-29 Thread Adam Leventhal
On Mon, Jan 29, 2007 at 02:39:13PM -0800, roland wrote:
> > # zfs get compressratio
> > NAME   PROPERTY   VALUE  SOURCE
> > pool/gzip  compressratio  3.27x  -
> > pool/lzjb  compressratio  1.89x  -
> 
> this looks MUCH better than i would have ever expected for smaller files. 
> 
> any real-world data how good or bad compressratio goes with lots of very 
> small but good compressible files , for example some (evil for those solaris 
> evangelists) untarred linux-source tree ?
> 
> i'm rather excited how effective gzip will compress here.
> 
> for comparison:
> 
> sun1:/comptest #  bzcat /tmp/linux-2.6.19.2.tar.bz2 |tar xvf -
> --snipp--
> 
> sun1:/comptest # du -s -k *
> 143895  linux-2.6.19.2
> 1   pax_global_header
> 
> sun1:/comptest # du -s -k --apparent-size *
> 224282  linux-2.6.19.2
> 1   pax_global_header
> 
> sun1:/comptest # zfs get compressratio comptest
> NAME  PROPERTY   VALUE  SOURCE
> comptest tank  compressratio  1.79x  -

Don't start sending me your favorite files to compress (it really should
work about the same as gzip), but here's the result for the above (I found
a tar file that's about 235M uncompressed):

# du -ks linux-2.6.19.2/
80087   linux-2.6.19.2
# zfs get compressratio pool/gzip
NAME   PROPERTY   VALUE  SOURCE
pool/gzip  compressratio  3.40x  -

Doing a gzip with the default compression level (6 -- the same setting I'm
using in ZFS) yields a file that's about 52M. The small files are hurting
a bit here, but it's still pretty good -- and considerably better than LZJB.

Adam

-- 
Adam Leventhal, Solaris Kernel Development   http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Project Proposal: Availability Suite

2007-01-29 Thread Jason J. W. Williams

Hi Jim,

Thank you very much for the heads up. Unfortunately, we need the
write-cache enabled for the application I was thinking of combining
this with. Sounds like SNDR and ZFS need some more soak time together
before you can use both to their full potential together?

Best Regards,
Jason

On 1/29/07, Jim Dunham <[EMAIL PROTECTED]> wrote:

Jason,
> Thank you for the detailed explanation. It is very helpful to
> understand the issue. Is anyone successfully using SNDR with ZFS yet?
Of the opportunities I've been involved with the answer is yes, but so
far I've not seen SNDR with  ZFS in a production environment, but that
does not mean they don't exists. It was not until late June '06, that
AVS 4.0, Solaris 10 and ZFS were generally available, and to date AVS
has not been made available for the Solaris Express, Community Release,
but it will be real soon.

While I have your attention, there are two issues between ZFS and AVS
that needs mentioning.

1). When ZFS is given an entire LUN to place in a ZFS storage pool, ZFS
detect this, enabling SCSI write-caching on the LUN, and also opens the
LUN with exclusive access, preventing other data services (like AVS)
from accessing this device. The work-around is to manually format the
LUN, typically placing all the blocks into a single partition, then just
place this partition into the ZFS storage pool. ZFS detect this, not
owning the entire LUN, and doesn't enable write-caching, which means it
also doesn't open the LUN with exclusive access, and therefore AVS and
ZFS can share the same LUN.

I thought about submitting an RFE to have ZFS provide a means to
override this restriction, but I am not 100% certain that a ZFS
filesystem directly accessing a write-cached enabled LUN is the same
thing as a replicated ZFS filesystem accessing a write-cached enabled
LUN. Even though AVS is write-order consistent, there are disaster
recovery scenarios, when enacted, where block-order, verses write-order
I/Os are issued.

2). One has to be very cautious in using "zpool import -f  " (forced
import), especially on a LUN or LUNs in which SNDR is actively
replicating into. If ZFS complains that the storage pool was not cleanly
exported when issuing a "zpool import ...", and one attempts a "zpool
import -f ", without checking the active replication state, they are
sure to panic Solaris. Of  course this failure scenario is no different
then accessing a LUN or LUNs on dual-ported, or SAN based storage when
another Solaris host is still accessing the ZFS filesystem, or
controller based replication, as they are all just different operational
scenarios of the same issue, data blocks changing out from underneath
the ZFS filesystem, and its CRC checking mechanisms.

Jim

>
> Best Regards,
> Jason



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Jason J. W. Williams

Hi Toby,

You're right. The healthcheck would definitely find any issues. I
misinterpreted your comment to that effect as a question and didn't
quite latch on. A zpool MAID-mode with that healthcheck might also be
interesting on something like a Thumper for pure-archival, D2D backup
work. Would dramatically cut down on the power. What do y'all think?

Best Regards,
Jason

On 1/29/07, Toby Thain <[EMAIL PROTECTED]> wrote:


On 29-Jan-07, at 11:02 PM, Jason J. W. Williams wrote:

> Hi Guys,
>
> I seem to remember the Massive Array of Independent Disk guys ran into
> a problem I think they called static friction, where idle drives would
> fail on spin up after being idle for a long time:

You'd think that probably wouldn't happen to a spare drive that was
spun up from time to time. In fact this problem would be (mitigated
and/or) caught by the periodic health check I suggested.

--T

> http://www.eweek.com/article2/0,1895,1941205,00.asp
>
> Would that apply here?
>
> Best Regards,
> Jason
>
> On 1/29/07, Toby Thain <[EMAIL PROTECTED]> wrote:
>>
>> On 29-Jan-07, at 9:04 PM, Al Hopper wrote:
>>
>> > On Mon, 29 Jan 2007, Toby Thain wrote:
>> >
>> >> Hi,
>> >>
>> >> This is not exactly ZFS specific, but this still seems like a
>> >> fruitful place to ask.
>> >>
>> >> It occurred to me today that hot spares could sit in standby (spun
>> >> down) until needed (I know ATA can do this, I'm supposing SCSI
>> does
>> >> too, but I haven't looked at a spec recently). Does anybody do
>> this?
>> >> Or does everybody do this already?
>> >
>> > I don't work with enough disk storage systems to know what is the
>> > industry
>> > norm.  But there are 3 broad categories of disk drive spares:
>> >
>> > a) Cold Spare.  A spare where the power is not connected until
>> it is
>> > required.  [1]
>> >
>> > b) Warm Spare.  A spare that is active but placed into a low power
>> > mode. ...
>> >
>> > c) Hot Spare.  A spare that is spun up and ready to accept
>> > read/write/position (etc) requests.
>>
>> Hi Al,
>>
>> Thanks for reminding me of the distinction. It seems very few
>> installations would actually require (c)?
>>
>> >
>> >> Does the tub curve (chance of early life failure) imply that hot
>> >> spares should be burned in, instead of sitting there doing nothing
>> >> from new? Just like a data disk, seems to me you'd want to know
>> if a
>> >> hot spare fails while waiting to be swapped in. Do they get tested
>> >> periodically?
>> >
>> > The ideal scenario, as you already allude to, would be for the disk
>> > subsystem to initially configure the drive as a hot spare and
>> send it
>> > periodic "test" events for, say, the first 48 hours.
>>
>> For some reason that's a little shorter than I had in mind, but I
>> take your word that that's enough burn-in for semiconductors, motors,
>> servos, etc.
>>
>> > This would get it
>> > past the first segment of the "bathtub" reliability curve ...
>> >
>> > If saving power was the highest priority, then the ideal situation
>> > would
>> > be where the disk subsystem could apply/remove power to the spare
>> > and move
>> > it from warm to cold upon command.
>>
>> I am surmising that it would also considerably increase the spare's
>> useful lifespan versus "hot" and spinning.
>>
>> >
>> > One "trick" with disk subsystems, like ZFS that have yet to have
>> > the FMA
>> > type functionality added and which (today) provide for hot spares
>> > only, is
>> > to initially configure a pool with one (hot) spare, and then add a
>> > 2nd hot
>> > spare, based on installing a brand new device, say, 12 months
>> > later.  And
>> > another spare 12 months later.  What you are trying to achieve,
>> > with this
>> > strategy, is to avoid the scenario whereby mechanical systems, like
>> > disk
>> > drives, tend to "wear out" within the same general, relatively
>> short,
>> > timeframe.
>> >
>> > One (obvious) issue with this strategy, is that it may be
>> > impossible to
>> > purchase the same disk drive 12 and 24 months later.  However, it's
>> > always
>> > possible to purchase a larger disk drive
>>
>> ...which is not guaranteed to be compatible with your storage
>> subsystem...!
>>
>> --Toby
>>
>> > and simply commit to the fact
>> > that the extra space provided by the newer drive will be wasted.
>> >
>> > [1] The most common example is a disk drive mounted on a carrier
>> > but not
>> > seated within the disk drive enclosure.  Simple "push in" when
>> > required.
>> > ...
>> > Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
>> approach.com
>> >Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
>> > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
>> >  OpenSolaris Governing Board (OGB) Member - Feb 2006
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>



___
zfs-discus

Re: [zfs-discuss] Re: Re: ZFS or UFS - what to do?

2007-01-29 Thread Boyd Adamson

On 29/01/2007, at 12:50 AM, [EMAIL PROTECTED] wrote:





On 28-Jan-07, at 7:59 AM, [EMAIL PROTECTED] wrote:





On 27-Jan-07, at 10:15 PM, Anantha N. Srirama wrote:


... ZFS will not stop alpha particle induced memory corruption
after data has been received by server and verified to be correct.
Sadly I've been hit with that as well.



My brother points out that you can use a rad hardened CPU. ECC  
should

take care of the RAM. :-)

I wonder when the former will become data centre best practice?


Alpha particles which "hit" CPUs must have their origin inside said
CPU.

(Alpha particles do not penentrate skin, paper, let alone system  
cases

or CPU packagaging)


Thanks. But what about cosmic rays?



I was just in pedantic mode; "cosmic rays" is the term covering
all different particles, including alpha, beta and gamma rays.

Alpha rays don't reach us from the "cosmos"; they are caught
long before they can do any harm.  Ditto beta rays.  Both have
an electrical charge that makes passing magnetic fields or passing
through materials difficult.  Both do exist "in the free" but are
commonly caused by slow radioactive decay of our natural environment.

Gamma rays are photons with high energy; they are not capture by
magnetic fields (such as those existing in atoms: electons, protons).
They need to take a direct hit before they're stopped; they can only
be stopped by dense materials, such as lead.  Unfortunately, natural
occuring lead is polluted by pollonium and uranium and is an alpha/ 
beta

source in its own right.  That's why 100 year old lead from roofs is
worth more money than new lead: it's radioisotopes have been depleted.




Ok, I'll bite. It's been a long day, so that may be why I can't see  
why the radioisotopes in lead that was dug up 100 years ago would be  
any more depleted than the lead that sat in the ground for the  
intervening 100 years. Half-life is half-life, no?


Now if it were something about the modern extraction process that  
added contaminants, then I can see it.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss