Re: [zfs-discuss] Re: Re: ZFS or UFS - what to do?

2007-01-30 Thread Casper . Dik

>Ok, I'll bite. It's been a long day, so that may be why I can't see  
>why the radioisotopes in lead that was dug up 100 years ago would be  
>any more depleted than the lead that sat in the ground for the  
>intervening 100 years. Half-life is half-life, no?

>Now if it were something about the modern extraction process that  
>added contaminants, then I can see it.


In nature, lead is found in deposits with trace elements of other
heavy radio nucleotides.  (U235/238/Th232).  These are removed in
processing, but one of their decay products is Pb-210.  Pb-210 cannot
be chemically removed from lead. (lead contains mostly stable Pb 207/208/209)
New lead may also contain trace amounts of Polonium-210.

So lead, when mined has trace amounts of radioactive Pb-210; as the
half-life of Pb210 is only 22 years, it's fairly radioactive but also
decays rapidly (1/32 of radiation left after 100 years, 1/1000th after
200)

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Need Help on device structure

2007-01-30 Thread dudekula mastan
Hi All,
   
   
  I don't know whether it's the right place or not to discuss my doubts.
   
  I opened a device ( in raw mode) and I filled the entire space (from 1 block 
to last block) with  some random data. While writing data, I am seeing the 
following warning messages in dmesg buffer.
   
  Jan 30 08:32:36 masthan scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/[EMAIL PROTECTED] (ssd175):
Jan 30 08:32:36 masthan Corrupt label; wrong magic number

  Any idea on this ?
   
  I thought  my application is corrupting device structure (device structure 
has disk label, partition table..etc).
   
  In linux, the first block of the device has device structure. Do you know the 
blocks which has device structure in solaris ?
   
  Your help appreciated.
   
  Thanks & Regards
  Masthan



 
-
 Get your own web address.
 Have a HUGE year through Yahoo! Small Business.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] restore pool from detached disk from mirror

2007-01-30 Thread Robert Milkowski
Hello zfs-discuss,

  I had a pool with only two disks in a mirror. I detached one disks
  and have erased later first disk. Now i would really like to quickly
  get data from the second disk available again. Other than detaching
  the second disk nothing else was done to it.

  Has anyone written such a tool in-house? I guess only vdev label was
  erased so it should be possible to do it.

  ?

-- 
Best regards,
 Robert  mailto:[EMAIL PROTECTED]
 http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Actual (cache) memory use of ZFS?

2007-01-30 Thread Bjorn Munch
Hello,

I am doing some tests using ZFS for the data files of a database
system, and ran into memory problems which has been discussed in a
thread here a few weeks ago.

When creating a new database, the data files are first initialized to
their configured size (written in full), then the servers are started.
They will then need to allocate shared memory for database cache.  I
am running two database nodes per host, trying to use 512Mb memory
each.

They are using so-called "Intimate Shared Memory" which requires that
the requested amount is available in physical memory.  Since ZFS has
just gobbled up memory for cache, it is not available and the database
won't start.

This was on a host with 2Gb memory.

I gave up and switched to other hosts having 8Gb memory each. They are
running Solaris (Sparc) 10 U3.  Based on what was said in the previous
thread (and the source code!), I assumed that ZFS would use up to 7Gb
for caching, which would be used up if the database files that were
written were large enough.

But this does not happen.  Now I'm running with database files of 5Gb
and database memory cache of 1Gb, plus some smaller files and shared
memory segments, per node (and two nodes per host).  And it works
fine.  Even if I increase the file size to 10Gb each, and also
increase the memory cache, it seems that the system stabilizes at
around 3/4 Gb free memory (according to vmstat).

Now I can see with mdb that the value of arc.c (which is the amount
ZFS will use for cache) is actually only about half of arc.c_max:

---
> arc::print -a "struct arc" c_max
70400370 c_max = 0x1bca74000
> arc::print -a "struct arc" c
70400360 c = 0xe86b6be1
---

How did this happen, and what's the rule here?  What will happen if I
had 4Gb of memory?  I would like to come up with some requirements or
limitations for running with ZFS.

-- 
Bjorn Munch
Sun Microsystems, Trondheim, Norway
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] restore pool from detached disk from mirror

2007-01-30 Thread Jeremy Teo

Hello,

On 1/30/07, Robert Milkowski <[EMAIL PROTECTED]> wrote:

Hello zfs-discuss,

  I had a pool with only two disks in a mirror. I detached one disks
  and have erased later first disk. Now i would really like to quickly
  get data from the second disk available again. Other than detaching
  the second disk nothing else was done to it.

  Has anyone written such a tool in-house? I guess only vdev label was
  erased so it should be possible to do it.


My grokking of the code confirms that the vdev labels of a detached
vdev are wiped. The vdev label contains the uber block. Without the
uber block, we can't access the rest of the zpool/zfs data on the
detached vdev.


--
Regards,
Jeremy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-30 Thread Luke Scharf

David Magda wrote:

What about a rotating spare?

When setting up a pool a lot of people would (say) balance things 
around buses and controllers to minimize single  points of failure, 
and a rotating spare could disrupt this organization, but would it be 
useful at all?


Functionally, that sounds a lot like raidz2!

"Hey, I can take a double-drive failure now!  And I don't even need to 
rebuild!  Just like having a hot spare with raid5, but without the 
rebuild time!"


Though I can see a "raidz sub N" being useful -- "just tell ZFS how many 
parity drives you want, and we'll take care of the rest."


-Luke




smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Actual (cache) memory use of ZFS?

2007-01-30 Thread Roch - PAE

Bjorn Munch writes:
 > Hello,
 > 
 > I am doing some tests using ZFS for the data files of a database
 > system, and ran into memory problems which has been discussed in a
 > thread here a few weeks ago.
 > 
 > When creating a new database, the data files are first initialized to
 > their configured size (written in full), then the servers are started.
 > They will then need to allocate shared memory for database cache.  I
 > am running two database nodes per host, trying to use 512Mb memory
 > each.
 > 
 > They are using so-called "Intimate Shared Memory" which requires that
 > the requested amount is available in physical memory.  Since ZFS has
 > just gobbled up memory for cache, it is not available and the database
 > won't start.
 > 
 > This was on a host with 2Gb memory.
 > 

That seems like a bug. ZFS is designed to release memory
upon demand by the DB. Which OS was this running ?
Could be related to :

MrNumber: 4034947
Synopsis: anon_swap_adjust(),  anon_resvmem() should
  call kmem_reap() if availrmem is low.
Fixed in snv_42

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Actual (cache) memory use of ZFS?

2007-01-30 Thread Bjorn Munch
ZFS does release memory if I e.g. do a simple malloc(), but using this 
"intimate shared memory (flag SHM_SHARE_MMU in the call to shmat()), this does 
not happen.  BTW the OS here was Solaris 10 U2; the 8Gb machines I'm using now 
are running U3.

Hmm, looks like this may have been fixed in U3, that would be good news!  That 
could also explain why one of my collegues didn't have similar problems, I'll 
check with him tomorrow as I'm not sure which machines he was using.

If this is the solution, I guess I shall have to kick myself (*ouch*) for not 
having asked here before, I did spend quite some time on this, including a 
"hack" to force the memory to be freed before starting the servers.

- Bjorn
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-30 Thread Albert Chin
On Mon, Jan 29, 2007 at 09:37:57PM -0500, David Magda wrote:
> On Jan 29, 2007, at 20:27, Toby Thain wrote:
> 
> >On 29-Jan-07, at 11:02 PM, Jason J. W. Williams wrote:
> >
> >>I seem to remember the Massive Array of Independent Disk guys ran  
> >>into
> >>a problem I think they called static friction, where idle drives  
> >>would
> >>fail on spin up after being idle for a long time:
> >
> >You'd think that probably wouldn't happen to a spare drive that was  
> >spun up from time to time. In fact this problem would be (mitigated  
> >and/or) caught by the periodic health check I suggested.
> 
> What about a rotating spare?
> 
> When setting up a pool a lot of people would (say) balance things  
> around buses and controllers to minimize single  points of failure,  
> and a rotating spare could disrupt this organization, but would it be  
> useful at all?

Agami Systems has the concept of "Enterprise Sparing", where the hot
spare is distributed amongst data drives in the array. When a failure
occurs, the rebuild occurs in parallel across _all_ drives in the
array:
  http://www.issidata.com/specs/agami/enterprise-classreliability.pdf

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help on device structure

2007-01-30 Thread Richard Elling

dudekula mastan wrote:

I don't know whether it's the right place or not to discuss my doubts.
 
I opened a device ( in raw mode) and I filled the entire space (from 1 
block to last block) with  some random data. While writing data, I am 
seeing the following warning messages in dmesg buffer.
 
Jan 30 08:32:36 masthan scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/[EMAIL PROTECTED] (ssd175):

Jan 30 08:32:36 masthan Corrupt label; wrong magic number
Any idea on this ?
 
I thought  my application is corrupting device structure (device 
structure has disk label, partition table..etc).
 
In linux, the first block of the device has device structure. Do you 
know the blocks which has device structure in solaris ?


For Solaris labeled disks, block 0 contains the vtoc.  If you overwrite
block 0 with junk, then this is the error message you should see.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Roch - PAE

Nicolas Williams writes:
 > On Thu, Jan 25, 2007 at 10:57:17AM +0800, Wee Yeh Tan wrote:
 > > On 1/25/07, Bryan Cantrill <[EMAIL PROTECTED]> wrote:
 > > >...
 > > >after all, what was ZFS going to do with that expensive but useless
 > > >hardware RAID controller?  ...
 > > 
 > > I almost rolled over reading this.
 > > 
 > > This is exactly what I went through when we moved our database server
 > > out from Vx** to ZFS.  We had a 3510 and were thinking how best to
 > > configure the RAID.  In the end, we ripped out the controller board
 > > and used the 3510 as a JBOD directly attached to the server.  My DBA
 > > was so happy with this setup (especially with the snapshot capability)
 > > he is asking for another such setup.
 > 
 > The only benefit of using a HW RAID controller with ZFS is that it
 > reduces the I/O that the host needs to do, but the trade off is that ZFS
 > cannot do combinatorial parity reconstruction so that it could only
 > detect errors, not correct them.  It would be cool if the host could
 > offload the RAID I/O to a HW controller but still be able to read the
 > individual stripes to perform combinatorial parity reconstruction.


right but in this situation, if the "cosmic ray / bit flip" hits on the
way to the controller, the array will store wrong data and
we will not be able to reconstruct the correct block.

So having multiple I/Os here improves the time to data
loss metric.

-r

 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Nicolas Williams
On Tue, Jan 30, 2007 at 06:32:14PM +0100, Roch - PAE wrote:
>  > The only benefit of using a HW RAID controller with ZFS is that it
>  > reduces the I/O that the host needs to do, but the trade off is that ZFS
>  > cannot do combinatorial parity reconstruction so that it could only
>  > detect errors, not correct them.  It would be cool if the host could
>  > offload the RAID I/O to a HW controller but still be able to read the
>  > individual stripes to perform combinatorial parity reconstruction.
> 
> right but in this situation, if the "cosmic ray / bit flip" hits on the
> way to the controller, the array will store wrong data and
> we will not be able to reconstruct the correct block.
> 
> So having multiple I/Os here improves the time to data
> loss metric.

You missed my point.  Assume _new_ RAID HW that allows the host to read
the individual stripes.  The ZFS could offload I/O to the RAID HW but,
when a checksum fails to validate on read, THEN go read the individual
stripes and parity and do the combinatorial reconstruction as if the
RAID HW didn't exist.

I don't believe such RAID HW exists, therefore the point is moot.  But
if such HW ever comes along...

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Roch - PAE

Nicolas Williams writes:
 > On Tue, Jan 30, 2007 at 06:32:14PM +0100, Roch - PAE wrote:
 > >  > The only benefit of using a HW RAID controller with ZFS is that it
 > >  > reduces the I/O that the host needs to do, but the trade off is that ZFS
 > >  > cannot do combinatorial parity reconstruction so that it could only
 > >  > detect errors, not correct them.  It would be cool if the host could
 > >  > offload the RAID I/O to a HW controller but still be able to read the
 > >  > individual stripes to perform combinatorial parity reconstruction.
 > > 
 > > right but in this situation, if the "cosmic ray / bit flip" hits on the
 > > way to the controller, the array will store wrong data and
 > > we will not be able to reconstruct the correct block.
 > > 
 > > So having multiple I/Os here improves the time to data
 > > loss metric.
 > 
 > You missed my point.  Assume _new_ RAID HW that allows the host to read
 > the individual stripes.  The ZFS could offload I/O to the RAID HW but,
 > when a checksum fails to validate on read, THEN go read the individual
 > stripes and parity and do the combinatorial reconstruction as if the
 > RAID HW didn't exist.
 > 
 > I don't believe such RAID HW exists, therefore the point is moot.  But
 > if such HW ever comes along...
 > 
 > Nico
 > -- 

I think I got the point. Mine was that if the data travels a 
single time toward the storage and is corrupted along the
way then there will be no hope of recovering it since the
array was given bad data. Having the data travel twice is a
benefit for MTTDL.

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Nicolas Williams
On Tue, Jan 30, 2007 at 06:41:25PM +0100, Roch - PAE wrote:
> I think I got the point. Mine was that if the data travels a 
> single time toward the storage and is corrupted along the
> way then there will be no hope of recovering it since the
> array was given bad data. Having the data travel twice is a
> benefit for MTTDL.

Well, this is certainly true, so I missed your point :)

Mirroring would help.  A mirror with RAID-Z members would only double
the I/O and still provide for combinatorial reconstruction when the
errors arise from bit rot on the rotating rust or on the path from the
RAID HW to the individual disks, as opposed to on the path from the host
to the RAID HW.  Depending on the relative error rates this could be a
useful trade-off to make (plus, mirroring should halve access times
while RAID-Z, if disk heads can be synchronized and the disks have
similar geometries, can provide an N multiple increase in bandwidth,
though I'm told that disk head synchronization is no longer a common
feature).

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Export ZFS over NFS ?

2007-01-30 Thread Neal Pollack

I've got my first server deployment with ZFS.
Consolidating a pair of other file servers that used to have
a dozen or so NFS exports in /etc/dfs/dfstab similar to;

/export/solaris/images
/export/tools
/export/ws
. and so on

For the new server, I have one large zfs pool;
-bash-3.00# df -hl
bigpool 16T   1.5T15T10%/export

that I am starting to populate.   Should I simply share /export,
or should I separately share the individual dirs in /export
like the old dfstab did?

I am assuming that one single command;
# zfs set sharenfs=ro bigpool
would share /export as a read-only NFS point?

Opinions/comments/tutoring?

Thanks,

Neal

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help on device structure

2007-01-30 Thread Eric Schrock
On Tue, Jan 30, 2007 at 09:24:26AM -0800, Richard Elling wrote:
> 
> For Solaris labeled disks, block 0 contains the vtoc.  If you overwrite
> block 0 with junk, then this is the error message you should see.
>

Also note that for EFI labelled disks, Solaris will create a 'bare' dev
link that corresponds to the whole disk, label included.  So if you
write to /dev/rdsk/c0t0d0s0 on an EFI labelled disk, you won't corrupt
the label.  But if you write to /dev/rdsk/c0t0d0, then you will be
overwriting the label portion.

Note that ZFS never writes to the first 8k of any device in case you are
using disks with a Solaris VTOC.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] unable to boot zone

2007-01-30 Thread Karen Chau

I'm unable to boot a zone after I did a sys-unconfig, how do I recover?

GLOBAL ZONE:
---
dmpk14a603# zoneadm list -cv
 ID NAME STATUS PATH
  0 global   running/
  1 snitch-zone02running/snitch-zone02
  4 snitch-zone04down   /snitch-zone04

dmpk14a603# zoneadm -z snitch-zone04 boot
zoneadm: zone 'snitch-zone04': zone is already booted

dmpk14a603# zoneadm -z snitch-zone04 halt
zoneadm: zone 'snitch-zone04': unable to unmount '/snitch-zone04/root/tmp'
zoneadm: zone 'snitch-zone04': unable to unmount file systems in zone
zoneadm: zone 'snitch-zone04': unable to destroy zone

dmpk14a603# zoneadm -z snitch-zone04 reboot
zoneadm: zone 'snitch-zone04': unable to unmount '/snitch-zone04/root/tmp'
zoneadm: zone 'snitch-zone04': unable to unmount file systems in zone
zoneadm: zone 'snitch-zone04': unable to destroy zone

dmpk14a603# uname -a
SunOS dmpk14a603 5.10 Generic_118822-25 sun4u sparc SUNW,Sun-Fire-V440

dmpk14a603# more /etc/release
  Solaris 10 1/06 s10s_u1wos_19a SPARC
  Copyright 2005 Sun Microsystems, Inc.  All Rights Reserved.
   Use is subject to license terms.
  Assembled 07 December 2005


snitch-zone04% su
# sys-unconfig
   WARNING

This program will unconfigure your system.  It will cause it
to revert to a "blank" system - it will not have a name or know
about other systems or networks.

This program will also halt the system.

Do you want to continue (y/n) ? y
svc.startd: The system is coming down.  Please wait.
svc.startd: 58 system services are now being stopped.
svc.startd: The system is down.


[NOTICE: Zone rebooting]


--

NOTICE:  This email message is for the sole use of the intended recipient(s)
and may contain confidential and privileged information.  Any unauthorized
review, use, disclosure or distribution is prohibited.  If you are not the
intended recipient, please contact the sender by reply email and destroy all
copies of the original message.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Export ZFS over NFS ?

2007-01-30 Thread Neal Pollack

Neal Pollack wrote:

I've got my first server deployment with ZFS.
Consolidating a pair of other file servers that used to have
a dozen or so NFS exports in /etc/dfs/dfstab similar to;

/export/solaris/images
/export/tools
/export/ws
. and so on

For the new server, I have one large zfs pool;
-bash-3.00# df -hl
bigpool 16T   1.5T15T10%/export

that I am starting to populate.   Should I simply share /export,
or should I separately share the individual dirs in /export
like the old dfstab did?

I am assuming that one single command;
# zfs set sharenfs=ro bigpool
would share /export as a read-only NFS point?

Opinions/comments/tutoring?


The only thing I found in docs was page 99 of the admin guide.
So it says I should do;

zfs set sharenfs=on bigpool

to get all sub dirs shared rw via NFS, and then do

zfs set sharenfs=ro bigpool/dirname   for those I want to protect read-only.

Is that the current best practice?

Thanks



Thanks,

Neal

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Export ZFS over NFS ?

2007-01-30 Thread Frank Cusack
On January 30, 2007 9:59:45 AM -0800 Neal Pollack <[EMAIL PROTECTED]> 
wrote:

I've got my first server deployment with ZFS.
Consolidating a pair of other file servers that used to have
a dozen or so NFS exports in /etc/dfs/dfstab similar to;

/export/solaris/images
/export/tools
/export/ws
. and so on

For the new server, I have one large zfs pool;
-bash-3.00# df -hl
bigpool 16T   1.5T15T10%/export

that I am starting to populate.   Should I simply share /export,
or should I separately share the individual dirs in /export
like the old dfstab did?


Just share /export.


I am assuming that one single command;
# zfs set sharenfs=ro bigpool
would share /export as a read-only NFS point?

Opinions/comments/tutoring?


Unless you have different share option requirements for different
dirs (say rw vs ro or different network access rules), just sharing
the top level is probably better (easier to manage).  Clients can
still mount subdirs and not the entire pool.

Now, if you create each dir under /export as a zfs filesystem, the
clients will HAVE to mount the individual filesystems.  If they just
mount /export they will not traverse the fs mount on the server when
they descend /export.  (today)

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] yet another blog: ZFS space, performance, MTTDL

2007-01-30 Thread Richard Elling

I've blogged about the trade-offs for space, performance, and MTTDL (RAS)
for ZFS and RAID in general.

http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance
Enjoy.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Richard Elling

Nicolas Williams wrote:

On Tue, Jan 30, 2007 at 06:41:25PM +0100, Roch - PAE wrote:
I think I got the point. Mine was that if the data travels a 
single time toward the storage and is corrupted along the

way then there will be no hope of recovering it since the
array was given bad data. Having the data travel twice is a
benefit for MTTDL.


Well, this is certainly true, so I missed your point :)


This technique is used in many situations where the BER is non-zero
(pretty much always) and the data is very important.  For example,
consider command sequences being sent to a deep space probe -- you
really, really, really want the correct commands to be received, so
you use ECC and repeat the commands many times.  There are mathematical
models for this.  Slow? Yes.  Correct? More likely.


Mirroring would help.  A mirror with RAID-Z members would only double
the I/O and still provide for combinatorial reconstruction when the
errors arise from bit rot on the rotating rust or on the path from the
RAID HW to the individual disks, as opposed to on the path from the host
to the RAID HW.  Depending on the relative error rates this could be a
useful trade-off to make (plus, mirroring should halve access times
while RAID-Z, if disk heads can be synchronized and the disks have
similar geometries, can provide an N multiple increase in bandwidth,
though I'm told that disk head synchronization is no longer a common
feature).


One of the benefits of ZFS is that not only is head synchronization not
needed, but also block offsets do not have to be the same.  For example,
in a traditional mirror, block 1 on device 1 is paired with block 1 on
device 2.  In ZFS, this 1:1 mapping is not required.  I believe this will
result in ZFS being more resilient to disks with multiple block failures.
In order for a traditional RAID to implement this, it would basically
need to [re]invent a file system.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Toby Thain


On 30-Jan-07, at 5:48 PM, Richard Elling wrote:
...


One of the benefits of ZFS is that not only is head synchronization  
not
needed, but also block offsets do not have to be the same.  For  
example,

in a traditional mirror, block 1 on device 1 is paired with block 1 on
device 2.  In ZFS, this 1:1 mapping is not required.  I believe  
this will
result in ZFS being more resilient to disks with multiple block  
failures.

In order for a traditional RAID to implement this, it would basically
need to [re]invent a file system.


Yes, this is another feature for the "Why ZFS can beat RAID" FAQ.

--T


 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Ian Collins
Richard Elling wrote:

>
> One of the benefits of ZFS is that not only is head synchronization not
> needed, but also block offsets do not have to be the same.  For example,
> in a traditional mirror, block 1 on device 1 is paired with block 1 on
> device 2.  In ZFS, this 1:1 mapping is not required.  I believe this will
> result in ZFS being more resilient to disks with multiple block failures.
> In order for a traditional RAID to implement this, it would basically
> need to [re]invent a file system.


This may well offer better protection against drive firmware bugs.

Ian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Wade . Stuart





> One of the benefits of ZFS is that not only is head synchronization not
> needed, but also block offsets do not have to be the same.  For example,
> in a traditional mirror, block 1 on device 1 is paired with block 1 on
> device 2.  In ZFS, this 1:1 mapping is not required.  I believe this will
> result in ZFS being more resilient to disks with multiple block failures.
> In order for a traditional RAID to implement this, it would basically
> need to [re]invent a file system.
>   -- richard


Richard,

  This does not seem to be enforced (! 1:1) in code anywhere that I can
see.  By not required are you pointing that this is able to be done in the
future,  or is this the case right now and I am missing the code that
accomplishes this?

-Wade

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Darren Dunham
> > One of the benefits of ZFS is that not only is head synchronization not
> > needed, but also block offsets do not have to be the same.  For example,
> > in a traditional mirror, block 1 on device 1 is paired with block 1 on
> > device 2.  In ZFS, this 1:1 mapping is not required.  I believe this will
> > result in ZFS being more resilient to disks with multiple block failures.
> > In order for a traditional RAID to implement this, it would basically
> > need to [re]invent a file system.
> >   -- richard
> 
>   This does not seem to be enforced (! 1:1) in code anywhere that I can
> see.  By not required are you pointing that this is able to be done in the
> future,  or is this the case right now and I am missing the code that
> accomplishes this?

I think he means that if a block fails to write on a VDEV, ZFS can write
that data elsewhere and is not forced to use that location.  As opposed
to SVM as an example, where the mirror must try to write at a particular
offset or fail.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
 < This line left intentionally blank to confuse you. >
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Wade . Stuart




> > > One of the benefits of ZFS is that not only is head synchronization
not
> > > needed, but also block offsets do not have to be the same.  For
example,
> > > in a traditional mirror, block 1 on device 1 is paired with block 1
on
> > > device 2.  In ZFS, this 1:1 mapping is not required.  I believe this
will
> > > result in ZFS being more resilient to disks with multiple block
failures.
> > > In order for a traditional RAID to implement this, it would basically
> > > need to [re]invent a file system.
> > >   -- richard
> >
> >   This does not seem to be enforced (! 1:1) in code anywhere that I
can
> > see.  By not required are you pointing that this is able to be done in
the
> > future,  or is this the case right now and I am missing the code that
> > accomplishes this?
>
> I think he means that if a block fails to write on a VDEV, ZFS can write
> that data elsewhere and is not forced to use that location.  As opposed
> to SVM as an example, where the mirror must try to write at a particular
> offset or fail.

Understood,  I am asking if the current code base actually does this as I
do not see the code path that deals with this case.

-Wade

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Darren Dunham
> > I think he means that if a block fails to write on a VDEV, ZFS can write
> > that data elsewhere and is not forced to use that location.  As opposed
> > to SVM as an example, where the mirror must try to write at a particular
> > offset or fail.
> 
> Understood,  I am asking if the current code base actually does this as I
> do not see the code path that deals with this case.

Got it.  So what does happen with a block write failure now on one side
of a mirror?  Does it retry forever or eventually fail the device?


-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
 < This line left intentionally blank to confuse you. >
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Cheap ZFS homeserver.

2007-01-30 Thread Wes Williams
> On 1/18/07, . <[EMAIL PROTECTED]> wrote:
> 
> SYBA SD-SATA-4P PCI SATA Controller Card (
> http://www.newegg.com/product/Product.asp?item=N82E168
> 15124020 )
> 
>

>From my home ZFS server setup, I had tried two Syba SD-SATA2-2E2I PCI-X SATA 
>II Controller Cards without any luck; both cards' BIOS' wouldn't recgonize my 
>SATA drives.
http://www.newegg.com/Product/Product.asp?Item=N82E16816124003R
Although the Syba tech support was reasonable, they only helped conclude that 
BOTH of the cards I had received were defective - yep, that was 0 and 2.

Perhaps the other model listed above works better, I don't know.  In the end, I 
just stuck with my onboard SATA I/O which is only SATA 150, but was still fast 
enough for my network.

For my ZFS home server, I was able to find an excellent Sun W1100z on eBay 
~$360 that came with a 2.4GHz Opteron, 1GB ECC RAM, Quardro FX 500, and 80Gb 
IDE drive - perfect for the basis of my ZFS server build-up.  To that I've 
added two Seagate Barracude ES ST3400620NS 400Gb SATA II drives for about $360 
that now run my 400Gb whole-disk ZFS mirror.  Overall I'm in for $720 for a 
great server/workstation with 400Gb of redundant storage, not bad, and 
performance is much better, with much greater functionality, than most $1,000+ 
home NAS systems I've seen.

At first I thought I'd toss the IDE drive, but I ended up just put the OS on it 
with all my data and zones on the ZFS mirror...at least until ZFS root.

One of these days I'll get around to finishing the blog about it:  
http://classiarius.com/W1100z%20Home%20ZFS%20Server/W1100z%20Home%20ZFS%20Server.html
 

HTH.

Wes W.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: restore pool from detached disk from mirror

2007-01-30 Thread Rainer Heilke
Jeremy is correct. There is actually an RFE open to allow a "zpool split" that 
would have allowed you to detach the second disk while keeping the vdev data 
(and thus allowing you to pull in the data in the detached disk using some sort 
of "import" type command).

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to boot zone

2007-01-30 Thread Mike Gerdts

On 1/30/07, Karen Chau <[EMAIL PROTECTED]> wrote:

dmpk14a603# zoneadm -z snitch-zone04 halt
zoneadm: zone 'snitch-zone04': unable to unmount '/snitch-zone04/root/tmp'
zoneadm: zone 'snitch-zone04': unable to unmount file systems in zone
zoneadm: zone 'snitch-zone04': unable to destroy zone


I would suspect that there is someone in the global zone did:

# cd /snitch-zone04/root/tmp

And is still sitting there.  Perhaps "fuser -c
/snitch-zone04/root/tmp" is in order to see if it lists out any PIDs.
If this wasn't in under a zonepath, I would also look for mounts and
NFS shares in subdirectories of the fs that won't unmount.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-30 Thread Richard Elling

[EMAIL PROTECTED] wrote:

One of the benefits of ZFS is that not only is head synchronization not
needed, but also block offsets do not have to be the same.  For example,
in a traditional mirror, block 1 on device 1 is paired with block 1 on
device 2.  In ZFS, this 1:1 mapping is not required.  I believe this will
result in ZFS being more resilient to disks with multiple block failures.
In order for a traditional RAID to implement this, it would basically
need to [re]invent a file system.
  -- richard


Richard,
  This does not seem to be enforced (! 1:1) in code anywhere that I can
see.  By not required are you pointing that this is able to be done in the
future,  or is this the case right now and I am missing the code that
accomplishes this?


IMHO, the best description of how mirrors work is:
http://blogs.sun.com/bonwick/entry/smokin_mirrors

The ditto block code is interesting, too.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss