Re: [zfs-discuss] [storage-discuss] A few questions : RAID set width

2008-09-18 Thread Nils Goroll
Hi all,

Ben Rockwood wrote:
> You want to keep stripes wide to reduce wasted disk space but you
> also want to keep them narrow to reduce the elements involved in parity
> calculation.

I Ben's argument, and the main point IMHO is how the RAID behaves in the 
degraded state. When a disk fails, that disk's data has to be reconstructed by 
reading from ALL the other disks of the RAID set. Effectively, for the degraded 
RAID case, N disks of a RAID are reduced to the performance of one disk only. 
Also this situation will last until the RAID is reconstructed after replacing 
the failed disk, which is an argument for not using too large disks (see 
another 
thread on this list).

Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] A few questions - small read I/O performance on RAIDZ

2008-09-18 Thread Nils Goroll
Hi Peter,

Sorry, I have read you post after posting a reply myself.

Peter Tribble wrote:
> No. The number of spindles is constant. The snag is that for random reads,
> the performance of a raidz1/2 vdev is essentially that of a single disk. (The
> writes are fast because they're always full-stripe; but so are the reads.)

Can you elaborate on this?

My understanding is that with RAIDZ the writes are always full-stripe for as 
much data as can be agglomerated into a single contiguous write, but I thought 
this did not imply that all of the data has to be read at once except with a 
degraded RAID.

What about for instance writing 16MB chunks and reading 8K random? Wouldn't 
RAIDZ access only the disks containing the 8K bits?

Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] typo: [storage-discuss] A few questions : RAID set width

2008-09-18 Thread Nils Goroll
> I Ben's argument, and the main point IMHO is how the RAID behaves in the
^
second
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] A few questions - small read I/O performance on RAIDZ

2008-09-18 Thread Robert Milkowski
Hello Nils,

Thursday, September 18, 2008, 11:15:37 AM, you wrote:

NG> Hi Peter,

NG> Sorry, I have read you post after posting a reply myself.

NG> Peter Tribble wrote:
>> No. The number of spindles is constant. The snag is that for random reads,
>> the performance of a raidz1/2 vdev is essentially that of a single disk. (The
>> writes are fast because they're always full-stripe; but so are the reads.)

NG> Can you elaborate on this?

NG> My understanding is that with RAIDZ the writes are always full-stripe for as
NG> much data as can be agglomerated into a single contiguous write, but I 
thought
NG> this did not imply that all of the data has to be read at once except with a
NG> degraded RAID.

NG> What about for instance writing 16MB chunks and reading 8K random? Wouldn't
NG> RAIDZ access only the disks containing the 8K bits?

Basically, the way RAID-Z works is that it spreads FS block to all
disks in a given VDEV, minus parity/checksum disks). Because when you
read data back from zfs before it gets to application zfs will check
it's checksum (fs checksum, not a raid-z one) so it needs entire fs
block... which is spread to all data disks in a given vdev.



-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] RAIDZ read-optimized write?

2008-09-18 Thread Nils Goroll
Hi Robert,

> Basically, the way RAID-Z works is that it spreads FS block to all
> disks in a given VDEV, minus parity/checksum disks). Because when you
> read data back from zfs before it gets to application zfs will check
> it's checksum (fs checksum, not a raid-z one) so it needs entire fs
> block... which is spread to all data disks in a given vdev.

Thank you very much for correcting my long-time misconception.

On the other hand, isn't there room for improvement here? If it was possible to 
break large writes into smaller blocks with individual checkums(for instance 
those which are larger than a preferred_read_size parameter), we could still 
write all of these with a single RAIDZ(2) line, avoid the RAIDx write penalty 
and improve read performance because we'd only need to issue a single read I/O 
for each requested block - needing to access the full RAIDZ line only for the 
degraded RAID case.

I think that this could make a big difference for write-once read many random 
access-type applications like DSS systems etc.

Is this feasible at all?

Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Procedure to follow after zpool upgrade on rpool (was: zpool upgrade wrecked GRUB)

2008-09-18 Thread Nils Goroll
(not sure if this has already been answered)

> I have a similar situation and would love some concise suggestions:
> 
> Had a working version of 2008.05 running svn_93 with the updated grub. I did 
> a pkg-update to svn_95 and ran the zfs update when it was suggested. System 
> ran fine until I did a a reboot, then no boot, only grub command line shows 
> up.

IMHO, after a ZFS upgrade an easy way to fix this is:

touch /etc/system # make bootadm re-create archive
bootadm update-archive
/boot/solaris/bin/update_grub

If you're already lost after an upgrade (commands from memory, no syntax 
guarantee)

* Boot from a current snv CD (needs to support the zpool version you have
   upgraded to)

   ISOs available at http://www.genunix.org/

* Import your rpool

   mkdir /tmp/rpool
   zpool import -R /tmp/rpool rpool

   - if this fails, get the pool ID with zpool import, then use

 zpool import -f -R /tmp/rpool 

* Mount your root-fs

   mount -F zfs rpool/opensolaris-X /mnt

(now same as above, but with mounted on /mnt)

* update boot-archive

   touch /mnt/etc/system
   bootadm update-archive -R /mnt

* update grub

   /mnt/boot/solaris/bin/update_grub

* umount, export

   umount /mnt
   zpool export rpool

At least this has worked for me.

Would it be a good idea to put this into indiana release notes?

Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Procedure to follow after zpool upgrade on rpool

2008-09-18 Thread Nils Goroll
Not knowing of a better place to put this, I have created

http://www.genunix.org/wiki/index.php/ZFS_rpool_Upgrade_and_GRUB

Please make any corrections there.

Thanks, Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?

2008-09-18 Thread Nils Goroll
Hi,

> It is important to remember that ZFS is ideal for writing new files from 
> scratch.

IIRC, maildir MTAs never overwrite mail files. But courier-imap does maintain 
some additional index files which will be overwritten and I guess other IMAP 
servers will probably do the same.

Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAIDZ read-optimized write?

2008-09-18 Thread Bob Friesenhahn
On Thu, 18 Sep 2008, Nils Goroll wrote:
>
> On the other hand, isn't there room for improvement here? If it was possible 
> to
> break large writes into smaller blocks with individual checkums(for instance
> those which are larger than a preferred_read_size parameter), we could still
> write all of these with a single RAIDZ(2) line, avoid the RAIDx write penalty
> and improve read performance because we'd only need to issue a single read I/O
> for each requested block - needing to access the full RAIDZ line only for the
> degraded RAID case.
>
> I think that this could make a big difference for write-once read many random
> access-type applications like DSS systems etc.

I imagine that this is indeed possible but that the law of diminishing 
returns would prevail.  The level of per-block overhead would become 
much greater so sequential throughput would be reduced and more disk 
space would be wasted.

You can be sure that the ZFS inventors thoroughly explored all of 
these issues and it would surprise me if someone didn't prototype it 
to see how it actually performs.

ZFS is designed for the present and the future.  Legacy filesystems 
were designed for the past.  In the present, the cost of memory is 
dramatically reduced, and in the future it will be even more so. 
This means that systems will contain massive cache RAM which 
dramatically reduces the number of read (and write) accesses.  Also, 
solid state disks (SSDs) will eventually become common and SSDs don't 
exhibit a seek penalty so designing the filesystem to avoid seeks does 
not carry over into the long term future.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAIDZ read-optimized write?

2008-09-18 Thread A Darren Dunham
On Thu, Sep 18, 2008 at 01:26:09PM +0200, Nils Goroll wrote:
> Thank you very much for correcting my long-time misconception.
> 
> On the other hand, isn't there room for improvement here? If it was
> possible to break large writes into smaller blocks with individual
> checkums(for instance those which are larger than a
> preferred_read_size parameter), we could still write all of these with
> a single RAIDZ(2) line, avoid the RAIDx write penalty and improve read
> performance because we'd only need to issue a single read I/O for each
> requested block - needing to access the full RAIDZ line only for the
> degraded RAID case.

Don't forget that the parent block contains the checksum so that it can
be compared.  There isn't room in the parent for an arbitrary number of
checksums as would be required with an arbitrary number of columns.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAIDZ read-optimized write?

2008-09-18 Thread Richard Elling
Nils Goroll wrote:
> Hi Robert,
>
>   
>> Basically, the way RAID-Z works is that it spreads FS block to all
>> disks in a given VDEV, minus parity/checksum disks). Because when you
>> read data back from zfs before it gets to application zfs will check
>> it's checksum (fs checksum, not a raid-z one) so it needs entire fs
>> block... which is spread to all data disks in a given vdev.
>> 
>
> Thank you very much for correcting my long-time misconception.
>
> On the other hand, isn't there room for improvement here? If it was possible 
> to 
> break large writes into smaller blocks with individual checkums(for instance 
> those which are larger than a preferred_read_size parameter), we could still 
> write all of these with a single RAIDZ(2) line, avoid the RAIDx write penalty 
> and improve read performance because we'd only need to issue a single read 
> I/O 
> for each requested block - needing to access the full RAIDZ line only for the 
> degraded RAID case.
>
> I think that this could make a big difference for write-once read many random 
> access-type applications like DSS systems etc.
>
> Is this feasible at all?
>   

Someone in the community was supposedly working on this, at one
time. It gets brought up about every 4-5 months or so.  Lots of detail
in the archives.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] A couple basic questions re: zfs sharenfs

2008-09-18 Thread Michael Stalnaker
All;

I¹m sure I¹m missing something basic here. I need to do the following
things, and can¹t for the life of me figure out how:

1. Export a zfs filesystem over NFS, but restrict access to a limited set of
hosts and/or subnets: ie: 10.9.8.0/24 and 10.9.9.5 in.
2. give root access to a zfs file system over nfs.

I¹m sure this is doable with the right options, but I can¹t figure out how.

Any suggestions?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A couple basic questions re: zfs sharenfs

2008-09-18 Thread Dave
Try something like this:

zsfs set sharenfs=options mypool/mydata

where options is:

sharenfs="[EMAIL PROTECTED]/24:@10.9.9.5/32,[EMAIL PROTECTED]/24:@10.9.9.5/32"

--
Dave

Michael Stalnaker wrote:
> All;
> 
> I’m sure I’m missing something basic here. I need to do the following 
> things, and can’t for the life of me figure out how:
> 
>1. Export a zfs filesystem over NFS, but restrict access to a limited
>   set of hosts and/or subnets: ie: 10.9.8.0/24 and 10.9.9.5 in.
>2. give root access to a zfs file system over nfs.
> 
> 
> I’m sure this is doable with the right options, but I can’t figure out how.
> 
> Any suggestions?
> 
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A couple basic questions re: zfs sharenfs

2008-09-18 Thread Johnson Earls
I believe this is just:

zfs set sharenfs='root=host1:host2,[EMAIL PROTECTED]/24:@10.9.9.5' filesystem

See the man pages for zfs(1M) (especially the last example) and share_nfs(1M).

- Johnson


Michael Stalnaker wrote:
> All;
> 
> I¹m sure I¹m missing something basic here. I need to do the following
> things, and can¹t for the life of me figure out how:
> 
> 1. Export a zfs filesystem over NFS, but restrict access to a limited set of
> hosts and/or subnets: ie: 10.9.8.0/24 and 10.9.9.5 in.
> 2. give root access to a zfs file system over nfs.
> 
> I¹m sure this is doable with the right options, but I can¹t figure out how.
> 
> Any suggestions? 
> 
> 
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
- Johnson Earls
System Support Lead for Sun Labs West
MPK16-1205  x88965  650/786-8965
[EMAIL PROTECTED]

~~
NOTICE:  This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information.  Any unauthorized review, use, disclosure or
distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and destroy
all copies of the original message.
~~
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-18 Thread Brent Jones
On Tue, Sep 16, 2008 at 11:51 PM, Ralf Ramge <[EMAIL PROTECTED]> wrote:
> Jorgen Lundman wrote:
>
>> If we were interested in finding a method to replicate data to a 2nd
>> x4500, what other options are there for us?
>
> If you already have an X4500, I think the best option for you is a cron
> job with incremental 'zfs send'. Or rsync.
>
> --
>
> Ralf Ramge
> Senior Solaris Administrator, SCNA, SCSA
>

We had some Sun reps come out the other day to talk to us about
storage options, and part of the discussion was AVS replication with
ZFS.
I brought up the question of replicating the resilvering process, and
the reps said it does not replicate. They may be mistaken, but I'm
hopeful they are correct.
Could this behavior have been changed recently on AVS to make
replication 'smarter' with ZFS as the underlying filesystem?

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to remove any references to a zpool that's gone

2008-09-18 Thread Glenn Lagasse
I had a disk that contained a zpool.  For reasons that we won't go in
to, that disk had zero's written all over it (at least enough to cover
the entirety of the zpool space).  Now when I run zpool status the
command hangs when it tries to display information about the now
non-existent pool.  Similarly, trying to destroy the pool hangs as well.

Is there some way to remove the pool from zfs's pool of knowledge?
Also, is it a bug that the failure mode for this situation isn't more
graceful?  Surely zfs should figure out that 'something really bad
happened' and give up the ghost gracefully?

Thanks!

-- 
Glenn
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to remove any references to a zpool that's gone

2008-09-18 Thread Mark J Musante

Hi Glenn,

Where is it hanging?  Could you provide a stack trace?  It's possible  
that it's just a bug and not a configuration issue.


On 18 Sep, 2008, at 16.12, Glenn Lagasse wrote:


I had a disk that contained a zpool.  For reasons that we won't go in
to, that disk had zero's written all over it (at least enough to cover
the entirety of the zpool space).  Now when I run zpool status the
command hangs when it tries to display information about the now
non-existent pool.  Similarly, trying to destroy the pool hangs as  
well.


Is there some way to remove the pool from zfs's pool of knowledge?
Also, is it a bug that the failure mode for this situation isn't more
graceful?  Surely zfs should figure out that 'something really bad
happened' and give up the ghost gracefully?

Thanks!

--
Glenn
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





Regards,
markm


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to remove any references to a zpool that's gone

2008-09-18 Thread Richard Elling
Glenn Lagasse wrote:
> Hey Mark,
>
> * Mark J Musante ([EMAIL PROTECTED]) wrote:
>   
>> Hi Glenn,
>>
>> Where is it hanging?  Could you provide a stack trace?  It's possible  
>> that it's just a bug and not a configuration issue.
>> 
>
> I'll have to recreate the situation (won't be able to do so until next
> week).  I had a zpool status (and subsequently a zpool destroy) command
> that was hung, subsequent zfs commands also would hang.  I couldn't even
> do a zpool export (which someone privately told me should work).  What
> worked was to reboot (which I actually had to power the machine off
> physically, init and reboot did nothing) and then I could export the
> 'broken' pool.  So I'm not sure where the bug is, but this shouldn't be
> too hard to replicate and I believe running zpool status with this type
> of setup will cause a hang and then you're stuck until you power off the
> machine and reboot to do the export.  I'll report back next week once I
> replicate this.
>   

Probably a bug like:
http://bugs.opensolaris.org/view_bug.do?bug_id=6667208
Your workaround works.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to remove any references to a zpool that's gone

2008-09-18 Thread Glenn Lagasse
Hey Mark,

* Mark J Musante ([EMAIL PROTECTED]) wrote:
> Hi Glenn,
>
> Where is it hanging?  Could you provide a stack trace?  It's possible  
> that it's just a bug and not a configuration issue.

I'll have to recreate the situation (won't be able to do so until next
week).  I had a zpool status (and subsequently a zpool destroy) command
that was hung, subsequent zfs commands also would hang.  I couldn't even
do a zpool export (which someone privately told me should work).  What
worked was to reboot (which I actually had to power the machine off
physically, init and reboot did nothing) and then I could export the
'broken' pool.  So I'm not sure where the bug is, but this shouldn't be
too hard to replicate and I believe running zpool status with this type
of setup will cause a hang and then you're stuck until you power off the
machine and reboot to do the export.  I'll report back next week once I
replicate this.

Thanks,

Glenn
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] doing HDS shadow copy of a zpool

2008-09-18 Thread chad . campbell
I appologize if this has been answered already, but I've tried to RTFM and 
haven't found much.  I'm trying to get HDS shadow copy to work for zpool 
replication.  We do this with VXVM by modifying each target disk ID after 
it's been shadowed from the source LUN.  This allows us to import each 
target disk into the target diskgroup and then have its volumes mounted 
for backup over the network.  From what I can tell, each LUN in a zpool 
will have 2 256K vdev labels in the front and 2 at the end.  Is there a 
way to modify the vdev labels so that the target LUNs don't end up with 
the same zpool ID as the source LUNs?  Better yet, is there a way to 
import and rename a zpool that has the same exact id and name of an 
existing one?  As it stands now, after shadow copy, format can tell that 
each target LUN is labeled to be part of the source zpool, but that is 
invisibile to zpool import.

Thanks,

Chad___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss