Re: [zfs-discuss] [storage-discuss] A few questions : RAID set width
Hi all, Ben Rockwood wrote: > You want to keep stripes wide to reduce wasted disk space but you > also want to keep them narrow to reduce the elements involved in parity > calculation. I Ben's argument, and the main point IMHO is how the RAID behaves in the degraded state. When a disk fails, that disk's data has to be reconstructed by reading from ALL the other disks of the RAID set. Effectively, for the degraded RAID case, N disks of a RAID are reduced to the performance of one disk only. Also this situation will last until the RAID is reconstructed after replacing the failed disk, which is an argument for not using too large disks (see another thread on this list). Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] A few questions - small read I/O performance on RAIDZ
Hi Peter, Sorry, I have read you post after posting a reply myself. Peter Tribble wrote: > No. The number of spindles is constant. The snag is that for random reads, > the performance of a raidz1/2 vdev is essentially that of a single disk. (The > writes are fast because they're always full-stripe; but so are the reads.) Can you elaborate on this? My understanding is that with RAIDZ the writes are always full-stripe for as much data as can be agglomerated into a single contiguous write, but I thought this did not imply that all of the data has to be read at once except with a degraded RAID. What about for instance writing 16MB chunks and reading 8K random? Wouldn't RAIDZ access only the disks containing the 8K bits? Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] typo: [storage-discuss] A few questions : RAID set width
> I Ben's argument, and the main point IMHO is how the RAID behaves in the ^ second ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] A few questions - small read I/O performance on RAIDZ
Hello Nils, Thursday, September 18, 2008, 11:15:37 AM, you wrote: NG> Hi Peter, NG> Sorry, I have read you post after posting a reply myself. NG> Peter Tribble wrote: >> No. The number of spindles is constant. The snag is that for random reads, >> the performance of a raidz1/2 vdev is essentially that of a single disk. (The >> writes are fast because they're always full-stripe; but so are the reads.) NG> Can you elaborate on this? NG> My understanding is that with RAIDZ the writes are always full-stripe for as NG> much data as can be agglomerated into a single contiguous write, but I thought NG> this did not imply that all of the data has to be read at once except with a NG> degraded RAID. NG> What about for instance writing 16MB chunks and reading 8K random? Wouldn't NG> RAIDZ access only the disks containing the 8K bits? Basically, the way RAID-Z works is that it spreads FS block to all disks in a given VDEV, minus parity/checksum disks). Because when you read data back from zfs before it gets to application zfs will check it's checksum (fs checksum, not a raid-z one) so it needs entire fs block... which is spread to all data disks in a given vdev. -- Best regards, Robert Milkowskimailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] RAIDZ read-optimized write?
Hi Robert, > Basically, the way RAID-Z works is that it spreads FS block to all > disks in a given VDEV, minus parity/checksum disks). Because when you > read data back from zfs before it gets to application zfs will check > it's checksum (fs checksum, not a raid-z one) so it needs entire fs > block... which is spread to all data disks in a given vdev. Thank you very much for correcting my long-time misconception. On the other hand, isn't there room for improvement here? If it was possible to break large writes into smaller blocks with individual checkums(for instance those which are larger than a preferred_read_size parameter), we could still write all of these with a single RAIDZ(2) line, avoid the RAIDx write penalty and improve read performance because we'd only need to issue a single read I/O for each requested block - needing to access the full RAIDZ line only for the degraded RAID case. I think that this could make a big difference for write-once read many random access-type applications like DSS systems etc. Is this feasible at all? Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Procedure to follow after zpool upgrade on rpool (was: zpool upgrade wrecked GRUB)
(not sure if this has already been answered) > I have a similar situation and would love some concise suggestions: > > Had a working version of 2008.05 running svn_93 with the updated grub. I did > a pkg-update to svn_95 and ran the zfs update when it was suggested. System > ran fine until I did a a reboot, then no boot, only grub command line shows > up. IMHO, after a ZFS upgrade an easy way to fix this is: touch /etc/system # make bootadm re-create archive bootadm update-archive /boot/solaris/bin/update_grub If you're already lost after an upgrade (commands from memory, no syntax guarantee) * Boot from a current snv CD (needs to support the zpool version you have upgraded to) ISOs available at http://www.genunix.org/ * Import your rpool mkdir /tmp/rpool zpool import -R /tmp/rpool rpool - if this fails, get the pool ID with zpool import, then use zpool import -f -R /tmp/rpool * Mount your root-fs mount -F zfs rpool/opensolaris-X /mnt (now same as above, but with mounted on /mnt) * update boot-archive touch /mnt/etc/system bootadm update-archive -R /mnt * update grub /mnt/boot/solaris/bin/update_grub * umount, export umount /mnt zpool export rpool At least this has worked for me. Would it be a good idea to put this into indiana release notes? Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Procedure to follow after zpool upgrade on rpool
Not knowing of a better place to put this, I have created http://www.genunix.org/wiki/index.php/ZFS_rpool_Upgrade_and_GRUB Please make any corrections there. Thanks, Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
Hi, > It is important to remember that ZFS is ideal for writing new files from > scratch. IIRC, maildir MTAs never overwrite mail files. But courier-imap does maintain some additional index files which will be overwritten and I guess other IMAP servers will probably do the same. Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAIDZ read-optimized write?
On Thu, 18 Sep 2008, Nils Goroll wrote: > > On the other hand, isn't there room for improvement here? If it was possible > to > break large writes into smaller blocks with individual checkums(for instance > those which are larger than a preferred_read_size parameter), we could still > write all of these with a single RAIDZ(2) line, avoid the RAIDx write penalty > and improve read performance because we'd only need to issue a single read I/O > for each requested block - needing to access the full RAIDZ line only for the > degraded RAID case. > > I think that this could make a big difference for write-once read many random > access-type applications like DSS systems etc. I imagine that this is indeed possible but that the law of diminishing returns would prevail. The level of per-block overhead would become much greater so sequential throughput would be reduced and more disk space would be wasted. You can be sure that the ZFS inventors thoroughly explored all of these issues and it would surprise me if someone didn't prototype it to see how it actually performs. ZFS is designed for the present and the future. Legacy filesystems were designed for the past. In the present, the cost of memory is dramatically reduced, and in the future it will be even more so. This means that systems will contain massive cache RAM which dramatically reduces the number of read (and write) accesses. Also, solid state disks (SSDs) will eventually become common and SSDs don't exhibit a seek penalty so designing the filesystem to avoid seeks does not carry over into the long term future. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAIDZ read-optimized write?
On Thu, Sep 18, 2008 at 01:26:09PM +0200, Nils Goroll wrote: > Thank you very much for correcting my long-time misconception. > > On the other hand, isn't there room for improvement here? If it was > possible to break large writes into smaller blocks with individual > checkums(for instance those which are larger than a > preferred_read_size parameter), we could still write all of these with > a single RAIDZ(2) line, avoid the RAIDx write penalty and improve read > performance because we'd only need to issue a single read I/O for each > requested block - needing to access the full RAIDZ line only for the > degraded RAID case. Don't forget that the parent block contains the checksum so that it can be compared. There isn't room in the parent for an arbitrary number of checksums as would be required with an arbitrary number of columns. -- Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAIDZ read-optimized write?
Nils Goroll wrote: > Hi Robert, > > >> Basically, the way RAID-Z works is that it spreads FS block to all >> disks in a given VDEV, minus parity/checksum disks). Because when you >> read data back from zfs before it gets to application zfs will check >> it's checksum (fs checksum, not a raid-z one) so it needs entire fs >> block... which is spread to all data disks in a given vdev. >> > > Thank you very much for correcting my long-time misconception. > > On the other hand, isn't there room for improvement here? If it was possible > to > break large writes into smaller blocks with individual checkums(for instance > those which are larger than a preferred_read_size parameter), we could still > write all of these with a single RAIDZ(2) line, avoid the RAIDx write penalty > and improve read performance because we'd only need to issue a single read > I/O > for each requested block - needing to access the full RAIDZ line only for the > degraded RAID case. > > I think that this could make a big difference for write-once read many random > access-type applications like DSS systems etc. > > Is this feasible at all? > Someone in the community was supposedly working on this, at one time. It gets brought up about every 4-5 months or so. Lots of detail in the archives. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] A couple basic questions re: zfs sharenfs
All; I¹m sure I¹m missing something basic here. I need to do the following things, and can¹t for the life of me figure out how: 1. Export a zfs filesystem over NFS, but restrict access to a limited set of hosts and/or subnets: ie: 10.9.8.0/24 and 10.9.9.5 in. 2. give root access to a zfs file system over nfs. I¹m sure this is doable with the right options, but I can¹t figure out how. Any suggestions? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A couple basic questions re: zfs sharenfs
Try something like this: zsfs set sharenfs=options mypool/mydata where options is: sharenfs="[EMAIL PROTECTED]/24:@10.9.9.5/32,[EMAIL PROTECTED]/24:@10.9.9.5/32" -- Dave Michael Stalnaker wrote: > All; > > I’m sure I’m missing something basic here. I need to do the following > things, and can’t for the life of me figure out how: > >1. Export a zfs filesystem over NFS, but restrict access to a limited > set of hosts and/or subnets: ie: 10.9.8.0/24 and 10.9.9.5 in. >2. give root access to a zfs file system over nfs. > > > I’m sure this is doable with the right options, but I can’t figure out how. > > Any suggestions? > > > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A couple basic questions re: zfs sharenfs
I believe this is just: zfs set sharenfs='root=host1:host2,[EMAIL PROTECTED]/24:@10.9.9.5' filesystem See the man pages for zfs(1M) (especially the last example) and share_nfs(1M). - Johnson Michael Stalnaker wrote: > All; > > I¹m sure I¹m missing something basic here. I need to do the following > things, and can¹t for the life of me figure out how: > > 1. Export a zfs filesystem over NFS, but restrict access to a limited set of > hosts and/or subnets: ie: 10.9.8.0/24 and 10.9.9.5 in. > 2. give root access to a zfs file system over nfs. > > I¹m sure this is doable with the right options, but I can¹t figure out how. > > Any suggestions? > > > > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- - Johnson Earls System Support Lead for Sun Labs West MPK16-1205 x88965 650/786-8965 [EMAIL PROTECTED] ~~ NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ~~ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 vs AVS ?
On Tue, Sep 16, 2008 at 11:51 PM, Ralf Ramge <[EMAIL PROTECTED]> wrote: > Jorgen Lundman wrote: > >> If we were interested in finding a method to replicate data to a 2nd >> x4500, what other options are there for us? > > If you already have an X4500, I think the best option for you is a cron > job with incremental 'zfs send'. Or rsync. > > -- > > Ralf Ramge > Senior Solaris Administrator, SCNA, SCSA > We had some Sun reps come out the other day to talk to us about storage options, and part of the discussion was AVS replication with ZFS. I brought up the question of replicating the resilvering process, and the reps said it does not replicate. They may be mistaken, but I'm hopeful they are correct. Could this behavior have been changed recently on AVS to make replication 'smarter' with ZFS as the underlying filesystem? -- Brent Jones [EMAIL PROTECTED] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to remove any references to a zpool that's gone
I had a disk that contained a zpool. For reasons that we won't go in to, that disk had zero's written all over it (at least enough to cover the entirety of the zpool space). Now when I run zpool status the command hangs when it tries to display information about the now non-existent pool. Similarly, trying to destroy the pool hangs as well. Is there some way to remove the pool from zfs's pool of knowledge? Also, is it a bug that the failure mode for this situation isn't more graceful? Surely zfs should figure out that 'something really bad happened' and give up the ghost gracefully? Thanks! -- Glenn ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to remove any references to a zpool that's gone
Hi Glenn, Where is it hanging? Could you provide a stack trace? It's possible that it's just a bug and not a configuration issue. On 18 Sep, 2008, at 16.12, Glenn Lagasse wrote: I had a disk that contained a zpool. For reasons that we won't go in to, that disk had zero's written all over it (at least enough to cover the entirety of the zpool space). Now when I run zpool status the command hangs when it tries to display information about the now non-existent pool. Similarly, trying to destroy the pool hangs as well. Is there some way to remove the pool from zfs's pool of knowledge? Also, is it a bug that the failure mode for this situation isn't more graceful? Surely zfs should figure out that 'something really bad happened' and give up the ghost gracefully? Thanks! -- Glenn ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to remove any references to a zpool that's gone
Glenn Lagasse wrote: > Hey Mark, > > * Mark J Musante ([EMAIL PROTECTED]) wrote: > >> Hi Glenn, >> >> Where is it hanging? Could you provide a stack trace? It's possible >> that it's just a bug and not a configuration issue. >> > > I'll have to recreate the situation (won't be able to do so until next > week). I had a zpool status (and subsequently a zpool destroy) command > that was hung, subsequent zfs commands also would hang. I couldn't even > do a zpool export (which someone privately told me should work). What > worked was to reboot (which I actually had to power the machine off > physically, init and reboot did nothing) and then I could export the > 'broken' pool. So I'm not sure where the bug is, but this shouldn't be > too hard to replicate and I believe running zpool status with this type > of setup will cause a hang and then you're stuck until you power off the > machine and reboot to do the export. I'll report back next week once I > replicate this. > Probably a bug like: http://bugs.opensolaris.org/view_bug.do?bug_id=6667208 Your workaround works. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to remove any references to a zpool that's gone
Hey Mark, * Mark J Musante ([EMAIL PROTECTED]) wrote: > Hi Glenn, > > Where is it hanging? Could you provide a stack trace? It's possible > that it's just a bug and not a configuration issue. I'll have to recreate the situation (won't be able to do so until next week). I had a zpool status (and subsequently a zpool destroy) command that was hung, subsequent zfs commands also would hang. I couldn't even do a zpool export (which someone privately told me should work). What worked was to reboot (which I actually had to power the machine off physically, init and reboot did nothing) and then I could export the 'broken' pool. So I'm not sure where the bug is, but this shouldn't be too hard to replicate and I believe running zpool status with this type of setup will cause a hang and then you're stuck until you power off the machine and reboot to do the export. I'll report back next week once I replicate this. Thanks, Glenn ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] doing HDS shadow copy of a zpool
I appologize if this has been answered already, but I've tried to RTFM and haven't found much. I'm trying to get HDS shadow copy to work for zpool replication. We do this with VXVM by modifying each target disk ID after it's been shadowed from the source LUN. This allows us to import each target disk into the target diskgroup and then have its volumes mounted for backup over the network. From what I can tell, each LUN in a zpool will have 2 256K vdev labels in the front and 2 at the end. Is there a way to modify the vdev labels so that the target LUNs don't end up with the same zpool ID as the source LUNs? Better yet, is there a way to import and rename a zpool that has the same exact id and name of an existing one? As it stands now, after shadow copy, format can tell that each target LUN is labeled to be part of the source zpool, but that is invisibile to zpool import. Thanks, Chad___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss