[zfs-discuss] ZFS compression API (Was Re: [osol-discuss] Re: Re: where to start?)
[EMAIL PROTECTED] wrote: Robert Milkowski wrote: But only if compression is turned on for a filesystem. Of course, and the default is off. However I think it would be good to have an API so application can decide what to compress and what not. I agree that an API would be good. However I don't think using the API should allow an application to write compressed data if the file system has that functionality turned off. Its a policy thing if the admin has compression off it is off for a reason. Or maybe what we need is another property value for compression that allows the app to request it but by default we don't do it. *Only* if we fail to come up with a mechanism to do this properly, efficiently automagically. Which bit did you mean ? The API itself or the policy part ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS compression API (Was Re: [osol-discuss] Re: Re: where to start?)
>[EMAIL PROTECTED] wrote: >>> Robert Milkowski wrote: >>> But only if compression is turned on for a filesystem. >>> Of course, and the default is off. >>> However I think it would be good to have an API so application can decide what to compress and what not. >>> I agree that an API would be good. However I don't think using the API >>> should allow an application to write compressed data if the file system >>> has that functionality turned off. Its a policy thing if the admin has >>> compression off it is off for a reason. Or maybe what we need is another >>> property value for compression that allows the app to request it but by >>> default we don't do it. >> >> *Only* if we fail to come up with a mechanism to do this properly, >> efficiently automagically. > >Which bit did you mean ? The API itself or the policy part ? The API (and therefor also the policy). If we can make it work without, then that is much better. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tracking error to file
Can that same method be used to figure out what files changed between snapshots? Wout. On 22 May 2006, at 08:25, Matthew Ahrens wrote: On Fri, May 19, 2006 at 01:23:02PM -0600, Gregory Shaw wrote: DATASET OBJECT RANGE 1b 2402lvl=0 blkid=1965 I haven't found a way to report in human terms what the above object refers to. Is there such a method? There isn't any great method currently, but you can use 'zdb' to find this information. The quickest way would be to first determine the name of dataset 0x1b (=27): # zdb local | grep "ID 27," Dataset local/ahrens [ZPL], ID 27, ... Then get info on that particular object in that filesystem: # zdb -vvv 2402 ... Object lvl iblk dblk lsize asize type 2402116K 3.50K 3.50K 2.50K ZFS plain file 264 bonus ZFS znode path/raidz/usr/src/uts/common/fs/zfs/dmu.c ... The "path" listed is relative to the filesystem's mountpoint. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: iostat numbers for ZFS disks, build 39
> is anyone else seeing this? I couldn't find any > references to this in > the bug database. > I'm also seeing this behavior on occasion with b36 and b38...from the b36 box... Sun Microsystems Inc. SunOS 5.11 snv_36 October 2007 -bash-3.00$ iostat 1 ***snip*** ttysd0 sd1 sd2 sd3cpu tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id 0 80 0 000 002 200 001 5 0 95 0 235 0 000 000 000 000 3 0 97 0 80 0 000 00 24210 193 750 000 6 0 94 0 82 0 000 00 82 3430 001 6 0 94 0 80 0 000 000 000 000 3 0 96 0 80 0 000 000 000 000 2 0 97 0 80 0 000 000 000 001 3 0 96 0 80 0 000 00 24192 189 770 000 5 0 95 0 82 0 000 00 18016418222030220 900 00 1 6 0 93 0 94 0 000 000 000 000 2 0 97 0 80 0 000 000 000 001 3 0 96 ^C -bash-3.00$ This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS desktop integration demo
hey all, I just posted some stuff I'd been playing around with wrt. more desktop integration of ZFS functionality, full story at: http://blogs.sun.com/roller/page/timf?entry=zfs_on_your_desktop it's not much, but it's a start... cheers, tim [ Oh, and if you hadn't noticed, Alo also did some desktop integration work, concerning ACLs recently, at http://blogs.sun.com/roller/page/alvaro#gnome_zfs_and_the_acl ] -- Tim Foster, Sun Microsystems Inc, Operating Platforms Group Engineering Operationshttp://blogs.sun.com/timf ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS compression API (Was Re: [osol-discuss] Re: Re: where to start?)
Hello Darren, Tuesday, May 23, 2006, 11:12:15 AM, you wrote: DJM> [EMAIL PROTECTED] wrote: >>> Robert Milkowski wrote: >>> But only if compression is turned on for a filesystem. >>> Of course, and the default is off. >>> However I think it would be good to have an API so application can decide what to compress and what not. >>> I agree that an API would be good. However I don't think using the API >>> should allow an application to write compressed data if the file system >>> has that functionality turned off. Its a policy thing if the admin has >>> compression off it is off for a reason. Or maybe what we need is another >>> property value for compression that allows the app to request it but by >>> default we don't do it. >> >> *Only* if we fail to come up with a mechanism to do this properly, >> efficiently automagically. It's not a good idea IMHO. The problem is that with stronger compression algorithms due to performance reasons I want to decide which algorithms and to what files ZFS should try to compress. For some files I write a lot of data even if they are compressing quite good i don't want it - too much CPU would be consumed. However for other files in the same data pool I know we do not write them that often so compression for them would be useful (and cheap in terms of CPU). -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Web administration interface
Steve, Thanks for the update, I will try again with Build 40. I generally use the CLI anyway, but when showing ZFS to others it is always nice to include the web GUI as some people feel more comfortable with it. Bob This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS compression API (Was Re: [osol-discuss] Re: Re: where to start?)
Robert Milkowski wrote: The problem is that with stronger compression algorithms due to performance reasons I want to decide which algorithms and to what files ZFS should try to compress. For some files I write a lot of data even if they are compressing quite good i don't want it - too much CPU would be consumed. However for other files in the same data pool I know we do not write them that often so compression for them would be useful (and cheap in terms of CPU). This is exactly the reason I don't want the applications running as a normal user to be able to turn on more expensive compression algorithms if the administrator of the data set has explicitly not turned on compression. I think there are three possible modes here: 1) No compression ever. 2) Current behaviour 3) New: Best effort from ZFS (ie current behaviour) with hints from application on the data eg, don't bother trying this is an MP3. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] RFE filesystem ownership
Hi I think ZFS should add the concept of ownership to a ZFS filesystem, so if i create a filesystem for joe, he should be able to use his space how ever he see's fit, if he wants to turn on compression or take 5000 snapshots its his filesystem, let him. If he wants to destroy snapshots, he created them it should be allowed, but he should not be allowed to do the same with carol's filesystem. The current filesystem management is not fine grained enough to deal with this. Of course if we don't assign an owner the filesystem should perform much like it does today. zfs set owner=joe pool/joe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tracking error to file
On Tue, May 23, 2006 at 11:49:47AM +0200, Wout Mertens wrote: > Can that same method be used to figure out what files changed between > snapshots? To figure out what files changed, we need to (a) figure out what object numbers changed, and (b) do the object number to file name translation. The method I described (using zdb) will not be involved in either step. zdb is an undocumented interface, and using it for this purpose is only a workaround. However, the same algorithms implemented in zdb will be used to do step (b), the object number to file name translation. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Web administration interface
Just a note about build 39. I tried Tim's smreg add suggestion and it worked, I now see the ZFS Admin page. But when I click on the link, the next page shows: Application Error org.apache.jasper.JasperException: /jsp/zfsmodule/DevicesTree.jsp(51,2) The end tag "http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE filesystem ownership
James Dickens wrote: Hi I think ZFS should add the concept of ownership to a ZFS filesystem, so if i create a filesystem for joe, he should be able to use his space how ever he see's fit, if he wants to turn on compression or take 5000 snapshots its his filesystem, let him. If he wants to destroy snapshots, he created them it should be allowed, but he should not be allowed to do the same with carol's filesystem. The current filesystem management is not fine grained enough to deal with this. Of course if we don't assign an owner the filesystem should perform much like it does today. Yes we do need something like this. This is already covered by the following CRs 6280676, 6421209. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Mirror options pros and cons
Hi, I have these two pools, four luns each. One has two mirrors x two luns, the other is one mirror x 4 luns. I am trying to figure out what the pro's and cons are of these two configs. One thing I have noticed is that the single mirror 4 lun config can survive as many as three lun failures. The other config only two. I am thinking that space efficiency is similar because zfs strips across all the luns in both configs. So that being said. I would like to here from others on pro's and cons of these two approaches. Thanks ahead, -tomg NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror ONLINE 0 0 0 /export/lun5 ONLINE 0 0 0 /export/lun2 ONLINE 0 0 0 mirror ONLINE 0 0 0 /export/lun3 ONLINE 0 0 0 /export/lun4 ONLINE 0 0 0 NAME STATE READ WRITE CKSUM newpool ONLINE 0 0 0 mirrorONLINE 0 0 0 /export/luna ONLINE 0 0 0 /export/lunb ONLINE 0 0 0 /export/lund ONLINE 0 0 0 /export/lunc ONLINE 0 0 0 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re[2]: ZFS compression API (Was Re: [osol-discuss] Re: Re: where to start?)
Hello Darren, Tuesday, May 23, 2006, 4:19:05 PM, you wrote: DJM> Robert Milkowski wrote: >> The problem is that with stronger compression algorithms due to >> performance reasons I want to decide which algorithms and to what >> files ZFS should try to compress. For some files I write a lot of data >> even if they are compressing quite good i don't want it - too much CPU >> would be consumed. However for other files in the same data pool I >> know we do not write them that often so compression for them would be >> useful (and cheap in terms of CPU). DJM> This is exactly the reason I don't want the applications running as a DJM> normal user to be able to turn on more expensive compression algorithms DJM> if the administrator of the data set has explicitly not turned on DJM> compression. DJM> I think there are three possible modes here: DJM> 1) No compression ever. DJM> 2) Current behaviour DJM> 3) New: Best effort from ZFS (ie current behaviour) with hints from DJM> application on the data eg, don't bother trying this is an MP3. Definitely that's something I had in mind. To be more precise case #3 should be something like either ZFS tries to determine itself which compression to use for each file (block?) or administrator can setup default compression for a dataset. However in both these cases application is able either hint ZFS or even force specific compression algorithm for a file (block? given write? what about mmap()?). In some cases algorithm per block (write()) would be preferred - like if our application writes an email and it already knows what kind of attachments there are so it doesn't make sense to even try to compress this block (however rest of the mail could be compressed). Something like this: ioctl(fd, strong compression) write(fd, buf, s1) - headers and body ioctl(fd, no compression) write(fd, buf, s1) - jpeg attachement However writev() or sendfilev() would be preferred - I'm not sure how to cope with that situation. To make API more universal applications should have an option not only to request specific compression but to request some kind of compression level without actually specifying algorithm - like I want light compression or I don't care about CPU and want heavy compression (or something in a middle). ps. and of course if admin chooses option #1 application can't force ZFS to use compression. And in #2 only compression selected by sys admin is used. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE filesystem ownership
Hello James, Tuesday, May 23, 2006, 6:43:11 PM, you wrote: JD> Hi JD> I think ZFS should add the concept of ownership to a ZFS filesystem, JD> so if i create a filesystem for joe, he should be able to use his JD> space how ever he see's fit, if he wants to turn on compression or JD> take 5000 snapshots its his filesystem, let him. If he wants to JD> destroy snapshots, he created them it should be allowed, but he should JD> not be allowed to do the same with carol's filesystem. The current JD> filesystem management is not fine grained enough to deal with this. Of JD> course if we don't assign an owner the filesystem should perform much JD> like it does today. JD> zfs set owner=joe pool/joe IIRC it's planned - however I'm not sure it a user should be able to turn on compression, especially when it's directly turned off by sys admin. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirror options pros and cons
Hello Tom, Tuesday, May 23, 2006, 9:46:24 PM, you wrote: TG> Hi, TG> I have these two pools, four luns each. One has two mirrors x two luns, TG> the other is one mirror x 4 luns. TG> I am trying to figure out what the pro's and cons are of these two configs. TG> One thing I have noticed is that the single mirror 4 lun config can TG> survive as many as three lun failures. The other config only two. TG> I am thinking that space efficiency is similar because zfs strips across TG> all the luns in both configs. TG> So that being said. I would like to here from others on pro's and cons TG> of these two approaches. TG> Thanks ahead, TG> -tomg TG>NAME STATE READ WRITE CKSUM TG> mypool ONLINE 0 0 0 TG> mirror ONLINE 0 0 0 TG> /export/lun5 ONLINE 0 0 0 TG> /export/lun2 ONLINE 0 0 0 TG> mirror ONLINE 0 0 0 TG> /export/lun3 ONLINE 0 0 0 TG> /export/lun4 ONLINE 0 0 0 TG> NAME STATE READ WRITE CKSUM TG> newpool ONLINE 0 0 0 TG> mirrorONLINE 0 0 0 TG> /export/luna ONLINE 0 0 0 TG> /export/lunb ONLINE 0 0 0 TG> /export/lund ONLINE 0 0 0 TG> /export/lunc ONLINE 0 0 0 In the first config you should get a pool storage with capacity equal to '2x lun size'. In the second config only '1x lun size'. So in the second config you get better redundancy but only half storage size. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE filesystem ownership
On 5/23/06, Robert Milkowski <[EMAIL PROTECTED]> wrote: Hello James, Tuesday, May 23, 2006, 6:43:11 PM, you wrote: JD> Hi JD> I think ZFS should add the concept of ownership to a ZFS filesystem, JD> so if i create a filesystem for joe, he should be able to use his JD> space how ever he see's fit, if he wants to turn on compression or JD> take 5000 snapshots its his filesystem, let him. If he wants to JD> destroy snapshots, he created them it should be allowed, but he should JD> not be allowed to do the same with carol's filesystem. The current JD> filesystem management is not fine grained enough to deal with this. Of JD> course if we don't assign an owner the filesystem should perform much JD> like it does today. JD> zfs set owner=joe pool/joe IIRC it's planned - however I'm not sure it a user should be able to turn on compression, especially when it's directly turned off by sys admin. perhaps if compression turned/forced off by the admin then it shouldn't be alowed, but enabling compression on a filesystem that just had it off by default should be allowed but of course this complicates implementation. They could create a system wide config file disallowing compression/encryption etc. James uadmin.blogspot.com -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE filesystem ownership
Darren J Moffat wrote: > James Dickens wrote: > > I think ZFS should add the concept of ownership to a ZFS filesystem, > > so if i create a filesystem for joe, he should be able to use his > > space how ever he see's fit, if he wants to turn on compression or > > take 5000 snapshots its his filesystem, let him. If he wants to > > destroy snapshots, he created them it should be allowed, but he should > > not be allowed to do the same with carol's filesystem. The current > > filesystem management is not fine grained enough to deal with this. Of > > course if we don't assign an owner the filesystem should perform much > > like it does today. > > Yes we do need something like this. > > This is already covered by the following CRs 6280676, 6421209. That could be done if "zfs" would be based on ksh93... you could simply run it as "profile shell" (pfksh93) and make a profile for that user+ZFS filesystem... Bye, Roland -- __ . . __ (o.\ \/ /.o) [EMAIL PROTECTED] \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirror options pros and cons
Robert Milkowski wrote: Hello Tom, Tuesday, May 23, 2006, 9:46:24 PM, you wrote: TG> Hi, TG> I have these two pools, four luns each. One has two mirrors x two luns, TG> the other is one mirror x 4 luns. TG> I am trying to figure out what the pro's and cons are of these two configs. TG> One thing I have noticed is that the single mirror 4 lun config can TG> survive as many as three lun failures. The other config only two. TG> I am thinking that space efficiency is similar because zfs strips across TG> all the luns in both configs. TG> So that being said. I would like to here from others on pro's and cons TG> of these two approaches. TG> Thanks ahead, TG> -tomg TG>NAME STATE READ WRITE CKSUM TG> mypool ONLINE 0 0 0 TG> mirror ONLINE 0 0 0 TG> /export/lun5 ONLINE 0 0 0 TG> /export/lun2 ONLINE 0 0 0 TG> mirror ONLINE 0 0 0 TG> /export/lun3 ONLINE 0 0 0 TG> /export/lun4 ONLINE 0 0 0 TG> NAME STATE READ WRITE CKSUM TG> newpool ONLINE 0 0 0 TG> mirrorONLINE 0 0 0 TG> /export/luna ONLINE 0 0 0 TG> /export/lunb ONLINE 0 0 0 TG> /export/lund ONLINE 0 0 0 TG> /export/lunc ONLINE 0 0 0 In the first config you should get a pool storage with capacity equal to '2x lun size'. In the second config only '1x lun size'. So in the second config you get better redundancy but only half storage size. Ok I see that, df shows it explicitly. [EMAIL PROTECTED]> df -F zfs -h Filesystem size used avail capacity Mounted on mypool 2.0G39M 1.9G 2%/mypool newpool 1000M 8K 1000M 1%/newpool What confused me is that ZFS does dynamic striping and if I write to the 2x lun mirror all of the disks get IO. But my error in thought was in how the data gets spread out. It must be that the writes get striped for bandwidth utilization but the blocks and their copies are not spread across the mirrors. I'd like to understand that better. It sure is good to be able to experiment with devious. -tomg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] RFE filesystem ownership
Hello James, Tuesday, May 23, 2006, 10:25:10 PM, you wrote: JD> On 5/23/06, Robert Milkowski <[EMAIL PROTECTED]> wrote: >> Hello James, >> >> Tuesday, May 23, 2006, 6:43:11 PM, you wrote: >> >> JD> Hi >> >> JD> I think ZFS should add the concept of ownership to a ZFS filesystem, >> JD> so if i create a filesystem for joe, he should be able to use his >> JD> space how ever he see's fit, if he wants to turn on compression or >> JD> take 5000 snapshots its his filesystem, let him. If he wants to >> JD> destroy snapshots, he created them it should be allowed, but he should >> JD> not be allowed to do the same with carol's filesystem. The current >> JD> filesystem management is not fine grained enough to deal with this. Of >> JD> course if we don't assign an owner the filesystem should perform much >> JD> like it does today. >> >> JD> zfs set owner=joe pool/joe >> >> >> IIRC it's planned - however I'm not sure it a user should be able to >> turn on compression, especially when it's directly turned off by sys >> admin. >> JD> perhaps if compression turned/forced off by the admin then it JD> shouldn't be alowed, but enabling compression on a filesystem that JD> just had it off by default should be allowed but of course this JD> complicates implementation. They could create a system wide config JD> file disallowing compression/encryption etc. Something like 'zfs set compression=user dataset' or instead of 'user' -> 'allow' -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] RFE filesystem ownership
Hello Roland, Tuesday, May 23, 2006, 10:31:37 PM, you wrote: RM> Darren J Moffat wrote: >> James Dickens wrote: >> > I think ZFS should add the concept of ownership to a ZFS filesystem, >> > so if i create a filesystem for joe, he should be able to use his >> > space how ever he see's fit, if he wants to turn on compression or >> > take 5000 snapshots its his filesystem, let him. If he wants to >> > destroy snapshots, he created them it should be allowed, but he should >> > not be allowed to do the same with carol's filesystem. The current >> > filesystem management is not fine grained enough to deal with this. Of >> > course if we don't assign an owner the filesystem should perform much >> > like it does today. >> >> Yes we do need something like this. >> >> This is already covered by the following CRs 6280676, 6421209. RM> That could be done if "zfs" would be based on ksh93... you could simply RM> run it as "profile shell" (pfksh93) and make a profile for that user+ZFS RM> filesystem... Maybe I'm missing something but it has nothing to do with ksh93 or any other shell. It should just work for a given user despite of it's shell, etc - just uid and proper privileges. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Mirror options pros and cons
Hello Tom, Tuesday, May 23, 2006, 10:37:31 PM, you wrote: TG> Robert Milkowski wrote: >> Hello Tom, >> >> Tuesday, May 23, 2006, 9:46:24 PM, you wrote: >> >> TG> Hi, >> >> TG> I have these two pools, four luns each. One has two mirrors x two luns, >> TG> the other is one mirror x 4 luns. >> >> TG> I am trying to figure out what the pro's and cons are of these two >> configs. >> >> TG> One thing I have noticed is that the single mirror 4 lun config can >> TG> survive as many as three lun failures. The other config only two. >> TG> I am thinking that space efficiency is similar because zfs strips across >> TG> all the luns in both configs. >> >> TG> So that being said. I would like to here from others on pro's and cons >> TG> of these two approaches. >> >> TG> Thanks ahead, >> TG> -tomg >> >> TG>NAME STATE READ WRITE CKSUM >> TG> mypool ONLINE 0 0 0 >> TG> mirror ONLINE 0 0 0 >> TG> /export/lun5 ONLINE 0 0 0 >> TG> /export/lun2 ONLINE 0 0 0 >> TG> mirror ONLINE 0 0 0 >> TG> /export/lun3 ONLINE 0 0 0 >> TG> /export/lun4 ONLINE 0 0 0 >> >> TG> NAME STATE READ WRITE CKSUM >> TG> newpool ONLINE 0 0 0 >> TG> mirrorONLINE 0 0 0 >> TG> /export/luna ONLINE 0 0 0 >> TG> /export/lunb ONLINE 0 0 0 >> TG> /export/lund ONLINE 0 0 0 >> TG> /export/lunc ONLINE 0 0 0 >> >> >> In the first config you should get a pool storage with capacity equal to >> '2x lun size'. In the second config only '1x lun size'. >> So in the second config you get better redundancy but only half >> storage size. >> >> TG> Ok I see that, df shows it explicitly. [EMAIL PROTECTED]>> df -F zfs -h TG> Filesystem size used avail capacity Mounted on TG> mypool 2.0G39M 1.9G 2%/mypool TG> newpool 1000M 8K 1000M 1%/newpool TG> What confused me is that ZFS does dynamic striping and if I write to the TG> 2x lun mirror all of the disks get IO. But my error in thought was in TG> how the data gets spread out. It must be that the writes get striped for TG> bandwidth utilization but the blocks and their copies are not spread TG> across the mirrors. I'd like to understand that better. TG> It sure is good to be able to experiment with devious. Well, mirror A B mirror C D with zfs is actually behaving like RAID-10 (stripe over mirrors). The main difference here is variable stripe width but when it comes to protection it's just RAID-10 + checksums for data+metadata. You can imaging such a config as stacked raid - the same if you have created two mirrors on HW RAID, then exposed two such disks to a host and then just did striping using ZFS (zpool create pool X Y - where X is one mirror from two disks and Y is another mirror from two disks). The difference is in using variable stripe width and checksums (and more clever IO scheduler?) -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[9]: [zfs-discuss] Re: Re: Due to 128KB limit in ZFS it can't saturate disks
Hello Roch, Monday, May 22, 2006, 3:42:41 PM, you wrote: RBPE> Robert Says: RBPE> Just to be sure - you did reconfigure system to actually allow larger RBPE> IO sizes? RBPE> Sure enough, I messed up (I had no tuning to get the above data); So RBPE> 1 MB was my max transfer sizes. Using 8MB I now see: RBPE>Bytes Elapse of phys IO Size RBPE>Sent RBPE> RBPE>8 MB; 3576 ms of phys; avg sz : 16 KB; throughput 2 MB/s RBPE>9 MB; 1861 ms of phys; avg sz : 32 KB; throughput 4 MB/s RBPE>31 MB; 3450 ms of phys; avg sz : 64 KB; throughput 8 MB/s RBPE>78 MB; 4932 ms of phys; avg sz : 128 KB; throughput 15 MB/s RBPE>124 MB; 4903 ms of phys; avg sz : 256 KB; throughput 25 MB/s RBPE>178 MB; 4868 ms of phys; avg sz : 512 KB; throughput 36 MB/s RBPE>226 MB; 4824 ms of phys; avg sz : 1024 KB; throughput 46 MB/s RBPE>226 MB; 4816 ms of phys; avg sz : 2048 KB; throughput 54 MB/s (was 46 MB/s) RBPE> 32 MB; 686 ms of phys; avg sz : 4096 KB; throughput 58 MB/s (was 46 MB/s) RBPE>224 MB; 4741 ms of phys; avg sz : 8192 KB; throughput 59 MB/s (was 47 MB/s) RBPE>272 MB; 4336 ms of phys; avg sz : 16384 KB; throughput 58 MB/s (new data) RBPE>288 MB; 4327 ms of phys; avg sz : 32768 KB; throughput 59 MB/s (new data) RBPE> Data was corrected after it was pointed out that, physio will be RBPE> throttled by maxphys. New data was obtained after settings RBPE> /etc/system: set maxphys=8338608 RBPE> /kernel/drv/sd.conf sd_max_xfer_size=0x80 RBPE> /kernel/drv/ssd.cond ssd_max_xfer_size=0x80 RBPE> And setting un_max_xfer_size in "struct sd_lun". RBPE> That address was figured out using dtrace and knowing that RBPE> sdmin() calls ddi_get_soft_state (details avail upon request). RBPE> RBPE> And of course disabling the write cache (using format -e) RBPE> With this in place I verified that each sdwrite() up to 8M RBPE> would lead to a single biodone interrupts using this: RBPE> dtrace -n 'biodone:entry,sdwrite:[EMAIL PROTECTED], stack(20)]=count()}' RBPE> Note that for 16M and 32M raw device writes, each default_physio RBPE> will issue a series of 8M I/O. And so we don't RBPE> expect any more throughput from that. RBPE> The script used to measure the rates (phys.d) was also RBPE> modified since I was counting the bytes before the I/O had RBPE> completed and that made a big difference for the very large RBPE> I/O sizes. RBPE> If you take the 8M case, the above rates correspond to the RBPE> time it takes to issue and wait for a single 8M I/O to the RBPE> sd driver. So this time certainly does include 1 seek and ~ RBPE> 0.13 seconds of data transfer, then the time to respond to RBPE> the interrupt, finally the wakeup of the thread waiting in RBPE> default_physio(). Given that the data transfer rate using 4 RBPE> MB is very close to the one using 8 MB, I'd say that at 60 RBPE> MB/sec all the fixed-cost element are well amortized. So I RBPE> would conclude from this that the limiting factor is now at RBPE> the device itself or on the data channel between the disk RBPE> and the host. RBPE> Now recall that the throughput that ZFS gets during an RBPE> spa_sync when submitted to a single dd and knowing that ZFS RBPE> will work with 128K I/O: RBPE>1431 MB; 23723 ms of spa_sync; avg sz : 127 KB; throughput 60 MB/s RBPE>1387 MB; 23044 ms of spa_sync; avg sz : 127 KB; throughput 60 MB/s RBPE>2680 MB; 44209 ms of spa_sync; avg sz : 127 KB; throughput 60 MB/s RBPE>1359 MB; 24223 ms of spa_sync; avg sz : 127 KB; throughput 56 MB/s RBPE>1143 MB; 19183 ms of spa_sync; avg sz : 126 KB; throughput 59 MB/s RBPE> My disk is RBPE>. Is it over FC or just SCSI/SAS? I have to try again with SAS/SCSI - maybe due to more overhead in FC larger IOs give better results than on SCSI? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[9]: [zfs-discuss] Re: Re: Due to 128KB limit in ZFS it can't saturate disks
Hello Roch, Friday, May 19, 2006, 3:53:35 PM, you wrote: RBPE> Robert Milkowski writes: >> Hello Roch, >> >> Monday, May 15, 2006, 3:23:14 PM, you wrote: >> >> RBPE> The question put forth is whether the ZFS 128K blocksize is sufficient >> RBPE> to saturate a regular disk. There is great body of evidence that shows >> RBPE> that the bigger the write sizes and matching large FS clustersize lead >> RBPE> to more throughput. The counter point is that ZFS schedules it's I/O >> RBPE> like nothing else seen before and manages to sature a single disk >> RBPE> using enough concurrent 128K I/O. >> >> Nevertheless I get much more throughput using UFS and writing with >> large block than using ZFS on the same disk. And the difference is >> actually quite big in favor of UFS. >> RBPE> Absolutely. Isn't this issue though ? RBPE> 6415647 Sequential writing is jumping RBPE> We will have to fix this to allow dd to get more throughput. RBPE> I'm pretty sure the fix won't need to increase the RBPE> blocksize though. Maybe - but it also means that until this is addressed it doesn't make any sense to compare ZFS to other filesysystems with sequential writing... The question is how well above problem is understood and when is it going to be corrected? And why in your test cases which are similar to mine you do not see dd to raw device to be actually faster by any important factor? (again, maybe you are using SCSI and I do use FC). -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Misc questions
Some miscellaneous questions: * When you share a ZFS fs via NFS, what happens to files and filesystems that exceed the limits of NFS? * Is there a recommendation or some guidelines to help answer the question "how full should a pool be before deciding it's time add disk space to a pool?" * Migrating pre-ZFS backups to ZFS backups: is there a better method than "restore the old backup into a ZFS fs, then back it up using "zfs send"? * Are ZFS quotas enforced assuming that compressed data is compressed, or uncompressed? The former seems to imply that the following would create a mess: 1) Turn on compression 2) Store data in the pool until the pool is almost full 3) Turn off compression 4) Read and re-write every file (thus expanding each file) * What block sizes will ZFS use? Is there an explanation somewhere about its method of choosing blocksize for a particular workload? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE filesystem ownership
Darren J Moffat wrote: James Dickens wrote: Hi I think ZFS should add the concept of ownership to a ZFS filesystem, so if i create a filesystem for joe, he should be able to use his space how ever he see's fit, if he wants to turn on compression or take 5000 snapshots its his filesystem, let him. If he wants to destroy snapshots, he created them it should be allowed, but he should not be allowed to do the same with carol's filesystem. The current filesystem management is not fine grained enough to deal with this. Of course if we don't assign an owner the filesystem should perform much like it does today. Yes we do need something like this. This is already covered by the following CRs 6280676, 6421209. These RFE's are currently being investigated. The basic idea is that an adminstrator will be allowed to grant specific users/groups to perform various zfs adminstrative tasks, such as create, destroy, clone, changing properties and so on. After the zfs team is in agreement as to what the interfaces should be, I will forward it to zfs-discuss for further feedback. -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Misc questions
On Tue, May 23, 2006 at 02:34:30PM -0700, Jeff Victor wrote: > * When you share a ZFS fs via NFS, what happens to files and > filesystems that exceed the limits of NFS? What limits do you have in mind? I'm not an NFS expert, but I think that NFSv4 (and probably v3) supports 64-bit file sizes, so there would be no limit mismatch there. > * Is there a recommendation or some guidelines to help answer the > question "how full should a pool be before deciding it's time add disk > space to a pool?" I'm not sure, but I'd guess around 90%. > * Migrating pre-ZFS backups to ZFS backups: is there a better method > than "restore the old backup into a ZFS fs, then back it up using "zfs > send"? No. > * Are ZFS quotas enforced assuming that compressed data is compressed, > or uncompressed? Quotas apply to the amount of space used, after compression. This is the space reported by 'zfs list', 'zfs get used', 'df', 'du', etc. > The former seems to imply that the following would create a mess: > 1) Turn on compression > 2) Store data in the pool until the pool is almost full > 3) Turn off compression > 4) Read and re-write every file (thus expanding each file) Since this example doesn't involve quotas, their behavior is not applicable here. In this example, there will be insufficient space in the pool to store your data, so your write operation will fail with ENOSPC. Perhaps a messy situation, but I don't see any alternative. If this is a concern, don't use compression. If you filled up a filesystem's quota rather than a pool, the behavior would be the same except you would get EDQUOT rather than ENOSPC. > * What block sizes will ZFS use? Is there an explanation somewhere > about its method of choosing blocksize for a particular workload? Files smaller than 128k will be stored in a single block, whose size is rounded up to the nearest sector (512 bytes). Files larger than 128k will be stored in multiple 128k blocks (unless the recordsize property has been set -- see the zfs(1m) manpage for an explanation of this). Thanks for using zfs! --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss