Re: [zfs-discuss] single memory allocation in the ZFS intent log
Casper Dik, Yes, I am familiar with Bonwick's slab allocators and tried it for wirespeed test of 64byte pieces for a 1Gb and then 100Mb Eths and lastly 10Mb Eth. My results were not encouraging. I assume it has improved over time. First, let me ask what happens to the FS if the allocs in the intent log code are sleeping waiting for memory IMO, The general problem with memory allocators is: - getting memory from a "cache" of ones own size/type is orders of magnitude higher than just getting some off one's own freelist, - their is a built in latency to recouperate/steal memory from other processes, - this stealing forces a sleep and context switches, - the amount of time to sleep is undeterminate with a single call per struct. How long can you sleep for? 100ms or 250ms or more.. - no process can guarantee a working set, In the time when memory was expensive, maybe a global sharing mechanisms would make sense, but when the amount of memory is somewhat plentiful and cheap, *** It then makes sense for a 2 stage implementation of preallocation of a working set and then normal allocation with the added latency. So, it makes sense to pre-allocate a working set of allocs by a single alloc call, break up the alloc into needed sizes, and then alloc from your own free list, -> if that freelist then empties, maybe then take the extra overhead with the kmem call. Consider this a expected cost to exceed a certain watermark. But otherwise, I bet if I give you some code for the pre-alloc, I bet 10 allocs from the freelist can be done versus the kmem_alloc call, and at least 100 to 10k allocs if sleep occurs on your side. Actually, I think it is so bad, that why don't you time 1 kmem_free versus grabbing elements off the freelist, However, don't trust me, I will drop a snapshot of the code to you tomarrow if you want and you make a single CPU benchmark comparison. Your multiple CPU issue, forces me to ask, is it a common occurance that 2 are more CPUs are simultaneouly requesting memory for the intent log? If it is, then their should be a freelist of a low watermark set of elements per CPU. However, one thing at a time.. So, do you want that code? It will be a single alloc of X units and then place them on a freelist. You then time it takes to remove Y elements from the freelist versus 1 kmem_alloc with a NO_SLEEP arg and report the numbers. Then I would suggest the call with the smallest sleep possible. How many allocs can then be done? 25k, 35k, more... Oh, the reason why we aren't timing the initial kmem_alloc call for the freelist, is because I expect that to occur during init and not proceed until memory is alloc'ed. Mitchell Erblich [EMAIL PROTECTED] wrote: > > > at least one location: > > > > When adding a new dva node into the tree, a kmem_alloc is done with > > a KM_SLEEP argument. > > > > thus, this process thread could block waiting for memory. > > > > I would suggest adding a pre-allocated pool of dva nodes. > > This is how the Solaris memory allocator works. It keeps pools of > "pre-allocated" nodes about until memory conditions are low. > > > When a new dva node is needed, first check this pre-allocated > > pool and allocate from their. > > There are two reasons why this is a really bad idea: > > - the system will run out of memory even sooner if people > start building their own free-lists > > - a single freelist does not scale; at two CPUs it becomes > the allocation bottleneck (I've measured and removed two > such bottlenecks from Solaris 9) > > You might want to learn about how the Solaris memory allocator works; > it pretty much works like you want, except that it is all part of the > framework. And, just as in your case, it does run out some times but > a private freelist does not help against that. > > > Why? This would eliminate a possible sleep condition if memory > >is not immediately available. The pool would add a working > >set of dva nodes that could be monitored. Per alloc latencies > >could be amortized over a chunk allocation. > > That's how the Solaris memory allocator already works. > > Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] single memory allocation in the ZFS intent log
>Casper Dik, > > Yes, I am familiar with Bonwick's slab allocators and tried > it for wirespeed test of 64byte pieces for a 1Gb and then > 100Mb Eths and lastly 10Mb Eth. My results were not > encouraging. I assume it has improved over time. Nothing which tries to send 64 byte pieces over 1Gb ethernet or 100Mb ethernet will give encouraging results. > First, let me ask what happens to the FS if the allocs > in the intent log code are sleeping waiting for memory How are you going to guarantee that there is *always* memory available? I think that's barking up the wrong tree. I think that a proper solution is not trying to find a way which prevents memory from running out but rather a way of dealing with the case of it running out. If KMEM_SLEEP is used in a path where it is causing problems, then no amount of freelists is going to solve that. There needs to be a solution which does not sleep. > - getting memory from a "cache" of ones own size/type > is orders of magnitude higher than just getting some > off one's own freelist, Actually, that's not true; Bonwick's allocator is *FASTER* by a *wide* margin than your own freelist. Believe me, I've measured this, I've seen "my own freelist" collapse on the floor when confronted with as few as two CPUs. As a minimum, you will need *per CPU* free lists. And that's precisely what the kernel memory allocator gives you. > In the time when memory was expensive, maybe a global > sharing mechanisms would make sense, but when the amount > of memory is somewhat plentiful and cheap, Not if all bits of the system are going to keep their own freelists * #CPUs. Then you are suddenly faced with a *MUCH* higher memory demand. The Bonwick allocator does keep quite a bit cached and keeps more memory unavailable already. > *** It then makes sense for a 2 stage implementation of > preallocation of a working set and then normal allocation > with the added latency. But the normal Bonwick allocation *is* two-stage; you are proposing to add a 3rd stage. > So, it makes sense to pre-allocate a working set of allocs > by a single alloc call, break up the alloc into needed sizes, > and then alloc from your own free list, That's what the Bonwick allocator does; so why are you duplicating this? Apart from the questionable performance gain (I believe there to be none), the loss of the kernel memory allocator debugging functionality is severe: - you can no longer track where the individual blocks are allocated - you can no longer track buffer overruns - buffers run into one another, so one overrun buffer corrupts another without trace > -> if that freelist then empties, maybe then take the extra > overhead with the kmem call. Consider this a expected cost to exceed > a certain watermark. This is exactly how the magazine layer works. > But otherwise, I bet if I give you some code for the pre-alloc, I bet >10 > allocs from the freelist can be done versus the kmem_alloc call, and > at least 100 to 10k allocs if sleep occurs on your side. I hope you're not designing this with a single lock per queue. I have eradicated code in Solaris 9 which looked like this: struct au_buff * au_get_buff(void) { au_buff_t *buffer = NULL; mutex_enter(&au_free_queue_lock); if (au_free_queue == NULL) { if (au_get_chunk(1)) { mutex_exit(&au_free_queue_lock); return (NULL); } } buffer = au_free_queue; au_free_queue = au_free_queue->next_buf; mutex_exit(&au_free_queue_lock); buffer->next_buf = NULL; return (buffer); } (with a corresponding free routine which never returned memory to the system but kept it in the freelist) This was replaced with essentially: buffer = kmem_cache_alloc(au_buf_cache, KM_SLEEP); The first bit of code stopped scaling at 1 CPU (the performance with two CPUs was slightly worse than with one CPU) The second bit of code was both FASTER in the single CPU case and scaled to the twelve CPUs I had for testing. > Actually, I think it is so bad, that why don't you time 1 kmem_free > versus grabbing elements off the freelist, I did, it's horrendous. Don't forget that the typical case, when the magazine layer is properly size after the system has been running for a while, no locks need to be grabbed to get memory as the magazines are per-CPU. But with your single freelist, you must grab a lock. Somewhere in the grab/release lock cycle there's at least one atomic operation and memory barrier. Those are perhaps cheap on single CPU systems but run in the hundreds of
Re: [zfs-discuss] single memory allocation in the ZFS intent log
On Wed, 4 Oct 2006, Erblichs wrote: Casper Dik, Yes, I am familiar with Bonwick's slab allocators and tried it for wirespeed test of 64byte pieces for a 1Gb and then 100Mb Eths and lastly 10Mb Eth. My results were not encouraging. I assume it has improved over time. First, let me ask what happens to the FS if the allocs in the intent log code are sleeping waiting for memory The same as would happen to the FS with your proposed additional allocator layer in if that "freelist" of yours runs out - it'll wait, you'll see a latency bubble. You seem to think it's likely that a kmem_alloc(..., KM_SLEEP) will sleep. It's not. Anything but. See below. IMO, The general problem with memory allocators is: - getting memory from a "cache" of ones own size/type is orders of magnitude higher than just getting some off one's own freelist, This is why the kernel memory allocator in Solaris has two such freelists: - the per-CPU kmem magazines (you say below 'one step at a time', but that step is already done in Solaris kemem) - the slab cache - their is a built in latency to recouperate/steal memory from other processes, Stealing ("reclaim" in Solaris kmem terms) happens if the following three conditions are true: - nothing in the per-CPU magazines - nothing in the slab cache - nothing in the quantum caches - on the attempt to grow the quantum cache, the request to the vmem backend finds no readily-available heap to satisfy the growth demand immediately - this stealing forces a sleep and context switches, - the amount of time to sleep is undeterminate with a single call per struct. How long can you sleep for? 100ms or 250ms or more.. - no process can guarantee a working set, Yes and no. If your working set is small, use the stack. In the time when memory was expensive, maybe a global sharing mechanisms would make sense, but when the amount of memory is somewhat plentiful and cheap, *** It then makes sense for a 2 stage implementation of preallocation of a working set and then normal allocation with the added latency. So, it makes sense to pre-allocate a working set of allocs by a single alloc call, break up the alloc into needed sizes, and then alloc from your own free list, See above - all of that _IS_ already done in Solaris kmem/vmem, with more parallelism and more intermediate caching layers designed to bring down allocation latency than your simple freelist approach would achieve. -> if that freelist then empties, maybe then take the extra overhead with the kmem call. Consider this a expected cost to exceed a certain watermark. But otherwise, I bet if I give you some code for the pre-alloc, I bet 10 allocs from the freelist can be done versus the kmem_alloc call, and at least 100 to 10k allocs if sleep occurs on your side. The same statistics can be made for Solaris kmem - you satisfy the request from the per-CPU magazine, you satisfy the request from the slab cache, you satisfy the request via immediate vmem backend allocation and a growth of the slab cache. All of these with increased latency but without sleeping. Sleeping only comes in if you're so tight on memory that you need to perform coalescing in the backend, and purge least-recently-used things from other kmem caches in favour of new backend requests. Just because you chose to say kmem_alloc(...,KM_SLEEP) doesn't mean you _will_ sleep. Normally you won't. Actually, I think it is so bad, that why don't you time 1 kmem_free versus grabbing elements off the freelist, However, don't trust me, I will drop a snapshot of the code to you tomarrow if you want and you make a single CPU benchmark comparison. Your multiple CPU issue, forces me to ask, is it a common occurance that 2 are more CPUs are simultaneouly requesting memory for the intent log? If it is, then their should be a freelist of a low watermark set of elements per CPU. However, one thing at a time.. Of course it's common - have two or more threads do filesystem I/O at the same time and you're already there. Which is why, one thing at a time, Solaris kmem had the magazine layer for, I think (predates my time at Sun), around 12 years now, to get SMP scalability. Been there done that ... So, do you want that code? It will be a single alloc of X units and then place them on a freelist. You then time it takes to remove Y elements from the freelist versus 1 kmem_alloc with a NO_SLEEP arg and report the numbers. Then I would suggest the call with the smallest sleep possible.
[zfs-discuss] Re: panic during recv
Hi, Yes I have a lot of trouble with zfs send .. zfs recv too. (sol 10 6/06, SUNWzfsu 11.10.0,REV=2006.05.18.02.15). All too often there is panic of the host doing zfs recv. When this happens for a certain snapshot combination ie zfs send -i snapA snapB then it *always* happens for that combination. In my experience about 1 combination in 30 leads to a crash. This might not seem very frequent but i'm using zfs to try to keep in sync a server with an on-line backup host with a dozen or so filesystems every 2 hours.. and inevitably i get a crash every day. These panics are very inconvenient and given that some combinations of snapshots never work I have to rollback a step or two on the backup server and then try another snapshot combination to move forward again. Tedious! My core dumps look a bit different though (but always in same ...) # echo '$C' | mdb 5 02a1011bc8d1 bcopy+0x1564(fcffead61c00, 3001529e400, 0, 140, 2, 7751e) 02a1011bcad1 dbuf_dirty+0x100(30015299a40, 3000e400420, , 300152a0638, 300152a05f0, 3) 02a1011bcb81 dnode_reallocate+0x150(108, 13, 300152a0598, 108, 0, 3000e400420) 02a1011bcc31 dmu_object_reclaim+0x80(0, 0, 13, 200, 11, 7bb7a400) 02a1011bccf1 restore_object+0x1b8(2a1011bd710, 30009834a70, 2a1011bd6c8, 11 , 3000e400420, 200) 02a1011bcdb1 dmu_recvbackup+0x608(300014fca00, 300014fccd8, 300014fcb30, 3000f492f18, 1, 0) 02a1011bcf71 zfs_ioc_recvbackup+0x38(300014fc000, 0, 0, 0, 9, 0) 02a1011bd021 zfsdev_ioctl+0x160(70362c00, 5d, ffbfeeb0, 1f, 7c, e68) 02a1011bd0d1 fop_ioctl+0x20(3000b61d540, 5a1f, ffbfeeb0, 13, 3000aa3d4d8, 11f86c8) 02a1011bd191 ioctl+0x184(4, 3000a0a4978, ffbfeeb0, ff38db68, 40350, 5a1f) 02a1011bd2e1 syscall_trap32+0xcc(4, 5a1f, ffbfeeb0, ff38db68, 40350, ff2eb3dc) Gary This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Questions of ZFS mount point and import Messages
On Tue, Oct 03, 2006 at 11:22:24PM -0700, Tejung Ted Chiang wrote: > Hi Experts, > > I got two questions below. > > 1. [b]Is there any mechanism to protect the zfs mount point from being > renamed via command mv?[/b] Now I can use "mv" to rename the mount > point which has zfs filesystem currently mounted. Of course solaris > will find no mnt-pt to mount the zfs filesystem. If I proceed any > properties change and do ZFS destroy, I'll have some uneasily recover > error. So is there any way to protect the mnt-pt from undesired > manipulate? No, there is no way to prevent this. > 2. [b]How do we know the zpool by which systems is currently used?[/b] > "zpool import" command does not tell us the system id information. It > is hard to tell the dependencies of pools and the systems in a SAN > environment. And do not know which pools are exactly exported. This has bbeen discussed at length recently on this list. The underlying RFE is: 6282725 hostname/hostid should be stored in the label - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Overview (rollup) of recent activity on zfs-discuss
For background on what this is, see: http://www.opensolaris.org/jive/message.jspa?messageID=24416#24416 http://www.opensolaris.org/jive/message.jspa?messageID=25200#25200 = zfs-discuss 09/16 - 09/30 = Size of all threads during period: Thread size Topic --- - 23 ZFS and HDS ShadowImage 16 mkdir == zfs create 15 Proposal: multiple copies of user data 13 problem ZFS / NFS from FreeBSD nfsv3 client -- periodic NFS server not resp 13 low disk performance 13 Metaslab alignment on RAID-Z 12 Newbie in ZFS 11 jbod questions 9 slow reads question... 9 live upgrade incompability 8 drbd using zfs send/receive? 7 destroy pool by id? 6 zpool always thinks it's mounted on another system 6 zfs clones 6 Veritas NetBackup Support for ZFS 6 Possible file corruption on a ZFS mirror 6 Building large home file server with SATA 6 Automounting ? (idea ?) 5 zpool wrongly recognizes disk size 5 how do I find out if I am on a zfs filesystem 5 Snapshotting a pool ? 5 Info on OLTP Perf 5 Fastest way to send 100gb ( with ZFS send ) 5 Disk Layout for New Storage Server 4 please remove my ignorance of raiding and mirroring 3 zfs gets confused with multiple faults involving hot spares 3 tracking error to file 3 slow zpool create ( and format ) 3 panic during recv 3 is there any way to merge pools and zfs file systems together? 3 ZFS vs. Apple XRaid 3 ZFS layout on hardware RAID-5? 3 Some questions about how to organize ZFS-based filestorage 3 Question: created non global zone with ZFS underneath the root filesystem 3 Importing ZFS filesystems across architectures... 3 I'm dancin' in the streets 2 ztune 2 zpool iostat 2 zpool df mypool 2 zfs scrub question 2 moving fs from a dead box 2 mounting during boot 2 incorrect link for dmu_tx.c in ZFS Source Code tour 2 [Fwd: Queston: after installing SunMC 3.6.1 ability to view the ZFS gui has disappeared] 2 ZFS imported simultanously on 2 systems... 2 Question: vxvm/dmp or zfs/mpxio 2 How to make an extended LUN size known to ZFS and Solaris 2 Filesystem structure 2 Customer problem with zfs 2 Comments on a ZFS multiple use of a pool, RFE. 2 Bizzare problem with ZFS filesystem 1 versioning with zfs like clearcase is this possible? 1 reslivering, how long will it take? 1 no automatic clearing of "zoned" eh? 1 mirror issues 1 create ZFS pool(s)/volume(s) during jumpstart 1 [Fwd: RESEND: [Fwd: Queston: after installing SunMC 3.6.1 ability to view the ZFS gui has disappeared]] 1 ZFS Available Space 1 ZFS 'quot' command 1 Recommendations for ZFS and databases 1 Re[2]: System hang caused by a "bad" snapshot 1 Pool shrinking 1 Physical Clone of zpool 1 Permissions on snapshot directories 1 Overview (rollup) of recent activity on zfs-discuss 1 Good PCI controllers for Nevada? 1 Comments on a ZFS multiple use of a pool, Posting activity by person for period: # of posts By -- -- 18 richard.elling at sun.com (richard elling - pae) 17 eric.schrock at sun.com (eric schrock) 12 torrey.mcmahon at sun.com (torrey mcmahon) 11 ginoruopolo at hotmail.com (gino ruopolo) 8 roch.bourbonnais at sun.com (roch) 7 rmilkowski at task.gda.pl (robert milkowski) 7 patrick at xsinet.co.za (patrick) 7 matthew.ahrens at sun.com (matthew ahrens) 7 chad at shire.net (chad leigh -- shire.net llc) 6 krzys at perfekt.net (krzys) 6 fcusack at fcusack.com (frank cusack) 5 milek at task.gda.pl (robert milkowski) 5 dd-b at dd-b.net (david dyer-bennet) 5 clayk at acu.edu (keith clay) 5 anantha.srirama at cdc.hhs.gov (anantha n. srirama) 4 weeyeh at gmail.com (wee yeh tan) 4 san2rini at fastwebnet.it (alf) 4 rincebrain at gmail.com (rich) 4 nicolas.williams at sun.com (nicolas williams) 4 neelakanth.nadgir at sun.com (neelakanth nadgir) 4 mike.kupfer at sun.com (mike kupfer) 4 jhm at
Re: [zfs-discuss] Re: panic during recv
Hey Gary, Can we get access to your core files? -Mark Gary Mitchell wrote: Hi, Yes I have a lot of trouble with zfs send .. zfs recv too. (sol 10 6/06, SUNWzfsu 11.10.0,REV=2006.05.18.02.15). All too often there is panic of the host doing zfs recv. When this happens for a certain snapshot combination ie zfs send -i snapA snapB then it *always* happens for that combination. In my experience about 1 combination in 30 leads to a crash. This might not seem very frequent but i'm using zfs to try to keep in sync a server with an on-line backup host with a dozen or so filesystems every 2 hours.. and inevitably i get a crash every day. These panics are very inconvenient and given that some combinations of snapshots never work I have to rollback a step or two on the backup server and then try another snapshot combination to move forward again. Tedious! My core dumps look a bit different though (but always in same ...) # echo '$C' | mdb 5 02a1011bc8d1 bcopy+0x1564(fcffead61c00, 3001529e400, 0, 140, 2, 7751e) 02a1011bcad1 dbuf_dirty+0x100(30015299a40, 3000e400420, , 300152a0638, 300152a05f0, 3) 02a1011bcb81 dnode_reallocate+0x150(108, 13, 300152a0598, 108, 0, 3000e400420) 02a1011bcc31 dmu_object_reclaim+0x80(0, 0, 13, 200, 11, 7bb7a400) 02a1011bccf1 restore_object+0x1b8(2a1011bd710, 30009834a70, 2a1011bd6c8, 11 , 3000e400420, 200) 02a1011bcdb1 dmu_recvbackup+0x608(300014fca00, 300014fccd8, 300014fcb30, 3000f492f18, 1, 0) 02a1011bcf71 zfs_ioc_recvbackup+0x38(300014fc000, 0, 0, 0, 9, 0) 02a1011bd021 zfsdev_ioctl+0x160(70362c00, 5d, ffbfeeb0, 1f, 7c, e68) 02a1011bd0d1 fop_ioctl+0x20(3000b61d540, 5a1f, ffbfeeb0, 13, 3000aa3d4d8, 11f86c8) 02a1011bd191 ioctl+0x184(4, 3000a0a4978, ffbfeeb0, ff38db68, 40350, 5a1f) 02a1011bd2e1 syscall_trap32+0xcc(4, 5a1f, ffbfeeb0, ff38db68, 40350, ff2eb3dc) Gary This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: panic during recv
Gary Mitchell wrote: Hi, Yes I have a lot of trouble with zfs send .. zfs recv too. (sol 10 6/06, SUNWzfsu 11.10.0,REV=2006.05.18.02.15). All too often there is panic of the host doing zfs recv. This is certainly a bug! Can you point us to the crash dumps? Also it might be helpful to have the actual 'zfs send' streams so we can reproduce the panic, if you can send Sun your data. If 'zfs send -i A B | zfs recv ...' causes a panic, we would need the output of 'zfs send A' and 'zfs send -i A B'. If you don't have a server handy, you can upload to ftp://sunsolve.sun.com/cores and let me know the location. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] q: zfs on das
actually zfs going over vtrak 200i promise array with 12x250g as scsi das. i have each disk on its own volume; anyone have had any experience running zfs on top of such a setup? any links and/or notes on similar setup esp. performance & reliability would be helpful. [if noone has done this, i will be glad to share my notes in a few weeks.] oz -- ozan s. yigit | [EMAIL PROTECTED] | 416 977 1414 x 1540 an open mind is no substitute for hard work -- nelson goodman ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS patches for S10 6/06
Hi, I am about to create a mirrored pool on an amd64 machine running S10 6/06 (no other patches). I plan to install the latest kernel patch (118855). Are there any ZFS patches already out that I should also install first? (No, I don't want to move to Nevada, but I will upgrade to S10 11/06 as soon as it is out) Andreas. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: [request-sponsor] request sponsor for #4890717
Jeremy Teo wrote: Hello, request sponsor for #4890717 want append-only files. I have a working prototype where the administrator can put a zfs fs into "append only" mode by setting the zfs "appendonly" property to "on" using zfs(1M). "append only" mode in this case means 1. Applications can only append to any existing files, but cannot truncate files by creating a new file with the same filename an existing file, or by writing in a file at an offset other than the end of the file. (Applications can still create new files) 2. Applications cannot remove existing files/directories. 3. Applications cannot rename/move existing files/directories. Thanks! I hope this is still wanted. :) How does this interact with the a append_only ACL that ZFS supports ? How does this property work in the face of inheritance. How does this property work in the the user delegation environment ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Questions of ZFS mount point and import Messages
On 10/5/06, Eric Schrock <[EMAIL PROTECTED]> wrote: On Tue, Oct 03, 2006 at 11:22:24PM -0700, Tejung Ted Chiang wrote: > 2. [b]How do we know the zpool by which systems is currently used?[/b] > "zpool import" command does not tell us the system id information. It > is hard to tell the dependencies of pools and the systems in a SAN > environment. And do not know which pools are exactly exported. This has bbeen discussed at length recently on this list. The underlying RFE is: 6282725 hostname/hostid should be stored in the label We are starting to build up our SAN and ZFS will be a heavy player here. For now, what we intend to do is to tag the pool with hostname on import. It's still suscepticle to human error so 6282725 is going to be really helpful. -- Just me, Wire ... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: [request-sponsor] request sponsor for #4890717
On 05/10/2006, at 8:10 AM, Darren J Moffat wrote: Jeremy Teo wrote: Hello, request sponsor for #4890717 want append-only files. I have a working prototype where the administrator can put a zfs fs into "append only" mode by setting the zfs "appendonly" property to "on" using zfs(1M). "append only" mode in this case means 1. Applications can only append to any existing files, but cannot truncate files by creating a new file with the same filename an existing file, or by writing in a file at an offset other than the end of the file. (Applications can still create new files) 2. Applications cannot remove existing files/directories. 3. Applications cannot rename/move existing files/directories. Thanks! I hope this is still wanted. :) How does this interact with the a append_only ACL that ZFS supports ? How does this property work in the face of inheritance. How does this property work in the the user delegation environment ? I was wondering the same thing. Personally, I'd rather see the append_only ACL work than a whole new fs property. Last time I looked there was some problem with append_only, but I can't remember what it was. Boyd ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: [request-sponsor] request sponsor for #4890717
Boyd Adamson wrote: On 05/10/2006, at 8:10 AM, Darren J Moffat wrote: Jeremy Teo wrote: Hello, request sponsor for #4890717 want append-only files. I have a working prototype where the administrator can put a zfs fs into "append only" mode by setting the zfs "appendonly" property to "on" using zfs(1M). "append only" mode in this case means 1. Applications can only append to any existing files, but cannot truncate files by creating a new file with the same filename an existing file, or by writing in a file at an offset other than the end of the file. (Applications can still create new files) 2. Applications cannot remove existing files/directories. 3. Applications cannot rename/move existing files/directories. Thanks! I hope this is still wanted. :) How does this interact with the a append_only ACL that ZFS supports ? How does this property work in the face of inheritance. How does this property work in the the user delegation environment ? I was wondering the same thing. Personally, I'd rather see the append_only ACL work than a whole new fs property. Last time I looked there was some problem with append_only, but I can't remember what it was. The basic problem at the moment with append_only via ACLs is the following: We have a problem with the NFS server, where there is no notion of O_APPEND. An open operation over NFS does not convey whether the client wishes to append or do a general write; only at the time of a write operation can the server see whether the client is appending. Therefore, a process could receive an error, e.g. ERANGE, EOVERFLOW, or ENOSPC, upon issuing an attempted write() somewhere other than at EOF. This adds unwanted overhead in the write path. I recently created a prototype that adds support for append only files in local ZFS file systems via ACLs. However, NFS clients will receive EACCES when attempting to open append only files. -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: [request-sponsor] request sponsor for #4890717
On 05/10/2006, at 11:28 AM, Mark Shellenbaum wrote: Boyd Adamson wrote: On 05/10/2006, at 8:10 AM, Darren J Moffat wrote: Jeremy Teo wrote: Hello, request sponsor for #4890717 want append-only files. I have a working prototype where the administrator can put a zfs fs into "append only" mode by setting the zfs "appendonly" property to "on" using zfs(1M). "append only" mode in this case means 1. Applications can only append to any existing files, but cannot truncate files by creating a new file with the same filename an existing file, or by writing in a file at an offset other than the end of the file. (Applications can still create new files) 2. Applications cannot remove existing files/directories. 3. Applications cannot rename/move existing files/directories. Thanks! I hope this is still wanted. :) How does this interact with the a append_only ACL that ZFS supports ? How does this property work in the face of inheritance. How does this property work in the the user delegation environment ? I was wondering the same thing. Personally, I'd rather see the append_only ACL work than a whole new fs property. Last time I looked there was some problem with append_only, but I can't remember what it was. The basic problem at the moment with append_only via ACLs is the following: We have a problem with the NFS server, where there is no notion of O_APPEND. An open operation over NFS does not convey whether the client wishes to append or do a general write; only at the time of a write operation can the server see whether the client is appending. Therefore, a process could receive an error, e.g. ERANGE, EOVERFLOW, or ENOSPC, upon issuing an attempted write() somewhere other than at EOF. This adds unwanted overhead in the write path. I recently created a prototype that adds support for append only files in local ZFS file systems via ACLs. However, NFS clients will receive EACCES when attempting to open append only files. Ah, that's right... it was NFS over ZFS. Am I the only person who sees it as odd that an ACL feature derived from NFSv4 is, in fact, not implemented in NFSv4? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS patches for S10 6/06
Andreas, The first ZFS patch will be released in the upcoming weeks. For now, the latest available bits are the ones from s10 6/06. Thanks, George Andreas Sterbenz wrote: Hi, I am about to create a mirrored pool on an amd64 machine running S10 6/06 (no other patches). I plan to install the latest kernel patch (118855). Are there any ZFS patches already out that I should also install first? (No, I don't want to move to Nevada, but I will upgrade to S10 11/06 as soon as it is out) Andreas. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss