Re: [zfs-discuss] single memory allocation in the ZFS intent log

2006-10-04 Thread Erblichs
Casper Dik,

Yes, I am familiar with Bonwick's slab allocators and tried
it for wirespeed test of 64byte pieces for a 1Gb and then
100Mb Eths and lastly 10Mb Eth. My results were not 
encouraging. I assume it has improved over time.

First, let me ask what happens to the FS if the allocs
in the intent log code are sleeping waiting for memory

IMO, The general problem with memory allocators is:

- getting memory from a "cache" of ones own size/type
  is orders of magnitude higher than just getting some
  off one's own freelist,

- their is a built in latency to recouperate/steal memory
  from other processes,

- this stealing forces a sleep and context switches,

- the amount of time to sleep is undeterminate with a single
  call per struct. How long can you sleep for? 100ms or
  250ms or more..

- no process can guarantee a working set,

In the time when memory was expensive, maybe a global
sharing mechanisms would make sense, but when  the amount
of memory is somewhat plentiful and cheap,

*** It then makes sense for a 2 stage implementation of
preallocation of a working set and then normal allocation
with the added latency. 

So, it makes sense to pre-allocate a working set of allocs
by a single alloc call, break up the alloc into needed sizes,
and then alloc from your own free list,

-> if that freelist then empties, maybe then take the extra
overhead with the kmem call. Consider this a expected cost to exceed
a certain watermark.

But otherwise, I bet if I give you some code for the pre-alloc, I bet
10 
allocs from the freelist can be done versus the kmem_alloc call, and
at least 100 to 10k allocs if sleep occurs on your side.

Actually, I think it is so bad, that why don't you time 1 kmem_free
versus grabbing elements off the freelist,

However, don't trust me, I will drop a snapshot of the code to you
tomarrow if you want and you make a single CPU benchmark comparison.

Your multiple CPU issue, forces me to ask, is it a common
occurance that 2 are more CPUs are simultaneouly requesting
memory for the intent log? If it is, then their should be a
freelist of a low watermark set of elements per CPU. However,
one thing at a time..

So, do you want that code? It will be a single alloc of X units
and then place them on a freelist. You then time it takes to
remove Y elements from the freelist versus 1 kmem_alloc with
a NO_SLEEP arg and report the numbers. Then I would suggest the
call with the smallest sleep possible. How many allocs can then
be done? 25k, 35k, more...

Oh, the reason why we aren't timing the initial kmem_alloc call
for the freelist, is because I expect that to occur during init
and not proceed until memory is alloc'ed.


Mitchell Erblich








[EMAIL PROTECTED] wrote:
> 
> >   at least one location:
> >
> >   When adding a new dva node into the tree, a kmem_alloc is done with
> >   a KM_SLEEP argument.
> >
> >   thus, this process thread could block waiting for memory.
> >
> >   I would suggest adding a  pre-allocated pool of dva nodes.
> 
> This is how the Solaris memory allocator works.  It keeps pools of
> "pre-allocated" nodes about until memory conditions are low.
> 
> >   When a new dva node is needed, first check this pre-allocated
> >   pool and allocate from their.
> 
> There are two reasons why this is a really bad idea:
> 
> - the system will run out of memory even sooner if people
>   start building their own free-lists
> 
> - a single freelist does not scale; at two CPUs it becomes
>   the allocation bottleneck (I've measured and removed two
>   such bottlenecks from Solaris 9)
> 
> You might want to learn about how the Solaris memory allocator works;
> it pretty much works like you want, except that it is all part of the
> framework.  And, just as in your case, it does run out some times but
> a private freelist does not help against that.
> 
> >   Why? This would eliminate a possible sleep condition if memory
> >is not immediately available. The pool would add a working
> >set of dva nodes that could be monitored. Per alloc latencies
> >could be amortized over a chunk allocation.
> 
> That's how the Solaris memory allocator already works.
> 
> Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] single memory allocation in the ZFS intent log

2006-10-04 Thread Casper . Dik

>Casper Dik,
>
>   Yes, I am familiar with Bonwick's slab allocators and tried
>   it for wirespeed test of 64byte pieces for a 1Gb and then
>   100Mb Eths and lastly 10Mb Eth. My results were not 
>   encouraging. I assume it has improved over time.

Nothing which tries to send 64 byte pieces over 1Gb ethernet or 100Mb
ethernet will give encouraging results.

>   First, let me ask what happens to the FS if the allocs
>   in the intent log code are sleeping waiting for memory

How are you going to guarantee that there is *always* memory available?

I think that's barking up the wrong tree.  I think that a proper solution is
not trying to find a way which prevents memory from running out but
rather a way of dealing with the case of it running out.

If KMEM_SLEEP is used in a path where it is causing problems, then no
amount of freelists is going to solve that.  There needs to be a solution
which does not sleep.

>   - getting memory from a "cache" of ones own size/type
> is orders of magnitude higher than just getting some
> off one's own freelist,

Actually, that's not true; Bonwick's allocator is *FASTER* by a *wide*
margin than your own freelist.

Believe me, I've measured this, I've seen "my own freelist" collapse
on the floor when confronted with as few as two CPUs.

As a minimum, you will need *per CPU* free lists.

And that's precisely what the kernel memory allocator gives you.

>   In the time when memory was expensive, maybe a global
>   sharing mechanisms would make sense, but when  the amount
>   of memory is somewhat plentiful and cheap,

Not if all bits of the system are going to keep their own freelists * #CPUs.

Then you are suddenly faced with a *MUCH* higher memory demand.  The
Bonwick allocator does keep quite a bit cached and keeps more memory
unavailable already.

>   *** It then makes sense for a 2 stage implementation of
>   preallocation of a working set and then normal allocation
>   with the added latency. 

But the normal Bonwick allocation *is* two-stage; you are proposing to
add a 3rd stage.

>   So, it makes sense to pre-allocate a working set of allocs
>   by a single alloc call, break up the alloc into needed sizes,
>   and then alloc from your own free list,

That's what the Bonwick allocator does; so why are you duplicating this?

Apart from the questionable performance gain (I believe there to be none),
the loss of the kernel memory allocator debugging functionality is severe:

- you can no longer track where the individual blocks are allocated
- you can no longer track buffer overruns
- buffers run into one another, so one overrun buffer corrupts another
  without trace

>   -> if that freelist then empties, maybe then take the extra
>   overhead with the kmem call. Consider this a expected cost to exceed
>   a certain watermark.

This is exactly how the magazine layer works.

>   But otherwise, I bet if I give you some code for the pre-alloc, I bet
>10 
>   allocs from the freelist can be done versus the kmem_alloc call, and
>   at least 100 to 10k allocs if sleep occurs on your side.

I hope you're not designing this with a single lock per queue.

I have eradicated code in Solaris 9 which looked like this:

struct au_buff *
au_get_buff(void)
{
au_buff_t *buffer = NULL;

mutex_enter(&au_free_queue_lock);

if (au_free_queue == NULL) {
if (au_get_chunk(1)) {
mutex_exit(&au_free_queue_lock);
return (NULL);
}
}

buffer = au_free_queue;
au_free_queue = au_free_queue->next_buf;
mutex_exit(&au_free_queue_lock);
buffer->next_buf = NULL;
return (buffer);
}

(with a corresponding free routine which never returned memory to the
system but kept it in the freelist)

This was replaced with essentially:

buffer = kmem_cache_alloc(au_buf_cache, KM_SLEEP);

The first bit of code stopped scaling at 1 CPU (the performance
with two CPUs was slightly worse than with one CPU)

The second bit of code was both FASTER in the single CPU case and
scaled to the twelve CPUs I had for testing.

>   Actually, I think it is so bad, that why don't you time 1 kmem_free
>   versus grabbing elements off the freelist,

I did, it's horrendous.

Don't forget that the typical case, when the magazine layer is properly
size after the system has been running for a while, no locks need to be
grabbed to get memory as the magazines are per-CPU.

But with your single freelist, you must grab a lock.  Somewhere in
the grab/release lock cycle there's at least one atomic operation
and memory barrier.

Those are perhaps cheap on single CPU systems but run in the hundreds
of

Re: [zfs-discuss] single memory allocation in the ZFS intent log

2006-10-04 Thread Frank Hofmann

On Wed, 4 Oct 2006, Erblichs wrote:


Casper Dik,

Yes, I am familiar with Bonwick's slab allocators and tried
it for wirespeed test of 64byte pieces for a 1Gb and then
100Mb Eths and lastly 10Mb Eth. My results were not
encouraging. I assume it has improved over time.

First, let me ask what happens to the FS if the allocs
in the intent log code are sleeping waiting for memory


The same as would happen to the FS with your proposed additional allocator 
layer in if that "freelist" of yours runs out - it'll wait, you'll see a 
latency bubble.


You seem to think it's likely that a kmem_alloc(..., KM_SLEEP) will sleep. 
It's not. Anything but. See below.




IMO, The general problem with memory allocators is:

- getting memory from a "cache" of ones own size/type
  is orders of magnitude higher than just getting some
  off one's own freelist,


This is why the kernel memory allocator in Solaris has two such freelists:

- the per-CPU kmem magazines (you say below 'one step at a time',
  but that step is already done in Solaris kemem)
- the slab cache



- their is a built in latency to recouperate/steal memory
  from other processes,


Stealing ("reclaim" in Solaris kmem terms) happens if the following three 
conditions are true:


- nothing in the per-CPU magazines
- nothing in the slab cache
- nothing in the quantum caches
- on the attempt to grow the quantum cache, the request to the
  vmem backend finds no readily-available heap to satisfy the
  growth demand immediately



- this stealing forces a sleep and context switches,

- the amount of time to sleep is undeterminate with a single
  call per struct. How long can you sleep for? 100ms or
  250ms or more..

- no process can guarantee a working set,


Yes and no. If your working set is small, use the stack.



In the time when memory was expensive, maybe a global
sharing mechanisms would make sense, but when  the amount
of memory is somewhat plentiful and cheap,

*** It then makes sense for a 2 stage implementation of
preallocation of a working set and then normal allocation
with the added latency.

So, it makes sense to pre-allocate a working set of allocs
by a single alloc call, break up the alloc into needed sizes,
and then alloc from your own free list,


See above - all of that _IS_ already done in Solaris kmem/vmem, with more 
parallelism and more intermediate caching layers designed to bring down 
allocation latency than your simple freelist approach would achieve.




-> if that freelist then empties, maybe then take the extra
overhead with the kmem call. Consider this a expected cost to exceed
a certain watermark.

But otherwise, I bet if I give you some code for the pre-alloc, I bet
10
allocs from the freelist can be done versus the kmem_alloc call, and
at least 100 to 10k allocs if sleep occurs on your side.


The same statistics can be made for Solaris kmem - you satisfy the request 
from the per-CPU magazine, you satisfy the request from the slab cache, 
you satisfy the request via immediate vmem backend allocation and a growth 
of the slab cache. All of these with increased latency but without 
sleeping. Sleeping only comes in if you're so tight on memory that you 
need to perform coalescing in the backend, and purge least-recently-used 
things from other kmem caches in favour of new backend requests. Just 
because you chose to say kmem_alloc(...,KM_SLEEP) doesn't mean you _will_ 
sleep. Normally you won't.




Actually, I think it is so bad, that why don't you time 1 kmem_free
versus grabbing elements off the freelist,

However, don't trust me, I will drop a snapshot of the code to you
tomarrow if you want and you make a single CPU benchmark comparison.

Your multiple CPU issue, forces me to ask, is it a common
occurance that 2 are more CPUs are simultaneouly requesting
memory for the intent log? If it is, then their should be a
freelist of a low watermark set of elements per CPU. However,
one thing at a time..


Of course it's common - have two or more threads do filesystem I/O at the 
same time and you're already there. Which is why, one thing at a time, 
Solaris kmem had the magazine layer for, I think (predates my time at 
Sun), around 12 years now, to get SMP scalability. Been there done that ...




So, do you want that code? It will be a single alloc of X units
and then place them on a freelist. You then time it takes to
remove Y elements from the freelist versus 1 kmem_alloc with
a NO_SLEEP arg and report the numbers. Then I would suggest the
call with the smallest sleep possible. 

[zfs-discuss] Re: panic during recv

2006-10-04 Thread Gary Mitchell
Hi,

Yes I have a lot of trouble with zfs send .. zfs recv too.
(sol 10 6/06, SUNWzfsu 11.10.0,REV=2006.05.18.02.15).  All too often there is 
panic of the host doing zfs recv.
When this happens for a certain snapshot combination ie zfs send -i snapA snapB 
then it *always* happens for that combination.  In my experience about 1 
combination in 30 leads to a crash.  This might not seem very frequent but i'm 
using zfs to try to keep in sync a server with an on-line backup host with a 
dozen or so filesystems every 2 hours.. and inevitably i get a crash every day. 
 These panics are very inconvenient and given that some combinations of 
snapshots never work I have to rollback a step or two on the backup server and 
then try another snapshot combination to move forward again. Tedious! 

My core dumps look a bit different though (but always in same ...)

# echo '$C' | mdb 5
02a1011bc8d1 bcopy+0x1564(fcffead61c00, 3001529e400, 0, 140, 2, 7751e)
02a1011bcad1 dbuf_dirty+0x100(30015299a40, 3000e400420, ,
300152a0638, 300152a05f0, 3)
02a1011bcb81 dnode_reallocate+0x150(108, 13, 300152a0598, 108, 0,
3000e400420)
02a1011bcc31 dmu_object_reclaim+0x80(0, 0, 13, 200, 11, 7bb7a400)
02a1011bccf1 restore_object+0x1b8(2a1011bd710, 30009834a70, 2a1011bd6c8, 11
, 3000e400420, 200)
02a1011bcdb1 dmu_recvbackup+0x608(300014fca00, 300014fccd8, 300014fcb30,
3000f492f18, 1, 0)
02a1011bcf71 zfs_ioc_recvbackup+0x38(300014fc000, 0, 0, 0, 9, 0)
02a1011bd021 zfsdev_ioctl+0x160(70362c00, 5d, ffbfeeb0, 1f, 7c, e68)
02a1011bd0d1 fop_ioctl+0x20(3000b61d540, 5a1f, ffbfeeb0, 13, 
3000aa3d4d8, 11f86c8)
02a1011bd191 ioctl+0x184(4, 3000a0a4978, ffbfeeb0, ff38db68, 40350, 5a1f)
02a1011bd2e1 syscall_trap32+0xcc(4, 5a1f, ffbfeeb0, ff38db68, 40350,
ff2eb3dc)


Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions of ZFS mount point and import Messages

2006-10-04 Thread Eric Schrock
On Tue, Oct 03, 2006 at 11:22:24PM -0700, Tejung Ted Chiang wrote:
> Hi Experts,
> 
> I got two questions below.
> 
> 1. [b]Is there any mechanism to protect the zfs mount point from being
> renamed via command mv?[/b] Now I can use "mv" to rename the mount
> point which has zfs filesystem currently mounted. Of course solaris
> will find no mnt-pt to mount the zfs filesystem. If I proceed any
> properties change and do ZFS destroy, I'll have some uneasily recover
> error. So is there any way to protect the mnt-pt from undesired
> manipulate?

No, there is no way to prevent this.

> 2. [b]How do we know the zpool by which systems is currently used?[/b]
> "zpool import" command does not tell us the system id information. It
> is hard to tell the dependencies of pools and the systems in a SAN
> environment. And do not know which pools are exactly exported.

This has bbeen discussed at length recently on this list.  The
underlying RFE is:

6282725 hostname/hostid should be stored in the label

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Overview (rollup) of recent activity on zfs-discuss

2006-10-04 Thread Eric Boutilier

For background on what this is, see:

http://www.opensolaris.org/jive/message.jspa?messageID=24416#24416
http://www.opensolaris.org/jive/message.jspa?messageID=25200#25200

=
zfs-discuss 09/16 - 09/30
=

Size of all threads during period:

Thread size Topic
--- -
 23   ZFS and HDS ShadowImage
 16   mkdir == zfs create
 15   Proposal: multiple copies of user data
 13   problem ZFS / NFS from FreeBSD nfsv3 client -- periodic NFS 
server not resp
 13   low disk performance
 13   Metaslab alignment on RAID-Z
 12   Newbie in ZFS
 11   jbod questions
  9   slow reads question...
  9   live upgrade incompability
  8   drbd using zfs send/receive?
  7   destroy pool by id?
  6   zpool always thinks it's mounted on another system
  6   zfs clones
  6   Veritas NetBackup Support for ZFS
  6   Possible file corruption on a ZFS mirror
  6   Building large home file server with SATA
  6   Automounting ? (idea ?)
  5   zpool wrongly recognizes disk size
  5   how do I find out if I am on a zfs filesystem
  5   Snapshotting a pool ?
  5   Info on OLTP Perf
  5   Fastest way to send 100gb ( with ZFS send )
  5   Disk Layout for New Storage Server
  4   please remove my ignorance of raiding and mirroring
  3   zfs gets confused with multiple faults involving hot spares
  3   tracking error to file
  3   slow zpool create ( and format )
  3   panic during recv
  3   is there any way to merge pools and zfs file systems together?
  3   ZFS vs. Apple XRaid
  3   ZFS layout on hardware RAID-5?
  3   Some questions about how to organize ZFS-based filestorage
  3   Question: created non global zone with ZFS underneath the root 
filesystem
  3   Importing ZFS filesystems across architectures...
  3   I'm dancin' in the streets
  2   ztune
  2   zpool iostat
  2   zpool df mypool
  2   zfs scrub question
  2   moving fs from a dead box
  2   mounting during boot
  2   incorrect link for dmu_tx.c in ZFS Source Code tour
  2   [Fwd: Queston: after installing SunMC 3.6.1 ability to view the 
ZFS gui has disappeared]
  2   ZFS imported simultanously on 2 systems...
  2   Question: vxvm/dmp or zfs/mpxio
  2   How to make an extended LUN size known to ZFS and Solaris
  2   Filesystem structure
  2   Customer problem with zfs
  2   Comments on a ZFS multiple use of a pool, RFE.
  2   Bizzare problem with ZFS filesystem
  1   versioning with zfs like clearcase is this possible?
  1   reslivering, how long will it take?
  1   no automatic clearing of "zoned" eh?
  1   mirror issues
  1   create ZFS pool(s)/volume(s) during jumpstart
  1   [Fwd: RESEND: [Fwd: Queston: after installing SunMC 3.6.1 ability 
to view the ZFS gui has disappeared]]
  1   ZFS Available Space
  1   ZFS 'quot' command
  1   Recommendations for ZFS and databases
  1   Re[2]: System hang caused by a "bad" snapshot
  1   Pool shrinking
  1   Physical Clone of zpool
  1   Permissions on snapshot directories
  1   Overview (rollup) of recent activity on zfs-discuss
  1   Good PCI controllers for Nevada?
  1   Comments on a ZFS multiple use of a pool,


Posting activity by person for period:

# of posts  By
--   --
 18   richard.elling at sun.com (richard elling - pae)
 17   eric.schrock at sun.com (eric schrock)
 12   torrey.mcmahon at sun.com (torrey mcmahon)
 11   ginoruopolo at hotmail.com (gino ruopolo)
  8   roch.bourbonnais at sun.com (roch)
  7   rmilkowski at task.gda.pl (robert milkowski)
  7   patrick at xsinet.co.za (patrick)
  7   matthew.ahrens at sun.com (matthew ahrens)
  7   chad at shire.net (chad leigh -- shire.net llc)
  6   krzys at perfekt.net (krzys)
  6   fcusack at fcusack.com (frank cusack)
  5   milek at task.gda.pl (robert milkowski)
  5   dd-b at dd-b.net (david dyer-bennet)
  5   clayk at acu.edu (keith clay)
  5   anantha.srirama at cdc.hhs.gov (anantha n. srirama)
  4   weeyeh at gmail.com (wee yeh tan)
  4   san2rini at fastwebnet.it (alf)
  4   rincebrain at gmail.com (rich)
  4   nicolas.williams at sun.com (nicolas williams)
  4   neelakanth.nadgir at sun.com (neelakanth nadgir)
  4   mike.kupfer at sun.com (mike kupfer)
  4   jhm at 

Re: [zfs-discuss] Re: panic during recv

2006-10-04 Thread Mark Maybee

Hey Gary,

Can we get access to your core files?

-Mark

Gary Mitchell wrote:

Hi,

Yes I have a lot of trouble with zfs send .. zfs recv too.
(sol 10 6/06, SUNWzfsu 11.10.0,REV=2006.05.18.02.15).  All too often there is 
panic of the host doing zfs recv.
When this happens for a certain snapshot combination ie zfs send -i snapA snapB then it *always* happens for that combination.  In my experience about 1 combination in 30 leads to a crash.  This might not seem very frequent but i'm using zfs to try to keep in sync a server with an on-line backup host with a dozen or so filesystems every 2 hours.. and inevitably i get a crash every day.  These panics are very inconvenient and given that some combinations of snapshots never work I have to rollback a step or two on the backup server and then try another snapshot combination to move forward again. Tedious! 


My core dumps look a bit different though (but always in same ...)

# echo '$C' | mdb 5
02a1011bc8d1 bcopy+0x1564(fcffead61c00, 3001529e400, 0, 140, 2, 7751e)
02a1011bcad1 dbuf_dirty+0x100(30015299a40, 3000e400420, ,
300152a0638, 300152a05f0, 3)
02a1011bcb81 dnode_reallocate+0x150(108, 13, 300152a0598, 108, 0,
3000e400420)
02a1011bcc31 dmu_object_reclaim+0x80(0, 0, 13, 200, 11, 7bb7a400)
02a1011bccf1 restore_object+0x1b8(2a1011bd710, 30009834a70, 2a1011bd6c8, 11
, 3000e400420, 200)
02a1011bcdb1 dmu_recvbackup+0x608(300014fca00, 300014fccd8, 300014fcb30,
3000f492f18, 1, 0)
02a1011bcf71 zfs_ioc_recvbackup+0x38(300014fc000, 0, 0, 0, 9, 0)
02a1011bd021 zfsdev_ioctl+0x160(70362c00, 5d, ffbfeeb0, 1f, 7c, e68)
02a1011bd0d1 fop_ioctl+0x20(3000b61d540, 5a1f, ffbfeeb0, 13, 
3000aa3d4d8, 11f86c8)
02a1011bd191 ioctl+0x184(4, 3000a0a4978, ffbfeeb0, ff38db68, 40350, 5a1f)
02a1011bd2e1 syscall_trap32+0xcc(4, 5a1f, ffbfeeb0, ff38db68, 40350,
ff2eb3dc)


Gary
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: panic during recv

2006-10-04 Thread Matthew Ahrens

Gary Mitchell wrote:

Hi,

Yes I have a lot of trouble with zfs send .. zfs recv too. (sol 10
6/06, SUNWzfsu 11.10.0,REV=2006.05.18.02.15).  All too often there is
panic of the host doing zfs recv.


This is certainly a bug!  Can you point us to the crash dumps?  Also it 
might be helpful to have the actual 'zfs send' streams so we can 
reproduce the panic, if you can send Sun your data.  If 'zfs send -i A B 
| zfs recv ...' causes a panic, we would need the output of 'zfs send A' 
and 'zfs send -i A B'.


If you don't have a server handy, you can upload to 
ftp://sunsolve.sun.com/cores and let me know the location.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] q: zfs on das

2006-10-04 Thread ozan s. yigit

actually zfs going over vtrak 200i promise array with 12x250g
as scsi das. i have each disk on its own volume; anyone have had any
experience running zfs on top of such a setup? any links and/or notes
on similar setup esp. performance & reliability would be helpful.

[if noone has done this, i will be glad to share my notes in a few
weeks.]

oz
--
ozan s. yigit | [EMAIL PROTECTED] | 416 977 1414 x 1540
an open mind is no substitute for hard work -- nelson goodman
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS patches for S10 6/06

2006-10-04 Thread Andreas Sterbenz

Hi,

I am about to create a mirrored pool on an amd64 machine running S10 6/06
(no other patches). I plan to install the latest kernel patch (118855).

Are there any ZFS patches already out that I should also install first?

(No, I don't want to move to Nevada, but I will upgrade to S10 11/06 as
soon as it is out)

Andreas.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: [request-sponsor] request sponsor for #4890717

2006-10-04 Thread Darren J Moffat

Jeremy Teo wrote:

Hello,

request sponsor for #4890717 want append-only files.

I have a working prototype where the administrator can put a zfs fs
into "append only" mode by setting the zfs "appendonly" property to
"on" using zfs(1M).

"append only" mode in this case means

1. Applications can only append to any existing files, but cannot
truncate files by creating a new file with the same filename an
existing file, or by writing in a file at an offset other than the end
of the file. (Applications can still create new files)

2. Applications cannot remove existing files/directories.

3. Applications cannot rename/move existing files/directories.

Thanks! I hope this is still wanted. :)



How does this interact with the a append_only ACL that ZFS supports ?

How does this property work in the face of inheritance.

How does this property work in the the user delegation environment ?

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions of ZFS mount point and import Messages

2006-10-04 Thread Wee Yeh Tan

On 10/5/06, Eric Schrock <[EMAIL PROTECTED]> wrote:

On Tue, Oct 03, 2006 at 11:22:24PM -0700, Tejung Ted Chiang wrote:
> 2. [b]How do we know the zpool by which systems is currently used?[/b]
> "zpool import" command does not tell us the system id information. It
> is hard to tell the dependencies of pools and the systems in a SAN
> environment. And do not know which pools are exactly exported.

This has bbeen discussed at length recently on this list.  The
underlying RFE is:

6282725 hostname/hostid should be stored in the label


We are starting to build up our SAN and ZFS will be a heavy player
here.  For now, what we intend to do is to tag the pool with hostname
on import.  It's still suscepticle to human error so 6282725 is going
to be really helpful.


--
Just me,
Wire ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: [request-sponsor] request sponsor for #4890717

2006-10-04 Thread Boyd Adamson

On 05/10/2006, at 8:10 AM, Darren J Moffat wrote:

Jeremy Teo wrote:

Hello,
request sponsor for #4890717 want append-only files.
I have a working prototype where the administrator can put a zfs fs
into "append only" mode by setting the zfs "appendonly" property to
"on" using zfs(1M).
"append only" mode in this case means
1. Applications can only append to any existing files, but cannot
truncate files by creating a new file with the same filename an
existing file, or by writing in a file at an offset other than the  
end

of the file. (Applications can still create new files)
2. Applications cannot remove existing files/directories.
3. Applications cannot rename/move existing files/directories.
Thanks! I hope this is still wanted. :)


How does this interact with the a append_only ACL that ZFS supports ?

How does this property work in the face of inheritance.

How does this property work in the the user delegation environment ?


I was wondering the same thing. Personally, I'd rather see the  
append_only ACL work than a whole new fs property.


Last time I looked there was some problem with append_only, but I  
can't remember what it was.


Boyd

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: [request-sponsor] request sponsor for #4890717

2006-10-04 Thread Mark Shellenbaum

Boyd Adamson wrote:

On 05/10/2006, at 8:10 AM, Darren J Moffat wrote:

Jeremy Teo wrote:

Hello,
request sponsor for #4890717 want append-only files.
I have a working prototype where the administrator can put a zfs fs
into "append only" mode by setting the zfs "appendonly" property to
"on" using zfs(1M).
"append only" mode in this case means
1. Applications can only append to any existing files, but cannot
truncate files by creating a new file with the same filename an
existing file, or by writing in a file at an offset other than the end
of the file. (Applications can still create new files)
2. Applications cannot remove existing files/directories.
3. Applications cannot rename/move existing files/directories.
Thanks! I hope this is still wanted. :)


How does this interact with the a append_only ACL that ZFS supports ?

How does this property work in the face of inheritance.

How does this property work in the the user delegation environment ?


I was wondering the same thing. Personally, I'd rather see the 
append_only ACL work than a whole new fs property.


Last time I looked there was some problem with append_only, but I can't 
remember what it was.




The basic problem at the moment with append_only via ACLs is the following:

We have a problem with the NFS server, where there is no notion of 
O_APPEND.  An open operation over NFS does not convey whether the client 
wishes to append or do a general write; only at the time of a write 
operation can the server see whether the client is appending. 
Therefore, a process could receive an error, e.g. ERANGE, EOVERFLOW, or 
ENOSPC, upon issuing an attempted write() somewhere other than at EOF. 
This adds unwanted overhead in the write path.


I recently created a prototype that adds support for append only files 
in local ZFS file systems via ACLs.  However, NFS clients will receive 
EACCES when attempting to open append only files.



   -Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: [request-sponsor] request sponsor for #4890717

2006-10-04 Thread Boyd Adamson

On 05/10/2006, at 11:28 AM, Mark Shellenbaum wrote:

Boyd Adamson wrote:

On 05/10/2006, at 8:10 AM, Darren J Moffat wrote:

Jeremy Teo wrote:

Hello,
request sponsor for #4890717 want append-only files.
I have a working prototype where the administrator can put a zfs fs
into "append only" mode by setting the zfs "appendonly" property to
"on" using zfs(1M).
"append only" mode in this case means
1. Applications can only append to any existing files, but cannot
truncate files by creating a new file with the same filename an
existing file, or by writing in a file at an offset other than  
the end

of the file. (Applications can still create new files)
2. Applications cannot remove existing files/directories.
3. Applications cannot rename/move existing files/directories.
Thanks! I hope this is still wanted. :)


How does this interact with the a append_only ACL that ZFS  
supports ?


How does this property work in the face of inheritance.

How does this property work in the the user delegation environment ?
I was wondering the same thing. Personally, I'd rather see the  
append_only ACL work than a whole new fs property.
Last time I looked there was some problem with append_only, but I  
can't remember what it was.


The basic problem at the moment with append_only via ACLs is the  
following:


We have a problem with the NFS server, where there is no notion of  
O_APPEND.  An open operation over NFS does not convey whether the  
client wishes to append or do a general write; only at the time of  
a write operation can the server see whether the client is  
appending. Therefore, a process could receive an error, e.g.  
ERANGE, EOVERFLOW, or ENOSPC, upon issuing an attempted write()  
somewhere other than at EOF. This adds unwanted overhead in the  
write path.


I recently created a prototype that adds support for append only  
files in local ZFS file systems via ACLs.  However, NFS clients  
will receive EACCES when attempting to open append only files.


Ah, that's right... it was NFS over ZFS. Am I the only person who  
sees it as odd that an ACL feature derived from NFSv4 is, in fact,  
not implemented in NFSv4?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS patches for S10 6/06

2006-10-04 Thread George Wilson

Andreas,

The first ZFS patch will be released in the upcoming weeks. For now, the 
latest available bits are the ones from s10 6/06.


Thanks,
George

Andreas Sterbenz wrote:

Hi,

I am about to create a mirrored pool on an amd64 machine running S10 6/06
(no other patches). I plan to install the latest kernel patch (118855).

Are there any ZFS patches already out that I should also install first?

(No, I don't want to move to Nevada, but I will upgrade to S10 11/06 as
soon as it is out)

Andreas.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss