Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Roch Bourbonnais - Performance Engineering
From: Gregory Shaw <[EMAIL PROTECTED]> Sender: [EMAIL PROTECTED] To: Mike Gerdts <[EMAIL PROTECTED]> Cc: ZFS filesystem discussion list , [EMAIL PROTECTED] Subject: Re: [zfs-discuss] ZFS and databases Date: Thu, 11 May 2006 13:15:48 -0600 Regarding directio and quickio, is there

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Jeff Bonwick
> Are you saying that copy-on-write doesn't apply for mmap changes, but > only file re-writes? I don't think that gels with anything else I > know about ZFS. No, you're correct -- everything is copy-on-write. Jeff ___ zfs-discuss mailing list zfs-d

Re: [zfs-discuss] ZFS RAM requirements?

2006-05-11 Thread Mike Gerdts
On 5/11/06, Roch Bourbonnais - Performance Engineering <[EMAIL PROTECTED]> wrote: Certainly something we'll have to tackle. How about a zpool memstat (or zpool -m iostat) variation that would report at least freemem and the amount evictable cached data ? Would that work for you ? -r Suppose

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Boyd Adamson
On 12/05/2006, at 3:59 AM, Richard Elling wrote: On Thu, 2006-05-11 at 10:27 -0700, Richard Elling wrote: On Thu, 2006-05-11 at 10:31 -0600, Gregory Shaw wrote: A couple of points/additions with regard to oracle in particular: When talking about large database installations, copy-on-wr

Re: [zfs-discuss] remote replication with huge data using zfs?

2006-05-11 Thread Jeff Bonwick
> plan A. To mirror on iSCSI devices: > keep one server with a set of zfs file systems > with 2 (sub)mirrors each, one of the mirrors use > devices physically on remote site accessed as > iSCSI LUNs. > > How does ZFS handle remote replication? > If

Re: [zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

2006-05-11 Thread Spencer Shepler
So, there is a set of us that have taken this discussion off list while we collect/share more data. Once we get a little further in diagnosis, we will summarize and bring the discussion back to the alias. Spencer On Thu, Joe Little wrote: > well, here's my first pass result: > > [EMAIL PROTECT

[zfs-discuss] remote replication with huge data using zfs?

2006-05-11 Thread Max Holm
Hi, We are to archive huge amount, say, 100TB, of data/images and keep a replicate at a remote site. I thought ZFS will be a good choice. Can someone comment and advice if it's practical: plan A. To mirror on iSCSI devices: keep one server with a set of zfs file systems with 2

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Torrey McMahon
This thread is useless without data. This thread is useless without data. This thread is useless without data. This thread is useless without data. This thread is useless without data. :-P ___ zfs-discuss mailing list zfs-discuss@op

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Gregory Shaw
Regarding directio and quickio, is there a way with ZFS to skip the system buffer cache? I've seen big benefits for using directio when the data files have been segregated from the log files. Having the system compete with the DB for read-ahead results in double work. On May 10, 2006, at

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Tao Chen
On 5/11/06, Peter Rival <[EMAIL PROTECTED]> wrote: Richard Elling wrote: > Oracle will zero-fill the tablespace with 128kByte iops -- it is not > sparse. I've got a scar. Has this changed in the past few years? Multiple parallel tablespace creates is usually a big pain point for filesystem /

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Peter Rival
Richard Elling wrote: On Thu, 2006-05-11 at 10:27 -0700, Richard Elling wrote: On Thu, 2006-05-11 at 10:31 -0600, Gregory Shaw wrote: A couple of points/additions with regard to oracle in particular: When talking about large database installations, copy-on-write may or may not apply. The

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Richard Elling
On Thu, 2006-05-11 at 10:27 -0700, Richard Elling wrote: > On Thu, 2006-05-11 at 10:31 -0600, Gregory Shaw wrote: > > A couple of points/additions with regard to oracle in particular: > > > > When talking about large database installations, copy-on-write may > > or may not apply. The files

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Peter Rival
Richard Elling wrote: On Thu, 2006-05-11 at 10:31 -0600, Gregory Shaw wrote: A couple of points/additions with regard to oracle in particular: When talking about large database installations, copy-on-write may or may not apply. The files are never completely rewritten, only changed inter

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Richard Elling
On Thu, 2006-05-11 at 10:31 -0600, Gregory Shaw wrote: > A couple of points/additions with regard to oracle in particular: > > When talking about large database installations, copy-on-write may > or may not apply. The files are never completely rewritten, only > changed internally via

Re: [zfs-discuss] fwd: ZFS Clone Promotion [PSARC/2006/303 Timeout: 05/12/2006]

2006-05-11 Thread George Wilson
Darren J Moffat wrote: How would you apply these diffs ? How do you select which files to apply and which not to ? For example you want the log files to be "merged" some how but you certainly don't want the binaries to be merged. This would have to be a decision by the user when the sync tak

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Gregory Shaw
A couple of points/additions with regard to oracle in particular: When talking about large database installations, copy-on-write may or may not apply. The files are never completely rewritten, only changed internally via mmap(). When you lay down your database, you will generally alloca

Re: [zfs-discuss] fwd: ZFS Clone Promotion [PSARC/2006/303 Timeout: 05/12/2006]

2006-05-11 Thread Nicolas Williams
On Thu, May 11, 2006 at 11:15:12AM -0400, Bill Sommerfeld wrote: > This situation is analogous to the "merge with common ancestor" > operations performed on source code by most SCM systems; with a named > snapshot as the clone base, the ancestor is preserved and can easily be > retrieved. Yes, and

Re: [zfs-discuss] fwd: ZFS Clone Promotion [PSARC/2006/303 Timeout: 05/12/2006]

2006-05-11 Thread Bill Sommerfeld
On Thu, 2006-05-11 at 10:38, Darren J Moffat wrote: > George Wilson wrote: > > This would be comparable to what live upgrade does with its sync option. > > With lu, certain files get synced to the newly activated BE just prior > > to booting it up. (see /etc/lu/synclist) > > even in that file th

ZFS diffs (Re: [zfs-discuss] fwd: ZFS Clone Promotion [PSARC/2006/303 Timeout: 05/12/2006])

2006-05-11 Thread Nicolas Williams
On Thu, May 11, 2006 at 03:38:59PM +0100, Darren J Moffat wrote: > What would the output of zfs diffs be ? My original conception was: - dnode # + changed blocks - + some naming hints so that one could quickly find changed dnodes in clones I talked about this with Bill Moore and he came up

Re: [zfs-discuss] fwd: ZFS Clone Promotion [PSARC/2006/303 Timeout: 05/12/2006]

2006-05-11 Thread Nicolas Williams
6370738 zfs diffs filesystems ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] fwd: ZFS Clone Promotion [PSARC/2006/303 Timeout: 05/12/2006]

2006-05-11 Thread Darren J Moffat
George Wilson wrote: This would be comparable to what live upgrade does with its sync option. With lu, certain files get synced to the newly activated BE just prior to booting it up. (see /etc/lu/synclist) even in that file there are three different policies: OVERWRITE, APPEND, PREPEND. Note

Re: [zfs-discuss] fwd: ZFS Clone Promotion [PSARC/2006/303 Timeout: 05/12/2006]

2006-05-11 Thread George Wilson
This would be comparable to what live upgrade does with its sync option. With lu, certain files get synced to the newly activated BE just prior to booting it up. (see /etc/lu/synclist) Let's take a filesystem which contains both static application data as well as constantly changing files such

Re: [zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

2006-05-11 Thread Joe Little
well, here's my first pass result: [EMAIL PROTECTED] loges1]# time tar xf /root/linux-2.2.26.tar real114m6.662s user0m0.049s sys 0m1.354s On 5/11/06, Roch Bourbonnais - Performance Engineering <[EMAIL PROTECTED]> wrote: Joe Little writes: > How did you get the average time for a

RE: [zfs-discuss] ZFS and databases

2006-05-11 Thread Gehr, Chuck R
Absolutely, I have done hot spot tests using a Poisson random distribution. With that pattern (where there are many cache hits), the writes are 3-10 times faster than sequential speed. My comment was regarding purely random i/o across a large (at least much larger than available memory cache) are

Re: [zfs-discuss] ZFS RAM requirements?

2006-05-11 Thread Roch Bourbonnais - Performance Engineering
I think there are 2 potential issues here. The ZFS cache or ARC manages memory for all pools on a system but the data is not really organized per pool. So on a pool export we don't free up buffers associated with that pool. The memory is actually returned to the system either when press

Re: [zfs-discuss] fwd: ZFS Clone Promotion [PSARC/2006/303 Timeout: 05/12/2006]

2006-05-11 Thread Darren J Moffat
George Wilson wrote: Matt, This is really cool! One thing that I can think of that would be nice to have is the ability to 'promote' and 'sync'. In other words, just prior to promoting the clone, bring any files that are newer on the original parent up-to-date on the clone. I suspect you coul

Re: [zfs-discuss] ZFS RAM requirements?

2006-05-11 Thread Roch Bourbonnais - Performance Engineering
Certainly something we'll have to tackle. How about a zpool memstat (or zpool -m iostat) variation that would report at least freemem and the amount evictable cached data ? Would that work for you ? -r Philip Beevers writes: > Roch Bourbonnais - Performance Engineering wrote: > > >Reported

Re: [zfs-discuss] fwd: ZFS Clone Promotion [PSARC/2006/303 Timeout: 05/12/2006]

2006-05-11 Thread George Wilson
Matt, This is really cool! One thing that I can think of that would be nice to have is the ability to 'promote' and 'sync'. In other words, just prior to promoting the clone, bring any files that are newer on the original parent up-to-date on the clone. I suspect you could utilize zfs diffs (

[zfs-discuss] The 12.5% compression rule

2006-05-11 Thread Darren J Moffat
Where does the 12.5% compression rule in zio_compress_data() come from ? Given that this is in the generic function for all compression algorithms rather than in the implementation of lzjb I wonder where the number comes from ? Just curious. -- Darren J Moffat

Re: [zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

2006-05-11 Thread Roch Bourbonnais - Performance Engineering
Joe Little writes: > How did you get the average time for async writes? My client (lacking > ptime, its linux) comes in at 50 minutes, not 50 seconds. I'm running > again right now for a more accurate number. I'm untarring from a local > file on the directory to the NFS share. > I used dtra

Re: [zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

2006-05-11 Thread Joe Little
How did you get the average time for async writes? My client (lacking ptime, its linux) comes in at 50 minutes, not 50 seconds. I'm running again right now for a more accurate number. I'm untarring from a local file on the directory to the NFS share. On 5/11/06, Roch Bourbonnais - Performance En

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Roch Bourbonnais - Performance Engineering
- Description of why I don't need directio, quickio, or ODM. The 2 main benefits that cames out of using directio was reducing memory consumption by avoiding the page cache AND bypassing the UFS single writer behavior. ZFS does not have the single writer lock. As for memory, the UFS code

Re: [zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

2006-05-11 Thread Roch Bourbonnais - Performance Engineering
# ptime tar xf linux-2.2.22.tar ptime tar xf linux-2.2.22.tar real 50.292 user1.019 sys11.417 # ptime tar xf linux-2.2.22.tar ptime tar xf linux-2.2.22.tar real 56.833 user1.056 sys11.581 # avg time waiting for async writes is

[zfs-discuss] Re: Trying to replicate ZFS self-heal demo and not seeing fixed error

2006-05-11 Thread Yusuf Goolamabbas
> > bash-3.00# dd if=/dev/urandom of=/dev/dsk/c1t10d0 > bs=1024 count=20480 > > A couple of things: > > (1) When you write to /dev/dsk, rather than > /dev/rdsk, the results > are cached in memory. So the on-disk state may > have been unaltered. That's why I also did a zpool export followed by

RE: [zfs-discuss] ZFS and databases

2006-05-11 Thread Roch Bourbonnais - Performance Engineering
Gehr, Chuck R writes: > One word of caution about random writes. From my experience, they are > not nearly as fast as sequential writes (like 10 to 20 times slower) > unless they are carefully aligned on the same boundary as the file > system record size. Otherwise, there is a heavy read pena