Re: [zfs-discuss] gang blocks at will?

2010-05-26 Thread Jeff Bonwick
You can set metaslab_gang_bang to (say) 8k to force lots of gang block allocations. Jeff On May 25, 2010, at 11:42 PM, Andriy Gapon wrote: > > I am working on improving some ZFS-related bits in FreeBSD boot chain. > At the moment it seems that the things work mostly fine except for a case > w

Re: [zfs-discuss] Apple cans ZFS project

2009-10-24 Thread Jeff Bonwick
> Apple can currently just take the ZFS CDDL code and incorporate it > (like they did with DTrace), but it may be that they wanted a "private > license" from Sun (with appropriate technical support and > indemnification), and the two entities couldn't come to mutually > agreeable terms. I

Re: [zfs-discuss] dedupe is in

2009-11-02 Thread Jeff Bonwick
> Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff ___ zfs-discuss m

Re: [zfs-discuss] heads-up: dedup=fletcher4,verify was broken

2009-11-23 Thread Jeff Bonwick
And, for the record, this is my fault. There is an aspect of endianness that I simply hadn't thought of. When I have a little more time I will blog about the whole thing, because there are many useful lessons here. Thank you, Matt, for all your help with this. And my apologies to everyone else

Re: [zfs-discuss] heads-up: dedup=fletcher4,verify was broken

2009-11-23 Thread Jeff Bonwick
erify) are unaffected. Jeff On Mon, Nov 23, 2009 at 09:44:41PM -0800, Jeff Bonwick wrote: > And, for the record, this is my fault. There is an aspect of endianness > that I simply hadn't thought of. When I have a little more time I will > blog about the whole thing, because

Re: [zfs-discuss] Deduplication - deleting the original

2009-12-08 Thread Jeff Bonwick
> i am no pro in zfs, but to my understanding there is no original. That is correct. From a semantic perspective, there is no change in behavior between dedup=off and dedup=on. Even the accounting remains the same: each reference to a block is charged to the dataset making the reference. The on

Re: [zfs-discuss] Doing ZFS rollback with preserving later created clones/snapshot?

2009-12-11 Thread Jeff Bonwick
Yes, although it's slightly indirect: - make a clone of the snapshot you want to roll back to - promote the clone See 'zfs promote' for details. Jeff On Fri, Dec 11, 2009 at 08:37:04AM +0100, Alexander Skwar wrote: > Hi. > > Is it possible on Solaris 10 5/09, to rollback to a Z

Re: [zfs-discuss] compressratio vs. dedupratio

2009-12-13 Thread Jeff Bonwick
It is by design. The idea is to report the dedup ratio for the data you've actually attempted to dedup. To get a 'diluted' dedup ratio of the sort you describe, just compare the space used by all datasets to the space allocated in the pool. For example, on my desktop, I have a pool called 'build

Re: [zfs-discuss] Pool import with failed ZIL device now possible ?

2010-02-16 Thread Jeff Bonwick
> People used fastfs for years in specific environments (hopefully > understanding the risks), and disabling the ZIL is safer than fastfs. > Seems like it would be a useful ZFS dataset parameter. We agree. There's an open RFE for this: 6280630 zil synchronicity No promise on date, but it will

Re: [zfs-discuss] Dedup - Does "on" imply "sha256?"

2010-08-24 Thread Jeff Bonwick
Correct. Jeff On Aug 24, 2010, at 9:45 PM, Peter Taps wrote: > Folks, > > One of the articles on the net says that the following two commands are > exactly the same: > > # zfs set dedup=on tank > # zfs set dedup=sha256 tank > > Essentially, "on" is just a pseudonym for "sha256" and "verify"

Re: [zfs-discuss] zpool file corruption

2008-09-24 Thread Jeff Bonwick
It's almost certainly the SIL3114 controller. Google "SIL3114 data corruption" -- it's nasty. Jeff On Thu, Sep 25, 2008 at 07:50:01AM +0200, Mikael Karlsson wrote: > I have a strange problem involving changes in large file on a mirrored > zpool in > Open solaris snv96. > We use it at storage in

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Jeff Bonwick
> The circumstances where I have lost data have been when ZFS has not > handled a layer of redundancy. However, I am not terribly optimistic > of the prospects of ZFS on any device that hasn't committed writes > that ZFS thinks are committed. FYI, I'm working on a workaround for broken devices.

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Jeff Bonwick
> Or is there a way to mitigate a checksum error on non-redundant zpool? It's just like the difference between non-parity, parity, and ECC memory. Most filesystems don't have checksums (non-parity), so they don't even know when they're returning corrupt data. ZFS without any replication can detec

Re: [zfs-discuss] questions about replacing a raidz2 vdev disk with a larger one

2008-10-11 Thread Jeff Bonwick
ZFS will allow the replacement. The available size is, however, be determined by the smallest of the lot. Once you've replaced *all* 500GB disks with 1TB disks, the available space will double. One suggestion: replace as many disks as you intend to at the same time, so that ZFS only has to do on

Re: [zfs-discuss] questions about replacing a raidz2 vdev disk with a larger one

2008-10-11 Thread Jeff Bonwick
pulling and old one -- then Eric is right, and in fact I'd go further: in that case, replace only one at a time so you maintain the ability to survive a disk failing while you're going all this. Jeff On Sat, Oct 11, 2008 at 06:37:17PM -0700, Erik Trimble wrote: > Jeff Bonwick wrote: &

Re: [zfs-discuss] Lost Disk Space

2008-11-02 Thread Jeff Bonwick
Are you running this on a live pool? If so, zdb can't get a reliable block count -- and zdb -L [live pool] emits a warning to that effect. Jeff On Thu, Oct 16, 2008 at 03:36:25AM -0700, Ben Rockwood wrote: > I've been struggling to fully understand why disk space seems to vanish. > I've dug t

Re: [zfs-discuss] Fwd: [osol-announce] IMPT: Do not use SXCE Build 102

2008-11-16 Thread Jeff Bonwick
These are the conditions: (1) The bug is specific to the root pool. Other pools are unaffected. (2) It is triggered by doing a 'zpool online' while I/O is in flight. (3) Item (2) can be triggered by syseventd. (4) The bug is new in build 102. Builds 101 and earlier are fine. I believe the follo

Re: [zfs-discuss] "ZFS, Smashing Baby" a fake???

2008-11-25 Thread Jeff Bonwick
I think we (the ZFS team) all generally agree with you. The current nevada code is much better at handling device failures than it was just a few months ago. And there are additional changes that were made for the FishWorks (a.k.a. Amber Road, a.k.a. Sun Storage 7000) product line that will make

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

2008-11-29 Thread Jeff Bonwick
> If you have more comments, or especially if you think I reached the wrong > conclusion, please do post it. I will post my continuing results. I think your conclusions are correct. The main thing you're seeing is the combination of gzip-9 being incredibly CPU-intensive with our I/O pipeline all

Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-12 Thread Jeff Bonwick
> I'm going to pitch in here as devil's advocate and say this is hardly > revolution. 99% of what zfs is attempting to do is something NetApp and > WAFL have been doing for 15 years+. Regardless of the merits of their > patents and prior art, etc., this is not something revolutionarily new. It >

Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-13 Thread Jeff Bonwick
> Off the top of my head nearly all of them. Some of them have artificial > limitations because they learned the hard way that if you give customers > enough rope they'll hang themselves. For instance "unlimited snapshots". Oh, that's precious! It's not an arbitrary limit, it's a safety feafure

Re: [zfs-discuss] zpol mirror creation after non-mirrored zpool is setup

2008-12-13 Thread Jeff Bonwick
On Sat, Dec 13, 2008 at 04:44:10PM -0800, Mark Dornfeld wrote: > I have installed Solaris 10 on a ZFS filesystem that is not mirrored. Since I > have an identical disk in the machine, I'd like to add that disk to the > existing pool as a mirror. Can this be done, and if so, how do I do it? Yes:

Re: [zfs-discuss] Where does set the value to zio->io_offset?

2009-01-24 Thread Jeff Bonwick
Each ZFS block pointer contains up to three DVAs (data virtual addresses), to implement 'ditto blocks' (multiple copies of the data, above and beyond any replication provided by mirroring or RAID-Z). Semantically, ditto blocks are a lot like mirrors, so we actually use the mirror code to read them

Re: [zfs-discuss] ZFS core contributor nominations

2009-02-02 Thread Jeff Bonwick
> I would like to nominate roch.bourbonn...@sun.com for his work on > improving the performance of ZFS over the last few years. Absolutely. Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-di

Re: [zfs-discuss] snapshot identity

2009-02-03 Thread Jeff Bonwick
> The Validated Execution project is investigating how to utilize ZFS > snapshots as the basis of a "validated" filesystem. Given that the > blocks of the dataset form a Merkel tree of hashes, it seemed > straightforward to validate the individual objects in the snapshot and > then sign the hash o

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Jeff Bonwick
> There is no substitute for cord-yank tests - many and often. The > weird part is, the ZFS design team simulated millions of them. > So the full explanation remains to be uncovered? We simulated power failure; we did not simulate disks that simply blow off write ordering. Any disk that you'd e

Re: [zfs-discuss] Does your device honor write barriers?

2009-02-10 Thread Jeff Bonwick
> wellif you want a write barrier, you can issue a flush-cache and > wait for a reply before releasing writes behind the barrier. You will > get what you want by doing this for certain. Not if the disk drive just *ignores* barrier and flush-cache commands and returns success. Some consumer d

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Jeff Bonwick
> I'm rather tired of hearing this mantra. > [...] > Every file system needs a repair utility Hey, wait a minute -- that's a mantra too! I don't think there's actually any substantive disagreement here -- stating that one doesn't need a separate program called /usr/sbin/fsck is not the same as sa

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Jeff Bonwick
> > This is CR 6667683 > > http://bugs.opensolaris.org/view_bug.do?bug_id=6667683 > > I think that would solve 99% of ZFS corruption problems! Based on the reports I've seen to date, I think you're right. > Is there any EDT for this patch? Well, because of this thread, this has gone from "on my

Re: [zfs-discuss] Forensics related ZFS questions

2009-03-15 Thread Jeff Bonwick
>1. Does variable FSB block sizing extend to files larger than record size, >concerning the last FSB allocated? > >In other words, for files larger than 128KB, that utilize more than one >full recordsize FSB, will the LAST FSB allocated be `right-sized' to fit >the remaining da

Re: [zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg

2009-03-29 Thread Jeff Bonwick
I agree with Chris -- I'd much rather do something like: zfs clone snap1 clone1 snap2 clone2 snap3 clone3 ... than introduce a pattern grammar. Supporting multiple snap/clone pairs on the command line allows you to do just about anything atomically. Jeff On Fri, Mar 27, 2009 at 10:46:3

Re: [zfs-discuss] Data size grew.. with compression on

2009-03-30 Thread Jeff Bonwick
Right. Another difference to be aware of is that ZFS reports the total space consumed, including space for metadata -- typically around 1%. Traditional filesystems like ufs and ext2 preallocate metadata and don't count it as using space. I don't know how reiserfs does its bookkeeping, but I would

Re: [zfs-discuss] Data size grew.. with compression on

2009-04-08 Thread Jeff Bonwick
> > Yes, I made note of that in my OP on this thread. But is it enough to > > end up with 8gb of non-compressed files measuring 8gb on > > reiserfs(linux) and the same data showing nearly 9gb when copied to a > > zfs filesystem with compression on. > > whoops.. a hefty exaggeration it only show

Re: [zfs-discuss] Peculiarities of COW over COW?

2009-04-27 Thread Jeff Bonwick
> ZFS blocksize is dynamic, power of 2, with a max size == recordsize. Minor clarification: recordsize is restricted to powers of 2, but blocksize is not -- it can be any multiple of sector size (512 bytes). For small files, this matters: a 37k file is stored in a 37k block. For larger, multi-bloc

Re: [zfs-discuss] Resilver Performance and Behavior

2009-05-03 Thread Jeff Bonwick
> >According to the ZFS documentation, a resilver operation > >includes what is effectively a dirty region log (DRL) so that if the > >resilver is interrupted, by a snapshot or reboot, the resilver can > >continue where it left off. > > That is not the case. The dirty region log keeps tra

Re: [zfs-discuss] Replacing a failed drive

2009-06-19 Thread Jeff Bonwick
Yep, you got it. Jeff On Fri, Jun 19, 2009 at 04:15:41PM -0700, Simon Breden wrote: > Hi, > > I have a ZFS storage pool consisting of a single RAIDZ2 vdev of 6 drives, and > I have a question about replacing a failed drive, should it occur in future. > > If a drive fails in this double-parity

Re: [zfs-discuss] Mobo SATA migration to AOC-SAT2-MV8 SATA card

2009-06-19 Thread Jeff Bonwick
Yep, right again. Jeff On Fri, Jun 19, 2009 at 04:21:42PM -0700, Simon Breden wrote: > Hi, > > I'm using 6 SATA ports from the motherboard but I've now run out of SATA > ports, and so I'm thinking of adding a Supermicro AOC-SAT2-MV8 8-port SATA > controller card. > > What is the procedure for

Re: [zfs-discuss] zfs rewrite?

2007-01-26 Thread Jeff Bonwick
On Fri, Jan 26, 2007 at 10:57:19PM -0800, Frank Cusack wrote: > On January 27, 2007 12:27:17 AM -0200 Toby Thain <[EMAIL PROTECTED]> wrote: > >On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote: > >>3. I created file system with huge amount of data, where most of the > >>data is read-only. I chan

Re: [zfs-discuss] Re: ZFS vs NFS vs array caches, revisited

2007-02-11 Thread Jeff Bonwick
> [b]How the ZFS striped on 7 slices of FC-SATA LUN via NFS worked [u]146 times > faster[/u] than the ZFS on 1 slice of the same LUN via NFS???[/b] Without knowing more I can only guess, but most likely it's a simple matter of working set. Suppose the benchmark in question has a 4G working set,

Re: [zfs-discuss] zfs corruption -- odd inum?

2007-02-11 Thread Jeff Bonwick
The object number is in hex. 21e282 hex is 2220674 decimal -- give that a whirl. This is all better now thanks to some recent work by Eric Kustarz: 6410433 'zpool status -v' would be more useful with filenames This was integrated into Nevada build 57. Jeff On Sat, Feb 10, 2007 at 05:18:05PM -

Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Jeff Bonwick
Toby Thain wrote: I'm no guru, but would not ZFS already require strict ordering for its transactions ... which property Peter was exploiting to get "fbarrier()" for free? Exactly. Even if you disable the intent log, the transactional nature of ZFS ensures preservation of event ordering. Not

Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Jeff Bonwick
Do you agree that their is a major tradeoff of "builds up a wad of transactions in memory"? I don't think so. We trigger a transaction group commit when we have lots of dirty data, or 5 seconds elapse, whichever comes first. In other words, we don't let updates get stale. Jeff

Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Jeff Bonwick
That is interesting. Could this account for disproportionate kernel CPU usage for applications that perform I/O one byte at a time, as compared to other filesystems? (Nevermind that the application shouldn't do that to begin with.) No, this is entirely a matter of CPU efficiency in the current c

Re: [zfs-discuss] Does running redundancy with ZFS use as much disk space as doubling drives?

2007-02-26 Thread Jeff Bonwick
On Mon, Feb 26, 2007 at 01:53:17AM -0800, Tor wrote: > [...] if using redundancy on ZDF The ZFS Document Format? ;-) > uses less disk space as simply getting extra drives and do identical copies, > with periodic CRC checks of the source material to check the health. If you create a 2-disk mirro

Re: [zfs-discuss] Does running redundancy with ZFS use as much disk space as doubling drives?

2007-02-26 Thread Jeff Bonwick
> My plan was to have 8-10 cheap drives, most of them IDE drives from > 120 gig and up to 320 gig. Does that mean that I can get 7-9 drives > with data plus full redundancy from the last drive? It sounds almost > like magic to me to be able to have the data on maybe 1 TB of drives > and have one dr

Re: [zfs-discuss] FAULTED ZFS volume even though it is mirrored

2007-03-01 Thread Jeff Bonwick
> However, I logged in this morning to discover that the ZFS volume could > not be read. In addition, it appears to have marked all drives, mirrors > & the volume itself as 'corrupted'. One possibility: I've seen this happen when a system doesn't shut down cleanly after the last change to the pool

Re: [zfs-discuss] ZFS stalling problem

2007-03-04 Thread Jeff Bonwick
Jesse, This isn't a stall -- it's just the natural rhythm of pushing out transaction groups. ZFS collects work (transactions) until either the transaction group is full (measured in terms of how much memory the system has), or five seconds elapse -- whichever comes first. Your data would seem t

Re: [zfs-discuss] Multiple filesystem costs? Directory sizes?

2007-05-01 Thread Jeff Bonwick
Mario, For the reasons you mentioned, having a few different filesystems (on the order of 5-10, I'd guess) can be handy. Any time you want different behavior for different types of data, multiple filesystems are the way to go. For maximum directory size, it turns out that the practical limits a

Re: [zfs-discuss] Re: zfs reports small st_size for directories?

2007-06-09 Thread Jeff Bonwick
> What was the reason to make ZFS use directory sizes as the number of > entries rather than the way other Unix filesystems use it? In UFS, the st_size is the size of the directory inode as though it were a file. The only reason it's like that is that UFS is sloppy and lets you cat directories --

Re: [zfs-discuss] ZFS raid is very slow???

2007-07-06 Thread Jeff Bonwick
A couple of questions for you: (1) What OS are you running (Solaris, BSD, MacOS X, etc)? (2) What's your config? In particular, are any of the partitions on the same disk? (3) Are you copying a few big files or lots of small ones? (4) Have you measured UFS-to-UFS and ZFS-to-ZFS performance

Re: [zfs-discuss] Mysterious corruption with raidz2 vdev

2007-07-30 Thread Jeff Bonwick
I suspect this is a bug in raidz error reporting. With a mirror, each copy either checksums correctly or it doesn't, so we know which drives gave us bad data. With RAID-Z, we have to infer which drives have damage. If the number of drives returning bad data is less than or equal to the number of

Re: [zfs-discuss] ZFS RAIDZ vs. RAID5.

2007-09-11 Thread Jeff Bonwick
> As you can see, two independent ZFS blocks share one parity block. > COW won't help you here, you would need to be sure that each ZFS > transaction goes to a different (and free) RAID5 row. > > This is I belive the main reason why poor RAID5 wasn't used in the first > place. Exactly right. RAI

Re: [zfs-discuss] ZFS panic when trying to import pool

2007-09-18 Thread Jeff Bonwick
Basically, it is complaining that there aren't enough disks to read the pool metadata. This would suggest that in your 3-disk RAID-Z config, either two disks are missing, or one disk is missing *and* another disk is damaged -- due to prior failed writes, perhaps. (I know there's at least one disk

Re: [zfs-discuss] Best option for my home file server?

2007-09-26 Thread Jeff Bonwick
I would keep it simple. Let's call your 250GB disks A, B, C, D, and your 500GB disks X and Y. I'd either make them all mirrors: zpool create mypool mirror A B mirror C D mirror X Y or raidz the little ones and mirror the big ones: zpool create mypool raidz A B C D mirror X Y or, as yo

Re: [zfs-discuss] ZFS Roadmap - thoughts on expanding raidz / restriping / defrag

2007-12-17 Thread Jeff Bonwick
In short, yes. The enabling technology for all of this is something we call bp rewrite -- that is, the ability to rewrite an existing block pointer (bp) to a new location. Since ZFS is COW, this would be trivial in the absence of snapshots -- just touch all the data. But because a block may appea

Re: [zfs-discuss] x4500 recommendations for netbackup dsu?

2007-12-20 Thread Jeff Bonwick
Yep, compression is generally a nice win for backups. The amount of compression will depend on the nature of the data. If it's all mpegs, you won't see any advantage because they're already compressed. But for just about everything else, 2-3x is typical. As for hot spares, they are indeed globa

Re: [zfs-discuss] Issue fixing ZFS corruption

2008-01-23 Thread Jeff Bonwick
The Silicon Image 3114 controller is known to corrupt data. Google for "silicon image 3114 corruption" to get a flavor. I'd suggest getting your data onto different h/w, quickly. Jeff On Wed, Jan 23, 2008 at 12:34:56PM -0800, Bertrand Sirodot wrote: > Hi, > > I have been experiencing corruption

Re: [zfs-discuss] Issue fixing ZFS corruption

2008-01-23 Thread Jeff Bonwick
Actually s10_72, but it's not really a fix, it's a workaround for a bug in the hardware. I don't know how effective it is. Jeff On Wed, Jan 23, 2008 at 04:54:54PM -0800, Erast Benson wrote: > I believe issue been fixed in snv_72+, no? > > On Wed, 2008-01-23 at 16:41 -

Re: [zfs-discuss] Lost intermediate snapshot; incremental backup still possible?

2008-02-12 Thread Jeff Bonwick
I think so. On your backup pool, roll back to the last snapshot that was successfully received. Then you should be able to send an incremental between that one and the present. Jeff On Thu, Feb 07, 2008 at 08:38:38AM -0800, Ian wrote: > I keep my system synchronized to a USB disk from time to t

Re: [zfs-discuss] raidz2 resilience on 3 disks

2008-02-21 Thread Jeff Bonwick
> 1) If i create a raidz2 pool on some disks, start to use it, then the disks' > controllers change. What will happen to my zpool? Will it be lost or is > there some disk tagging which allows zfs to recognise the disks? It'll be fine. ZFS opens by path, but then checks both the devid and the on-d

Re: [zfs-discuss] moving zfs filesystems between disks

2008-02-27 Thread Jeff Bonwick
Yes. Just say this: # zpool replace mypool disk1 disk2 This will do all the intermediate steps you'd expect: attach disk2 as a mirror of disk1, resilver, detach disk2, and grow the pool to reflect the larger size of disk1. Jeff On Wed, Feb 27, 2008 at 04:48:59PM -0800, Bill Shannon wrote: > I'

Re: [zfs-discuss] moving zfs filesystems between disks

2008-02-27 Thread Jeff Bonwick
flect the larger size of newdisk. Jeff On Wed, Feb 27, 2008 at 05:04:02PM -0800, Jeff Bonwick wrote: > Yes. Just say this: > > # zpool replace mypool disk1 disk2 > > This will do all the intermediate steps you'd expect: attach disk2 > as a mirror of disk1, resilver, detach d

Re: [zfs-discuss] Cause for data corruption?

2008-02-29 Thread Jeff Bonwick
> I thought RAIDZ would correct data errors automatically with the parity data. Right. However, if the data is corrupted while in memory (e.g. on a PC with non-parity memory), there's nothing ZFS can do to detect that. I mean, not even theoretically. The best we could do would be to narrow the w

Re: [zfs-discuss] [dtrace-discuss] periodic ZFS disk accesses

2008-03-01 Thread Jeff Bonwick
> I recently converted my home directory to zfs on an external disk drive. > Approximately every three seconds I can hear the disk being accessed, > even if I'm doing nothing. The noise is driving me crazy! > [...] > Anyway, anyone have any ideas of how I can use dtrace or some other tool > to tra

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-02 Thread Jeff Bonwick
Nathan: yes. Flipping each bit and recomputing the checksum is not only possible, we actually did it in early versions of the code. The problem is that it's really expensive. For a 128K block, that's a million bits, so you have to re-run the checksum a million times, on 128K of data. That's 128G

Re: [zfs-discuss] ZFS performance lower than expected

2008-03-26 Thread Jeff Bonwick
> The disks in the SAN servers were indeed striped together with Linux LVM > and exported as a single volume to ZFS. That is really going to hurt. In general, you're much better off giving ZFS access to all the individual LUNs. The intermediate LVM layer kills the concurrency that's native to ZF

Re: [zfs-discuss] Per filesystem scrub

2008-03-31 Thread Jeff Bonwick
Peter, That's a great suggestion. And as fortune would have it, we have the code to do it already. Scrubbing in ZFS is driven from the logical layer, not the physical layer. When you scrub a pool, you're really just scrubbing the pool-wide metadata, then scrubbing each filesystem. At 50,000 fe

Re: [zfs-discuss] Per filesystem scrub

2008-04-04 Thread Jeff Bonwick
> Aye, or better yet -- give the scrub/resilver/snap reset issue fix very > high priority. As it stands snapshots are impossible when you need to > resilver and scrub (even on supposedly sun supported thumper configs). No argument. One of our top engineers is working on this as we speak. I say

Re: [zfs-discuss] Performance of one single 'cp'

2008-04-14 Thread Jeff Bonwick
No, that is definitely not expected. One thing that can hose you is having a single disk that performs really badly. I've seen disks as slow as 5 MB/sec due to vibration, bad sectors, etc. To see if you have such a disk, try my diskqual.sh script (below). On my desktop system, which has 8 drive

Re: [zfs-discuss] zfs filesystem metadata checksum

2008-04-14 Thread Jeff Bonwick
Not at present, but it's a good RFE. Unfortunately it won't be quite as simple as just adding an ioctl to report the dnode checksum. To see why, consider a file with one level of indirection: that is, it consists of a dnode, a single indirect block, and several data blocks. The indirect block cont

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-04-28 Thread Jeff Bonwick
If your entire pool consisted of a single mirror of two disks, A and B, and you detached B at some point in the past, you *should* be able to recover the pool as it existed when you detached B. However, I just tried that experiment on a test pool and it didn't work. I will investigate further and

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-04-29 Thread Jeff Bonwick
Urgh. This is going to be harder than I thought -- not impossible, just hard. When we detach a disk from a mirror, we write a new label to indicate that the disk is no longer in use. As a side effect, this zeroes out all the old uberblocks. That's the bad news -- you have no uberblocks. The go

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-04-29 Thread Jeff Bonwick
Urgh. This is going to be harder than I thought -- not impossible, just hard. When we detach a disk from a mirror, we write a new label to indicate that the disk is no longer in use. As a side effect, this zeroes out all the old uberblocks. That's the bad news -- you have no uberblocks. The go

Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools

2008-04-30 Thread Jeff Bonwick
Indeed, things should be simpler with fewer (generally one) pool. That said, I suspect I know the reason for the particular problem you're seeing: we currently do a bit too much vdev-level caching. Each vdev can have up to 10MB of cache. With 132 pools, even if each pool is just a single iSCSI de

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-05-03 Thread Jeff Bonwick
Oh, you're right! Well, that will simplify things! All we have to do is convince a few bits of code to ignore ub_txg == 0. I'll try a couple of things and get back to you in a few hours... Jeff On Fri, May 02, 2008 at 03:31:52AM -0700, Benjamin Brumaire wrote: > Hi, > > while diving deeply in

Re: [zfs-discuss] lost zpool when server restarted.

2008-05-04 Thread Jeff Bonwick
It's OK that you're missing labels 2 and 3 -- there are four copies precisely so that you can afford to lose a few. Labels 2 and 3 are at the end of the disk. The fact that only they are missing makes me wonder if someone resized the LUNs. Growing them would be OK, but shrinking them would indee

Re: [zfs-discuss] lost zpool when server restarted.

2008-05-04 Thread Jeff Bonwick
> Looking at the txg numbers, it's clear that labels on to devices that > are unavailable now may be stale: Actually, they look OK. The txg values in the label indicate the last txg in which the pool configuration changed for devices in that top-level vdev (e.g. mirror or raid-z group), not the l

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-05-04 Thread Jeff Bonwick
s the name of the missing device. Good luck, and please let us know how it goes! Jeff On Sat, May 03, 2008 at 10:48:34PM -0700, Jeff Bonwick wrote: > Oh, you're right! Well, that will simplify things! All we have to do > is convince a few bits of code to ignore ub_txg == 0. I'

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-05-04 Thread Jeff Bonwick
; label_write(fd, offsetof(vdev_label_t, vl_uberblock), 1ULL << UBERBLOCK_SHIFT, ub); label_write(fd, offsetof(vdev_label_t, vl_vdev_phys), VDEV_PHYS_SIZE, &vl.vl_vdev_phys); fsync(fd); return (0); } Jeff On Sun, May 04, 2008 at 01:2

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-05-07 Thread Jeff Bonwick
Yes, I think that would be useful. Something like 'zpool revive' or 'zpool undead'. It would not be completely general-purpose -- in a pool with multiple mirror devices, it could only work if all replicas were detached in the same txg -- but for the simple case of a single top-level mirror vdev,

Re: [zfs-discuss] ZFS with raidz

2008-05-30 Thread Jeff Bonwick
Very cool! Just one comment. You said: > We'll try compression level #9. gzip-9 is *really* CPU-intensive, often for little gain over gzip-1. As in, it can take 100 times longer and yield just a few percent gain. The CPU cost will limit write bandwidth to a few MB/sec per core. I'd suggest tha

Re: [zfs-discuss] [caiman-discuss] disk names?

2008-06-04 Thread Jeff Bonwick
I agree with that. format(1M) and cfgadm(1M) are, ah, not the most user-friendly tools. It would be really nice to have 'zpool disks' go out and taste all the drives to see which ones are available. We already have most of the code to do it. 'zpool import' already contains the taste-all-disks-a

Re: [zfs-discuss] Cannot delete errored file

2008-06-10 Thread Jeff Bonwick
That's odd -- the only way the 'rm' should fail is if it can't read the znode for that file. The znode is metadata, and is therefore stored in two distinct places using ditto blocks. So even if you had one unlucky copy that was damaged on two of your disks, you should still have another copy elsew

Re: [zfs-discuss] zfs mirror broken?

2008-06-20 Thread Jeff Bonwick
If you say 'zpool online ' that should tell ZFS that the disk is healthy again and automatically kick off a resilver. Of course, that should have happened automatically. What version of ZFS / Solaris are you running? Jeff On Fri, Jun 20, 2008 at 06:01:25PM +0200, Justin Vassallo wrote: > Hi, >

Re: [zfs-discuss] [caiman-discuss] swap & dump on ZFS volume

2008-06-30 Thread Jeff Bonwick
> Neither swap or dump are mandatory for running Solaris. Dump is mandatory in the sense that losing crash dumps is criminal. Swap is more complex. It's certainly not mandatory. Not so long ago, swap was typically larger than physical memory. But in recent years, we've essentially moved to a w

Re: [zfs-discuss] zpool with RAID-5 from intelligent storage arrays

2008-06-30 Thread Jeff Bonwick
Using ZFS to mirror two hardware RAID-5 LUNs is actually quite nice. Because the data is mirrored at the ZFS level, you get all the benefits of self-healing. Moreover, you can survive a great variety of hardware failures: three or more disks can die (one in the first array, two or more in the seco

Re: [zfs-discuss] ZFS Deferred Frees

2008-06-30 Thread Jeff Bonwick
When a block is freed as part of transaction group N, it can be reused in transaction group N+1. There's at most a one-txg (few-second) delay. Jeff On Mon, Jun 16, 2008 at 01:02:53PM -0400, Torrey McMahon wrote: > I'm doing some simple testing of ZFS block reuse and was wondering when > deferre

Re: [zfs-discuss] [caiman-discuss] swap & dump on ZFS volume

2008-07-01 Thread Jeff Bonwick
> To be honest, it is not quite clear to me, how we might utilize > dumpadm(1M) to help us to calculate/recommend size of dump device. > Could you please elaborate more on this ? dumpadm(1M) -c specifies the dump content, which can be kernel, kernel plus current process, or all memory. If the dum

Re: [zfs-discuss] [caiman-discuss] swap & dump on ZFS volume

2008-07-01 Thread Jeff Bonwick
> The problem is that size-capping is the only control we have over > thrashing right now. It's not just thrashing, it's also any application that leaks memory. Without a cap, the broken application would continue plowing through memory until it had consumed every free block in the storage pool.

Re: [zfs-discuss] Changing GUID

2008-07-02 Thread Jeff Bonwick
> How difficult would it be to write some code to change the GUID of a pool? As a recreational hack, not hard at all. But I cannot recommend it in good conscience, because if the pool contains more than one disk, the GUID change cannot possibly be atomic. If you were to crash or lose power in th

Re: [zfs-discuss] bug id 6343667

2008-07-05 Thread Jeff Bonwick
FYI, we are literally just days from having this fixed. Matt: after putback you really should blog about this one -- both to let people know that this long-standing bug has been fixed, and to describe your approach to it. It's a surprisingly tricky and interesting problem. Jeff On Sat, Jul 05,

Re: [zfs-discuss] is it possible to add a mirror device later?

2008-07-06 Thread Jeff Bonwick
I would just swap the physical locations of the drives, so that the second half of the mirror is in the right location to be bootable. ZFS won't mind -- it tracks the disks by content, not by pathname. Note that SATA is not hotplug-happy, so you're probably best off doing this while the box is powe

Re: [zfs-discuss] confusion and frustration with zpool

2008-07-06 Thread Jeff Bonwick
As a first step, 'fmdump -ev' should indicate why it's complaining about the mirror. Jeff On Sun, Jul 06, 2008 at 07:55:22AM -0700, Pete Hartman wrote: > I'm doing another scrub after clearing "insufficient replicas" only to find > that I'm back to the report of insufficient replicas, which basi

Re: [zfs-discuss] scrub failing to initialise

2008-07-11 Thread Jeff Bonwick
If the cabling outage was transient, the disk driver would simply retry until they came back. If it's a hotplug-capable bus and the disks were flagged as missing, ZFS would by default wait until the disks came back (see zpool get failmode ), and complete the I/O then. There would be no missing di

Re: [zfs-discuss] Remove log device?

2008-07-13 Thread Jeff Bonwick
You are correct, and it is indeed annoying. I hope to have this fixed by the end of the month. Jeff On Sun, Jul 13, 2008 at 10:16:55PM -0500, Mike Gerdts wrote: > It seems as though there is no way to remove a log device once it is > added. Is this correct? > > Assuming this is correct, is the

Re: [zfs-discuss] scrub never finishes

2008-07-13 Thread Jeff Bonwick
ZFS co-inventor Matt Ahrens recently fixed this: 6343667 scrub/resilver has to start over when a snapshot is taken Trust me when I tell you that solving this correctly was much harder than you might expect. Thanks again, Matt. Jeff On Sun, Jul 13, 2008 at 07:08:48PM -0700, Anil Jangity wrote:

Re: [zfs-discuss] remote replication with huge data using zfs?

2006-05-11 Thread Jeff Bonwick
> plan A. To mirror on iSCSI devices: > keep one server with a set of zfs file systems > with 2 (sub)mirrors each, one of the mirrors use > devices physically on remote site accessed as > iSCSI LUNs. > > How does ZFS handle remote replication? > If

Re: [zfs-discuss] ZFS and databases

2006-05-11 Thread Jeff Bonwick
> Are you saying that copy-on-write doesn't apply for mmap changes, but > only file re-writes? I don't think that gels with anything else I > know about ZFS. No, you're correct -- everything is copy-on-write. Jeff ___ zfs-discuss mailing list zfs-d

Re: [zfs-discuss] Re: where has all my space gone? (with zfs mountroot + b38)

2006-05-21 Thread Jeff Bonwick
> I've had a "zdb -bv root_pool" running for about 30 minutes now.. it > just finished and of course told me that everything adds up: This is definitely the delete queue problem: > Blocks LSIZE PSIZE ASIZE avgcomp %Total Type > 4.18M 357G222G223G 53.2K1.6199.

Re: [zfs-discuss] Re: where has all my space gone? (with zfs mountroot + b38)

2006-05-22 Thread Jeff Bonwick
> > 6420204 root filesystem's delete queue is not running > The workaround for this bug is to issue to following command... > > # zfs set readonly=off / > > This will cause the delete queue to start up and should flush your queue. Tabriz, Thanks for the update. James, please let us know if th

  1   2   >