You can set metaslab_gang_bang to (say) 8k to force lots of gang block
allocations.
Jeff
On May 25, 2010, at 11:42 PM, Andriy Gapon wrote:
>
> I am working on improving some ZFS-related bits in FreeBSD boot chain.
> At the moment it seems that the things work mostly fine except for a case
> w
> Apple can currently just take the ZFS CDDL code and incorporate it
> (like they did with DTrace), but it may be that they wanted a "private
> license" from Sun (with appropriate technical support and
> indemnification), and the two entities couldn't come to mutually
> agreeable terms.
I
> Terrific! Can't wait to read the man pages / blogs about how to use it...
Just posted one:
http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup
Enjoy, and let me know if you have any questions or suggestions for
follow-on posts.
Jeff
___
zfs-discuss m
And, for the record, this is my fault. There is an aspect of endianness
that I simply hadn't thought of. When I have a little more time I will
blog about the whole thing, because there are many useful lessons here.
Thank you, Matt, for all your help with this. And my apologies to
everyone else
erify) are unaffected.
Jeff
On Mon, Nov 23, 2009 at 09:44:41PM -0800, Jeff Bonwick wrote:
> And, for the record, this is my fault. There is an aspect of endianness
> that I simply hadn't thought of. When I have a little more time I will
> blog about the whole thing, because
> i am no pro in zfs, but to my understanding there is no original.
That is correct. From a semantic perspective, there is no change
in behavior between dedup=off and dedup=on. Even the accounting
remains the same: each reference to a block is charged to the dataset
making the reference. The on
Yes, although it's slightly indirect:
- make a clone of the snapshot you want to roll back to
- promote the clone
See 'zfs promote' for details.
Jeff
On Fri, Dec 11, 2009 at 08:37:04AM +0100, Alexander Skwar wrote:
> Hi.
>
> Is it possible on Solaris 10 5/09, to rollback to a Z
It is by design. The idea is to report the dedup ratio for the data
you've actually attempted to dedup. To get a 'diluted' dedup ratio
of the sort you describe, just compare the space used by all datasets
to the space allocated in the pool. For example, on my desktop,
I have a pool called 'build
> People used fastfs for years in specific environments (hopefully
> understanding the risks), and disabling the ZIL is safer than fastfs.
> Seems like it would be a useful ZFS dataset parameter.
We agree. There's an open RFE for this:
6280630 zil synchronicity
No promise on date, but it will
Correct.
Jeff
On Aug 24, 2010, at 9:45 PM, Peter Taps wrote:
> Folks,
>
> One of the articles on the net says that the following two commands are
> exactly the same:
>
> # zfs set dedup=on tank
> # zfs set dedup=sha256 tank
>
> Essentially, "on" is just a pseudonym for "sha256" and "verify"
It's almost certainly the SIL3114 controller.
Google "SIL3114 data corruption" -- it's nasty.
Jeff
On Thu, Sep 25, 2008 at 07:50:01AM +0200, Mikael Karlsson wrote:
> I have a strange problem involving changes in large file on a mirrored
> zpool in
> Open solaris snv96.
> We use it at storage in
> The circumstances where I have lost data have been when ZFS has not
> handled a layer of redundancy. However, I am not terribly optimistic
> of the prospects of ZFS on any device that hasn't committed writes
> that ZFS thinks are committed.
FYI, I'm working on a workaround for broken devices.
> Or is there a way to mitigate a checksum error on non-redundant zpool?
It's just like the difference between non-parity, parity, and ECC memory.
Most filesystems don't have checksums (non-parity), so they don't even
know when they're returning corrupt data. ZFS without any replication
can detec
ZFS will allow the replacement. The available size is, however,
be determined by the smallest of the lot. Once you've replaced
*all* 500GB disks with 1TB disks, the available space will double.
One suggestion: replace as many disks as you intend to at the same time,
so that ZFS only has to do on
pulling
and old one -- then Eric is right, and in fact I'd go further: in that
case, replace only one at a time so you maintain the ability to survive
a disk failing while you're going all this.
Jeff
On Sat, Oct 11, 2008 at 06:37:17PM -0700, Erik Trimble wrote:
> Jeff Bonwick wrote:
&
Are you running this on a live pool? If so, zdb can't get a reliable
block count -- and zdb -L [live pool] emits a warning to that effect.
Jeff
On Thu, Oct 16, 2008 at 03:36:25AM -0700, Ben Rockwood wrote:
> I've been struggling to fully understand why disk space seems to vanish.
> I've dug t
These are the conditions:
(1) The bug is specific to the root pool. Other pools are unaffected.
(2) It is triggered by doing a 'zpool online' while I/O is in flight.
(3) Item (2) can be triggered by syseventd.
(4) The bug is new in build 102. Builds 101 and earlier are fine.
I believe the follo
I think we (the ZFS team) all generally agree with you. The current
nevada code is much better at handling device failures than it was
just a few months ago. And there are additional changes that were
made for the FishWorks (a.k.a. Amber Road, a.k.a. Sun Storage 7000)
product line that will make
> If you have more comments, or especially if you think I reached the wrong
> conclusion, please do post it. I will post my continuing results.
I think your conclusions are correct. The main thing you're seeing is
the combination of gzip-9 being incredibly CPU-intensive with our I/O
pipeline all
> I'm going to pitch in here as devil's advocate and say this is hardly
> revolution. 99% of what zfs is attempting to do is something NetApp and
> WAFL have been doing for 15 years+. Regardless of the merits of their
> patents and prior art, etc., this is not something revolutionarily new. It
>
> Off the top of my head nearly all of them. Some of them have artificial
> limitations because they learned the hard way that if you give customers
> enough rope they'll hang themselves. For instance "unlimited snapshots".
Oh, that's precious! It's not an arbitrary limit, it's a safety feafure
On Sat, Dec 13, 2008 at 04:44:10PM -0800, Mark Dornfeld wrote:
> I have installed Solaris 10 on a ZFS filesystem that is not mirrored. Since I
> have an identical disk in the machine, I'd like to add that disk to the
> existing pool as a mirror. Can this be done, and if so, how do I do it?
Yes:
Each ZFS block pointer contains up to three DVAs (data virtual addresses),
to implement 'ditto blocks' (multiple copies of the data, above and beyond
any replication provided by mirroring or RAID-Z). Semantically, ditto blocks
are a lot like mirrors, so we actually use the mirror code to read them
> I would like to nominate roch.bourbonn...@sun.com for his work on
> improving the performance of ZFS over the last few years.
Absolutely.
Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-di
> The Validated Execution project is investigating how to utilize ZFS
> snapshots as the basis of a "validated" filesystem. Given that the
> blocks of the dataset form a Merkel tree of hashes, it seemed
> straightforward to validate the individual objects in the snapshot and
> then sign the hash o
> There is no substitute for cord-yank tests - many and often. The
> weird part is, the ZFS design team simulated millions of them.
> So the full explanation remains to be uncovered?
We simulated power failure; we did not simulate disks that simply
blow off write ordering. Any disk that you'd e
> wellif you want a write barrier, you can issue a flush-cache and
> wait for a reply before releasing writes behind the barrier. You will
> get what you want by doing this for certain.
Not if the disk drive just *ignores* barrier and flush-cache commands
and returns success. Some consumer d
> I'm rather tired of hearing this mantra.
> [...]
> Every file system needs a repair utility
Hey, wait a minute -- that's a mantra too!
I don't think there's actually any substantive disagreement here -- stating
that one doesn't need a separate program called /usr/sbin/fsck is not the
same as sa
> > This is CR 6667683
> > http://bugs.opensolaris.org/view_bug.do?bug_id=6667683
>
> I think that would solve 99% of ZFS corruption problems!
Based on the reports I've seen to date, I think you're right.
> Is there any EDT for this patch?
Well, because of this thread, this has gone from "on my
>1. Does variable FSB block sizing extend to files larger than record size,
>concerning the last FSB allocated?
>
>In other words, for files larger than 128KB, that utilize more than one
>full recordsize FSB, will the LAST FSB allocated be `right-sized' to fit
>the remaining da
I agree with Chris -- I'd much rather do something like:
zfs clone snap1 clone1 snap2 clone2 snap3 clone3 ...
than introduce a pattern grammar. Supporting multiple snap/clone pairs
on the command line allows you to do just about anything atomically.
Jeff
On Fri, Mar 27, 2009 at 10:46:3
Right.
Another difference to be aware of is that ZFS reports the total
space consumed, including space for metadata -- typically around 1%.
Traditional filesystems like ufs and ext2 preallocate metadata and
don't count it as using space. I don't know how reiserfs does its
bookkeeping, but I would
> > Yes, I made note of that in my OP on this thread. But is it enough to
> > end up with 8gb of non-compressed files measuring 8gb on
> > reiserfs(linux) and the same data showing nearly 9gb when copied to a
> > zfs filesystem with compression on.
>
> whoops.. a hefty exaggeration it only show
> ZFS blocksize is dynamic, power of 2, with a max size == recordsize.
Minor clarification: recordsize is restricted to powers of 2, but
blocksize is not -- it can be any multiple of sector size (512 bytes).
For small files, this matters: a 37k file is stored in a 37k block.
For larger, multi-bloc
> >According to the ZFS documentation, a resilver operation
> >includes what is effectively a dirty region log (DRL) so that if the
> >resilver is interrupted, by a snapshot or reboot, the resilver can
> >continue where it left off.
>
> That is not the case. The dirty region log keeps tra
Yep, you got it.
Jeff
On Fri, Jun 19, 2009 at 04:15:41PM -0700, Simon Breden wrote:
> Hi,
>
> I have a ZFS storage pool consisting of a single RAIDZ2 vdev of 6 drives, and
> I have a question about replacing a failed drive, should it occur in future.
>
> If a drive fails in this double-parity
Yep, right again.
Jeff
On Fri, Jun 19, 2009 at 04:21:42PM -0700, Simon Breden wrote:
> Hi,
>
> I'm using 6 SATA ports from the motherboard but I've now run out of SATA
> ports, and so I'm thinking of adding a Supermicro AOC-SAT2-MV8 8-port SATA
> controller card.
>
> What is the procedure for
On Fri, Jan 26, 2007 at 10:57:19PM -0800, Frank Cusack wrote:
> On January 27, 2007 12:27:17 AM -0200 Toby Thain <[EMAIL PROTECTED]> wrote:
> >On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:
> >>3. I created file system with huge amount of data, where most of the
> >>data is read-only. I chan
> [b]How the ZFS striped on 7 slices of FC-SATA LUN via NFS worked [u]146 times
> faster[/u] than the ZFS on 1 slice of the same LUN via NFS???[/b]
Without knowing more I can only guess, but most likely it's a simple
matter of working set. Suppose the benchmark in question has a 4G
working set,
The object number is in hex. 21e282 hex is 2220674 decimal --
give that a whirl.
This is all better now thanks to some recent work by Eric Kustarz:
6410433 'zpool status -v' would be more useful with filenames
This was integrated into Nevada build 57.
Jeff
On Sat, Feb 10, 2007 at 05:18:05PM -
Toby Thain wrote:
I'm no guru, but would not ZFS already require strict ordering for its
transactions ... which property Peter was exploiting to get "fbarrier()"
for free?
Exactly. Even if you disable the intent log, the transactional nature
of ZFS ensures preservation of event ordering. Not
Do you agree that their is a major tradeoff of
"builds up a wad of transactions in memory"?
I don't think so. We trigger a transaction group commit when we
have lots of dirty data, or 5 seconds elapse, whichever comes first.
In other words, we don't let updates get stale.
Jeff
That is interesting. Could this account for disproportionate kernel
CPU usage for applications that perform I/O one byte at a time, as
compared to other filesystems? (Nevermind that the application
shouldn't do that to begin with.)
No, this is entirely a matter of CPU efficiency in the current c
On Mon, Feb 26, 2007 at 01:53:17AM -0800, Tor wrote:
> [...] if using redundancy on ZDF
The ZFS Document Format? ;-)
> uses less disk space as simply getting extra drives and do identical copies,
> with periodic CRC checks of the source material to check the health.
If you create a 2-disk mirro
> My plan was to have 8-10 cheap drives, most of them IDE drives from
> 120 gig and up to 320 gig. Does that mean that I can get 7-9 drives
> with data plus full redundancy from the last drive? It sounds almost
> like magic to me to be able to have the data on maybe 1 TB of drives
> and have one dr
> However, I logged in this morning to discover that the ZFS volume could
> not be read. In addition, it appears to have marked all drives, mirrors
> & the volume itself as 'corrupted'.
One possibility: I've seen this happen when a system doesn't shut down
cleanly after the last change to the pool
Jesse,
This isn't a stall -- it's just the natural rhythm of pushing out
transaction groups. ZFS collects work (transactions) until either
the transaction group is full (measured in terms of how much memory
the system has), or five seconds elapse -- whichever comes first.
Your data would seem t
Mario,
For the reasons you mentioned, having a few different filesystems
(on the order of 5-10, I'd guess) can be handy. Any time you want
different behavior for different types of data, multiple filesystems
are the way to go.
For maximum directory size, it turns out that the practical limits
a
> What was the reason to make ZFS use directory sizes as the number of
> entries rather than the way other Unix filesystems use it?
In UFS, the st_size is the size of the directory inode as though it
were a file. The only reason it's like that is that UFS is sloppy
and lets you cat directories --
A couple of questions for you:
(1) What OS are you running (Solaris, BSD, MacOS X, etc)?
(2) What's your config? In particular, are any of the partitions
on the same disk?
(3) Are you copying a few big files or lots of small ones?
(4) Have you measured UFS-to-UFS and ZFS-to-ZFS performance
I suspect this is a bug in raidz error reporting. With a mirror,
each copy either checksums correctly or it doesn't, so we know
which drives gave us bad data. With RAID-Z, we have to infer
which drives have damage. If the number of drives returning bad
data is less than or equal to the number of
> As you can see, two independent ZFS blocks share one parity block.
> COW won't help you here, you would need to be sure that each ZFS
> transaction goes to a different (and free) RAID5 row.
>
> This is I belive the main reason why poor RAID5 wasn't used in the first
> place.
Exactly right. RAI
Basically, it is complaining that there aren't enough disks to read
the pool metadata. This would suggest that in your 3-disk RAID-Z
config, either two disks are missing, or one disk is missing *and*
another disk is damaged -- due to prior failed writes, perhaps.
(I know there's at least one disk
I would keep it simple. Let's call your 250GB disks A, B, C, D,
and your 500GB disks X and Y. I'd either make them all mirrors:
zpool create mypool mirror A B mirror C D mirror X Y
or raidz the little ones and mirror the big ones:
zpool create mypool raidz A B C D mirror X Y
or, as yo
In short, yes. The enabling technology for all of this is something
we call bp rewrite -- that is, the ability to rewrite an existing
block pointer (bp) to a new location. Since ZFS is COW, this would
be trivial in the absence of snapshots -- just touch all the data.
But because a block may appea
Yep, compression is generally a nice win for backups. The amount of
compression will depend on the nature of the data. If it's all mpegs,
you won't see any advantage because they're already compressed. But
for just about everything else, 2-3x is typical.
As for hot spares, they are indeed globa
The Silicon Image 3114 controller is known to corrupt data.
Google for "silicon image 3114 corruption" to get a flavor.
I'd suggest getting your data onto different h/w, quickly.
Jeff
On Wed, Jan 23, 2008 at 12:34:56PM -0800, Bertrand Sirodot wrote:
> Hi,
>
> I have been experiencing corruption
Actually s10_72, but it's not really a fix, it's a workaround
for a bug in the hardware. I don't know how effective it is.
Jeff
On Wed, Jan 23, 2008 at 04:54:54PM -0800, Erast Benson wrote:
> I believe issue been fixed in snv_72+, no?
>
> On Wed, 2008-01-23 at 16:41 -
I think so. On your backup pool, roll back to the last snapshot that
was successfully received. Then you should be able to send an incremental
between that one and the present.
Jeff
On Thu, Feb 07, 2008 at 08:38:38AM -0800, Ian wrote:
> I keep my system synchronized to a USB disk from time to t
> 1) If i create a raidz2 pool on some disks, start to use it, then the disks'
> controllers change. What will happen to my zpool? Will it be lost or is
> there some disk tagging which allows zfs to recognise the disks?
It'll be fine. ZFS opens by path, but then checks both the devid and
the on-d
Yes. Just say this:
# zpool replace mypool disk1 disk2
This will do all the intermediate steps you'd expect: attach disk2
as a mirror of disk1, resilver, detach disk2, and grow the pool
to reflect the larger size of disk1.
Jeff
On Wed, Feb 27, 2008 at 04:48:59PM -0800, Bill Shannon wrote:
> I'
flect the larger size of newdisk.
Jeff
On Wed, Feb 27, 2008 at 05:04:02PM -0800, Jeff Bonwick wrote:
> Yes. Just say this:
>
> # zpool replace mypool disk1 disk2
>
> This will do all the intermediate steps you'd expect: attach disk2
> as a mirror of disk1, resilver, detach d
> I thought RAIDZ would correct data errors automatically with the parity data.
Right. However, if the data is corrupted while in memory (e.g. on a PC
with non-parity memory), there's nothing ZFS can do to detect that.
I mean, not even theoretically. The best we could do would be to
narrow the w
> I recently converted my home directory to zfs on an external disk drive.
> Approximately every three seconds I can hear the disk being accessed,
> even if I'm doing nothing. The noise is driving me crazy!
> [...]
> Anyway, anyone have any ideas of how I can use dtrace or some other tool
> to tra
Nathan: yes. Flipping each bit and recomputing the checksum is not only
possible, we actually did it in early versions of the code. The problem
is that it's really expensive. For a 128K block, that's a million bits,
so you have to re-run the checksum a million times, on 128K of data.
That's 128G
> The disks in the SAN servers were indeed striped together with Linux LVM
> and exported as a single volume to ZFS.
That is really going to hurt. In general, you're much better off
giving ZFS access to all the individual LUNs. The intermediate
LVM layer kills the concurrency that's native to ZF
Peter,
That's a great suggestion. And as fortune would have it, we have the
code to do it already. Scrubbing in ZFS is driven from the logical
layer, not the physical layer. When you scrub a pool, you're really
just scrubbing the pool-wide metadata, then scrubbing each filesystem.
At 50,000 fe
> Aye, or better yet -- give the scrub/resilver/snap reset issue fix very
> high priority. As it stands snapshots are impossible when you need to
> resilver and scrub (even on supposedly sun supported thumper configs).
No argument. One of our top engineers is working on this as we speak.
I say
No, that is definitely not expected.
One thing that can hose you is having a single disk that performs
really badly. I've seen disks as slow as 5 MB/sec due to vibration,
bad sectors, etc. To see if you have such a disk, try my diskqual.sh
script (below). On my desktop system, which has 8 drive
Not at present, but it's a good RFE. Unfortunately it won't be
quite as simple as just adding an ioctl to report the dnode checksum.
To see why, consider a file with one level of indirection: that is,
it consists of a dnode, a single indirect block, and several data blocks.
The indirect block cont
If your entire pool consisted of a single mirror of two disks, A and B,
and you detached B at some point in the past, you *should* be able to
recover the pool as it existed when you detached B. However, I just
tried that experiment on a test pool and it didn't work. I will
investigate further and
Urgh. This is going to be harder than I thought -- not impossible,
just hard.
When we detach a disk from a mirror, we write a new label to indicate
that the disk is no longer in use. As a side effect, this zeroes out
all the old uberblocks. That's the bad news -- you have no uberblocks.
The go
Urgh. This is going to be harder than I thought -- not impossible,
just hard.
When we detach a disk from a mirror, we write a new label to indicate
that the disk is no longer in use. As a side effect, this zeroes out
all the old uberblocks. That's the bad news -- you have no uberblocks.
The go
Indeed, things should be simpler with fewer (generally one) pool.
That said, I suspect I know the reason for the particular problem
you're seeing: we currently do a bit too much vdev-level caching.
Each vdev can have up to 10MB of cache. With 132 pools, even if
each pool is just a single iSCSI de
Oh, you're right! Well, that will simplify things! All we have to do
is convince a few bits of code to ignore ub_txg == 0. I'll try a
couple of things and get back to you in a few hours...
Jeff
On Fri, May 02, 2008 at 03:31:52AM -0700, Benjamin Brumaire wrote:
> Hi,
>
> while diving deeply in
It's OK that you're missing labels 2 and 3 -- there are four copies
precisely so that you can afford to lose a few. Labels 2 and 3
are at the end of the disk. The fact that only they are missing
makes me wonder if someone resized the LUNs. Growing them would
be OK, but shrinking them would indee
> Looking at the txg numbers, it's clear that labels on to devices that
> are unavailable now may be stale:
Actually, they look OK. The txg values in the label indicate the
last txg in which the pool configuration changed for devices in that
top-level vdev (e.g. mirror or raid-z group), not the l
s the name of the missing device.
Good luck, and please let us know how it goes!
Jeff
On Sat, May 03, 2008 at 10:48:34PM -0700, Jeff Bonwick wrote:
> Oh, you're right! Well, that will simplify things! All we have to do
> is convince a few bits of code to ignore ub_txg == 0. I'
;
label_write(fd, offsetof(vdev_label_t, vl_uberblock),
1ULL << UBERBLOCK_SHIFT, ub);
label_write(fd, offsetof(vdev_label_t, vl_vdev_phys),
VDEV_PHYS_SIZE, &vl.vl_vdev_phys);
fsync(fd);
return (0);
}
Jeff
On Sun, May 04, 2008 at 01:2
Yes, I think that would be useful. Something like 'zpool revive'
or 'zpool undead'. It would not be completely general-purpose --
in a pool with multiple mirror devices, it could only work if
all replicas were detached in the same txg -- but for the simple
case of a single top-level mirror vdev,
Very cool! Just one comment. You said:
> We'll try compression level #9.
gzip-9 is *really* CPU-intensive, often for little gain over gzip-1.
As in, it can take 100 times longer and yield just a few percent gain.
The CPU cost will limit write bandwidth to a few MB/sec per core.
I'd suggest tha
I agree with that. format(1M) and cfgadm(1M) are, ah, not the most
user-friendly tools. It would be really nice to have 'zpool disks'
go out and taste all the drives to see which ones are available.
We already have most of the code to do it. 'zpool import' already
contains the taste-all-disks-a
That's odd -- the only way the 'rm' should fail is if it can't
read the znode for that file. The znode is metadata, and is
therefore stored in two distinct places using ditto blocks.
So even if you had one unlucky copy that was damaged on two
of your disks, you should still have another copy elsew
If you say 'zpool online ' that should tell ZFS that
the disk is healthy again and automatically kick off a resilver.
Of course, that should have happened automatically. What version
of ZFS / Solaris are you running?
Jeff
On Fri, Jun 20, 2008 at 06:01:25PM +0200, Justin Vassallo wrote:
> Hi,
>
> Neither swap or dump are mandatory for running Solaris.
Dump is mandatory in the sense that losing crash dumps is criminal.
Swap is more complex. It's certainly not mandatory. Not so long ago,
swap was typically larger than physical memory. But in recent years,
we've essentially moved to a w
Using ZFS to mirror two hardware RAID-5 LUNs is actually quite nice.
Because the data is mirrored at the ZFS level, you get all the benefits
of self-healing. Moreover, you can survive a great variety of hardware
failures: three or more disks can die (one in the first array, two or
more in the seco
When a block is freed as part of transaction group N, it can be reused
in transaction group N+1. There's at most a one-txg (few-second) delay.
Jeff
On Mon, Jun 16, 2008 at 01:02:53PM -0400, Torrey McMahon wrote:
> I'm doing some simple testing of ZFS block reuse and was wondering when
> deferre
> To be honest, it is not quite clear to me, how we might utilize
> dumpadm(1M) to help us to calculate/recommend size of dump device.
> Could you please elaborate more on this ?
dumpadm(1M) -c specifies the dump content, which can be kernel, kernel plus
current process, or all memory. If the dum
> The problem is that size-capping is the only control we have over
> thrashing right now.
It's not just thrashing, it's also any application that leaks memory.
Without a cap, the broken application would continue plowing through
memory until it had consumed every free block in the storage pool.
> How difficult would it be to write some code to change the GUID of a pool?
As a recreational hack, not hard at all. But I cannot recommend it
in good conscience, because if the pool contains more than one disk,
the GUID change cannot possibly be atomic. If you were to crash or
lose power in th
FYI, we are literally just days from having this fixed.
Matt: after putback you really should blog about this one --
both to let people know that this long-standing bug has been
fixed, and to describe your approach to it.
It's a surprisingly tricky and interesting problem.
Jeff
On Sat, Jul 05,
I would just swap the physical locations of the drives, so that the
second half of the mirror is in the right location to be bootable.
ZFS won't mind -- it tracks the disks by content, not by pathname.
Note that SATA is not hotplug-happy, so you're probably best off
doing this while the box is powe
As a first step, 'fmdump -ev' should indicate why it's complaining
about the mirror.
Jeff
On Sun, Jul 06, 2008 at 07:55:22AM -0700, Pete Hartman wrote:
> I'm doing another scrub after clearing "insufficient replicas" only to find
> that I'm back to the report of insufficient replicas, which basi
If the cabling outage was transient, the disk driver would simply retry
until they came back. If it's a hotplug-capable bus and the disks were
flagged as missing, ZFS would by default wait until the disks came back
(see zpool get failmode ), and complete the I/O then. There would
be no missing di
You are correct, and it is indeed annoying. I hope to have this
fixed by the end of the month.
Jeff
On Sun, Jul 13, 2008 at 10:16:55PM -0500, Mike Gerdts wrote:
> It seems as though there is no way to remove a log device once it is
> added. Is this correct?
>
> Assuming this is correct, is the
ZFS co-inventor Matt Ahrens recently fixed this:
6343667 scrub/resilver has to start over when a snapshot is taken
Trust me when I tell you that solving this correctly was much harder
than you might expect. Thanks again, Matt.
Jeff
On Sun, Jul 13, 2008 at 07:08:48PM -0700, Anil Jangity wrote:
> plan A. To mirror on iSCSI devices:
> keep one server with a set of zfs file systems
> with 2 (sub)mirrors each, one of the mirrors use
> devices physically on remote site accessed as
> iSCSI LUNs.
>
> How does ZFS handle remote replication?
> If
> Are you saying that copy-on-write doesn't apply for mmap changes, but
> only file re-writes? I don't think that gels with anything else I
> know about ZFS.
No, you're correct -- everything is copy-on-write.
Jeff
___
zfs-discuss mailing list
zfs-d
> I've had a "zdb -bv root_pool" running for about 30 minutes now.. it
> just finished and of course told me that everything adds up:
This is definitely the delete queue problem:
> Blocks LSIZE PSIZE ASIZE avgcomp %Total Type
> 4.18M 357G222G223G 53.2K1.6199.
> > 6420204 root filesystem's delete queue is not running
> The workaround for this bug is to issue to following command...
>
> # zfs set readonly=off /
>
> This will cause the delete queue to start up and should flush your queue.
Tabriz,
Thanks for the update. James, please let us know if th
1 - 100 of 138 matches
Mail list logo