to reclaim space used by snapshots, they have to be removed.
--
/ Peter Schuller
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
this is ARC in
general or something FreeBSD specific, I don't know. But it does, at
this point, not have to do with ARC sizing since the ARC is sensibly
large.
(I realize I should investigate properly and report back, but I'm not
likely to have time to dig into this now
all cache size. Given MRU+MFU and without knowing further details
right now, I accept that the ARC may fundamentally need a bigger cache
size in relation to the working set in order to be effective in the
way I am using it here. I was basing my expectations on LRU-style
behavior.
Thanks!
--
ing reported by FreeBSD, but as
far as I can tell these are different issues. Sure, a bigger ARC might
hide the behavior I happen to see; but I want the cache to behave in a
way where I do not need gigabytes of extra ARC size to "lure" it into
caching the data necessary for 'urxvt' wi
what you're implying in your response.
--
/ Peter Schuller
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
er, can someone confirm/deny? If the
latter, is there some way to tweak it? I have not found one (other
than changing the code). Is there any particular reason why such knobs
are not exposed? Am I missing something?
--
/ Peter Schuller
___
zfs-discuss m
s only recently that some write
barrier/write caching issues started being seriously discussed in the
Linux kernel community for example).
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller '
Key retrieval: Send an E-Mail to getpgp...@scode.org
E-Mail: peter.schul...@infi
tically massively
faster on ZFS... but this won't happen until operating systems start
exposing the necessary interface.
What does one need to do to get something happening here? Other than
whine on mailing lists...
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller '
ess that's actually what's going on
> though, just an interesting creepy speculation.
This would be another case where battery-backed (local to the machine)
NVRAM fundamentally helps even in a situation where you are only
concerned with the barrier, since there is no problem having a
batt
barriers and thus corruption-proofness.
Agreed.
Btw, a great example of a "non-enterprisy" case where you do care
about persistence, is the pretty common case of simply running a mail
server. Just for anyone reading the above paragraph and concluding it
doesn't matter to mere mortals
en if you do want to have correctness when it
comes to write barriers and/or honoring fsync().
However, that said, as I stated in another post I wouldn't be
surprised if it turns out the USB device was ignoring sync
commands. But I have no idea what the case was for the original
poster,
. Though correctness cannot be proven, you can at least
test for common cases of systematic incorrect behavior.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller '
Key retrieval: Send an E-Mail to getpgp...@scode.org
E-Mail: peter.schul...@infidyne.com Web: http://www.scode.
outtage, from problems
arrising from misbehaving hardware or bugs in software. ZFS cannot
magically overcome such problems, nor can UFS/reiserfs/xfs/whatever
else.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller '
Key retrieval: Send an E-Mail to getpgp...@scode.
ms a couple of weeks later.
While I don't know what is going on in your case, blaming the
introduction of a piece of software/hardware/procedure on some problem
without identifying a causal relationship, is a common mistake to
make.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or '
tee that you won't get completely
broken behavior.
I think it boils down to the fact that 99% of customers that aren't
doing integration of the individual components in overall packages,
probably don't care/understand/bother with it, so as long as the
benchmarks say it's "fa
of course would affect filesystems other than ZFS aswell. What is
worse, I was unable to completely disable write caching either because
that, too, did not actually propagate to the underlying device when
attempted.
(I could not say for certain whether this was fundamental to the
device or in comb
detect corruption, but also
correct it. You can choose your desired level of redundancy expressed
as a percentage of the file size.
[1] http://en.wikipedia.org/wiki/Parchive
[2] http://en.wikipedia.org/wiki/Forward_error_correction
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Sch
nt cache flushing/writer barriers?)
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
pgpi9RuPOuyGm.pgp
Description: PGP signature
__
course
the cost of these bays are added (~ $60-$70 I believe for the 5-bay
supermicro; the Lian Li stuff is cheaper, but not hotswap and such).
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTE
er some people complaining here and on FreeBSD lists about not seeing
the added space. In my case I always rebooted anyway so I could never tell
the difference.
I stand corrected. Thanks!
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retri
l replace.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
signature.asc
Description: This is a digi
e RAID, if you intend to
take advantage of the self-healing properties of ZFS with multiple disks, you
must expose the individual disks to your mirror/raidz/raidz2 individually
through the virtualization environment and use them in your pool.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or '
> I was just wandering that maybe there's a problem with just one
> disk...
No, this is something I have observed on at least four different systems, with
vastly varying hardware. Probably just the effects of the known problem.
Thanks,
--
/ Peter Schuller
PGP userID: 0xE9758B7
k. It wasn't a poke at the
streaming performance. Very interesting to hear there's a bug open for it
though.
> Can you also post iostat -xnz 1 while you're doing dd?
> and zpool status
This was FreeBSD, but I can provide iostat -x if you still want it for some
reason.
ably ZFS waits for a little while before
resuming writes.
Note that this is also being run on plain hardware; it's not even PCI Express.
During throughput peaks, but not constantly, the bottleneck is probably the
PCI bus.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <
problems.
If this theory is correct, a scrub (zpool scrub fatty) should encounter
checksum errors on da3 and da6.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTE
ked at in the next firmware release.
=
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
signature.asc
Description: This is a digi
if you also care about diskspace, it's a show stopper unless you
can throw money at the problem.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode
earlier...
[1] Because of course all serious players use proper UPS and a power outtage
should never happen unless you suck. (This has actually been advocated to me.
Seriously.)
[2] Because of [1] and because of course you only run stable software that is
well tested and will never be buggy. (This
having two levels of volumen
management).
> Please let us know what you find out...
If I get anything confirmed from LSI I'll post an update.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROT
ible with all PCI slots/motherboards), and (2) they are not
PCI Express ;)
The one I am using has been great so far though (on FreeBSD; never got a
chance to try on Solaris).
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send
it for JBOD use with ZFS? Or avenus of investigation? Is there
any chance of a lowly consumer getting any information out of LSI? Is there
some other manufacturer that provide low-budget stuff that you can get some
technical information about? Does anyone have some specific knowledge of a
suitable prod
art and avoiding flushing the cache with
useless data? I am not read up on the details of the ARC. But in this
particular case it was clear that a simple LRU had been much more
useful - unless there was some other problem related to my setup or
FreeBSD integration that somehow broke proper caching.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
pgp4eTccU5E8q.pgp
Description: PGP signature
___
zfs
date and don't
remember the source of this information).
AFAIK the ZFS pools themselves are fully portable.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] W
de all drives
in the set, or stripe the pool across multiple sets (so eg, you could add
7x750 gb and have the pool striped over that and the 3x300 gb set).
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL P
der what those 8-ways
cost new...
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
__
ased on performance observations
if it actually did flush caches.
The ability to get decent performance *AND* reliability on cheap disks
is one of the major reasons why I love ZFS :)
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an
by adding an additional
raidz/raidz2 array that you then stripe between.
I believe zpool should have warned you about trying to add a
non-redundant component alongside the redundant raidz, requiring you to
force (-f) the addition.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schu
e was made aware of this and allowed a weaker form of
COMMIT where you drop the persistence requirement, but keep the
consistency requirement.)
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL
is of course a normal problem with shell scripting (unless the zfs
command is documented to guarantee backward compatible output?), but in
cases like this it really really becomes critical.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retri
the safety of the out-of-the-box tool, regardless of the
local policy for privilege delegation.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTE
s on FreeBSD.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
signature.asc
Description: OpenPGP digital signature
__
be more specific about your intended
actions in cases where you want to destroy snapshots or clones.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED
uot;zfs destroy --nofs" or "zfs destroy --safe".
Another option is to allow something along the lines of:
zfs destroy snapshot:/path/to/[EMAIL PROTECTED]
Where the use of "snapshot:" would guarantee that non-snapshots are not
affected.
--
/ Peter Schuller
PGP userID: 0
ial
casing "rm -rf /" and similar, which is generally not considered a good
idea.
But somehow the snapshot situation feels a lot more risky.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTE
ium sized business type use. Yes
you should avoid it, but shit (always) happens.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED]
1-20 semi-structured
servers with 5-20 or so terrabytes each, you probably don't need is
all that much - even if it would be nice (speaking from experience).
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to
ython client) that was
completely CPU bound in kernel space, and tracing showed single-byte
I/O.
Regardless, the above stats are interesting and I suppose consistent
with what one might expect, from previous discussion on this list.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schu
e fbarrier() simply
fsync()).
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
___
zfs-discuss maili
) exposes an
asynchronous method of ensuring relative order of I/O operations to
userland, which is often useful.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTE
s thus far limited to my home storage server. But I have wished for an
fbarrier() many many times over the past few years...)
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [
correct the error if it manages the
> redundancy.
So now with ZFS, can anyone with a 400 drive array confirm that a
"scrub" has to fix roughly one problem a day? (Or modify appropriately
for whatever amount of drives.)
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Sc
new drive in B.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
___
zfs-discuss mailing list
zf
r
> your ideas. Feel free to contact me directly.
Thanks. It's not that I have any particular situation where this becomes more
important than usual. It is just a general observation of a behavior which,
in cases where availability is not important, is sub-optimal from a data
safet
he case of the clone the old data is not
removed.
--
/ Peter Schuller, InfiDyne Technologies HB
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
___
all data on n drives (for n levels of
redundancy), or alternatively to the unlikely event of bad blocks co-inciding
on multiple drives, wouldn't reliability be significantly increased in cases
where this is an acceptable practice?
Opinions?
--
/ Peter Schuller, InfiDyne Technologies HB
f
there is a problem specific to ZFS that is NOT just obvious results of some
general principle, that's very relevant for the ZFS administration guide IMO
(and man pages for that matter).
--
/ Peter Schuller, InfiDyne Technologies HB
PGP userID: 0xE9758B7D or 'Peter Schuller <
hat with more than 9 drives, the statistical
probability of failure is too high for raidz (or raid5). It's a shame the
statement in the guide is not further qualified to actually explain that
there is a concrete issue at play.
(I haven't looked into the archives to find the previously men
de:
http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf
--
/ Peter Schuller, InfiDyne Technologies HB
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED]
e I actually want it
disabled (the NFS server).
--
/ Peter Schuller, InfiDyne Technologies HB
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: h
that
article in response to my statement above...
--
/ Peter Schuller, InfiDyne Technologies HB
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
_
a normalized
performance of n for small reads.
Is there some reason why a small read on a raidz2 is not statistically very
likely to require I/O on only one device? Assuming a non-degraded pool of
course.
--
/ Peter Schuller, InfiDyne Technologies HB
PGP userID: 0xE9758B7D or 'Peter
nd I don't know how flushes are handled by zfs.
But it's most definitely faster than the 4 mb/second seen when dd:ing to the
pool.
--
/ Peter Schuller, InfiDyne Technologies HB
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send
; and "reading" means dd:ing (to /dev/null, from /dev/zero)
with bs=$((1024*1024)).
Pools created with "zpool create speedtest c4d0 c5d0 c6d0 c7d0" and
variations of that for the different combinations. The pool with all
four drives is 1.16T in size.
--
/ Peter Schuller, InfiDyne Tech
few
people actually seem to care about this. Since I wanted to confirm my
understanding of ZFS semantics w.r.t. write caching anyway I thought I might
aswell also ask about the general tendency among drives since, if anywhere,
people here might know.
--
/ Peter Schuller, InfiDyne Technologies
perform actual powerloss tests, it would be interesting
to hear from anybody whether it is generally expected to be safe.
--
/ Peter Schuller, InfiDyne Technologies HB
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED
67 matches
Mail list logo