ZFS has always done a certain amount of "write throttling". In the past
(or the present, for those of you running S10 or pre build 87 bits) this
throttling was controlled by a timer and the size of the ARC: we would
"cut" a transaction group every 5 seconds based off of our timer, and
we would als
ugh, thanks for exploring this and isolating the problem. We will look
into what is going on (wrong) here. I have filed bug:
6545015 RAID-Z resilver broken
to track this problem.
-Mark
Marco van Lienen wrote:
On Sat, Apr 07, 2007 at 05:05:18PM -0500, in a galaxy far far away, Chris
Csanady
Joseph Barbey wrote:
Matthew Ahrens wrote:
Joseph Barbey wrote:
Robert Milkowski wrote:
JB> So, normally, when the script runs, all snapshots finish in
maybe a minute
JB> total. However, on Sundays, it continues to take longer and
longer. On
JB> 2/25 it took 30 minutes, and this last Sund
Anton B. Rang wrote:
This sounds a lot like:
6417779 ZFS: I/O failure (write on ...) -- need to
reallocate writes
Which would allow us to retry write failures on
alternate vdevs.
Of course, if there's only one vdev, the write should be retried to a different
block on the original vdev ... ri
Atul Vidwansa wrote:
Hi,
I have few questions about the way a transaction group is created.
1. Is it possible to group transactions related to multiple operations
in same group? For example, an "rmdir foo" followed by "mkdir bar",
can these end up in same transaction group?
Yes.
2. Is it
Frederic Payet - Availability Services wrote:
Hi gurus,
When creating some small files an ZFS directory, used blocks number is
not what could be espected:
hinano# zfs list
NAME USED AVAIL REFER MOUNTPOINT
pool2 702K 16.5G 26.5K /pool2
pool2/new
This issue has been discussed a number of times in this forum.
To summerize:
ZFS (specifically, the ARC) will try to use *most* of the systems
available memory to cache file system data. The default is to
max out at physmem-1GB (i.e., use all of physical memory except
for 1GB). In the face of m
Robert,
This doesn't look like cache flushing, rather it looks like we are
trying to finish up some writes... but are having a hard time allocating
space for them. Is this pool almost 100% full? There are lots of
instances of zio_write_allocate_gang_members(), which indicates a very
high degree
Peter Buckingham wrote:
Hi Eric,
eric kustarz wrote:
The first thing i would do is see if any I/O is happening ('zpool
iostat 1'). If there's none, then perhaps the machine is hung (which
you then would want to grab a couple of '::threadlist -v 10's from mdb
to figure out if there are hung t
Jason J. W. Williams wrote:
Hi Mark,
That does help tremendously. How does ZFS decide which zio cache to
use? I apologize if this has already been addressed somewhere.
The ARC caches data blocks in the zio_buf_xxx() cache that matches
the block size. For example, dnode data is stored on disk
Al Hopper wrote:
On Wed, 10 Jan 2007, Mark Maybee wrote:
Jason J. W. Williams wrote:
Hi Robert,
Thank you! Holy mackerel! That's a lot of memory. With that type of a
calculation my 4GB arc_max setting is still in the danger zone on a
Thumper. I wonder if any of the ZFS developers could
Jason J. W. Williams wrote:
Hi Robert,
Thank you! Holy mackerel! That's a lot of memory. With that type of a
calculation my 4GB arc_max setting is still in the danger zone on a
Thumper. I wonder if any of the ZFS developers could shed some light
on the calculation?
In a worst-case scenario, Ro
Tomas Ögren wrote:
On 05 January, 2007 - Mark Maybee sent me these 1,5K bytes:
So it looks like this data does not include ::kmastat info from *after*
you reset arc_reduce_dnlc_percent. Can I get that?
Yeah, attached. (although about 18 hours after the others)
Excellent, this confirms #3
. You are tying the arc hands
here, so it has no ability to reduce its size.
Number 3 is the most difficult issue. We are looking into that at the
moment as well.
-Mark
Tomas Ögren wrote:
On 05 January, 2007 - Mark Maybee sent me these 0,8K bytes:
Thomas,
This could be fragmentation in
Thomas,
This could be fragmentation in the meta-data caches. Could you
print out the results of ::kmastat?
-Mark
Tomas Ögren wrote:
On 05 January, 2007 - Robert Milkowski sent me these 3,8K bytes:
Hello Tomas,
I saw the same behavior here when ncsize was increased from default.
Try with de
Ah yes! Thank you Casper. I knew this looked familiar! :-)
Yes, this is almost certainly what is happening here. The
bug was introduced in build 51 and fixed in build 54.
[EMAIL PROTECTED] wrote:
Hmmm, so there is lots of evictable cache here (mostly in the MFU
part of the cache)... could yo
Hmmm, so there is lots of evictable cache here (mostly in the MFU
part of the cache)... could you make your core file available?
I would like to take a look at it.
-Mark
Tomas Ögren wrote:
On 03 January, 2007 - Mark Maybee sent me these 5,0K bytes:
Tomas,
There are a couple of things going
Tomas,
There are a couple of things going on here:
1. There is a lot of fragmentation in your meta-data caches (znode,
dnode, dbuf, etc). This is burning up about 300MB of space in your
hung kernel. This is a known problem that we are currently working
on.
2. While the ARC has set its desired
[EMAIL PROTECTED] wrote:
Hello Casper,
Tuesday, December 12, 2006, 10:54:27 AM, you wrote:
So 'a' UB can become corrupt, but it is unlikely that 'all' UBs will
become corrupt through something that doesn't also make all the data
also corrupt or inaccessible.
CDSC> So how does this work for
Andrew Miller wrote:
Quick question about the interaction of ZFS filesystem compression and the filesystem cache. We have an Opensolaris (actually Nexenta alpha-6) box running RRD collection. These files seem to be quite compressible. A test filesystem containing about 3,000 of these files sho
Jeremy Teo wrote:
On 12/5/06, Bill Sommerfeld <[EMAIL PROTECTED]> wrote:
On Mon, 2006-12-04 at 13:56 -0500, Krzys wrote:
> mypool2/[EMAIL PROTECTED] 34.4M - 151G -
> mypool2/[EMAIL PROTECTED] 141K - 189G -
> mypool2/d3 492G 254G 11.5G legacy
>
> I am so
Robert Milkowski wrote:
Hello John,
Thursday, November 9, 2006, 12:03:58 PM, you wrote:
JC> Hi all,
JC> When testing our programs, I got a problem. On UFS, we get the number of
JC> free inode via 'df -e', then do some things based this value, such as
JC> create an empty file, the value will de
Matthew Flanagan wrote:
Matt,
Matthew Flanagan wrote:
mkfile 100m /data
zpool create tank /data
...
rm /data
...
panic[cpu0]/thread=2a1011d3cc0: ZFS: I/O failure
(write on off 0: zio 60007432bc0 [L0
unallocated] 4000L/400P DVA[0]=<0:b000:400>
DVA[1]=<0:120a000:400> fletcher4 lzjb
This is:
6483887 without direct management, arc ghost lists can run amok
The fix I have in mind is to control the ghost lists as part of
the arc_buf_hdr_t allocations. If you want to test out my fix,
I can send you some diffs...
-Mark
Juergen Keil wrote:
Jürgen Keil writes:
> > ZFS 11.0 on S
Ceri Davies wrote:
On Wed, Oct 11, 2006 at 11:49:48PM -0700, Matthew Ahrens wrote:
James McPherson wrote:
On 10/12/06, Steve Goldberg <[EMAIL PROTECTED]> wrote:
Where is the ZFS configuration (zpools, mountpoints, filesystems,
etc) data stored within Solaris? Is there something akin to vfs
Hey Gary,
Can we get access to your core files?
-Mark
Gary Mitchell wrote:
Hi,
Yes I have a lot of trouble with zfs send .. zfs recv too.
(sol 10 6/06, SUNWzfsu 11.10.0,REV=2006.05.18.02.15). All too often there is
panic of the host doing zfs recv.
When this happens for a certain snapshot c
Patrick wrote:
Hi,
So recently, i decided to test out some of the ideas i've been toying
with, and decided to create 50 000 and 100 000 filesystems, the test
machine was a nice V20Z with dual 1.8 opterons, 4gb ram, connecting a
scsi 3310 raid array, via two scsi controllers.
Now creating the ma
Yup, its almost certain that this is the bug you are hitting.
-Mark
Alan Hargreaves wrote:
I know, bad form replying to myself, but I am wondering if it might be
related to
6438702 error handling in zfs_getpage() can trigger "page not
locked"
Which is marked "fix in progress" with
Jill Manfield wrote:
My customer is running java on a ZFS file system. His platform is Soalris 10
x86 SF X4200. When he enabled ZFS his memory of 18 gigs drops to 2 gigs rather
quickly. I had him do a # ps -e -o pid,vsz,comm | sort -n +1 and it came back:
The culprit application you see is
Bill Sommerfeld wrote:
One question for Matt: when ditto blocks are used with raidz1, how well
does this handle the case where you encounter one or more single-sector
read errors on other drive(s) while reconstructing a failed drive?
for a concrete example
A0 B0 C0 D0 P0
A1 B1 C
Robert Milkowski wrote:
Hello Philippe,
It was recommended to lower ncsize and I did (to default ~128K).
So far it works ok for last days and staying at about 1GB free ram
(fluctuating between 900MB-1,4GB).
Do you think it's a long term solution or with more load and more data
the problem can s
Robert Milkowski wrote:
Hello Mark,
Monday, September 11, 2006, 4:25:40 PM, you wrote:
MM> Jeremy Teo wrote:
Hello,
how are writes distributed as the free space within a pool reaches a
very small percentage?
I understand that when free space is available, ZFS will batch writes
and then issu
Thomas Burns wrote:
On Sep 12, 2006, at 2:04 PM, Mark Maybee wrote:
Thomas Burns wrote:
Hi,
We have been using zfs for a couple of months now, and, overall, really
like it. However, we have run into a major problem -- zfs's memory
requirements
crowd out our primary applic
Thomas Burns wrote:
Hi,
We have been using zfs for a couple of months now, and, overall, really
like it. However, we have run into a major problem -- zfs's memory
requirements
crowd out our primary application. Ultimately, we have to reboot the
machine
so there is enough free memory to st
Jeremy Teo wrote:
Hello,
how are writes distributed as the free space within a pool reaches a
very small percentage?
I understand that when free space is available, ZFS will batch writes
and then issue them in sequential order, maximising write bandwidth.
When free space reaches a minimum, what
Gavin,
Please file a bug on this.
Thanks,
-Mark
Gavin Maltby wrote:
Hi,
My desktop paniced last night during a zfs receive operation. This
is a dual opteron system running snv_47 and bfu'd to DEBUG project bits
that
are in sync with the onnv gate as of two days ago. The project bits
are f
Ivan Debnár wrote:
Hi, thanks for respose.
As this is close-source mailserver (CommuniGate pro), I can't say 100% answer,
but the writes that I see that take too much time (15-30secs) are writes from
temp queue to final storage, and from my understanding, they are sync so the
queue manager c
Ivan,
What mail clients use your mail server? You may be seeing the
effects of:
6440499 zil should avoid txg_wait_synced() and use dmu_sync() to issue
parallel IOs when fsyncing
This bug was fixed in nevada build 43, and I don't think made it into
s10 update 2. It will, of course, be in upd
Jürgen Keil wrote:
We are trying to obtain a mutex that is currently held
by another thread trying to get memory.
Hmm, reminds me a bit on the zvol swap hang I got
some time ago:
http://www.opensolaris.org/jive/thread.jspa?threadID=11956&tstart=150
I guess if the other thead is stuck trying
Robert Milkowski wrote:
On Wed, 6 Sep 2006, Mark Maybee wrote:
Robert Milkowski wrote:
::dnlc!wc
1048545 3145811 76522461
Well, that explains half your problem... and maybe all of it:
After I reduced vdev prefetch from 64K to 8K for last few hours system
is working properly
Robert Milkowski wrote:
::dnlc!wc
1048545 3145811 76522461
Well, that explains half your problem... and maybe all of it:
We have a thread that *should* be trying to free up these entries
in the DNLC, however it appears to be blocked:
stack pointer for thread 2a10014fcc0: 2a10014edd1
[
Hmmm, interesting data. See comments in-line:
Robert Milkowski wrote:
Yes, server has 8GB of RAM.
Most of the time there's about 1GB of free RAM.
bash-3.00# mdb 0
Loading modules: [ unix krtld genunix dtrace specfs ufs sd md ip sctp usba fcp
fctl qlc ssd lofs zfs random logindmux ptm cpc nfs
Robert,
I would be interested in seeing your crash dump. ZFS will consume much
of your memory *in the absence of memory pressure*, but it should be
responsive to memory pressure, and give up memory when this happens. It
looks like you have 8GB of memory on your system? ZFS should never
consum
Michael Schuster - Sun Microsystems wrote:
Pawel Jakub Dawidek wrote:
On Tue, Aug 22, 2006 at 04:30:44PM +0200, Jeremie Le Hen wrote:
I don't know much about ZFS, but Sun states this is a "128 bits"
filesystem. How will you handle this in regards to the FreeBSD
kernel interface that is alrea
Robert,
Are you sure that nfs-s5-p0/d5110 and nfs-s5-p0/d5111 are mounted
following the import? These messages imply that the d5110 and d5111
directories in the top-level filesystem of pool nfs-s5-p0 are not
empty. Could you verify that 'df /nfs-s5-p0/d5110' displays
nfs-s5-p0/d5110 as the "Fil
Robert Milkowski wrote:
Hello Mark,
Sunday, August 13, 2006, 8:00:31 PM, you wrote:
MM> Robert Milkowski wrote:
Hello zfs-discuss,
bash-3.00# zpool status nfs-s5-s6
pool: nfs-s5-s6
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made
Robert Milkowski wrote:
Hello zfs-discuss,
bash-3.00# zpool status nfs-s5-s6
pool: nfs-s5-s6
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device need
Eric Lowe wrote:
Eric Schrock wrote:
Well the fact that it's a level 2 indirect block indicates why it can't
simply be removed. We don't know what data it refers to, so we can't
free the associated blocks. The panic on move is quite interesting -
after BFU give it another shot and file a bug
Luke Lonergan wrote:
Robert,
On 8/8/06 9:11 AM, "Robert Milkowski" <[EMAIL PROTECTED]> wrote:
1. UFS, noatime, HW RAID5 6 disks, S10U2
70MB/s
2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1)
87MB/s
3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2
130MB/s
4. ZFS, atime
Jürgen Keil wrote:
I've tried to use "dmake lint" on on-src-20060731, and was running out of swap
on my
Tecra S1 laptop, 32-bit x86, 768MB main memory, with a 512MB swap slice.
The "FULL KERNEL: global crosschecks:" lint run consumes lots (~800MB) of space
in /tmp, so the system was running out
Yup, your probably running up against the limitations of 32-bit kernel
addressability. We are currently very conservative in this environment,
and so tend to end up with a small cache as a result. It may be
possible to tweak things to get larger cache sizes, but you run the risk
of starving out
Nate,
Thanks for investigating this. Sounds like ZFS is either conflicting
with the Linux partition or running off the end of its partition in the
VMware configuration you set up. The result is the CKSUM errors you
are observing. This could well lead to errors when we try to pagefault
in the i
Nathanael,
This looks like a bug. We are trying to clean up after an error in
zfs_getpage() when we trigger this panic. Can you make a core file
available? I'd like to take a closer look.
I've filed a bug to track this:
6438702 error handling in zfs_getpage() can trigger "page not lo
This will be handled as a read-modify-write: we will still create a new
block, with the original data from the old block, plus the modifications
and then destroy the old block.
-Mark
Andrzej Butkiewicz wrote:
I have some questions about modify filesystem block.
When we want to modify existing
Sorry guys, I have to take the blame for letting this slip. I have
been working with the VM folks on some comprehensive changes to the
way ZFS works with the VM system (still a ways out I'm afraid), and
let this bug slip into the background.
I'm afraid its probably too late to get this into the
55 matches
Mail list logo