Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-29 Thread Edward Ned Harvey
> From: Richard Elling [mailto:richard.ell...@gmail.com]
> 
> > Worse yet, your arc consumption could be so large, that
> > PROCESSES don't fit in ram anymore.  In this case, your processes get
> pushed
> > out to swap space, which is really bad.
> 
> This will not happen. The ARC will be asked to shrink when other memory
> consumers demand memory. The lower bound of ARC size is c_min

Makes sense.  Is c_min a constant?  Suppose processes are consuming a lot of
memory.  Will c_min protect L2ARC entries in the ARC?  At least on my
systems, it seems that c_min is fixed at 10% of the total system memory.

If c_min is sufficiently small, relative to the amount of ARC that would be
necessary to index the L2ARC...  Since every entry in L2ARC requires an
entry in ARC, this seems to imply, that if process memory consumption is
high, then both the ARC and L2ARC are effectively useless.
 
Things sometimes get evicted from ARC completely, and sometimes they get
evicted into L2ARC with only a reference still remaining in ARC.  But if
processes consume enough memory on the system so as to shrink the ARC to
effectively nonexistent, then the L2ARC must also be nonexistent.


> L2ARC is populated by a thread that watches the soon-to-be-evicted list.

This seems to imply, if processes start consuming a lot of memory, the first
thing to disappear is the ARC, and the second thing to disappear is the
L2ARC (because the L2ARC references stored in ARC get evicted from ARC after
other things in ARC)


> AVL trees

Good to know.  Thanks.


> > So the point is - Whenever you do a write, and the calculated DDT is not
> > already in ARC/L2ARC, the system will actually perform several small
reads
> > looking for the DDT entry before it finally knows that the DDT entry
> > actually exists.  So the penalty of performing a write, with dedup
enabled,
> > and the relevant DDT entry not already in ARC/L2ARC is a very large
> penalty.
> 
> "very" is a relative term, 

Agreed.  Here is what I was implying:
Suppose you don't have enough ram to hold the complete DDT.  And you perform
a bunch of random writes (whether sync or async).  Then you will suffer a
lot of cache-misses searching for DDT entries, and the consequence will be
... For every little write that could potentially have only the disk penalty
of one little write, instead has the disk penalty of several reads plus a
write.  So your random write performance is effectively several times slower
than it could potentially have been, if only you had more ram.

Reads are unaffected, except if there's random-write congestion hogging disk
time.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-29 Thread Roy Sigurd Karlsbakk
> Controls whether deduplication is in effect for a
> dataset. The default value is off. The default checksum
> used for deduplication is sha256 (subject to change).

> 
> This is from b159.

This was fletcher4 earlier, and still is in opensolaris/openindiana. Given a 
combination with verify (which I would use anyway, since there are always tiny 
chances of collisions), why would sha256 be a better choice?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-29 Thread Brandon High
On Fri, Apr 29, 2011 at 7:10 AM, Roy Sigurd Karlsbakk  
wrote:
> This was fletcher4 earlier, and still is in opensolaris/openindiana. Given a 
> combination with verify (which I would use anyway, since there are always 
> tiny chances of collisions), why would sha256 be a better choice?

fletcher4 was only an option for snv_128, which was quickly pulled and
replaced with snv_128b which removed fletcher4 as an option.

The official post is here:
http://www.opensolaris.org/jive/thread.jspa?threadID=118519&tstart=0#437431

It looks like fletcher4 is still an option in snv_151a for non-dedup
datasets, and is in fact the default.

As an aside: Erik, any idea when the 159 bits will make it to the public?

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Faster copy from UFS to ZFS

2011-04-29 Thread Dan Shelton
Is anyone aware of any freeware program that can speed up copying tons 
of data (2 TB) from UFS to ZFS on same server?


Thanks.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster copy from UFS to ZFS

2011-04-29 Thread Freddie Cash
On Fri, Apr 29, 2011 at 10:53 AM, Dan Shelton  wrote:
> Is anyone aware of any freeware program that can speed up copying tons of
> data (2 TB) from UFS to ZFS on same server?

rsync, with --whole-file --inplace (and other options), works well for
the initial copy.

rsync, with --no-whole-file --inplace (and other options), works
extremely fast for updates.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster copy from UFS to ZFS

2011-04-29 Thread Joerg Schilling
Dan Shelton  wrote:

> Is anyone aware of any freeware program that can speed up copying tons 
> of data (2 TB) from UFS to ZFS on same server?

Try star -copy 

Note that due to the problems on ZFS to deal with stable states, I recommend to 
use -no-fsync and it may of course help to specify a larger FIFO than the 
default in case you have plenty of RAM.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Still no way to recover a "corrupted" pool

2011-04-29 Thread Freddie Cash
Is there anyway, yet, to import a pool with corrupted space_map
errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?

I have a pool comprised of 4 raidz2 vdevs of 6 drives each.  I have
almost 10 TB of data in the pool (3 TB actual disk space used due to
dedup and compression).  While testing various failure modes, I have
managed to corrupt the pool to the point where it won't import.  So
much for being bulletproof.  :(

If I try to import the pool normally, it give corrupted space_map errors.

If I try to "import -F" the pool, it complains that "zio-io_type !=
ZIO_TYPE_WRITE".

I've also tried the above with "-o readonly=on" and "-R
some/other/root" variations.

There's also no zfs.cache file anywhere to be found, and creating a
blank file doesn't help.

Does this mean that a 10 TB pool can be lost due to a single file
being corrupted, or a single piece of pool metadata being corrupted?
And that there's *still* no recovery tools for situations like this?

Running ZFSv28 on 64-bit FreeBSD 8-STABLE.

For the curious, the failure mode that causes this?  Rebooting while 8
simultaneous rsyncs were running, which were not killed by the
shutdown process for some reason, which prevented 8 ZFS filesystems
from being unmounted, which prevented the pool from being exported
(even though I have a "zfs unmount -f" and "zpool export -f"
fail-safe), which locked up the shutdown process requiring a power
reset.

:(

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster copy from UFS to ZFS

2011-04-29 Thread Ian Collins

 On 04/30/11 06:00 AM, Freddie Cash wrote:

On Fri, Apr 29, 2011 at 10:53 AM, Dan Shelton  wrote:

Is anyone aware of any freeware program that can speed up copying tons of
data (2 TB) from UFS to ZFS on same server?

rsync, with --whole-file --inplace (and other options), works well for
the initial copy.


Is rsync ACL aware yet?

I always use find piped to cpio.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster copy from UFS to ZFS

2011-04-29 Thread Brandon High
On Fri, Apr 29, 2011 at 10:53 AM, Dan Shelton  wrote:
> Is anyone aware of any freeware program that can speed up copying tons of
> data (2 TB) from UFS to ZFS on same server?

Setting 'sync=disabled' for the initial copy will help, since it will
make all writes asynchronous.

You will probably want to set it back to default after you're done.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-29 Thread Erik Trimble

On 4/29/2011 9:44 AM, Brandon High wrote:

On Fri, Apr 29, 2011 at 7:10 AM, Roy Sigurd Karlsbakk  
wrote:

This was fletcher4 earlier, and still is in opensolaris/openindiana. Given a 
combination with verify (which I would use anyway, since there are always tiny 
chances of collisions), why would sha256 be a better choice?

fletcher4 was only an option for snv_128, which was quickly pulled and
replaced with snv_128b which removed fletcher4 as an option.

The official post is here:
http://www.opensolaris.org/jive/thread.jspa?threadID=118519&tstart=0#437431

It looks like fletcher4 is still an option in snv_151a for non-dedup
datasets, and is in fact the default.

As an aside: Erik, any idea when the 159 bits will make it to the public?

-B


Yup, fletcher4 is still the default for any fileset not using dedup.  
It's "good enough", and I can't see any reason to change it for those 
purposes (since it's collision problems aren't much of an issue when 
just doing data integrity checks).


Sorry, no idea on release date stuff. I'm completely out of the loop on 
release info.  I'm lucky if I can get a heads up before it actually gets 
published internally.


:-(


I'm just a lowly Java Platform Group dude.   Solaris ain't my silo.

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster copy from UFS to ZFS

2011-04-29 Thread Richard Elling
On Apr 29, 2011, at 1:37 PM, Brandon High wrote:

> On Fri, Apr 29, 2011 at 10:53 AM, Dan Shelton  wrote:
>> Is anyone aware of any freeware program that can speed up copying tons of
>> data (2 TB) from UFS to ZFS on same server?
> 
> Setting 'sync=disabled' for the initial copy will help, since it will
> make all writes asynchronous.

Few local commands are sync. cp, cpio, tar, rsync, etc are all effectively
async for local copies. Disabling the ZIL will do nothing to speed these.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still no way to recover a "corrupted" pool

2011-04-29 Thread Freddie Cash
On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash  wrote:
> Is there anyway, yet, to import a pool with corrupted space_map
> errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?
>
> I have a pool comprised of 4 raidz2 vdevs of 6 drives each.  I have
> almost 10 TB of data in the pool (3 TB actual disk space used due to
> dedup and compression).  While testing various failure modes, I have
> managed to corrupt the pool to the point where it won't import.  So
> much for being bulletproof.  :(
>
> If I try to import the pool normally, it give corrupted space_map errors.
>
> If I try to "import -F" the pool, it complains that "zio-io_type !=
> ZIO_TYPE_WRITE".
>
> I've also tried the above with "-o readonly=on" and "-R
> some/other/root" variations.
>
> There's also no zfs.cache file anywhere to be found, and creating a
> blank file doesn't help.
>
> Does this mean that a 10 TB pool can be lost due to a single file
> being corrupted, or a single piece of pool metadata being corrupted?
> And that there's *still* no recovery tools for situations like this?
>
> Running ZFSv28 on 64-bit FreeBSD 8-STABLE.
>
> For the curious, the failure mode that causes this?  Rebooting while 8
> simultaneous rsyncs were running, which were not killed by the
> shutdown process for some reason, which prevented 8 ZFS filesystems
> from being unmounted, which prevented the pool from being exported
> (even though I have a "zfs unmount -f" and "zpool export -f"
> fail-safe), which locked up the shutdown process requiring a power
> reset.

Well, by commenting out the VERIFY line for zio->io_type !=
ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but
only with -F and -o readonly=on.  :(

Trying to import it read-write gives dmu_free_range errors and panics
the system.

Compiling a kernel with that assertion commented out allows the pool
to be imported read-only.  Importing it read-write gives a bunch of
other dmu panics.  :( :( :(

How can it be that after 28 pool format revisions and 5+ years of
development, ZFS is still this brittle?  I've found lots of threads
from 2007 about this very issue, with "don't do that" and "it's not an
issue" and "there's no need for a pool consistency checker" and other
similar "head in the sand" responses.  :(  But still no way to prevent
or fix this form of corruption.

It's great that I can get the pool to import read-only, so the data is
still available.  But that really doesn't help when I've already
rebuilt this pool twice due to this issue.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still no way to recover a "corrupted" pool

2011-04-29 Thread Alexander J. Maidak
On Fri, 2011-04-29 at 16:21 -0700, Freddie Cash wrote:
> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash  wrote:
> > Is there anyway, yet, to import a pool with corrupted space_map
> > errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?

>...
> Well, by commenting out the VERIFY line for zio->io_type !=
> ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but
> only with -F and -o readonly=on.  :(
> 
> It's great that I can get the pool to import read-only, so the data is
> still available.  But that really doesn't help when I've already
> rebuilt this pool twice due to this issue.
> 

Just curious, did you try an import or recovery with Solaris 11 Express
build 151a?  I expect it wouldn't have made a difference, but I'd be
curious to know.

-Alex

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still no way to recover a "corrupted" pool

2011-04-29 Thread Freddie Cash
On Fri, Apr 29, 2011 at 5:00 PM, Alexander J. Maidak  wrote:
> On Fri, 2011-04-29 at 16:21 -0700, Freddie Cash wrote:
>> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash  wrote:
>> > Is there anyway, yet, to import a pool with corrupted space_map
>> > errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?
>
>>...
>> Well, by commenting out the VERIFY line for zio->io_type !=
>> ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but
>> only with -F and -o readonly=on.  :(
>>
>> It's great that I can get the pool to import read-only, so the data is
>> still available.  But that really doesn't help when I've already
>> rebuilt this pool twice due to this issue.
>>
>
> Just curious, did you try an import or recovery with Solaris 11 Express
> build 151a?  I expect it wouldn't have made a difference, but I'd be
> curious to know.

No, that's on the menu for next week, trying a couple OpenSolaris,
Solaris Express, Nexenta LiveCDs to see if they make a difference.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still no way to recover a "corrupted" pool

2011-04-29 Thread Brandon High
On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash  wrote:
> Running ZFSv28 on 64-bit FreeBSD 8-STABLE.

I'd suggest trying to import the pool into snv_151a (Solaris 11
Express), which is the reference and development platform for ZFS.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-29 Thread Edward Ned Harvey
> From: Edward Ned Harvey
> I saved the core and ran again.  This time it spewed "leaked space"
messages
> for an hour, and completed.  But the final result was physically
impossible (it
> counted up 744k total blocks, which means something like 3Megs per block
in
> my 2.39T used pool.  I checked compressratio is 1.00x and I have no
> compression.)
> 
> I ran again.
> 
> Still spewing messages.  This can't be a good sign.
> 
> Anyone know what it means, or what to do about it?

After running again, I get an even more impossible number ... 45.4K total
blocks, which would mean something like 50 megs per block.

This pool does scrub regularly (every other week).  In fact, it's scheduled
to scrub this weekend

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-29 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> What does it mean / what should you do, if you run that command, and it
> starts spewing messages like this?
> leaked space: vdev 0, offset 0x3bd8096e00, size 7168

And one of these:
Assertion failed: space_map_load(&msp->ms_map, &zdb_space_map_ops, 0x0,
&msp->ms_smo, spa->spa_meta_objset) == 0, file ../zdb.c, line 1439, function
zdb_leak_init
Abort (core dumped)

I saved the core and ran again.  This time it spewed "leaked space" messages
for an hour, and completed.  But the final result was physically impossible
(it counted up 744k total blocks, which means something like 3Megs per block
in my 2.39T used pool.  I checked compressratio is 1.00x and I have no
compression.)

I ran again.

Still spewing messages.  This can't be a good sign.

Anyone know what it means, or what to do about it?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-29 Thread Edward Ned Harvey
> From: Neil Perrin [mailto:neil.per...@oracle.com]
>
> The size of these structures will vary according to the release you're
running.
> You can always find out the size for a particular system using ::sizeof
within
> mdb. For example, as super user :
> 
> : xvm-4200m2-02 ; echo ::sizeof ddt_entry_t | mdb -k
> sizeof (ddt_entry_t) = 0x178
> : xvm-4200m2-02 ; echo ::sizeof arc_buf_hdr_t | mdb -k
> sizeof (arc_buf_hdr_t) = 0x100
> : xvm-4200m2-02 ;

I can do the echo | mdb -k.  But what is that : xvm-4200 command?  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-29 Thread Brandon High
On Thu, Apr 28, 2011 at 6:48 PM, Edward Ned Harvey
 wrote:
> What does it mean / what should you do, if you run that command, and it
> starts spewing messages like this?
> leaked space: vdev 0, offset 0x3bd8096e00, size 7168

I'm not sure there's much you can do about it short of deleting
datasets and/or snapshots.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss