Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
> From: Richard Elling [mailto:richard.ell...@gmail.com] > > > Worse yet, your arc consumption could be so large, that > > PROCESSES don't fit in ram anymore. In this case, your processes get > pushed > > out to swap space, which is really bad. > > This will not happen. The ARC will be asked to shrink when other memory > consumers demand memory. The lower bound of ARC size is c_min Makes sense. Is c_min a constant? Suppose processes are consuming a lot of memory. Will c_min protect L2ARC entries in the ARC? At least on my systems, it seems that c_min is fixed at 10% of the total system memory. If c_min is sufficiently small, relative to the amount of ARC that would be necessary to index the L2ARC... Since every entry in L2ARC requires an entry in ARC, this seems to imply, that if process memory consumption is high, then both the ARC and L2ARC are effectively useless. Things sometimes get evicted from ARC completely, and sometimes they get evicted into L2ARC with only a reference still remaining in ARC. But if processes consume enough memory on the system so as to shrink the ARC to effectively nonexistent, then the L2ARC must also be nonexistent. > L2ARC is populated by a thread that watches the soon-to-be-evicted list. This seems to imply, if processes start consuming a lot of memory, the first thing to disappear is the ARC, and the second thing to disappear is the L2ARC (because the L2ARC references stored in ARC get evicted from ARC after other things in ARC) > AVL trees Good to know. Thanks. > > So the point is - Whenever you do a write, and the calculated DDT is not > > already in ARC/L2ARC, the system will actually perform several small reads > > looking for the DDT entry before it finally knows that the DDT entry > > actually exists. So the penalty of performing a write, with dedup enabled, > > and the relevant DDT entry not already in ARC/L2ARC is a very large > penalty. > > "very" is a relative term, Agreed. Here is what I was implying: Suppose you don't have enough ram to hold the complete DDT. And you perform a bunch of random writes (whether sync or async). Then you will suffer a lot of cache-misses searching for DDT entries, and the consequence will be ... For every little write that could potentially have only the disk penalty of one little write, instead has the disk penalty of several reads plus a write. So your random write performance is effectively several times slower than it could potentially have been, if only you had more ram. Reads are unaffected, except if there's random-write congestion hogging disk time. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
> Controls whether deduplication is in effect for a > dataset. The default value is off. The default checksum > used for deduplication is sha256 (subject to change). > > This is from b159. This was fletcher4 earlier, and still is in opensolaris/openindiana. Given a combination with verify (which I would use anyway, since there are always tiny chances of collisions), why would sha256 be a better choice? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On Fri, Apr 29, 2011 at 7:10 AM, Roy Sigurd Karlsbakk wrote: > This was fletcher4 earlier, and still is in opensolaris/openindiana. Given a > combination with verify (which I would use anyway, since there are always > tiny chances of collisions), why would sha256 be a better choice? fletcher4 was only an option for snv_128, which was quickly pulled and replaced with snv_128b which removed fletcher4 as an option. The official post is here: http://www.opensolaris.org/jive/thread.jspa?threadID=118519&tstart=0#437431 It looks like fletcher4 is still an option in snv_151a for non-dedup datasets, and is in fact the default. As an aside: Erik, any idea when the 159 bits will make it to the public? -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Faster copy from UFS to ZFS
Is anyone aware of any freeware program that can speed up copying tons of data (2 TB) from UFS to ZFS on same server? Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster copy from UFS to ZFS
On Fri, Apr 29, 2011 at 10:53 AM, Dan Shelton wrote: > Is anyone aware of any freeware program that can speed up copying tons of > data (2 TB) from UFS to ZFS on same server? rsync, with --whole-file --inplace (and other options), works well for the initial copy. rsync, with --no-whole-file --inplace (and other options), works extremely fast for updates. -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster copy from UFS to ZFS
Dan Shelton wrote: > Is anyone aware of any freeware program that can speed up copying tons > of data (2 TB) from UFS to ZFS on same server? Try star -copy Note that due to the problems on ZFS to deal with stable states, I recommend to use -no-fsync and it may of course help to specify a larger FIFO than the default in case you have plenty of RAM. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Still no way to recover a "corrupted" pool
Is there anyway, yet, to import a pool with corrupted space_map errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions? I have a pool comprised of 4 raidz2 vdevs of 6 drives each. I have almost 10 TB of data in the pool (3 TB actual disk space used due to dedup and compression). While testing various failure modes, I have managed to corrupt the pool to the point where it won't import. So much for being bulletproof. :( If I try to import the pool normally, it give corrupted space_map errors. If I try to "import -F" the pool, it complains that "zio-io_type != ZIO_TYPE_WRITE". I've also tried the above with "-o readonly=on" and "-R some/other/root" variations. There's also no zfs.cache file anywhere to be found, and creating a blank file doesn't help. Does this mean that a 10 TB pool can be lost due to a single file being corrupted, or a single piece of pool metadata being corrupted? And that there's *still* no recovery tools for situations like this? Running ZFSv28 on 64-bit FreeBSD 8-STABLE. For the curious, the failure mode that causes this? Rebooting while 8 simultaneous rsyncs were running, which were not killed by the shutdown process for some reason, which prevented 8 ZFS filesystems from being unmounted, which prevented the pool from being exported (even though I have a "zfs unmount -f" and "zpool export -f" fail-safe), which locked up the shutdown process requiring a power reset. :( -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster copy from UFS to ZFS
On 04/30/11 06:00 AM, Freddie Cash wrote: On Fri, Apr 29, 2011 at 10:53 AM, Dan Shelton wrote: Is anyone aware of any freeware program that can speed up copying tons of data (2 TB) from UFS to ZFS on same server? rsync, with --whole-file --inplace (and other options), works well for the initial copy. Is rsync ACL aware yet? I always use find piped to cpio. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster copy from UFS to ZFS
On Fri, Apr 29, 2011 at 10:53 AM, Dan Shelton wrote: > Is anyone aware of any freeware program that can speed up copying tons of > data (2 TB) from UFS to ZFS on same server? Setting 'sync=disabled' for the initial copy will help, since it will make all writes asynchronous. You will probably want to set it back to default after you're done. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On 4/29/2011 9:44 AM, Brandon High wrote: On Fri, Apr 29, 2011 at 7:10 AM, Roy Sigurd Karlsbakk wrote: This was fletcher4 earlier, and still is in opensolaris/openindiana. Given a combination with verify (which I would use anyway, since there are always tiny chances of collisions), why would sha256 be a better choice? fletcher4 was only an option for snv_128, which was quickly pulled and replaced with snv_128b which removed fletcher4 as an option. The official post is here: http://www.opensolaris.org/jive/thread.jspa?threadID=118519&tstart=0#437431 It looks like fletcher4 is still an option in snv_151a for non-dedup datasets, and is in fact the default. As an aside: Erik, any idea when the 159 bits will make it to the public? -B Yup, fletcher4 is still the default for any fileset not using dedup. It's "good enough", and I can't see any reason to change it for those purposes (since it's collision problems aren't much of an issue when just doing data integrity checks). Sorry, no idea on release date stuff. I'm completely out of the loop on release info. I'm lucky if I can get a heads up before it actually gets published internally. :-( I'm just a lowly Java Platform Group dude. Solaris ain't my silo. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster copy from UFS to ZFS
On Apr 29, 2011, at 1:37 PM, Brandon High wrote: > On Fri, Apr 29, 2011 at 10:53 AM, Dan Shelton wrote: >> Is anyone aware of any freeware program that can speed up copying tons of >> data (2 TB) from UFS to ZFS on same server? > > Setting 'sync=disabled' for the initial copy will help, since it will > make all writes asynchronous. Few local commands are sync. cp, cpio, tar, rsync, etc are all effectively async for local copies. Disabling the ZIL will do nothing to speed these. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Still no way to recover a "corrupted" pool
On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash wrote: > Is there anyway, yet, to import a pool with corrupted space_map > errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions? > > I have a pool comprised of 4 raidz2 vdevs of 6 drives each. I have > almost 10 TB of data in the pool (3 TB actual disk space used due to > dedup and compression). While testing various failure modes, I have > managed to corrupt the pool to the point where it won't import. So > much for being bulletproof. :( > > If I try to import the pool normally, it give corrupted space_map errors. > > If I try to "import -F" the pool, it complains that "zio-io_type != > ZIO_TYPE_WRITE". > > I've also tried the above with "-o readonly=on" and "-R > some/other/root" variations. > > There's also no zfs.cache file anywhere to be found, and creating a > blank file doesn't help. > > Does this mean that a 10 TB pool can be lost due to a single file > being corrupted, or a single piece of pool metadata being corrupted? > And that there's *still* no recovery tools for situations like this? > > Running ZFSv28 on 64-bit FreeBSD 8-STABLE. > > For the curious, the failure mode that causes this? Rebooting while 8 > simultaneous rsyncs were running, which were not killed by the > shutdown process for some reason, which prevented 8 ZFS filesystems > from being unmounted, which prevented the pool from being exported > (even though I have a "zfs unmount -f" and "zpool export -f" > fail-safe), which locked up the shutdown process requiring a power > reset. Well, by commenting out the VERIFY line for zio->io_type != ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but only with -F and -o readonly=on. :( Trying to import it read-write gives dmu_free_range errors and panics the system. Compiling a kernel with that assertion commented out allows the pool to be imported read-only. Importing it read-write gives a bunch of other dmu panics. :( :( :( How can it be that after 28 pool format revisions and 5+ years of development, ZFS is still this brittle? I've found lots of threads from 2007 about this very issue, with "don't do that" and "it's not an issue" and "there's no need for a pool consistency checker" and other similar "head in the sand" responses. :( But still no way to prevent or fix this form of corruption. It's great that I can get the pool to import read-only, so the data is still available. But that really doesn't help when I've already rebuilt this pool twice due to this issue. -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Still no way to recover a "corrupted" pool
On Fri, 2011-04-29 at 16:21 -0700, Freddie Cash wrote: > On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash wrote: > > Is there anyway, yet, to import a pool with corrupted space_map > > errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions? >... > Well, by commenting out the VERIFY line for zio->io_type != > ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but > only with -F and -o readonly=on. :( > > It's great that I can get the pool to import read-only, so the data is > still available. But that really doesn't help when I've already > rebuilt this pool twice due to this issue. > Just curious, did you try an import or recovery with Solaris 11 Express build 151a? I expect it wouldn't have made a difference, but I'd be curious to know. -Alex ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Still no way to recover a "corrupted" pool
On Fri, Apr 29, 2011 at 5:00 PM, Alexander J. Maidak wrote: > On Fri, 2011-04-29 at 16:21 -0700, Freddie Cash wrote: >> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash wrote: >> > Is there anyway, yet, to import a pool with corrupted space_map >> > errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions? > >>... >> Well, by commenting out the VERIFY line for zio->io_type != >> ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but >> only with -F and -o readonly=on. :( >> >> It's great that I can get the pool to import read-only, so the data is >> still available. But that really doesn't help when I've already >> rebuilt this pool twice due to this issue. >> > > Just curious, did you try an import or recovery with Solaris 11 Express > build 151a? I expect it wouldn't have made a difference, but I'd be > curious to know. No, that's on the menu for next week, trying a couple OpenSolaris, Solaris Express, Nexenta LiveCDs to see if they make a difference. -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Still no way to recover a "corrupted" pool
On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash wrote: > Running ZFSv28 on 64-bit FreeBSD 8-STABLE. I'd suggest trying to import the pool into snv_151a (Solaris 11 Express), which is the reference and development platform for ZFS. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
> From: Edward Ned Harvey > I saved the core and ran again. This time it spewed "leaked space" messages > for an hour, and completed. But the final result was physically impossible (it > counted up 744k total blocks, which means something like 3Megs per block in > my 2.39T used pool. I checked compressratio is 1.00x and I have no > compression.) > > I ran again. > > Still spewing messages. This can't be a good sign. > > Anyone know what it means, or what to do about it? After running again, I get an even more impossible number ... 45.4K total blocks, which would mean something like 50 megs per block. This pool does scrub regularly (every other week). In fact, it's scheduled to scrub this weekend ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > What does it mean / what should you do, if you run that command, and it > starts spewing messages like this? > leaked space: vdev 0, offset 0x3bd8096e00, size 7168 And one of these: Assertion failed: space_map_load(&msp->ms_map, &zdb_space_map_ops, 0x0, &msp->ms_smo, spa->spa_meta_objset) == 0, file ../zdb.c, line 1439, function zdb_leak_init Abort (core dumped) I saved the core and ran again. This time it spewed "leaked space" messages for an hour, and completed. But the final result was physically impossible (it counted up 744k total blocks, which means something like 3Megs per block in my 2.39T used pool. I checked compressratio is 1.00x and I have no compression.) I ran again. Still spewing messages. This can't be a good sign. Anyone know what it means, or what to do about it? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
> From: Neil Perrin [mailto:neil.per...@oracle.com] > > The size of these structures will vary according to the release you're running. > You can always find out the size for a particular system using ::sizeof within > mdb. For example, as super user : > > : xvm-4200m2-02 ; echo ::sizeof ddt_entry_t | mdb -k > sizeof (ddt_entry_t) = 0x178 > : xvm-4200m2-02 ; echo ::sizeof arc_buf_hdr_t | mdb -k > sizeof (arc_buf_hdr_t) = 0x100 > : xvm-4200m2-02 ; I can do the echo | mdb -k. But what is that : xvm-4200 command? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On Thu, Apr 28, 2011 at 6:48 PM, Edward Ned Harvey wrote: > What does it mean / what should you do, if you run that command, and it > starts spewing messages like this? > leaked space: vdev 0, offset 0x3bd8096e00, size 7168 I'm not sure there's much you can do about it short of deleting datasets and/or snapshots. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss