On Wed, Jan 27, 2010 at 12:01:36PM -0800, Gregory Durham wrote:
> Hello All,
> I read through the attached threads and found a solution by a poster and
> decided to try it.

That may have been mine - good to know it helped, or at least started to.

> The solution was to use 3 files (in my case I made them sparse)

yep - writes to allocate space for them up front are pointless with CoW.

> I then created a raidz2 pool across these 3 files 

Really?  If you want one tape's worth of space, written to 3 tapes,
you might as well just write the same file to three tapes, I think.
(I'm assuming here the files are the size you expect to write to
a single tape - otherwise I'm even more confused about this bit).

Perhaps it's easier to let zfs cope with repairing small media errors
here and there, but the main idea of using a redundant pool of files
was to cope with loss or damage to whole tapes, for a backup that
already needed to span multiple tapes. If you want this three-way copy
of a single tape, plus easy recovery from bad spots by reading back
multiple tapes, then use a 3-way mirror.  But consider the
error-recovery mode of whatever you're using to write to tape - some
skip to the next file on a read error.

I expect similar ratios of data to parity files/tapes as would be used
in typical disk setups, at least for "wide stripes".  Say raidz2 in
sets of 10, 8+2, or so.   (As an aside, I like this for disks, too -
since striping 128k blocks to a power-of-two wide data stripe has to
be more efficient)

> and started a zfs send | recv. The performance is horrible

There can be several reasons for this, and we'd need to know more
about your setup.

The first critical thing is going to be the setup of the staging
filesystem tha holds your pool files.  If this is itself a raidz,
perhaps you're iops limited - you're expecting 3 disk-files worth of
concurrency from a pool that may not have it, though it should be a
write-mostly workload so less sensitive.  You'll be seeking a lot
either way, though.

If this is purely staging to tape, consider making the staging pool
out of non-redundant single-disk vdevs.  Alternately, if the staging
pool is safe, there's another trick you might consider: create the
pool, then offline 2 files while you recv, leaving the pool-of-files
degraded.  Then when you're done, you can let the pool resilver and
fill in the redundancy.  This might change the IO pattern enough to
take less time overall, or at least allow you some flexibility with
windows to schedule backup and tapes.

Next is dedup - make sure you have the memory and l2arc capacity to
dedup the incoming write stream.  Dedup within the pool of files if
you want and can (because this will dedup your tapes), but don't dedup
under it as well. I've found this to produce completely pathological
disk thrashing, in a related configuration (pool on lofi crypto
file).  Stacking dedup like this doubles the performance cliff under
memory pressure we've been talking about recently.

(If you really do want 3-way-mirror files, then by all means dedup
them in the staging pool.) 

Related to this is arc usage - I haven't investigated this carefully
myself, but you may well be double-caching: the backup pool's data, as
well as the staging pool's view of the files.  Again, since it's a
write mostly workload zfs should hopefully figure out that few blocks
are being re-read, but you might experiment with primarycache=metadata
for the staging pool holding the files.  Perhaps zpool-on-files is
smart enough to use direct io bypassing cache anyway, I'm not sure.

How's your cpu usage? Check that you're not trying to double-compress
the files (again, within the backup pool but not outside) and consider
using a lightweight checksum rather than sha256 outside.

Then there's streaming and concurrency - try piping through buffer and
using bigger socket and tcp buffers.  TCP stalls and slow-start will
amplify latency many-fold.

A good zil device on the staging pool might also help, the backup pool
will be doing sync writes to close its txgs, though probably not too
many others. I haven't experimented here, either.

> This pool is temporary as it will be sent to tape, deleted and
> recreated.

I tend not to do that, since I can incrementally update the pool
contents before rewriting tapes.  This helps hide the performance
issues dramatically since much less data is transferred and written to
the files, after the first time. 

> Is it possible to zfs send to two destination simultaneously?

Yes, though it's less convenient than using -R on the top of the
pool, since you have to solve any dependencies (including clone
renames) yourself.  Whether this helps or hurts depends on your
bottleneck: it will help with network and buffering issues, but hurt
(badly) if you're limited by thrashing seeks (at the writer, since you
already know the reader can sustain higher rates).

> Or am I stuck. Any pointers would be great!

Never. Always! :-)

--
Dan.

Attachment: pgpIPYL2vmAI0.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to