Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

hw Mon, 14 Nov 2022 13:48:33 -0800

On Fri, 2022-11-11 at 21:55 -0800, David Christensen wrote:
> [...]
> As with most filesystems, performance of ZFS drops dramatically as you 
> approach 100% usage.  So, you need a data destruction policy that keeps 
> storage usage and performance at acceptable levels.
> 
> Lots of snapshots slows down commands that involve snapshots (e.g.  'zfs 
> list -r -t snapshot ...').  This means sysadmin tasks take longer when 
> the pool has more snapshots.


Hm, how long does it take?  It's not like I'm planning on making hundreds of
snaphsots ...

> 
> > > I have considered switching to one Intel Optane Memory
> > > Series and a PCIe 4x adapter card in each server [for a ZFS cache].
> > 
> > Isn't that very expensinve and wears out just as well?  
> 
> 
> The Intel Optane Memory Series products are designed to be cache devices 
> -- when using compatible hardware, Windows, and Intel software.  My 
> hardware should be compatible (Dell PowerEdge T30), but I am unsure if 
> FreeBSD 12.3-R will see the motherboard NVMe slot or an installed Optane 
> Memory Series product.

Try it out?


> Intel Optane Memory M10 16 GB PCIe M.2 80mm are US $18.25 on Amazon.
> 
> 
> Intel Optane Memory M.2 2280 32GB PCIe NVMe 3.0 x2 are US $69.95 on Amazon.

I thought Optane comes as very expensive PCI cards.  I don't have any m.2 slots,
and it seems difficult to even find mainboards with at least two that support
the same cards, which would be a requirement because there's no storing data
without redundancy.


> > Wouldn't it be better to have the cache in RAM?
> 
> Adding memory should help in more ways than one.  Doing so might reduce 
> ZFS cache device usage, but I am not certain.  But, more RAM will not 
> address the excessive wear problems when using a desktop SSD as a ZFS 
> cache device.

Well, after some fruitless experimentation with btrfs, I finally decided to go
with ZFS for the backups.  It's the better choice because btrfs can't reliably
do RAID5, and deduplication seems still very experimental.  I tried that, too,
and after like 5 hours or so, deduplication with bees freed only about 0.1% disk
space, so that was ridiculous.

ZFS gives me almost twice as much storage capacity as btrfs and also has
snapshots.

> 8 GB ECC memory modules to match the existing modules in my SOHO server 
> are $24.95 each on eBay.  I have two free memory slots.

Apparently ZFS gets really memory hungry with deduplication.  I'll go without,
and when using snapshots, I'll have plenty space left without.

> 
> > > Please run and post the relevant command for LVM, btrfs, whatever.
> > 
> > Well, what would that tell you?
> 
> 
> That would provide accurate information about the storage configuration 
> of your backup server.

Oh I only created that this morning:


# zpool status
  pool: moon
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        moon        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            sdl     ONLINE       0     0     0
            sdm     ONLINE       0     0     0
            sdn     ONLINE       0     0     0
            sdp     ONLINE       0     0     0
            sdq     ONLINE       0     0     0
            sdr     ONLINE       0     0     0
          raidz1-2  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdh     ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdj     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            sdk     ONLINE       0     0     0
            sdo     ONLINE       0     0     0


Some of the disks are 15 years old ...  It made sense to me to group the disks
by the ones that are the same (size and model) and use raidz or mirror depending
on how many disks there are.

I don't know if that's ideal.  Would zfs have it figured out by itself if I had
added all of the disks in a raidz?  With two groups of only two disks each that
might have wasted space?

> Here is the pool in my backup server.  mirror-0 and mirror-1 each use 
> two Seagate 3 TB HDD's.  dedup and cache each use partitions on two 
> Intel SSD 520 Series 180 GB SSD's:
> 
> 2022-11-11 20:41:09 toor@f1 ~
> # zpool status p1
>    pool: p1
>   state: ONLINE
>    scan: scrub repaired 0 in 7 days 22:18:11 with 0 errors on Sun Sep  4 
> 14:18:21 2022
> config:
> 
>         NAME                              STATE     READ WRITE CKSUM
>         p1                                ONLINE       0     0     0
>           mirror-0                        ONLINE       0     0     0
>             gpt/p1a.eli                   ONLINE       0     0     0
>             gpt/p1b.eli                   ONLINE       0     0     0
>           mirror-1                        ONLINE       0     0     0
>             gpt/p1c.eli                   ONLINE       0     0     0
>             gpt/p1d.eli                   ONLINE       0     0     0
>         dedup   
>           mirror-2                        ONLINE       0     0     0
>             gpt/CVCV******D0180EGN-2.eli  ONLINE       0     0     0
>             gpt/CVCV******7K180EGN-2.eli  ONLINE       0     0     0
>         cache
>           gpt/CVCV******D0180EGN-1.eli    ONLINE       0     0     0
>           gpt/CVCV******7K180EGN-1.eli    ONLINE       0     0     0
> 
> errors: No known data errors

Is the SSD cache even relevant for a backup server?  I might have two unused
80GB SSDs I may be able to plug in to use as cache.  Once I can get my network
card to not block every now and then, I can at least sustain about 200MB/s
writing, which is about as much as the disks I'm reading from can deliver.

> > > I suggest creating a ZFS pool with a mirror vdev of two HDD's.
> > >    If you
> > > can get past your dislike of SSD's,
> > >   add a mirror of two SSD's as a
> > > dedicated dedup vdev.  (These will not see the hard usage that cache
> > > devices get.)
> > >    Create a filesystem 'backup'.  Create child filesystems,
> > > one for each host.  Create grandchild filesystems, one for the root
> > > filesystem on each host.
> > 
> > Huh?  What's with these relationships?
> 
> 
> ZFS datasets can be organized into hierarchies.  Child dataset 
> properties can be inherited from the parent dataset.  Commands can be 
> applied to an entire hierarchy by specifying the top dataset and using a 
> "recursive" option.  Etc..

Ah, ok, that's what you mean.

> When a host is decommissioned and you no longer need the backups, you 
> can destroy the backups for just that host.  When you add a new host, 
> you can create filesystems for just that host.  You can use different 
> backup procedures for different hosts.  Etc..

I'll probably make file systems for host-sepcifc data and some for types of
other data.  Some of it doesn't need compression, so I can turn that off per
file system.

> 
> > >    Set up daily rsync backups of the root
> > > filesystems on the various hosts to the ZFS grandchild filesystems.  Set
> > > up zfs-auto-snapshot to take daily snapshots of everything, and retain
> > > 10 snapshots.  Then watch what happens.
> > 
> > What do you expect to happen?  
> 
> 
> I expect the first full backup and snapshot will use an amount of 
> storage that is something less than the sum of the sizes of the source 
> filesystems (due to compression).  The second through tenth backups and 
> snapshots will each increase the storage usage by something less than 
> the sum of the daily churn of the source filesystems.  On day 11, and 
> every day thereafter, the oldest snapshot will be destroyed, daily churn 
> will be added, and usage will stabilize.  Any source system upgrades and 
> software installs will cause an immediate backup storage usage increase. 
>   Any source system cleanings and software removals will cause a backup 
> storage usage decrease after 10 days.

Makes sense ... How does that work with destroying the oldest snapshot?  IIRC,
when a snapshot is removed (destroyed?  That's strange wording, "merge" seems
better ...), it's supposed to somehow merge with the data it has been created
from such that the "first data" becomes what the snapshot was unless the
snapshot is destroyed (that wording would make sense then), meaning it doesn't
exist anymore without merging and the "first data" is still there as it was.

I remember trying to do stuff with snapshots a long time ago and zfs would freak
out telling me that I can't merge a snapshot because there were other snapshots
that were getting in the way (as if I'd care, just figure it out yourself, darn
it, that's your job not mine ...) and it was a nightmare to get rid of those.

> > I'm thinking about changing my backup sever ...
> > In any case, I need to do more homework first.
> 
> Keep your existing backup server and procedures operational.

Nah, I haven't used it in a long time and I switched it from Fedora to Debian
now and it's all very flexible.  Backups are better done in the winter.  There's
a reason why there's 8 fans in it and then some.

Since my memory is bad, I even forgot that I had switched out the HP smart array
controllers some time ago for some 3ware controllers that support JBOD.  That
was quite a pleasant surprise ...

>   If you do 
> not have offline copies of your backups (e.g. drives in removable racks, 
> external drives), implement that now.

Electricity is way too expensive to keep this server running.

> Then work on ZFS.  ZFS looks simple enough going in, but you soon 
> realize that ZFS has a large feature set, new concepts, and a 
> non-trivial learning curve.  Incantations get long and repetitive; you 
> will want to script common tasks.

It all depends on how deep you want to get into it.  I'll make a script to make
backups and perhaps make snapshots by hand for now.

>   Expect to make mistakes.  It would be 
> wise to do your ZFS evaluation in a VM.  Using a VM would also allow you 
> to use any OS supported by the hypervisor (which may work-around the 
> problem of FreeBSD not having drivers for the HP smart array P410).

Nah, I can do whatever with this backup server and it was good for experimenting
with btrfs and its deduplication.

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

Reply via email to