Re: [zfs-discuss] OpenIndiana | ZFS | scrub | network | awful slow

Sven C. Merckens Sat, 23 Jul 2011 08:56:49 -0700

Hi Richard, hi Daniel, hi Roy

thanks for Your response..


Sorry for my delay, I spent a longer time ill in bed and didn't had the time to 
go on in testing the system:


Am 19.06.2011 um 01:39 schrieb Richard Elling:

> You're better off disabling dedup for this workload. If the dedup ratio was 
> more like
> 10, 342, or some number > 2 dedup can be worthwhile.  IMHO, dedup ratios < 2 
> are
> not good candidates for dedup.
> 
> You can look at existing, non-deduped data to get an estimate of the 
> potential dedup
> savings using:
>       zdb -S poolname
> 

because the source isn't on a ZFS volume I can't check the possible dedup 
saving on that before copying it to the ZFS volume...

Do I interpret the numbers correct?
Dedup ratio of 1.18x = 18% saving in space?
combined with the compress ratio of 1.48x I get a combined ratio of 1.67x ≈ 67% 
saving in space?
I think that a good value. 

The data consist of a lot of pictures (jpeg, tiff, raw), Photoshop, InDesign 
files and about 1 TB of eMails.

zdb -DD shows on the already dedupped and comrpessed data the following result 
on System 2:

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1     174M   21.8T   15.4T   15.4T     174M   21.8T   15.4T   15.4T
     2    22.7M   2.84T   1.97T   1.98T    48.6M   6.08T   4.22T   4.23T
     4    1.59M    204G    132G    132G    7.40M    947G    612G    614G
     8     228K   28.5G   18.5G   18.6G    2.29M    293G    190G    190G
    16    45.0K   5.63G   3.67G   3.68G     927K    116G   75.6G   75.9G
    32    5.58K    714M    359M    362M     233K   29.1G   14.4G   14.5G
    64    2.61K    334M    152M    153M     220K   27.4G   11.9G   12.0G
   128      924    116M   30.0M   30.5M     154K   19.3G   5.19G   5.28G
   256      496     62M   28.4M   28.6M     174K   21.7G   9.90G   9.98G
   512      268   33.5M   14.3M   14.4M     168K   21.0G   9.22G   9.29G
    1K       44   5.50M     94K    124K    52.2K   6.52G    116M    152M
    2K        3    384K   1.50K   3.59K    7.19K    920M   3.59M   8.59M
    4K        2    256K      1K   2.39K    10.9K   1.36G   5.43M   13.0M
   16K        2    256K      1K   2.39K    35.2K   4.40G   17.6M   42.0M
  256K        1    128K     512   1.20K     321K   40.2G    161M    384M
 Total     199M   24.9T   17.5T   17.6T     235M   29.4T   20.5T   20.6T

dedup = 1.17, compress = 1.43, copies = 1.00, dedup * compress / copies = 1.67

zdb -D shows on System 2:

DDT-sha256-zap-duplicate: 25742321 entries, size 445 on disk, 174 in core
DDT-sha256-zap-unique: 182847350 entries, size 449 on disk, 186 in core

dedup = 1.17, compress = 1.43, copies = 1.00, dedup * compress / copies = 1.67


I can't find a real explanation of this results in this table in any manual.

Do I interpret the table correct?
199 Million Blocks used, using 24.9TB of space on the volume, referencing 
29.4TB of "real data"?


But for further testing I disabled the dedup, just leaving the gzip-6 
compression.


The values on System 1 will be the same, because it is the same data, only the 
compression rate is only 10%.

>> Dedup is on "always" (not only at the start), also compression is activated:
>> System 1 = compression on (lzib?)
>> System 2 = compression on (gzip-6)
>> compression rates:
>> System 1 = 1.10x
>> System 2 = 1.48x
>> 
>> compression and dedup were some of the primary reasons to choose zfs in this 
>> situation.
> 
> Compression is a win for a large number of cases.  Dedup, not so much.
> 

I was quite impressed about the compressratio with gzip-6 on System 2. Because 
it is the long therm backup-system in a data center (planned) the space on this 
system should be used at the best possible quotes. We don't want to be forced 
to add drives to the system every few weeks. The system was planned to hold the 
actual data and the new data for the next 1-2 years without the need to add 
more drives to it.

But dedup must be a factor, because with dedup=on the same system I only get 
about 10-15Mb/s on writing data to the iSCSI volume, with dedup=off I get about 
45-50MB/s. Because of the gzip-compression this looks like a good value.

>> ok, I understand, more RAM will be no fault.. ;)
>> 
>> On System 1 there is no such massive change in RAM usage while copying files 
>> to and from the volume.
>> But the Performance is only about 20MB/s via GigaBit (iSCSI).
> 
> This is likely not a dedup issue. More likely, Nagle is biting you or there 
> is a
> serialization that is not immediately obvious. I have also seen questionable
> network configurations cause strange slowdowns for iSCSI.
> -- richard

ok, but why didn't that occur at the beginning? We made a lot of tests, with 
some TB of data, performance was satisfying...
This "nagle" problem shouldn't depend on the amount of storage used on the 
system..

As I understand the nagle problem, it could be a problem on the Client as on 
the Server side of the iSCSI-chain.
So I need to have a look on the OS X-settings and on the OpenSolaris settings, 
to eliminate the nagle-problem.

On the OS X-Server-System the value to disable the "nagle problem" is already 
set (long time before using iSCSI)
/etc/sysctl.conf -> net.inet.tcp.delayed_ack=0


What do You mean with "serialization"?

For the System 1 in our network we already planned to put it on a own network. 
So use one EtherNet interface on the OS X server only for iSCSI data and put it 
in a physically own network (with an own switch etc..).



The slowdowns appear on different servers, 
I tried a 
OS X 10.5.8 (Server) Xserve Intel Dual Xeon 2Ghz
OS X 10.6.7 (Server) Mac Pro Server Single Quad-Core Xeon 2.8Ghz
OS X 10.6.7 (Server) Mac Mini Core2Duo 2.66Ghz

The Xserve and Mac Pro use a LACP connection (Link aggregation) with 2 GigaBit 
connections, the Mac Mini a simple GigaBit connection.

The transferrates in reading are comparable about about 70-80Mb/s.

But on all systems the uploadrate is really low..

>> On Wed, Jun 15, 2011 at 07:19:05PM +0200, Roy Sigurd Karlsbakk wrote:
>>> 
>>> Dedup is known to require a LOT of memory and/or L2ARC, and 24GB isn't 
>>> really much with 34TBs of data.
>> 
>> The fact that your second system lacks the l2arc cache device is absolutely 
>> your prime suspect.

I added a SSD for L2Arc and didn't get any change in speed....

But the problem must be somewhere else.

Today I made a scrub on System 1:
root@ACStorage:~# zpool status -v
  pool: ACBackPool_1
 state: ONLINE
 scan: scrub in progress since Sat Jul 23 15:48:32 2011
    35,9G scanned out of 45,4T at 5,36M/s, (scan is slow, no estimated time)
    0 repaired, 0,08% done
config:

        NAME                       STATE     READ WRITE CKSUM
        ACBackPool_1               ONLINE       0     0     0
          raidz2-0                 ONLINE       0     0     0
            c1t5000CCA369C76401d0  ONLINE       0     0     0
            c1t5000CCA369C9E6A4d0  ONLINE       0     0     0
            c1t5000CCA369CAD435d0  ONLINE       0     0     0
            c1t5000CCA369CAF8CCd0  ONLINE       0     0     0
            c1t5000CCA369CBA08Dd0  ONLINE       0     0     0
            c1t5000CCA369CBA666d0  ONLINE       0     0     0
            c1t5000CCA369CBAC4Ed0  ONLINE       0     0     0
            c1t5000CCA369CBB08Ad0  ONLINE       0     0     0
            c1t5000CCA369CBB102d0  ONLINE       0     0     0
            c1t5000CCA369CBB30Fd0  ONLINE       0     0     0
          raidz2-1                 ONLINE       0     0     0
            c1t5000CCA369C9EC38d0  ONLINE       0     0     0
            c1t5000CCA369C9FA35d0  ONLINE       0     0     0
            c1t5000CCA369C9FA83d0  ONLINE       0     0     0
            c1t5000CCA369CA0188d0  ONLINE       0     0     0
            c1t5000CCA369CA12FBd0  ONLINE       0     0     0
            c1t5000CCA369CA3071d0  ONLINE       0     0     0
            c1t5000CCA369CAB044d0  ONLINE       0     0     0
            c1t5000CCA369CB928Ad0  ONLINE       0     0     0
            c1t5000CCA369CBA1D5d0  ONLINE       0     0     0
            c1t5000CCA369CBAAC1d0  ONLINE       0     0     0
          raidz2-2                 ONLINE       0     0     0
            c1t5000CCA369C9F9B3d0  ONLINE       0     0     0
            c1t5000CCA369CA09A6d0  ONLINE       0     0     0
            c1t5000CCA369CA12AFd0  ONLINE       0     0     0
            c1t5000CCA369CA1384d0  ONLINE       0     0     0
            c1t5000CCA369CAAEC0d0  ONLINE       0     0     0
            c1t5000CCA369CAD93Ad0  ONLINE       0     0     0
            c1t5000CCA369CAD950d0  ONLINE       0     0     0
            c1t5000CCA369CADA7Dd0  ONLINE       0     0     0
            c1t5000CCA369CADA89d0  ONLINE       0     0     0
            c1t5000CCA369CADA93d0  ONLINE       0     0     0
        cache
          c1t5E83A97FEFD3F27Bd0    ONLINE       0     0     0
        spares
          c1t5000CCA369C9ED1Bd0    AVAIL   
          c1t5000CCA369CA09B3d0    AVAIL   
          c1t5000CCA369CADA1Fd0    AVAIL   
          c1t5000CCA369CADA88d0    AVAIL   
          c1t5000CCA369CBA0E0d0    AVAIL   
          c1t5000CCA369CBB15Dd0    AVAIL   

errors: No known data errors


awful slow....

I tried some hints I found here 
(http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg45685.html)

echo "metaslab_min_alloc_size/Z 1000" | mdb -kw
echo "zfs_scrub_delay/W0" | mdb -kw

but that didn't change anything...

a dd-test shows:
root@ACStorage:~# dd if=/dev/zero of=/ACBackPool_1/ds.test bs=1024k count=5000
5000+0 records in
5000+0 records out
5242880000 bytes (5,2 GB) copied, 3,20443 s, 1,6 GB/s


1.6GB/s seems to be a real good value.. Or does the SSD manipulate this result?

Why is a dd as fast and the scrub as slow?

Any hints are appreciated.

Many thank in advance

Best Regards

Sven C. Merckens

-- 

Sven C. Merckens
Michael-Ende-Straße 16
52499 Baesweiler
Tel.:    +49 2401 896074
Fax:     +49 2401 801115
mercken...@mac.com


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OpenIndiana | ZFS | scrub | network | awful slow

Reply via email to