Re: [zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

Lachlan Mulcahy Mon, 05 Dec 2011 17:17:45 -0800

Hi All,

Just a follow up - it seems like whatever it was doing it eventually got
done with and the speed picked back up again. The send/recv finally
finished -- I guess I could do with a little patience :)


Lachlan

On Mon, Dec 5, 2011 at 10:47 AM, Lachlan Mulcahy <lmulc...@marinsoftware.com
> wrote:

> Hi All,
>
> We are currently doing a zfs send/recv with mbuffer to send incremental
> changes across and it seems to be running quite slowly, with zfs receive
> the apparent bottle neck.
>
> The process itself seems to be using almost 100% of a single CPU in "sys"
> time.
>
> Wondering if anyone has any ideas if this is normal or if this is just
> going to run forever and never finish...
>
>
> details - two machines connected via Gigabit Ethernet on the same LAN.
>
> Sending server:
>
> zfs send -i 20111201_1 data@20111205_1 | mbuffer -s 128k -m 1G -O
> tdp03r-int:9090
>
> Receiving server:
>
> mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF tank/db/data
>
> mbuffer showing:
>
> in @  256 KiB/s, out @  256 KiB/s,  306 GiB total, buffer 100% ful
>
>
>
> My debug:
>
> DTraceToolkit hotkernel reports:
>
> zfs`lzjb_decompress                                        10   0.0%
> unix`page_nextn                                            31   0.0%
> genunix`fsflush_do_pages                                   37   0.0%
> zfs`dbuf_free_range                                       183   0.1%
> genunix`list_next                                        5822   3.7%
> unix`mach_cpu_idle                                     150261  96.1%
>
>
> Top shows:
>
>    PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>  22945 root        1  60    0   13M 3004K cpu/6  144:21  3.79% zfs
>    550 root       28  59    0   39M   22M sleep   10:19  0.06% fmd
>
> I'd say the 3.7% or so here is so low because we are providing not per
> CPU, but aggregate CPU usage. mpstat seems to show the real story.
>
> mpstat 1 shows output much like this each second:
>
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
> idl
>   0    0   0    0   329  108   83    0   17    3    0     0    0   0   0
> 100
>   1    0   0    0   100    1   94    0   23    1    0     0    0   0   0
> 100
>   2    0   0    0    32    0   28    0    5    1    0     0    0   0   0
> 100
>   3    0   0    0    18    0   11    0    0    0    0     0    0   0   0
> 100
>   4    0   0    0    16    6   10    0    2    0    0     0    0   0   0
> 100
>   5    0   0    0     6    0    2    0    0    0    0     0    0   0   0
> 100
>   6    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>   7    0   0    0     9    0    4    0    0    0    0    16    0   0   0
> 100
>   8    0   0    0     6    0    3    0    0    0    0     0    0   3   0
> 97
>   9    0   0    0     3    1    0    0    0    0    0     0    0   0   0
> 100
>  10    0   0    0    22    2   35    0    1    1    0     0    0  89   0
> 11
>  11    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>  12    0   0    0     3    0    2    0    1    0    0     2    0   0   0
> 100
>  13    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>  14    0   0    0    24   17    6    0    0    2    0    61    0   0   0
> 100
>  15    0   0    0    14    0   24    0    0    1    0     2    0   0   0
> 100
>  16    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>  17    0   0    0    10    2    8    0    0    5    0    78    0   1   0
> 99
>  18    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>  19    0   0    0     5    1    2    0    0    0    0    10    0   0   0
> 100
>  20    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>  21    0   0    0     9    2    4    0    0    0    0     4    0   0   0
> 100
>  22    0   0    0     4    0    0    0    0    0    0     0    0   0   0
> 100
>  23    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>
>
> So I'm lead to believe that zfs receive is spending almost 100% of a
> single CPUs time doing a lot of genunix`list_next ...
>
> Any ideas what is going on here?
>
> Best Regards,
> --
> Lachlan Mulcahy
> Senior DBA,
> Marin Software Inc.
> San Francisco, USA
>
> AU Mobile: +61 458 448 721
> US Mobile: +1 (415) 867 2839
> Office : +1 (415) 671 6080
>
>


-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

Reply via email to