Hi All, Just a follow up - it seems like whatever it was doing it eventually got done with and the speed picked back up again. The send/recv finally finished -- I guess I could do with a little patience :)
Lachlan On Mon, Dec 5, 2011 at 10:47 AM, Lachlan Mulcahy <lmulc...@marinsoftware.com > wrote: > Hi All, > > We are currently doing a zfs send/recv with mbuffer to send incremental > changes across and it seems to be running quite slowly, with zfs receive > the apparent bottle neck. > > The process itself seems to be using almost 100% of a single CPU in "sys" > time. > > Wondering if anyone has any ideas if this is normal or if this is just > going to run forever and never finish... > > > details - two machines connected via Gigabit Ethernet on the same LAN. > > Sending server: > > zfs send -i 20111201_1 data@20111205_1 | mbuffer -s 128k -m 1G -O > tdp03r-int:9090 > > Receiving server: > > mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF tank/db/data > > mbuffer showing: > > in @ 256 KiB/s, out @ 256 KiB/s, 306 GiB total, buffer 100% ful > > > > My debug: > > DTraceToolkit hotkernel reports: > > zfs`lzjb_decompress 10 0.0% > unix`page_nextn 31 0.0% > genunix`fsflush_do_pages 37 0.0% > zfs`dbuf_free_range 183 0.1% > genunix`list_next 5822 3.7% > unix`mach_cpu_idle 150261 96.1% > > > Top shows: > > PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND > 22945 root 1 60 0 13M 3004K cpu/6 144:21 3.79% zfs > 550 root 28 59 0 39M 22M sleep 10:19 0.06% fmd > > I'd say the 3.7% or so here is so low because we are providing not per > CPU, but aggregate CPU usage. mpstat seems to show the real story. > > mpstat 1 shows output much like this each second: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt > idl > 0 0 0 0 329 108 83 0 17 3 0 0 0 0 0 > 100 > 1 0 0 0 100 1 94 0 23 1 0 0 0 0 0 > 100 > 2 0 0 0 32 0 28 0 5 1 0 0 0 0 0 > 100 > 3 0 0 0 18 0 11 0 0 0 0 0 0 0 0 > 100 > 4 0 0 0 16 6 10 0 2 0 0 0 0 0 0 > 100 > 5 0 0 0 6 0 2 0 0 0 0 0 0 0 0 > 100 > 6 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 7 0 0 0 9 0 4 0 0 0 0 16 0 0 0 > 100 > 8 0 0 0 6 0 3 0 0 0 0 0 0 3 0 > 97 > 9 0 0 0 3 1 0 0 0 0 0 0 0 0 0 > 100 > 10 0 0 0 22 2 35 0 1 1 0 0 0 89 0 > 11 > 11 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 12 0 0 0 3 0 2 0 1 0 0 2 0 0 0 > 100 > 13 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 14 0 0 0 24 17 6 0 0 2 0 61 0 0 0 > 100 > 15 0 0 0 14 0 24 0 0 1 0 2 0 0 0 > 100 > 16 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 17 0 0 0 10 2 8 0 0 5 0 78 0 1 0 > 99 > 18 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 19 0 0 0 5 1 2 0 0 0 0 10 0 0 0 > 100 > 20 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 21 0 0 0 9 2 4 0 0 0 0 4 0 0 0 > 100 > 22 0 0 0 4 0 0 0 0 0 0 0 0 0 0 > 100 > 23 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > > > So I'm lead to believe that zfs receive is spending almost 100% of a > single CPUs time doing a lot of genunix`list_next ... > > Any ideas what is going on here? > > Best Regards, > -- > Lachlan Mulcahy > Senior DBA, > Marin Software Inc. > San Francisco, USA > > AU Mobile: +61 458 448 721 > US Mobile: +1 (415) 867 2839 > Office : +1 (415) 671 6080 > > -- Lachlan Mulcahy Senior DBA, Marin Software Inc. San Francisco, USA AU Mobile: +61 458 448 721 US Mobile: +1 (415) 867 2839 Office : +1 (415) 671 6080
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss