On Tue, 13 Mar 2012 00:22:23 +0100 Phil Regnauld wrote: PR> Mikolaj Golub (to.my.trociny) writes: >> >> It looks like in the case of hastd this was send(2) who returned ENOMEM, but >> it would be good to check. Could you please start synchronization again, >> ktrace primary worker process when ENOMEM errors are observed and show >> output >> here?
PR> Ok, took a little while, as running ktrace on the hastd does slow it down PR> significantly, and the error normally occurs at 30-90 sec intervals. PR> 0x0f90 b2f3 3ad5 e657 7f0f 3e50 698f 5deb 12af |..:..W..>Pi.]...| PR> 0x0fa0 740d c343 6e80 75f3 e1a7 bfdf a4c1 f6a6 |t..Cn.u.........| PR> 0x0fb0 ea85 655d e423 bd5e 42f7 7e9a 05d2 363a |..e].#.^B.~...6:| PR> 0x0fc0 025e a7b5 0956 417c f31c a6eb 2cd9 d073 |.^...VA|....,..s| PR> 0x0fd0 2589 e8c0 d76a 889f 8345 eeaf f2a0 c2d6 |%....j...E......| PR> 0x0fe0 b89e aaef fee2 6593 e515 7271 88aa cf66 |......e...rq...f| PR> 0x0ff0 d272 411a 7289 d6c9 6643 bdbe 3c8c 8ae8 |.rA.r...fC..<...| PR> 50959 hastd RET sendto 32768/0x8000 PR> 50959 hastd CALL sendto(0x6,0x8024bf000,0x8000,0x20000<MSG_NOSIGNAL>,0,0) PR> 50959 hastd RET sendto -1 errno 12 Cannot allocate memory PR> 50959 hastd CALL clock_gettime(0xd,0x7fffff3f86f0) PR> 50959 hastd RET clock_gettime 0 PR> 50959 hastd CALL getpid PR> 50959 hastd RET getpid 50959/0xc70f PR> 50959 hastd CALL sendto(0x3,0x7fffff3f8780,0x84,0,0,0) PR> 50959 hastd GIO fd 3 wrote 132 bytes PR> "<27>Mar 12 23:42:43 hastd[50959]: [hvol] (primary) Unable to sen\ PR> d request (Cannot allocate memory): WRITE(8626634752, 131072)." PR> 50959 hastd RET sendto 132/0x84 PR> 50959 hastd CALL close(0x7) PR> 50959 hastd RET close 0 Ok. So it is send(2). I suppose the network driver could generate the error. Did you tell what network adaptor you had? >> If it is send(2) who fails then monitoring netstat and network driver >> statistics might be helpful. Something like >> >> netstat -nax >> netstat -naT >> netstat -m >> netstat -nid PR> I could run this in a loop, but that would be a lot of data, and might PR> not be appropriate to paste here. PR> I didn't see any obvious errors, but I'm not sure what I'm looking for. PR> netstat -m didn't show anything close to running out of buffers or PR> clusters... >> sysctl -a dev.<nic> >> >> And may be >> >> vmstat -m >> vmstat -z PR> No obvious errors there either, but again what should I look out for ? I would look at sysctl -a dev.<nic> statistics and try to find if there is correlation between ENOMEM failures and growing of error counters. PR> In the meantime, I've also experimented with a few different scenarios, and PR> I'm quite puzzled. PR> For instance, I configured one of the other gigabit cards on each host to PR> provide a dedicated replication network. The main difference is that up PR> until now this has been running using tagged vlans. To be on the safe side, PR> I decided to use an untagged interface (the second gigabit adapter in each PR> machine). PR> PR> Here's where I observed, and it is very odd: PR> PR> - doing a dd ... | ssh dd fails in the same fashion as before PR> - I created a second zvol + hast resource of just 1 GB, and it replicated PR> without any problems, peaking at 75 MB / sec (!) - maybe 1GB is too small PR> ? PR> PR> (side note: hastd doesn't pick up configuration changes even with SIGHUP, PR> which makes it hard to provision new resources on the fly) PR> - I restarted replication on the 100 G hast resource, and it's currently PR> replicating without any problems over the second ethernet, but it's PR> dragging along at 9-10 MB/sec, peaking at 29 MB/sec occasionally. Looking at buffer usage from 'netstat -nax' output ran during synchronization (on both hosts) could provide useful info where the bottleneck is. top -HS output might be useful too. PR> Earlier, I was observing peaks at 65-70 MB sec in between failures... PR> So I don't really know what to conclude :-| -- Mikolaj Golub _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"