Hi, I've got a fairly simple setup: two hosts running 9.0-R (will upgrade to stable if told to, but want to check here first), ZFS and HAST. HAST is configured to run on top of zvols configured on each host, as illustrated:
FS FS +------+ +------+ | hvol | <---- hastd -----> | hvol | +------+ +------+ | zvol | | zvol | +------+ +------+ | zfs | | zfs | +------+ +------+ h1 h2 Connection is gigabit to the same switch. No issues with large TCP transfers such as SCP/FTP. Config is vanilla: # zfs create -V 10G zfs/hvol hast.conf: resource hvol { on h1 { local /dev/zvol/zfs/hvol remote tcp4://192.168.1.100 } on h2 { local /dev/zvol/zfs/hvol remote tcp4://192.168.1.200 } } h1 is behaving fine as primary, either with h2 turned off or in init - but as soon as I set the role to secondary for h2, the receiver repeatedly crashes and restarts - see the traces below. I've seen http://lists.freebsd.org/pipermail/freebsd-current/2011-May/024871.html http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2012-01/msg00510.html ... but in the first case the fix is in 9 since last year, and the second is referring to async replication - I'm using the default (fullsync). hastctl status on the primary shows the dirty size diminishing slowly, but obviously this isn't optimal (and causes freezes on I/O to the primary hvol, causing all kinds of issues with the consumers of the hvol). Any idea ? Am I doing something wrong ? Primary: Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31642091520, 131072). Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31649693696, 131072). Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31691243520, 131072). Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31783256064, 131072). Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31782731776, 131072). Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31803441152, 131072). Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31881953280, 131072). Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Secondary: Mar 11 01:01:30 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2874, exitcode=75). Mar 11 01:01:38 h2 hastd[2875]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:01:44 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2875, exitcode=75). Mar 11 01:01:45 h2 hastd[2876]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:01:50 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2876, exitcode=75). Mar 11 01:01:56 h2 hastd[2877]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:01 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2877, exitcode=75). Mar 11 01:02:05 h2 hastd[2878]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:11 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2878, exitcode=75). Mar 11 01:02:15 h2 hastd[2879]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:20 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2879, exitcode=75). Mar 11 01:02:30 h2 hastd[2880]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:34 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2880, exitcode=75). Mar 11 01:02:41 h2 hastd[2881]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:47 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2881, exitcode=75). Mar 11 01:02:48 h2 hastd[2882]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:54 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2882, exitcode=75). Mar 11 01:02:59 h2 hastd[2883]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:03:04 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2883, exitcode=75). Mar 11 01:03:13 h2 hastd[2884]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:03:17 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2884, exitcode=75). Mar 11 01:03:18 h2 hastd[2885]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:03:23 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2885, exitcode=75). Mar 11 01:03:28 h2 hastd[2886]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:03:33 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2886, exitcode=75). Mar 11 01:03:42 h2 hastd[2887]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:03:48 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2887, exitcode=75). Mar 11 01:03:48 h2 hastd[2888]: [hvol] (secondary) Unable to receive request header: Socket is not connected. _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"