Re: HAST instability

2011-06-14 Thread Daniel Kalchev
On 14.06.11 17:56, Mikolaj Golub wrote: It has turned out that automatic receive buffer sizing works only for connections in ESTABLISHED state. And with small receive buffer the connection might stuck sending data only via TCP window probes -- one byte every few seconds (see "Scenario to make r

Re: HAST instability

2011-06-14 Thread Mikolaj Golub
On Tue, 14 Jun 2011 16:39:11 +0300 Daniel Kalchev wrote: DK> On 10.06.11 20:07, Mikolaj Golub wrote: >> On Fri, 10 Jun 2011 20:05:43 +0300 Mikolaj Golub wrote to Daniel Kalchev: >> >> MG> Could you please try this patch? >> >> MG> http://people.freebsd.org/~trociny/hastd.no_shutdown.p

Re: HAST instability

2011-06-14 Thread Daniel Kalchev
On 10.06.11 20:07, Mikolaj Golub wrote: On Fri, 10 Jun 2011 20:05:43 +0300 Mikolaj Golub wrote to Daniel Kalchev: MG> Could you please try this patch? MG> http://people.freebsd.org/~trociny/hastd.no_shutdown.patch Sure you still have to have your kernel patched with uipc_socket.c.patch :

Re: HAST instability

2011-06-10 Thread Mikolaj Golub
On Fri, 10 Jun 2011 20:05:43 +0300 Mikolaj Golub wrote to Daniel Kalchev: MG> Could you please try this patch? MG> http://people.freebsd.org/~trociny/hastd.no_shutdown.patch Sure you still have to have your kernel patched with uipc_socket.c.patch :-) -- Mikolaj Golub ___

Re: HAST instability

2011-06-10 Thread Mikolaj Golub
On Fri, 03 Jun 2011 19:18:29 +0300 Daniel Kalchev wrote: DK> Well, apparently my HAST joy was short. On a second run, I got stuck with DK> Jun 3 19:08:16 b1a hastd[1900]: [data2] (primary) Unable to receive DK> reply header: Operation timed out. DK> on the primary. No messages on the secon

Re: HAST instability

2011-06-03 Thread Daniel Kalchev
Well, apparently my HAST joy was short. On a second run, I got stuck with Jun 3 19:08:16 b1a hastd[1900]: [data2] (primary) Unable to receive reply header: Operation timed out. on the primary. No messages on the secondary. On primary: # netstat -an | grep 8457 tcp4 0 0 10.2.101.

Re: HAST instability

2011-06-03 Thread Daniel Kalchev
Decided to apply the patch proposed in -current by Mikolaj Golub: http://people.freebsd.org/~trociny/uipc_socket.c.patch This apparently fixed my issue as well. Running without checksums for a full bonnie++ run (~100GB write/rewrite) produced no disconnects, no stalls and generated up to 280MB

Re: HAST instability

2011-05-31 Thread Daniel Kalchev
Here goes the second run, wihtout checksums. systat -if /0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10 Load Average Interface Traffic PeakTotal lo0 in 0.000 KB/s 71.666 KB/s 361.825

Re: HAST instability

2011-05-31 Thread Daniel Kalchev
On 31.05.11 17:08, Mikolaj Golub wrote: As I wrote privately, it would be nice to see both netstat and hast logs (from both nodes) for the same rather long period, when several cases occured. It would be good to place them somewere on web so other guys could access them too, as I will be offli

Re: HAST instability

2011-05-31 Thread Mikolaj Golub
On Tue, 31 May 2011 15:51:07 +0300 Daniel Kalchev wrote: DK> On 30.05.11 21:42, Mikolaj Golub wrote: >> DK> One strange thing is that there is never established TCP connection >> DK> between both nodes: >> >> DK> tcp4 0 0 10.2.101.11.48939 10.2.101.12.8457 >>

Re: HAST instability

2011-05-31 Thread Daniel Kalchev
On 30.05.11 21:42, Mikolaj Golub wrote: DK> One strange thing is that there is never established TCP connection DK> between both nodes: DK> tcp4 0 0 10.2.101.11.48939 10.2.101.12.8457 FIN_WAIT_2 DK> tcp4 0 1288 10.2.101.11.57008 10.2.101.12.8457

Re: HAST instability

2011-05-30 Thread Mikolaj Golub
On Mon, 30 May 2011 17:43:04 +0300 Daniel Kalchev wrote: DK> tcp4 0 0 10.2.101.11.48939 10.2.101.12.8457 FIN_WAIT_2 DK> tcp4 0 1288 10.2.101.11.57008 10.2.101.12.8457 CLOSE_WAIT DK> tcp4 0 0 10.2.101.11.46346 10.2.101.12.8457 FI

Re: HAST instability

2011-05-30 Thread Mikolaj Golub
On Mon, 30 May 2011 17:43:04 +0300 Daniel Kalchev wrote: DK> Some further investigation: DK> The HAST nodes do not disconnect when checksum is enabled (either DK> crc32 or sha256). DK> One strange thing is that there is never established TCP connection DK> between both nodes: DK> tcp4

Re: HAST instability

2011-05-30 Thread Daniel Kalchev
Some further investigation: The HAST nodes do not disconnect when checksum is enabled (either crc32 or sha256). One strange thing is that there is never established TCP connection between both nodes: tcp4 0 0 10.2.101.11.48939 10.2.101.12.8457 FIN_WAIT_2 tcp4 0

HAST instability

2011-05-29 Thread Daniel Kalchev
I am trying to get a basic HAST setup working on 8-stable (as of today). Hardware is two supermicro blades, each with 2x Xeon E5620 processors, 48GB RAM, integrated LSI2008 controller, two 600GB SAS2 Toshiba drives, two Intel gigabit interfaces and two Intel 10Gbit interfaces. On each of the d