On Sat, Oct 30, 2010 at 03:25:56PM +0300, Mikolaj Golub wrote: > > On Thu, 28 Oct 2010 22:08:54 +0300 Mikolaj Golub wrote to Pawel Jakub Dawidek: > > PJD>> I looked at the code and the keepalive packets arbe sent from another > PJD>> thread. Could you try turning them off in primary.c and see if that > PJD>> helps? > > MG> At first I set RETRY_SLEEP to 1 sec to have more keepalive packets. The > errors > MG> started to observe frequently: > > MG> Oct 28 21:35:53 bolek hastd[1709]: [storage] (secondary) Unable to > receive request header: RPC version wrong. > MG> Oct 28 21:35:54 bolek hastd[1632]: [storage] (secondary) Worker process > exited ungracefully (pid=1709, exitcode=75). > MG> Oct 28 21:36:12 bolek hastd[1722]: [storage] (secondary) Unable to > receive request header: RPC version wrong. > MG> Oct 28 21:36:12 bolek hastd[1632]: [storage] (secondary) Worker process > exited ungracefully (pid=1722, exitcode=75). > MG> ... > > MG> Now I have been running synchronization for more then a half an hour with > MG> keepalive_send disabled and have not seen any error. > > So :-) What do you think about sending keepalive in remote_send_thread() to > avoid this problem and sending them only when a connection is idle (it looks > like there is no much use to send them all the time)? Something like in the > patch below (it works for me).
I like your patch and I agree of course it is better to send keepalive packets only when connection is idle. The only thing I'd change is to modify QUEUE_TAKE1() macro to take additional argument 'timeout' - if we don't want it to time out, we pass 0. Could you modify your patch? -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am!
pgphudAeuOdiS.pgp
Description: PGP signature