Hi Craig
I am able to reproduce the issue now . I have postgres-8.1.2 installed in
cluster setup.

I have started the below query from one system let say A to system B in
cluster .
psql -U<dbname> -h<ip of system B> -c "select sleep(300);"

while this command is going on , system B is stopped abruptly by taking out
the power cable from it . This caused the above query on system A to hang.
This is still showing in 'ps -eaf' output after one day.  I think the tcp
keepalive mechanism which has been set at system level should have closed
this connection. But it didnt . Following keepalive values have been set on
system A :

net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
Why system level keepalive is not working in this case. Well, I learnt
, from the link you have provided, that programs must request keepalive
control for their sockets using the setsockopt interface. I wonder if
postgres8.1.2 supports / request for system level keepalive control ?? If
not, then which release/version of  postgres supports that ??

Thanks...
Tamanna


On Tue, Nov 29, 2011 at 4:56 PM, tamanna madaan <
tamanna.mad...@globallogic.com> wrote:

> well, one question : Is tcp-keep-alive enabled by default in postgres-8.1.2 .
>
> I am using postgres on linux platform .
>
>
>
> On Tue, Nov 29, 2011 at 8:51 AM, tamanna madaan <
> tamanna.mad...@globallogic.com> wrote:
>
>> Hi Craig
>>
>> Thanks for your reply . But unfortunately I dont have that process
>> running right now. I have already killed that process . But I have seen
>> this problem sometimes on my setup.
>> It generally happens when the remote system is going slow for some reason
>> (CPU utilization high etc.)  . But whatever is the reason , I would assume
>> that the query should return with some error or so
>> in case the system, the query is running on , is rebooted .  But  it
>> doesn't return and remain stuck. Moreover, the same query sometimes hangs
>> even if it is run on local postgres  database so I dont think
>> network issues have any role in that . Please help.
>>
>> Thanks....
>>
>> Regards
>> Tamanna
>>
>>
>> On Tue, Nov 29, 2011 at 7:58 AM, Craig Ringer <ring...@ringerc.id.au>wrote:
>>
>>> On 11/28/2011 05:30 PM, tamanna madaan wrote:
>>>
>>>> Hi All
>>>> I have postgres installed in cluster setup. My system has a script
>>>> which executes the below query on remote system in cluster.
>>>> psql -t -q -Uslon -h<hostip> -d<dbname> -c"select 1;"
>>>> But somehow this query got stuck. It didnt return even after the remote
>>>> system( on which this query was supposed to execute) is rebooted . What
>>>> could be the reason ??
>>>>
>>>
>>> I relised just after sending my last message:
>>>
>>> You should use ps to find out what exactly psql is doing and which
>>> system call it's blocked in in the kernel (if it's waiting on a syscall).
>>> As you didn't mention your OS I'll assume you're on Linux, where you'd use:
>>>
>>>  ps -C psql -o wchan:80=
>>>
>>> or
>>>
>>>  ps -p 1234 -o wchan:80=
>>>
>>> ... where "1234" is the pid of the stuck psql process. In a psql waiting
>>> for command line input I see it blocked in the kernel routine "n_tty_read"
>>> for example.
>>>
>>>
>>> If you really want to know what it's doing you can also attach gdb and
>>> get a backtrace to see what code it's paused in inside psql:
>>>
>>> gdb -q -p 1234 <<__END__
>>> bt
>>> q
>>> __END__
>>>
>>> If you get a message about "missing debuginfos", lots of lines reading
>>> "no debugging symbols found" or lots of lines ending in "?? ()" then you
>>> need to install debug symbols. How to do that depends on your OS/distro so
>>> I won't go into that; it's documented on the PostgreSQL wiki under "how to
>>> get a stack trace" but you probably won't want to bother if this is just
>>> for curiosity's sake.
>>>
>>> You're looking for output that looks like:
>>>
>>> #1  0x000000369d22a131 in rl_getc () from /lib64/libreadline.so.6
>>> #2  0x000000369d22a8e9 in rl_read_key () from /lib64/libreadline.so.6
>>> #3  0x000000369d215b11 in readline_internal_char () from
>>> /lib64/libreadline.so.6
>>> #4  0x000000369d216065 in readline () from /lib64/libreadline.so.6
>>>
>>> ... etc ...
>>>
>>>
>>> --
>>> Craig Ringer
>>>
>>
>>
>>
>> --
>> Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
>> Leaders in Software R&D Services
>> ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA
>>
>> Office: +0-120-406-2000 x 2971
>>
>> www.globallogic.com
>>
>>
>>
>
>
> --
> Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
> Leaders in Software R&D Services
> ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA
>
> Office: +0-120-406-2000 x 2971
>
> www.globallogic.com
>
>
>


-- 
Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
Leaders in Software R&D Services
ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

Office: +0-120-406-2000 x 2971

www.globallogic.com

Reply via email to