Re: [389-users] Skipped request ...

Rich Megginson Fri, 14 May 2010 07:23:36 -0700

Reinhard Nappert wrote:
> Actually, in that case I get:
> [14/May/2010:10:12:43 -0400] - slapd_poll(81) timed out
> [14/May/2010:10:12:44 -0400] - slapd_poll(84) timed out
> [14/May/2010:10:12:44 -0400] - slapd_poll(65) timed out
>
> In errors
>   
That's what I would expect to see - that means the server timed out 
trying to send results to the client.
> -Reinhard 
>
> -----Original Message-----
> From: 389-users-boun...@lists.fedoraproject.org 
> [mailto:389-users-boun...@lists.fedoraproject.org] On Behalf Of Reinhard 
> Nappert
> Sent: Friday, May 14, 2010 10:08 AM
> To: General discussion list for the 389 Directory server project.
> Subject: Re: [389-users] Skipped request ...
>
> Setting it to 5000 seconds makes me to get past those 257 searches, but only 
> to get to 367. At that time the client looses connection to the server, 
> although it is still running and replying to other requests
>
> -Reinhard
>
> BTW No change, when I set it to 1000
>  
>
> -----Original Message-----
> From: 389-users-boun...@lists.fedoraproject.org 
> [mailto:389-users-boun...@lists.fedoraproject.org] On Behalf Of Rich Megginson
> Sent: Friday, May 14, 2010 9:59 AM
> To: General discussion list for the 389 Directory server project.
> Subject: Re: [389-users] Skipped request ...
>
> Reinhard Nappert wrote:
>   
>> Where do you check/configure
>>
>> Ioblocktimeout
>>   
>>     
> nsslapd-ioblocktimeout in cn=config in the dse.ldif If it is not set in your 
> dse.ldif, that means the server is using the default value of 1800000 
> milliseconds, which is probably too much for your application.  Try setting a 
> much, much smaller value, like 5000 (5 seconds).
>   
>> -Reinhard
>>
>> -----Original Message-----
>> From: 389-users-boun...@lists.fedoraproject.org
>> [mailto:389-users-boun...@lists.fedoraproject.org] On Behalf Of Rich 
>> Megginson
>> Sent: Thursday, May 13, 2010 10:58 PM
>> To: General discussion list for the 389 Directory server project.
>> Subject: Re: [389-users] Skipped request ...
>>
>> Reinhard Nappert wrote:
>>   
>>     
>>>  
>>> Rich,
>>>
>>> Let's see if this give you a hint. Hopefully, it does ....
>>>   
>>>     
>>>       
>> Looks really strange, as if the stack trace is corrupted e.g.
>>
>> #7  0x00002b5577b32fbb in plugin_call_plugins (pb=0xa8ad30, 
>> whichfunction=410)
>>     at ../ldap/servers/slapd/plugin.c:326
>> #8  0x00002b5577b3e0fb in flush_ber (pb=0x1, conn=0x2b55789faa49, 
>> op=0xa8ad30, 
>>     ber=0x2, type=2) at ../ldap/servers/slapd/result.c:1500
>>
>> flush_ber() line 1500 is a call to PR_Lock().  It looks like many, many 
>> threads attempting to write to the client, but the client is simply not 
>> reading the reply, or there is some sort of network problem (are you using 
>> some sort of IP failover/load balancing device?).
>>
>> What is your ioblocktimeout?
>>   
>>     
>>> Thanks,
>>> -Reinhard
>>>
>>>
>>> -----Original Message-----
>>> From: 389-users-boun...@lists.fedoraproject.org
>>> [mailto:389-users-boun...@lists.fedoraproject.org] On Behalf Of Rich 
>>> Megginson
>>> Sent: Thursday, May 13, 2010 6:48 PM
>>> To: General discussion list for the 389 Directory server project.
>>> Subject: Re: [389-users] Skipped request ...
>>>
>>> Reinhard Nappert wrote:
>>>   
>>>     
>>>       
>>>> How can I make ns-slapd to produce a core.
>>>>   
>>>>     
>>>>       
>>>>         
>>> You first have to start ns-slapd in an environment that you have done 
>>> ulimit -c unlimited then kill -11 <ns-slapd pid>
>>>   
>>>     
>>>       
>>>> I got it in the same state again, and I did a gdb 
>>>> /opt/UMC/jdb/sbin/ns-slapd 16712
>>>>
>>>> 0x00002b8e908889a2 in poll () from /lib64/tls/libc.so.6
>>>> (gdb) where
>>>> #0  0x00002b8e908889a2 in poll () from /lib64/tls/libc.so.6
>>>> #1  0x00002b8e906b6a5f in PR_Poll () from 
>>>> /opt/UMC/jdb/lib/dirsrv/libnspr4.so
>>>> #2  0x0000000000415ae7 in slapd_daemon (ports=0x7fff1b44cf50)
>>>>     at ../ldap/servers/slapd/daemon.c:662
>>>> #3  0x000000000041c0b3 in main (argc=7, argv=0x7fff1b44d098)
>>>>     at ../ldap/servers/slapd/main.c:1162
>>>>
>>>> Does this help?
>>>>   
>>>>     
>>>>       
>>>>         
>>> Try this:
>>> (gdb) set logging file /tmp/slapd.txt
>>> (gdb) set logging on
>>> (gdb) thread apply all bt
>>> # hit return as many times as needed
>>> (gdb) quit
>>>
>>> Then send me /tmp/slapd.txt
>>>   
>>>     
>>>       
>>>> -Reinhard
>>>>
>>>> -----Original Message-----
>>>> From: 389-users-boun...@lists.fedoraproject.org
>>>> [mailto:389-users-boun...@lists.fedoraproject.org] On Behalf Of Rich 
>>>> Megginson
>>>> Sent: Thursday, May 13, 2010 6:04 PM
>>>> To: General discussion list for the 389 Directory server project.
>>>> Subject: Re: [389-users] Skipped request ...
>>>>
>>>> Reinhard Nappert wrote:
>>>>   
>>>>     
>>>>       
>>>>         
>>>>> Hi Rick,
>>>>>
>>>>> I attached access and error file with debug level 8. The server does not 
>>>>> respond to any requests anymore. If you kill the client, it responds 
>>>>> afterwards.
>>>>>
>>>>> Let me know, what you see.
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>> I don't see anything obvious.  One thing I do know is that this code has 
>>>> been improved since 1.1.2 (especially the debugging, which not very 
>>>> usefully prints the file descriptor addresses in int format :P)  I don't 
>>>> suppose you could try to reproduce this with 1.2.5?
>>>>   
>>>>     
>>>>       
>>>>         
>>>>> Thanks,
>>>>> -Reinhard
>>>>>
>>>>> -----Original Message-----
>>>>> From: 389-users-boun...@lists.fedoraproject.org
>>>>> [mailto:389-users-boun...@lists.fedoraproject.org] On Behalf Of 
>>>>> Rich Megginson
>>>>> Sent: Thursday, May 13, 2010 1:10 PM
>>>>> To: General discussion list for the 389 Directory server project.
>>>>> Subject: Re: [389-users] Skipped request ...
>>>>>
>>>>> Reinhard Nappert wrote:
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>>>> Rich, which debugging level do you suggest? Apparently, I tried to much, 
>>>>>> because it would crash the server constantly.
>>>>>>     
>>>>>>       
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> Debugging levels should not crash the server - can provide more 
>>>>> information about the crash?
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>>>> For now, I go just with 8 (Connection Management). Seeing the problem, 
>>>>>> what would you enable?
>>>>>>   
>>>>>>     
>>>>>>       
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> Yes, start with 8.
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>>>> Thanks,
>>>>>> -Reinhard
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: 389-users-boun...@lists.fedoraproject.org
>>>>>> [mailto:389-users-boun...@lists.fedoraproject.org] On Behalf Of 
>>>>>> Rich Megginson
>>>>>> Sent: Wednesday, May 12, 2010 6:50 PM
>>>>>> To: General discussion list for the 389 Directory server project.
>>>>>> Subject: Re: [389-users] Skipped request ...
>>>>>>
>>>>>> Reinhard Nappert wrote:
>>>>>>   
>>>>>>     
>>>>>>       
>>>>>>         
>>>>>>           
>>>>>>             
>>>>>>> Hi Rich,
>>>>>>>
>>>>>>> I ran some further tests. This entire thing looks kind of weird. I have 
>>>>>>> a kind of monitoring tool, I use to figure out if the server still 
>>>>>>> responds in a timely manner. This tool performs an anonymous bind and 
>>>>>>> reads a specific object, every 30 seconds.
>>>>>>>     
>>>>>>>       
>>>>>>>         
>>>>>>>           
>>>>>>>             
>>>>>>>               
>>>>>> Does it perform an unbind operation?  Does it disconnect the socket?
>>>>>>   
>>>>>>     
>>>>>>       
>>>>>>         
>>>>>>           
>>>>>>             
>>>>>>>  What I see is that the server responds to the incoming request and it 
>>>>>>> performs about 500 requests within those 30 seconds. Then, I see, when 
>>>>>>> the next monitoring connection request comes is, but I never see the 
>>>>>>> bind. Since this times out, the monitoring tool restarts the server 
>>>>>>> after a while (about 10 seconds).
>>>>>>>
>>>>>>> Here are the logs in access:
>>>>>>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection 
>>>>>>> from
>>>>>>> 127.0.0.1 to 127.0.0.1
>>>>>>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection 
>>>>>>> from
>>>>>>> 10.227.6.45 to 10.227.6.53
>>>>>>>
>>>>>>> So, you see the server does not respond to any requests after 
>>>>>>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection 
>>>>>>> from
>>>>>>> 127.0.0.1 to 127.0.0.1
>>>>>>>
>>>>>>> And start responding, once it was restarted:
>>>>>>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection 
>>>>>>> from
>>>>>>> 10.227.6.45 to 10.227.6.53
>>>>>>>
>>>>>>> I was wondering , if we could get somehow some debugging out of 
>>>>>>> ns-slapd, once it is in this state (truss or something else).
>>>>>>>   
>>>>>>>     
>>>>>>>       
>>>>>>>         
>>>>>>>           
>>>>>>>             
>>>>>>>               
>>>>>> http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting
>>>>>> If that produces too much error log output, or kills the 
>>>>>> performance, you can also try replacing the error log with a named
>>>>>> pipe+script -
>>>>>> http://directory.fedoraproject.org/wiki/Named_Pipe_Log_Script
>>>>>> man ds-logpipe.py
>>>>>>   
>>>>>>     
>>>>>>       
>>>>>>         
>>>>>>           
>>>>>>             
>>>>>>> Any help is appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -Reinhard
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: 389-users-boun...@lists.fedoraproject.org
>>>>>>> [mailto:389-users-boun...@lists.fedoraproject.org] On Behalf Of 
>>>>>>> Rich Megginson
>>>>>>> Sent: Tuesday, May 11, 2010 5:21 PM
>>>>>>> To: General discussion list for the 389 Directory server project.
>>>>>>> Subject: Re: [389-users] Skipped request ...
>>>>>>>
>>>>>>> Reinhard Nappert wrote:
>>>>>>>   
>>>>>>>     
>>>>>>>       
>>>>>>>         
>>>>>>>           
>>>>>>>             
>>>>>>>               
>>>>>>>> Hi all,
>>>>>>>>  
>>>>>>>> I have seen a weird behavior of my DS (1.1.2). It has a very 
>>>>>>>> small database (only about 2300 objects). A client performed a 
>>>>>>>> one-level search retrieving the children. The server find 114 
>>>>>>>> objects, but the search was very slow:
>>>>>>>>  
>>>>>>>> [06/May/2010:12:23:11 +0000] conn=127 op=149 SRCH base=<base>
>>>>>>>> scope=1 
>>>>>>>> filter="(&(&(objectClass=<xyz>)(<att1>=value))(!(<att2>=TRUE)))"
>>>>>>>>  
>>>>>>>> yes, the filter is a bit complex, but both attribute types 
>>>>>>>> <att1> and <att2> are indexed. This search usually is fast. It 
>>>>>>>> looks to me that the server is already in a funny state.
>>>>>>>> ...
>>>>>>>> [06/May/2010:12:23:17 +0000] conn=127 op=149 RESULT err=3
>>>>>>>> tag=101
>>>>>>>> nentries=114 etime=7
>>>>>>>>     
>>>>>>>>       
>>>>>>>>         
>>>>>>>>           
>>>>>>>>             
>>>>>>>>               
>>>>>>>>                 
>>>>>>> err=3 is TIMELIMIT_EXCEEDED - that's probably why you aren't getting 
>>>>>>> all of the results you expect, and could be why it's skipping the op.
>>>>>>>   
>>>>>>>     
>>>>>>>       
>>>>>>>         
>>>>>>>           
>>>>>>>             
>>>>>>>               
>>>>>>>>  
>>>>>>>> When the client gets the results, it iterates over those and 
>>>>>>>> gets its children, like:
>>>>>>>>  
>>>>>>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 SRCH base=<dn of 
>>>>>>>> result from previous SRCH> scope=1 
>>>>>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" 
>>>>>>>> attrs=ALL.
>>>>>>>> Those searches are quick:
>>>>>>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 RESULT err=0
>>>>>>>> tag=101
>>>>>>>> nentries=1 etime=0
>>>>>>>>  
>>>>>>>> but somehow the server does not process on of the requests, when 
>>>>>>>> the client iterates over the results:
>>>>>>>>  
>>>>>>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 SRCH base=<dn of 
>>>>>>>> result from previous SRCH> scope=1 
>>>>>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" 
>>>>>>>> attrs=ALL.
>>>>>>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 RESULT err=0
>>>>>>>> tag=101
>>>>>>>> nentries=1 etime=0
>>>>>>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 SRCH base=<dn of 
>>>>>>>> result from previous SRCH> scope=1 
>>>>>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" 
>>>>>>>> attrs=ALL.
>>>>>>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 RESULT err=0
>>>>>>>> tag=101 nentries=0 etime=0 You can see that the server skipped 
>>>>>>>> op=264. It looks to me that the request came in, but somehow the 
>>>>>>>> server joked up, before it could log the request in access.
>>>>>>>>  
>>>>>>>> Has anybody seen such a behavior before?
>>>>>>>>  
>>>>>>>> Thanks,
>>>>>>>> -Reinhard
>>>>>>>>
>>>>>>>>  
>>>>>>>> ----------------------------------------------------------------
>>>>>>>> -
>>>>>>>> -
>>>>>>>> -
>>>>>>>> -
>>>>>>>> --
>>>>>>>> --
>>>>>>>>
>>>>>>>> --
>>>>>>>> 389 users mailing list
>>>>>>>> 389-us...@lists.fedoraproject.org 
>>>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>>>>>     
>>>>>>>>       
>>>>>>>>         
>>>>>>>>           
>>>>>>>>             
>>>>>>>>               
>>>>>>>>                 
>>>>>>> --
>>>>>>> 389 users mailing list
>>>>>>> 389-us...@lists.fedoraproject.org 
>>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>>>> --
>>>>>>> 389 users mailing list
>>>>>>> 389-us...@lists.fedoraproject.org 
>>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>>>>   
>>>>>>>     
>>>>>>>       
>>>>>>>         
>>>>>>>           
>>>>>>>             
>>>>>>>               
>>>>>> --
>>>>>> 389 users mailing list
>>>>>> 389-us...@lists.fedoraproject.org
>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>>> --
>>>>>> 389 users mailing list
>>>>>> 389-us...@lists.fedoraproject.org
>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>>>   
>>>>>>     
>>>>>>       
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> --
>>>>> 389 users mailing list
>>>>> 389-us...@lists.fedoraproject.org
>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>>   
>>>>> -------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> --
>>>>>
>>>>> --
>>>>> 389 users mailing list
>>>>> 389-us...@lists.fedoraproject.org
>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>> --
>>>> 389 users mailing list
>>>> 389-us...@lists.fedoraproject.org
>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>> --
>>>> 389 users mailing list
>>>> 389-us...@lists.fedoraproject.org
>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>   
>>>>     
>>>>       
>>>>         
>>> --
>>> 389 users mailing list
>>> 389-us...@lists.fedoraproject.org
>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>   
>>> ---------------------------------------------------------------------
>>> -
>>> --
>>>
>>> --
>>> 389 users mailing list
>>> 389-us...@lists.fedoraproject.org
>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>     
>>>       
>> --
>> 389 users mailing list
>> 389-us...@lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>> --
>> 389 users mailing list
>> 389-us...@lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>   
>>     
>
> --
> 389 users mailing list
> 389-us...@lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
> --
> 389 users mailing list
> 389-us...@lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
> --
> 389 users mailing list
> 389-us...@lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
>


--
389 users mailing list
389-us...@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

Re: [389-users] Skipped request ...

Reply via email to