Re: [389-users] 389 directory server crash

Rich Megginson Wed, 17 Jul 2013 08:38:05 -0700

On 07/17/2013 01:52 AM, Mitja Mihelič wrote:

On 07/16/2013 04:49 PM, Rich Megginson wrote:
On 07/16/2013 01:23 AM, Mitja Mihelič wrote:
On 07/15/2013 05:28 PM, Rich Megginson wrote:
On 07/15/2013 02:57 AM, Mitja Mihelič wrote:
On 07/12/2013 05:55 PM, Rich Megginson wrote:
On 07/12/2013 08:22 AM, Mitja Mihelič wrote:
On 07/09/2013 03:34 PM, Rich Megginson wrote:
On 07/09/2013 06:43 AM, Mitja Mihelič wrote:
Hi!
We are having problems with some our 389-DS instances. Theycrash after receiving an update from the provider.
After looking at the stack trace, I think this ishttps://fedorahosted.org/389/ticket/47391
Yes, it looks like it might be it. When CONSUMER_ONE crashed forthe first time, the last thing replicated was a password change.Do you perhaps know, where I could get a 389DS version for Centos6that has the patch? The ticket says it was pushed to 1.2.11, butwould seem that our 1.2.11.15-14 is still an unpatched one and therepositories do not have any newer versions.
Is that the 389-ds-base that is included with CentOS6?
Yes, the 389-ds-base-1.2.11.15-14.el6_4.x86_64 and389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 are from the officialCentos6 updates repoository.
389-ds-base-debuginfo is from http://debuginfo.centos.org/6/
The rest are from epel.
Looking at the stack trace you sent earlier - there is only 1thread? You ran
gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 
'quit' /usr/sbin/ns-slapd `pidof ns-slapd` > stacktrace.`date +%s`.txt 2>&1


?  If so, I have no idea what's going on - I've never seen the server deadlock 
itself with only 1 thread . . .
I ran
gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread applyall bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof -o 49171 ns-slapd` >stacktrace.`date +%s`.txt 2>&1The "-o 49171" is to exclude the pid of the config server instance, soonly the problematic pid was looked at.If you get any more information regarding this crash it would be verymuch appreciated.
It may be best if I removed all 389DS related data from both of theconsumer servers and start fresh. If they crash again I will send therelevant stack traces.


Yes, that sounds good.

The crash happened twice after about a week of running withoutproblems. The crashes happened on two consumer servers but notat the same time.The servers are running CentOS 6x with the following 389DSpackages installed:
389-ds-console-doc-1.2.6-1.el6.noarch
389-console-1.1.7-1.el6.noarch
389-adminutil-1.1.15-1.el6.x86_64
389-dsgw-1.1.10-1.el6.x86_64
389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64
389-admin-1.1.29-1.el6.x86_64
389-ds-console-1.2.6-1.el6.noarch
389-admin-console-doc-1.1.8-1.el6.noarch
389-ds-1.2.2-1.el6.noarch
389-ds-base-1.2.11.15-14.el6_4.x86_64
389-ds-base-libs-1.2.11.15-14.el6_4.x86_64
389-admin-console-1.1.8-1.el6.noarch
We are in the process of replacing the Centos 5x baseconsumer+provider setup with a CentOS 6x base one. For thetime being, the CentOS 6 machines are acting as consumers forthe old server. They run for a while and then the replicatedinstances crash though not at the same time.
One of the servers did not want to start after the crash,
Can you provide the error messages from the errors log?
I have attached error logs from the provider(2013-06-27-provider_error) and the consumer(2013-06-27-server_two_error) in question.
so I have run db2index on its database. It's been running forfour days and it has still not finished.
Try exporting using db2ldif, then importing using ldif2db.
The export process hangs. After an hour strace still shows:
futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL
The error log for this is attached as2013-07-10-server_two-ldif_import_hangs.
Are you using db2ldif or db2ldif.pl? If you are using db2ldif,is the server running? If not, please try first shutting downthe server and use db2ldif.
If db2ldif still hangs, then please follow the instructions athttp://port389.org/wiki/FAQ#Debugging_Hangs to get a stack traceof the hung process.
I was using db2ldif with the server shut down. I tried it againand it hung. The LDIF file was created but its size was zero. Theproduced stack trace is attached asserver_two-db2ldif_hang-stacktrace.1373877200.txt.
All I get from db2index now are these outputs:
[09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095entries (pass 1104) -- average rate 53686277.5/sec, recentrate 0.0/sec, hit ratio 0%
How many entries do you have in your database?
The number revolves around 65400. It varies perhaps 2 userdel/add operations a month and 20 attribute changes per week, ifthat.
The other instance did start up, but the replication processdid not work anymore. I disabled the replication to this hostand set it up again. I chose "Initialize consumer now" and theconsumer crashed every time.
Can provide a stack trace of the core when the server crashes?This may be different than the stack trace below.
The last provided stack trace was produced at the last servercrash. I will provide another stack trace when CONSUMER_ONEcrashes again. Currently it refuses to crash at initializationtime and keeps running.
I have enabled full error logging and could find nothing.
I have read a few threads (not all, I admit) on this list andhttp://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashesand tried to troubleshoot.
The crash produced the attached core dump and I could use yourhelp with understanding it. As well as any help with thecrash. If more info is needed I will gladly provide it.
Regards, Mitja



--
389 users mailing list
389-us...@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-us...@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

Re: [389-users] 389 directory server crash

Reply via email to