I'm sorry, no. It's on Ubuntu 11.10... I was looking into grabbing a copy of the SUSE community dvd iso the other night - would this come with all the necessary packages for setting up Pacemaker/Corosync + OCFS2? If nothing else I'd be happy to see if I could replicate the issue consistently, and among at least two distributions.
On 5/15/2012 8:34 PM, Andrew Beekhof wrote: > Is this on SLES by any chance? > SUSE are about the only ones with knowledge in this area I'm afraid. > > On Tue, May 15, 2012 at 6:01 AM, Matthew O'Connor <m...@ecsorl.com> wrote: >> Hi! >> >> I ran into the issue of ocfs2_controld.pcmk consuming vast CPU again - >> twice, actually. The most recent happenstance was after a multi-node >> failure. One node stayed alive, two nodes had to be rebooted. After >> the reboots, one of the two came back without issue, and was able to >> mount the OCFS2 stores. The second node exhibited high-cpu usage on the >> ocfs2_controld.pcmk process, and could not mount the OCFS2 stores. The >> logs were being voraciously filled with the following message: >> >> ocfs2_controld: Unable to open checkpoint "ocfs2:controld": Object >> does not exist >> >> This message was being output so frequently that syslogd was starting to >> rate-limit it. I suspect this accounts for the high CPU usage. After >> restarting the troubled node several times, I found the solution was to >> order the OCFS2/DLM resource group to stop, cluster-wide, and then >> restart it. Normal behavior followed. (In a prior post to the list, I >> referenced hard-killing the ocfs2_controld.pcmk process. This was a >> more graceful shutdown.) >> >> Attached are two strace outputs. I'm sorry I'm not very familiar with >> strace, so the value of these files may be questionable. If there is >> anything else I can provide the next time this happens, I'd be happy to >> do so! The log-f.txt file was generated with the -f option, and the >> log-fc.txt file was generated with -f -c. >> >> Here also is a snippet from the syslog, during the cluster-wide shutdown >> of the OCFS2/DLM group: >> >> May 14 15:22:13 gw05 ocfs2_controld: Unable to open checkpoint >> "ocfs2:controld": Object does not exist >> May 14 15:22:14 ocfs2_controld: last message repeated 199 times >> May 14 15:22:15 gw05 o2cb[4134]: INFO: Stopping ocfs2_controld.pcmk >> May 14 15:22:16 gw05 dlm_controld.pcmk: [3411]: notice: >> terminate_ais_connection: Disconnecting from AIS >> May 14 15:22:16 gw05 lrmd: [2993]: info: RA output: >> (p_dlm:2:stop:stderr) dlm_controld.pcmk: no process found >> May 14 15:22:19 gw05 ocfs2_controld: Unable to open checkpoint >> "ocfs2:controld": Object does not exist >> May 14 15:22:20 ocfs2_controld: last message repeated 199 times >> May 14 15:22:25 gw05 ocfs2_controld: Unable to open checkpoint >> "ocfs2:controld": Object does not exist >> May 14 15:22:26 ocfs2_controld: last message repeated 199 times >> May 14 15:22:31 gw05 ocfs2_controld: Unable to open checkpoint >> "ocfs2:controld": Object does not exist >> May 14 15:22:32 ocfs2_controld: last message repeated 199 times >> May 14 15:22:37 gw05 ocfs2_controld: Unable to open checkpoint >> "ocfs2:controld": Object does not exist >> May 14 15:22:38 ocfs2_controld: last message repeated 199 times >> >> One other interesting bit of log (well, to me), was this bit that >> occurred when I tried to manually mount the OCFS2 store on the afflicted >> server: >> >> mount.ocfs2: Unable to access cluster service while trying to join >> the group >> >> One other note - I discovered I had not specified a monitor for either >> the pacemaker:o2cb or the pacemaker:controld RA. Could that have >> possibly triggered this issue? >> >> -- >> >> Sincerely, >> Matthew O'Connor >> >> ----------------------------------------------------------------- >> Sr. Software Engineer >> PGP/GPG Key: 0x55F981C4 >> Fingerprint: E5DC A0F8 5A40 E4DA 2CE6 B5A2 014C 2CBF 55F9 81C4 >> >> Engineering and Computer Simulations, Inc. >> 11825 High Tech Ave Suite 250 >> Orlando, FL 32817 >> >> Tel: 407-823-9991 x315 >> Fax: 407-823-8299 >> Email: m...@ecsorl.com >> Web: www.ecsorl.com >> ----------------------------------------------------------------- >> >> CONFIDENTIAL NOTICE: The information contained in this electronic >> message is legally privileged, confidential and exempt from disclosure >> under applicable law. It is intended only for the use of the individual >> or entity named above. If the reader of this message is not the intended >> recipient, you are hereby notified that any dissemination, distribution >> or copying of this message is strictly prohibited. If you have received >> this communication in error, please notify the sender immediately by >> return e-mail and delete the original message and any copies of it from >> your computer system. Thank you. >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> -- Sincerely, Matthew O'Connor ----------------------------------------------------------------- Sr. Software Engineer PGP/GPG Key: 0x55F981C4 Fingerprint: E5DC A0F8 5A40 E4DA 2CE6 B5A2 014C 2CBF 55F9 81C4 Engineering and Computer Simulations, Inc. 11825 High Tech Ave Suite 250 Orlando, FL 32817 Tel: 407-823-9991 x315 Fax: 407-823-8299 Email: m...@ecsorl.com Web: www.ecsorl.com ----------------------------------------------------------------- CONFIDENTIAL NOTICE: The information contained in this electronic message is legally privileged, confidential and exempt from disclosure under applicable law. It is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender immediately by return e-mail and delete the original message and any copies of it from your computer system. Thank you. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org