On Mon, Nov 16, 2009 at 4:31 PM, Colin <colin....@gmail.com> wrote: > Hi Andrew, > > thanks for your response! > > On Mon, Nov 16, 2009 at 3:19 PM, Andrew Beekhof <and...@beekhof.net> wrote: >> On Thu, Nov 12, 2009 at 4:46 PM, Colin <colin....@gmail.com> wrote: >>> On Thu, Nov 12, 2009 at 3:36 PM, Andrew Beekhof <and...@beekhof.net> wrote: >> >>> 5) The log message "cib: [2941]: debug: cib_remote_listen: New >>> clear-text connection" should include from where the connection came. >> >> why and how? > > Why: It's like "file not found" without the info which file wasn't > found ... perhaps it's just me, but I would like to see the source IP > and port of the connection. > > How: You're probably not asking me how to implement the feature, so > I'm assuming that you misunderstood what exactly I was asking for(?).
No, I'm saying that I'm pretty sure we don't have access to the IP information. > >>> 6) The log message "cib: [2941]: ERROR: cib_remote_listen: User is not >>> a member of the required group" might mention which user and which >>> group... >> >> it doesn't do so for security reasons > > Hm. > > Security? I see, that's when you use unencrypted remote syslogging -- > anybody already on the machine could just use ps(1). > > How about logging it in the ERROR messages, but only when > debug-logging is enabled? No, because then I'll get confused emails from people wondering why there are a stream of ERRORs in the logs. > >>> 8) Just tried with crm_resource: The password prompt when not setting >>> CIB_password is sent to stdout, rather than stderr [which makes it >>> near impossible to send the output someplace]. >> >> we can probably change that > > That'd be great, also because the new behaviour would be more in-line > with what many other command line programs do... > >>> 9) I am getting completely bogus results via the remote connection, >>> e.g. "crm_resource --list" shows only 2 of 8 resources, and shows the >>> as stopped, whereas on the cluster nodes I see the -- correct -- list >>> with 8 resources which are all started. With "cibadmin -Q" I get: >>> >>> # cibadmin -Q | wc # on a cluster node >>> 379 1895 50474 >>> >>> # cibadmin -Q | wc # via the remote connection >>> cibadmin: Opened connection to 192.168.80.10:6900 >>> 66 193 4731 >> >> someone else mentioned that, i've not been able to reproduce it yet. > > Weird. I'm using the precompiled Debian packages for Pacemaker 1.0.6 > with Corosync. Anything that might help debug the problem? add more hours to the day? :) > > r...@cluster1:~# tail -f /var/log/daemon.log > Nov 16 15:53:33 cluster1 cib: [24749]: debug: cib_remote_listen: New > clear-text connection > Nov 16 15:53:34 cluster1 cib: [24749]: info: log_data_element: > cib_remote_listen: Login: <cib_command op="authenticate" > user="hacluster" password="*****" hidden="password" /> > Nov 16 15:53:34 cluster1 cib: [24749]: debug: cib_remote_listen: New > clear-text connection > Nov 16 15:53:35 cluster1 cib: [24749]: info: log_data_element: > cib_remote_listen: Login: <cib_command op="authenticate" > user="hacluster" password="*****" hidden="password" /> > Nov 16 15:53:35 cluster1 corosync[7426]: [TOTEM ] mcasted message > added to pending queue > [... more corosync messages ...] > Nov 16 15:53:35 cluster1 corosync[7426]: [TOTEM ] releasing messages > up to and including 48a > Nov 16 15:53:35 cluster1 cib: [24749]: ERROR: cib_recv_remote_msg: Empty reply > Nov 16 15:53:35 cluster1 cib: [24749]: ERROR: cib_recv_plaintext: > Error receiving message: -1: Connection reset by peer (104) > Nov 16 15:53:35 cluster1 cib: [24749]: ERROR: cib_recv_remote_msg: Empty reply > ^C > r...@cluster1:~# cibadmin -Q | wc > 382 1943 51825 > r...@cluster1:~# > > r...@admin:~# cibadmin -Q > cib.xml > cibadmin: Opened connection to 192.168.80.10:6900 > r...@admin:~# wc cib.xml > 86 255 6379 cib.xml > r...@admin:~# > >>> 10) It's very easy to trash the cib process, e.g. by connecting via >>> telnet and sending a few bytes of garbage; result is an endless loop >>> of "cib: [7846]: ERROR: cib_recv_remote_msg: Empty reply" messages, >>> one per second, and that I need to "killall -9 cib" in order to get >>> everything working again. >> >> ok, thats not good. >> I think this patch should fix it though: >> >> diff -r 828b3329a64c cib/remote.c >> --- a/cib/remote.c Fri Nov 06 16:28:21 2009 +0100 >> +++ b/cib/remote.c Mon Nov 16 15:18:41 2009 +0100 >> @@ -220,7 +220,7 @@ cib_remote_listen(int ssock, gpointer da >> } >> >> do { >> - crm_debug_2("Iter: %d", lpc++); >> + crm_debug_2("Iter: %d", lpc); >> if(ssock == remote_tls_fd) { >> #ifdef HAVE_GNUTLS_GNUTLS_H >> login = cib_recv_remote_msg(session, TRUE); >> @@ -230,7 +230,7 @@ cib_remote_listen(int ssock, gpointer da >> } >> sleep(1); >> >> - } while(login == NULL && lpc < 10); >> + } while(login == NULL && ++lpc < 10); >> >> crm_log_xml_info(login, "Login: "); >> if(login == NULL) { > > Thanks, since we have been using precompiled packages I haven't > actually gone through the exercise of compiling Pacemaker, so it might > take some time before I get around to testing this patch... > > Regards, Colin > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker