----- Original Message ----- > From: "David Vossel" <dvos...@redhat.com> > To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> > Sent: Thursday, May 23, 2013 11:21:33 PM > Subject: Re: [Pacemaker] pacemaker-remote tls handshaking > > ----- Original Message ----- > > From: "Lindsay Todd" <rltodd....@gmail.com> > > To: "The Pacemaker cluster resource manager" > > <pacemaker@oss.clusterlabs.org> > > Sent: Thursday, May 23, 2013 4:35:02 PM > > Subject: Re: [Pacemaker] pacemaker-remote tls handshaking > > > > Working on this problem further... > > > > On Tue, May 21, 2013 at 5:14 PM, David Vossel <dvos...@redhat.com> wrote: > > > I'd suggest this. Try running the pacemaker_remote regression test and > > > see > > > what happens. This will start up > > > an instance of pacemaker_remote locally and issue client commands to it > > > to > > > test both the TLS connection and > > > the ability to start/stop/monitor services. > > > > > > /usr/share/pacemaker/tests/lrmd/regression.py -R > > > > But sadly SL 6.4 doesn't have the systemctl commands this is trying to > > oops > > > use. (Also I am building RPMs and installing those, the lrmd > > regression tests aren't included in pacemaker-cts. > > another oops > > > No problem, I ran > > directly from the build directory.) It doesn't seem to make much > > progress. The stdout is: > > > > sh: systemctl: command not found > > sh: /lib/systemd/system/lrmd_dummy_daemon.service: No such file or > > directory > > sh: systemctl: command not found > > Starting ... > > > > And the lrmd-regression.log has: > > Set r/w permissions for uid=496, gid=494 on /tmp/lrmd-regression.log > > May 23 15:14:39 [3610] swbuildsl6 pacemaker_remoted: info: > > qb_ipcs_us_publish: server name: lrmd > > May 23 15:14:39 [3610] swbuildsl6 pacemaker_remoted: notice: > > lrmd_init_remote_tls_server: Starting a tls listener on port 3121. > > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info: > > qb_ipcs_us_publish: server name: cib_ro > > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info: > > qb_ipcs_us_publish: server name: cib_rw > > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info: > > qb_ipcs_us_publish: server name: cib_shm > > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info: > > qb_ipcs_us_publish: server name: attrd > > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info: > > qb_ipcs_us_publish: server name: stonith-ng > > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info: > > qb_ipcs_us_publish: server name: crmd > > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info: > > main: Starting > > > > > > > By default, the connection should retry for 60 seconds after the vm > > > resource starts. Like you've noticed, this > > > can be extended to account for vms that take longer to boot. > > > > But maybe this should start after the monitor method for the VM first > > indicates success? Or does it already? > > The policy engine has no way of expressing this right now. It would be > difficult to make this happen. Likely your idea of additional start scripts > to verify when the VM's network is actually available would be a better > choice. > > > > > >> There have been a few segfaults of crmd during my testing of this, so > > >> perhaps > > >> there is a memory smash somewhere. (A couple times the failure was at > > >> remote_lrmd_ra.c:186, > > > > > > Please provide gdb backtrace. We need to get this resolved asap before > > > the > > > release of v.1.1.10 is complete. > > > I believe there is a new rc in the works already. > > > > So I've attached results from a few core dumps. All were triggered > > using "crm resource cleanup swbuildsl6" where swbuildsl6 is the host > > name of the VM (that I can still telnet to port 3121). > > thanks :) > > > >> > I doubt this will make a difference, but here's the key I use during > > >> > testing, > > >> > lrmd:ce9db0bc3cec583d3b3bf38b0ac9ff91 > > > > It makes no difference. I had wondered if the shorter key would matter. > > > > Also, I've attached some patches I made to 1.1.10rc3 to try to resolve > > this problem. So far no success. Some of these add logging; the > > others are fix what look to me to be fishy code with cases that aren't > > completely handled. With the additional logging, I see these results > > being logged: > > > > May 23 17:06:51 swbuildsl6 pacemaker_remoted[2326]: notice: > > lrmd_remote_listen: LRMD client connection established. 0x995250 id: > > df04d8ee-7fcb-4025-8c8f-8a1555a4d097 > > May 23 17:06:53 cvmh02 crmd[18982]: warning: lrmd_tcp_connect_cb: > > Client tls handshake failed for server swbuildsl6:3121. Disconnecting > > May 23 17:06:52 swbuildsl6 pacemaker_remoted[2326]: error: > > lrmd_remote_client_msg: Remote lrmd tls handshake failed: -9 > > May 23 17:06:52 swbuildsl6 pacemaker_remoted[2326]: notice: > > lrmd_remote_client_destroy: LRMD client disconnecting remote client - > > name: <unknown> id: df04d8ee-7fcb-4025-8c8f-8a1555a4d097 > > > > Puzzling -- nothing being logged from > > crm_initiate_client_tls_handshake -- is there something I need to add > > to somehow activate the crm_err and crm_info calls? > > Well, you've definitely gotten my attention. I tried this on my rhel 6 box > and sure enough, I'm seeing the exact same thing you're seeing. No worries. > I'll track this down. I'm sure it has to do with the gnutls version being > used.
I figured it out. It's a gnutls bug I believe. The old gnutls library version doesn't like the way I'm setting the psk credentials (which makes the handshake fail) I have a work-around I'm implementing now. I'll have a patch by Tuesday. -- Vossel > > In the mean time, if you want to test this feature, it does work in Fedora > 18. Thanks for all your work on testing this. You're feedback came just in > time. We are about to release 1.1.10 soon :) > > -- Vossel > > > /rlt > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org