Mark, while I didn't succeed yet to obtain a piece of verbose output from a client (just because I didn't want to flood all the clients with that and tried to take it from one client only), I've got smth interesting from the server:
cf3 Loaded /var/cfengine/ppkeys/root-192.168.5.1.pub cf3 Loaded /var/cfengine/ppkeys/root-192.168.5.1.pub cf3 A public key was already known from foo/192.168.5.1 - no trust required cf3 A public key was already known from bar/192.168.5.1 - no trust required cf3 Adding IP 192.168.5.1 to SkipVerify - no need to check this if we have a key cf3 Adding IP 192.168.5.1 to SkipVerify - no need to check this if we have a key cf3 The new public key does not match the old one! Spoofing attempt! cf3 The public key identity was confirmed as r...@bar cf3 Auth dialogue error cf3 From (host=foo,user=root,ip=192.168.5.1) The truth is, the 192.168.5.1 really belongs to host bar, and host foo has completely different IP address of 192.168.8.15. But the message from the executor with "BAD: keys did not match" error is sent by host foo, not host bar! I believe, there is some clash in server databases which results in loading wrong key. The lost ipaddr record in "Denying connection from non-authorized IP" message can be best explained with that suggestion, too. I know, such kind of things is always hardest to debug, we'd spent a lot of time with the similar issue years ago in an ircd code :). But I hope this thing is solvable. 2010/5/31 Mark Burgess <mark.burg...@iu.hio.no>: > > Can you provide the output of an exact session that is failing, including the > error > message from both sides. > > ./cf-serverd -v > ./cf.agent -v (cut the relevant parts) > > M > > Seva Gluschenko wrote: >> Perhaps, I forgot to mention, but obvious problems arose after >> increasing number of managed servers to 200+. There were no problems >> when it was only 20 of them. >> >> 2010/5/31 Nicolas Charles <charl...@gmail.com>: >>> I complete : there were load of this error on my servers (this doesn't >>> seem so common since not so many people complained about it, and it was >>> quite hard to debug) >>> >>> On 31/05/2010 17:24, Seva Gluschenko wrote: >>>> Mark, >>>> >>>> there are no configuration errors. cf-promises -v reports "Inputs are >>>> valid". Nicolas in private conversation told me that there were loads >>>> of "Denying connection from non-authorized IP" on the 3.0.4 due to an >>>> issue with the cryptographic part. >>>> >>>> Actually, after installing 3.0.5b1 on the policy server bad key errors >>>> are diminished significantly but not disappeared completely. I've >>>> scheduled all managed servers for cfengine version upgrade, will see >>>> if it helps. But, quite obviously, the issue with empty ipaddr still >>>> may happen in 3.0.5b1 which is hardly to be considered normal. >>>> >>>> 2010/5/31 Mark Burgess<mark.burg...@iu.hio.no>: >>>> >>>>> Then look for a configuration error. I doubt there is a problem with >>>>> Cfengine itself. >>>>> Use -v on both sides to debug >>>>> >>>>> M >>>>> >>>>> Seva Gluschenko wrote: >>>>> >>>>>> Mark, >>>>>> >>>>>> both server and clients were >>>>>> cfengine-community-3.0.4p2-1.centos5.i386.rpm. >>>>>> >>>>>> Now server is running cfengine-community-3.0.5-b1.i386.rpm, built from >>>>>> the source. I've configured one of clients for the package update, so >>>>>> will see whether it helps. It would be really helpful to have a chance >>>>>> to know the moment when ipaddr becomes empty. >>>>>> >>>>>> 2010/5/31 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>> >>>>>>> Could be - depends what the clients were before. I am not seeing this >>>>>>> kind of error. >>>>>>> Usually it is a case of error in the policy that is difficult to see. >>>>>>> >>>>>>> M >>>>>>> >>>>>>> Seva Gluschenko wrote: >>>>>>> >>>>>>>> Mark, >>>>>>>> >>>>>>>> Just updated to rev. 1027. If you could be so kind and add EDEADLK >>>>>>>> acknowledgment to transation.c (the patch I've sent), that would help >>>>>>>> some Linux systems users, I believe. >>>>>>>> >>>>>>>> At the first glance, the message is still there: >>>>>>>> >>>>>>>> May 31 16:56:20 xxx cf-serverd[14666]: Denying connection from >>>>>>>> non-authorized IP >>>>>>>> >>>>>>>> Does it mean all clients must be upgraded to 3.0.5b1 as well to get >>>>>>>> rid of this? >>>>>>>> >>>>>>>> 2010/5/31 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>>>> >>>>>>>>> Sorry, patched the threading for something unrelated. Could be a >>>>>>>>> conflict. Try new svn >>>>>>>>> >>>>>>>>> >>>>>>>>> Seva Gluschenko wrote: >>>>>>>>> >>>>>>>>>> Now it seems like I like talking to myself ;> >>>>>>>>>> >>>>>>>>>> Anyway, I've looked further and found that error code 35 stands for >>>>>>>>>> EDEADLK which wasn't obviously meant by pthread_mutex_trylock manual >>>>>>>>>> page authors, but it states for sure that the mutex is locked. So >>>>>>>>>> I've >>>>>>>>>> patched transaction.c a bit to handle it. The patch is as follows: >>>>>>>>>> >>>>>>>>>> --- src/transaction.c.orig 2010-05-31 15:22:16.317657266 +0400 >>>>>>>>>> +++ src/transaction.c 2010-05-31 15:22:57.226673209 +0400 >>>>>>>>>> @@ -439,7 +439,7 @@ >>>>>>>>>> >>>>>>>>>> status = pthread_mutex_trylock(mutex); >>>>>>>>>> >>>>>>>>>> - if(status != EBUSY) >>>>>>>>>> + if (status != EBUSY&& status != EDEADLK) >>>>>>>>>> { >>>>>>>>>> CfOut(cf_error, "", "!! The mutex %d was not locked in %s() -- >>>>>>>>>> status=%d", name, fname, status); >>>>>>>>>> FatalError("Software assertion failure\n"); >>>>>>>>>> >>>>>>>>>> The new caveat which have been discovered can be best illustrated by >>>>>>>>>> the following string: >>>>>>>>>> >>>>>>>>>> May 31 15:31:25 xxx cf-serverd[19810]: Denying connection from >>>>>>>>>> non-authorized IP >>>>>>>>>> >>>>>>>>>> as far as I've got from sources, the IP address must be listed in >>>>>>>>>> this >>>>>>>>>> message, so in short we have an IP leak somewhere. >>>>>>>>>> >>>>>>>>>> 2010/5/31 Seva Gluschenko<seva.glusche...@gmail.com>: >>>>>>>>>> >>>>>>>>>>> Well, after certain research I've finally built the Cfengine RPM >>>>>>>>>>> with >>>>>>>>>>> bison. Unfortunately I've got no success, because Cfengine is >>>>>>>>>>> complaining upon startup: >>>>>>>>>>> >>>>>>>>>>> # /etc/init.d/cfengine3 start >>>>>>>>>>> Starting cfengine3 ... >>>>>>>>>>> !! The mutex 6 was not locked in PromiseIdExists() -- status=35 >>>>>>>>>>> Fatal cfengine error: Software assertion failure >>>>>>>>>>> cf-agent was not able to get confirmation of promises from >>>>>>>>>>> cf-promises, so going to failsafe >>>>>>>>>>> cf-execd started. [OK] >>>>>>>>>>> !! The mutex 6 was not locked in PromiseIdExists() -- status=35 >>>>>>>>>>> Fatal cfengine error: Software assertion failure >>>>>>>>>>> cf-agent was not able to get confirmation of promises from >>>>>>>>>>> cf-promises, so going to failsafe >>>>>>>>>>> cf-serverd started. [OK] >>>>>>>>>>> >>>>>>>>>>> So that I've been forced to roll back to 3.0.4p2 and still getting >>>>>>>>>>> multiple client connection errors now. Any help would be appreciated >>>>>>>>>>> much. >>>>>>>>>>> >>>>>>>>>>> 2010/5/28 Seva Gluschenko<seva.glusche...@gmail.com>: >>>>>>>>>>> >>>>>>>>>>>> Thank you for pointing this out. Bison wasn't installed indeed. >>>>>>>>>>>> >>>>>>>>>>>> By the way, copying /usr/bin/libtool didn't help until I copied >>>>>>>>>>>> ltmain.sh and missing from /usr/share/libtool as well. Now I've >>>>>>>>>>>> been >>>>>>>>>>>> able to compile cfengine, but RPM packaging issues are still >>>>>>>>>>>> demanding >>>>>>>>>>>> to be solved. Working on it. >>>>>>>>>>>> >>>>>>>>>>>> 2010/5/28 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>>>>>>>> >>>>>>>>>>>>> Install bison >>>>>>>>>>>>> >>>>>>>>>>>>> Seva Gluschenko wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately, things went wrong much further than I estimated. >>>>>>>>>>>>>> After >>>>>>>>>>>>>> installation from the home directory with plain make install >>>>>>>>>>>>>> I've got >>>>>>>>>>>>>> the following error: >>>>>>>>>>>>>> >>>>>>>>>>>>>> cf3:/var/cfengine/inputs/groups.cf:441,12: yacc stack overflow, >>>>>>>>>>>>>> near token ',' >>>>>>>>>>>>>> >>>>>>>>>>>>>> well, my groups definition contain quite long "or" lists because >>>>>>>>>>>>>> it >>>>>>>>>>>>>> was the only way I've found to have a chance to define server >>>>>>>>>>>>>> groups. >>>>>>>>>>>>>> Now I rolled back to cfengine-community 3.0.4p2 from RPM since it >>>>>>>>>>>>>> doesn't have yacc stack overflows. Is there any method to >>>>>>>>>>>>>> increase its >>>>>>>>>>>>>> stack at the build time? >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2010/5/28 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Right - copy libtool from your system into the directory also >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> cp /usr/bin/libtool . >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> and try again (might need aclocal again) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Seva Gluschenko wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Mark, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you for your helpful advice, the following worked: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> aclocal >>>>>>>>>>>>>>>> automake -a -c >>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> But when I wrote cfengine.spec to build an RPM, build failed >>>>>>>>>>>>>>>> with the >>>>>>>>>>>>>>>> following output: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> if /bin/sh ../libtool --tag=CC --mode=compile gcc >>>>>>>>>>>>>>>> -DHAVE_CONFIG_H -I. >>>>>>>>>>>>>>>> -I. -I. -I/usr/include/db4 -I/usr/include -pthread -g -O2 >>>>>>>>>>>>>>>> -Wreturn-type -Wmissing-prototypes -Wuninitialized -pthread -g >>>>>>>>>>>>>>>> -O2 >>>>>>>>>>>>>>>> -I/usr/include/db4 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 >>>>>>>>>>>>>>>> -pthread >>>>>>>>>>>>>>>> -g -O2 -I/usr/include/db4 -D_LARGEFILE_SOURCE >>>>>>>>>>>>>>>> -D_FILE_OFFSET_BITS=64 >>>>>>>>>>>>>>>> -MT libpromises_la-cf3parse.lo -MD -MP -MF >>>>>>>>>>>>>>>> ".deps/libpromises_la-cf3parse.Tpo" -c -o >>>>>>>>>>>>>>>> libpromises_la-cf3parse.lo >>>>>>>>>>>>>>>> `test -f 'cf3parse.c' || echo './'`cf3parse.c; \ >>>>>>>>>>>>>>>> then mv -f ".deps/libpromises_la-cf3parse.Tpo" >>>>>>>>>>>>>>>> ".deps/libpromises_la-cf3parse.Plo"; else rm -f >>>>>>>>>>>>>>>> ".deps/libpromises_la-cf3parse.Tpo"; exit 1; fi >>>>>>>>>>>>>>>> ../libtool: line 466: CDPATH: command not found >>>>>>>>>>>>>>>> ../libtool: line 1144: func_opt_split: command not found >>>>>>>>>>>>>>>> libtool: Version mismatch error. This is libtool 2.2.6, but >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> libtool: definition of this LT_INIT comes from an older >>>>>>>>>>>>>>>> release. >>>>>>>>>>>>>>>> libtool: You should recreate aclocal.m4 with macros from >>>>>>>>>>>>>>>> libtool 2.2.6 >>>>>>>>>>>>>>>> libtool: and run autoconf again. >>>>>>>>>>>>>>>> make[2]: *** [libpromises_la-cf3parse.lo] Error 1 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> any ideas how to get rid of this? It hadn't happened upon >>>>>>>>>>>>>>>> plain build >>>>>>>>>>>>>>>> in the home directory. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2010/5/28 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Ah this is the perennial problem with these snapshots >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Run >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ./aclocal >>>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If that doesn't work, try >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ./aclocal >>>>>>>>>>>>>>>>> automake -a -c >>>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Seva Gluschenko wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Mark, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm experiencing problems trying to build the latest svn on >>>>>>>>>>>>>>>>>> CentOS5. >>>>>>>>>>>>>>>>>> First of all, there's no automake 1.10 in RPM available, so >>>>>>>>>>>>>>>>>> I've >>>>>>>>>>>>>>>>>> patched configure script downgrading version to 1.9. Even >>>>>>>>>>>>>>>>>> though, make >>>>>>>>>>>>>>>>>> fails with the following output: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> $ cd .&& /bin/sh /tmp/cfengine-3.0.5/missing --run >>>>>>>>>>>>>>>>>> automake-1.9 --gnu >>>>>>>>>>>>>>>>>> src/Makefile.am:8: Libtool library used but `LIBTOOL' is >>>>>>>>>>>>>>>>>> undefined >>>>>>>>>>>>>>>>>> src/Makefile.am:8: >>>>>>>>>>>>>>>>>> src/Makefile.am:8: The usual way to define `LIBTOOL' is to >>>>>>>>>>>>>>>>>> add `AC_PROG_LIBTOOL' >>>>>>>>>>>>>>>>>> src/Makefile.am:8: to `configure.ac' and run `aclocal' and >>>>>>>>>>>>>>>>>> `autoconf' again. >>>>>>>>>>>>>>>>>> src/Makefile.am: required file `./compile' not found >>>>>>>>>>>>>>>>>> WARNING: `automake-1.9' is needed, and you do not seem to >>>>>>>>>>>>>>>>>> have it handy on your >>>>>>>>>>>>>>>>>> system. You might have modified some files >>>>>>>>>>>>>>>>>> without having the >>>>>>>>>>>>>>>>>> proper tools for further handling them. Check the >>>>>>>>>>>>>>>>>> `README' file, >>>>>>>>>>>>>>>>>> it often tells you about the needed >>>>>>>>>>>>>>>>>> prerequirements for installing >>>>>>>>>>>>>>>>>> this package. You may also peek at any GNU >>>>>>>>>>>>>>>>>> archive site, in case >>>>>>>>>>>>>>>>>> some other package would contain this missing >>>>>>>>>>>>>>>>>> `automake-1.9' program. >>>>>>>>>>>>>>>>>> make: *** [Makefile.in] Error 1 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Despite LIBTOOL is defined in configure and present in the >>>>>>>>>>>>>>>>>> tree. I've >>>>>>>>>>>>>>>>>> tried to switch to the system-wide libtool but got no >>>>>>>>>>>>>>>>>> success. At this >>>>>>>>>>>>>>>>>> point I'm stuck. Is there any change to get some early RPM >>>>>>>>>>>>>>>>>> build for >>>>>>>>>>>>>>>>>> CentOS5? We've already faced problems with servers which >>>>>>>>>>>>>>>>>> weren't >>>>>>>>>>>>>>>>>> managed until they keys were removed from the master server >>>>>>>>>>>>>>>>>> because of >>>>>>>>>>>>>>>>>> bad key issue. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2010/5/28 Mark<m...@iu.hio.no>: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Try the latest svn in case some recent changes could affect >>>>>>>>>>>>>>>>>>> this. Just a >>>>>>>>>>>>>>>>>>> suggestion. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Mark >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 27 May 2010, at 13:06, Seva >>>>>>>>>>>>>>>>>>> Gluschenko<seva.glusche...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hello folks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> There's an error report which happens on regular basis >>>>>>>>>>>>>>>>>>>> since a number >>>>>>>>>>>>>>>>>>>> of managed servers grew to 100+: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> BAD: keys did not match >>>>>>>>>>>>>>>>>>>> !! Authentication dialogue with X.X.X.X failed >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I'm virtually sure that there're no hijacking attempts in >>>>>>>>>>>>>>>>>>>> my network, >>>>>>>>>>>>>>>>>>>> so I suppose that happens because of some server >>>>>>>>>>>>>>>>>>>> limitations. I rose >>>>>>>>>>>>>>>>>>>> initial maxchildren setting from 1000 to 5000 in body >>>>>>>>>>>>>>>>>>>> server control, >>>>>>>>>>>>>>>>>>>> but it doesn't seem to have effect. Any ideas? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> SY, Seva Gluschenko. >>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>> Help-cfengine mailing list >>>>>>>>>>>>>>>>>>>> Help-cfengine@cfengine.org >>>>>>>>>>>>>>>>>>>> https://cfengine.org/mailman/listinfo/help-cfengine >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Mark Burgess >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>>>>>> Professor of Network and System Administration >>>>>>>>>>>>>>>>> Oslo University College, Norway >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>>>>>>>>>>>> Office Telf : +47 22453272 >>>>>>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Mark Burgess >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>>>> Professor of Network and System Administration >>>>>>>>>>>>>>> Oslo University College, Norway >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>>>>>>>>>> Office Telf : +47 22453272 >>>>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Mark Burgess >>>>>>>>>>>>> >>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>> Professor of Network and System Administration >>>>>>>>>>>>> Oslo University College, Norway >>>>>>>>>>>>> >>>>>>>>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>>>>>>>> Office Telf : +47 22453272 >>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> SY, Seva Gluschenko. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> SY, Seva Gluschenko. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Mark Burgess >>>>>>>>> >>>>>>>>> ------------------------------------------------- >>>>>>>>> Professor of Network and System Administration >>>>>>>>> Oslo University College, Norway >>>>>>>>> >>>>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>>>> Office Telf : +47 22453272 >>>>>>>>> ------------------------------------------------- >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> Mark Burgess >>>>>>> >>>>>>> ------------------------------------------------- >>>>>>> Professor of Network and System Administration >>>>>>> Oslo University College, Norway >>>>>>> >>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>> Office Telf : +47 22453272 >>>>>>> ------------------------------------------------- >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Mark Burgess >>>>> >>>>> ------------------------------------------------- >>>>> Professor of Network and System Administration >>>>> Oslo University College, Norway >>>>> >>>>> Personal Web: http://www.iu.hio.no/~mark >>>>> Office Telf : +47 22453272 >>>>> ------------------------------------------------- >>>>> >>>>> >>>> >>>> >>> _______________________________________________ >>> Help-cfengine mailing list >>> Help-cfengine@cfengine.org >>> https://cfengine.org/mailman/listinfo/help-cfengine >>> >> >> >> > > -- > Mark Burgess > > ------------------------------------------------- > Professor of Network and System Administration > Oslo University College, Norway > > Personal Web: http://www.iu.hio.no/~mark > Office Telf : +47 22453272 > ------------------------------------------------- > -- SY, Seva Gluschenko. _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine