Can you provide the output of an exact session that is failing, including the error message from both sides.
./cf-serverd -v ./cf.agent -v (cut the relevant parts) M Seva Gluschenko wrote: > Perhaps, I forgot to mention, but obvious problems arose after > increasing number of managed servers to 200+. There were no problems > when it was only 20 of them. > > 2010/5/31 Nicolas Charles <charl...@gmail.com>: >> I complete : there were load of this error on my servers (this doesn't >> seem so common since not so many people complained about it, and it was >> quite hard to debug) >> >> On 31/05/2010 17:24, Seva Gluschenko wrote: >>> Mark, >>> >>> there are no configuration errors. cf-promises -v reports "Inputs are >>> valid". Nicolas in private conversation told me that there were loads >>> of "Denying connection from non-authorized IP" on the 3.0.4 due to an >>> issue with the cryptographic part. >>> >>> Actually, after installing 3.0.5b1 on the policy server bad key errors >>> are diminished significantly but not disappeared completely. I've >>> scheduled all managed servers for cfengine version upgrade, will see >>> if it helps. But, quite obviously, the issue with empty ipaddr still >>> may happen in 3.0.5b1 which is hardly to be considered normal. >>> >>> 2010/5/31 Mark Burgess<mark.burg...@iu.hio.no>: >>> >>>> Then look for a configuration error. I doubt there is a problem with >>>> Cfengine itself. >>>> Use -v on both sides to debug >>>> >>>> M >>>> >>>> Seva Gluschenko wrote: >>>> >>>>> Mark, >>>>> >>>>> both server and clients were >>>>> cfengine-community-3.0.4p2-1.centos5.i386.rpm. >>>>> >>>>> Now server is running cfengine-community-3.0.5-b1.i386.rpm, built from >>>>> the source. I've configured one of clients for the package update, so >>>>> will see whether it helps. It would be really helpful to have a chance >>>>> to know the moment when ipaddr becomes empty. >>>>> >>>>> 2010/5/31 Mark Burgess<mark.burg...@iu.hio.no>: >>>>> >>>>>> Could be - depends what the clients were before. I am not seeing this >>>>>> kind of error. >>>>>> Usually it is a case of error in the policy that is difficult to see. >>>>>> >>>>>> M >>>>>> >>>>>> Seva Gluschenko wrote: >>>>>> >>>>>>> Mark, >>>>>>> >>>>>>> Just updated to rev. 1027. If you could be so kind and add EDEADLK >>>>>>> acknowledgment to transation.c (the patch I've sent), that would help >>>>>>> some Linux systems users, I believe. >>>>>>> >>>>>>> At the first glance, the message is still there: >>>>>>> >>>>>>> May 31 16:56:20 xxx cf-serverd[14666]: Denying connection from >>>>>>> non-authorized IP >>>>>>> >>>>>>> Does it mean all clients must be upgraded to 3.0.5b1 as well to get rid >>>>>>> of this? >>>>>>> >>>>>>> 2010/5/31 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>>> >>>>>>>> Sorry, patched the threading for something unrelated. Could be a >>>>>>>> conflict. Try new svn >>>>>>>> >>>>>>>> >>>>>>>> Seva Gluschenko wrote: >>>>>>>> >>>>>>>>> Now it seems like I like talking to myself ;> >>>>>>>>> >>>>>>>>> Anyway, I've looked further and found that error code 35 stands for >>>>>>>>> EDEADLK which wasn't obviously meant by pthread_mutex_trylock manual >>>>>>>>> page authors, but it states for sure that the mutex is locked. So I've >>>>>>>>> patched transaction.c a bit to handle it. The patch is as follows: >>>>>>>>> >>>>>>>>> --- src/transaction.c.orig 2010-05-31 15:22:16.317657266 +0400 >>>>>>>>> +++ src/transaction.c 2010-05-31 15:22:57.226673209 +0400 >>>>>>>>> @@ -439,7 +439,7 @@ >>>>>>>>> >>>>>>>>> status = pthread_mutex_trylock(mutex); >>>>>>>>> >>>>>>>>> - if(status != EBUSY) >>>>>>>>> + if (status != EBUSY&& status != EDEADLK) >>>>>>>>> { >>>>>>>>> CfOut(cf_error, "", "!! The mutex %d was not locked in %s() -- >>>>>>>>> status=%d", name, fname, status); >>>>>>>>> FatalError("Software assertion failure\n"); >>>>>>>>> >>>>>>>>> The new caveat which have been discovered can be best illustrated by >>>>>>>>> the following string: >>>>>>>>> >>>>>>>>> May 31 15:31:25 xxx cf-serverd[19810]: Denying connection from >>>>>>>>> non-authorized IP >>>>>>>>> >>>>>>>>> as far as I've got from sources, the IP address must be listed in this >>>>>>>>> message, so in short we have an IP leak somewhere. >>>>>>>>> >>>>>>>>> 2010/5/31 Seva Gluschenko<seva.glusche...@gmail.com>: >>>>>>>>> >>>>>>>>>> Well, after certain research I've finally built the Cfengine RPM with >>>>>>>>>> bison. Unfortunately I've got no success, because Cfengine is >>>>>>>>>> complaining upon startup: >>>>>>>>>> >>>>>>>>>> # /etc/init.d/cfengine3 start >>>>>>>>>> Starting cfengine3 ... >>>>>>>>>> !! The mutex 6 was not locked in PromiseIdExists() -- status=35 >>>>>>>>>> Fatal cfengine error: Software assertion failure >>>>>>>>>> cf-agent was not able to get confirmation of promises from >>>>>>>>>> cf-promises, so going to failsafe >>>>>>>>>> cf-execd started. [OK] >>>>>>>>>> !! The mutex 6 was not locked in PromiseIdExists() -- status=35 >>>>>>>>>> Fatal cfengine error: Software assertion failure >>>>>>>>>> cf-agent was not able to get confirmation of promises from >>>>>>>>>> cf-promises, so going to failsafe >>>>>>>>>> cf-serverd started. [OK] >>>>>>>>>> >>>>>>>>>> So that I've been forced to roll back to 3.0.4p2 and still getting >>>>>>>>>> multiple client connection errors now. Any help would be appreciated >>>>>>>>>> much. >>>>>>>>>> >>>>>>>>>> 2010/5/28 Seva Gluschenko<seva.glusche...@gmail.com>: >>>>>>>>>> >>>>>>>>>>> Thank you for pointing this out. Bison wasn't installed indeed. >>>>>>>>>>> >>>>>>>>>>> By the way, copying /usr/bin/libtool didn't help until I copied >>>>>>>>>>> ltmain.sh and missing from /usr/share/libtool as well. Now I've been >>>>>>>>>>> able to compile cfengine, but RPM packaging issues are still >>>>>>>>>>> demanding >>>>>>>>>>> to be solved. Working on it. >>>>>>>>>>> >>>>>>>>>>> 2010/5/28 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>>>>>>> >>>>>>>>>>>> Install bison >>>>>>>>>>>> >>>>>>>>>>>> Seva Gluschenko wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Unfortunately, things went wrong much further than I estimated. >>>>>>>>>>>>> After >>>>>>>>>>>>> installation from the home directory with plain make install I've >>>>>>>>>>>>> got >>>>>>>>>>>>> the following error: >>>>>>>>>>>>> >>>>>>>>>>>>> cf3:/var/cfengine/inputs/groups.cf:441,12: yacc stack overflow, >>>>>>>>>>>>> near token ',' >>>>>>>>>>>>> >>>>>>>>>>>>> well, my groups definition contain quite long "or" lists because >>>>>>>>>>>>> it >>>>>>>>>>>>> was the only way I've found to have a chance to define server >>>>>>>>>>>>> groups. >>>>>>>>>>>>> Now I rolled back to cfengine-community 3.0.4p2 from RPM since it >>>>>>>>>>>>> doesn't have yacc stack overflows. Is there any method to >>>>>>>>>>>>> increase its >>>>>>>>>>>>> stack at the build time? >>>>>>>>>>>>> >>>>>>>>>>>>> 2010/5/28 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>>>>>>>>> >>>>>>>>>>>>>> Right - copy libtool from your system into the directory also >>>>>>>>>>>>>> >>>>>>>>>>>>>> cp /usr/bin/libtool . >>>>>>>>>>>>>> >>>>>>>>>>>>>> and try again (might need aclocal again) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Seva Gluschenko wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Mark, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you for your helpful advice, the following worked: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> aclocal >>>>>>>>>>>>>>> automake -a -c >>>>>>>>>>>>>>> make >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But when I wrote cfengine.spec to build an RPM, build failed >>>>>>>>>>>>>>> with the >>>>>>>>>>>>>>> following output: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if /bin/sh ../libtool --tag=CC --mode=compile gcc >>>>>>>>>>>>>>> -DHAVE_CONFIG_H -I. >>>>>>>>>>>>>>> -I. -I. -I/usr/include/db4 -I/usr/include -pthread -g -O2 >>>>>>>>>>>>>>> -Wreturn-type -Wmissing-prototypes -Wuninitialized -pthread -g >>>>>>>>>>>>>>> -O2 >>>>>>>>>>>>>>> -I/usr/include/db4 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 >>>>>>>>>>>>>>> -pthread >>>>>>>>>>>>>>> -g -O2 -I/usr/include/db4 -D_LARGEFILE_SOURCE >>>>>>>>>>>>>>> -D_FILE_OFFSET_BITS=64 >>>>>>>>>>>>>>> -MT libpromises_la-cf3parse.lo -MD -MP -MF >>>>>>>>>>>>>>> ".deps/libpromises_la-cf3parse.Tpo" -c -o >>>>>>>>>>>>>>> libpromises_la-cf3parse.lo >>>>>>>>>>>>>>> `test -f 'cf3parse.c' || echo './'`cf3parse.c; \ >>>>>>>>>>>>>>> then mv -f ".deps/libpromises_la-cf3parse.Tpo" >>>>>>>>>>>>>>> ".deps/libpromises_la-cf3parse.Plo"; else rm -f >>>>>>>>>>>>>>> ".deps/libpromises_la-cf3parse.Tpo"; exit 1; fi >>>>>>>>>>>>>>> ../libtool: line 466: CDPATH: command not found >>>>>>>>>>>>>>> ../libtool: line 1144: func_opt_split: command not found >>>>>>>>>>>>>>> libtool: Version mismatch error. This is libtool 2.2.6, but the >>>>>>>>>>>>>>> libtool: definition of this LT_INIT comes from an older release. >>>>>>>>>>>>>>> libtool: You should recreate aclocal.m4 with macros from >>>>>>>>>>>>>>> libtool 2.2.6 >>>>>>>>>>>>>>> libtool: and run autoconf again. >>>>>>>>>>>>>>> make[2]: *** [libpromises_la-cf3parse.lo] Error 1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> any ideas how to get rid of this? It hadn't happened upon plain >>>>>>>>>>>>>>> build >>>>>>>>>>>>>>> in the home directory. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2010/5/28 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ah this is the perennial problem with these snapshots >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Run >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ./aclocal >>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If that doesn't work, try >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ./aclocal >>>>>>>>>>>>>>>> automake -a -c >>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Seva Gluschenko wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Mark, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm experiencing problems trying to build the latest svn on >>>>>>>>>>>>>>>>> CentOS5. >>>>>>>>>>>>>>>>> First of all, there's no automake 1.10 in RPM available, so >>>>>>>>>>>>>>>>> I've >>>>>>>>>>>>>>>>> patched configure script downgrading version to 1.9. Even >>>>>>>>>>>>>>>>> though, make >>>>>>>>>>>>>>>>> fails with the following output: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> $ cd .&& /bin/sh /tmp/cfengine-3.0.5/missing --run >>>>>>>>>>>>>>>>> automake-1.9 --gnu >>>>>>>>>>>>>>>>> src/Makefile.am:8: Libtool library used but `LIBTOOL' is >>>>>>>>>>>>>>>>> undefined >>>>>>>>>>>>>>>>> src/Makefile.am:8: >>>>>>>>>>>>>>>>> src/Makefile.am:8: The usual way to define `LIBTOOL' is to >>>>>>>>>>>>>>>>> add `AC_PROG_LIBTOOL' >>>>>>>>>>>>>>>>> src/Makefile.am:8: to `configure.ac' and run `aclocal' and >>>>>>>>>>>>>>>>> `autoconf' again. >>>>>>>>>>>>>>>>> src/Makefile.am: required file `./compile' not found >>>>>>>>>>>>>>>>> WARNING: `automake-1.9' is needed, and you do not seem to >>>>>>>>>>>>>>>>> have it handy on your >>>>>>>>>>>>>>>>> system. You might have modified some files without >>>>>>>>>>>>>>>>> having the >>>>>>>>>>>>>>>>> proper tools for further handling them. Check the >>>>>>>>>>>>>>>>> `README' file, >>>>>>>>>>>>>>>>> it often tells you about the needed prerequirements >>>>>>>>>>>>>>>>> for installing >>>>>>>>>>>>>>>>> this package. You may also peek at any GNU archive >>>>>>>>>>>>>>>>> site, in case >>>>>>>>>>>>>>>>> some other package would contain this missing >>>>>>>>>>>>>>>>> `automake-1.9' program. >>>>>>>>>>>>>>>>> make: *** [Makefile.in] Error 1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Despite LIBTOOL is defined in configure and present in the >>>>>>>>>>>>>>>>> tree. I've >>>>>>>>>>>>>>>>> tried to switch to the system-wide libtool but got no >>>>>>>>>>>>>>>>> success. At this >>>>>>>>>>>>>>>>> point I'm stuck. Is there any change to get some early RPM >>>>>>>>>>>>>>>>> build for >>>>>>>>>>>>>>>>> CentOS5? We've already faced problems with servers which >>>>>>>>>>>>>>>>> weren't >>>>>>>>>>>>>>>>> managed until they keys were removed from the master server >>>>>>>>>>>>>>>>> because of >>>>>>>>>>>>>>>>> bad key issue. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2010/5/28 Mark<m...@iu.hio.no>: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Try the latest svn in case some recent changes could affect >>>>>>>>>>>>>>>>>> this. Just a >>>>>>>>>>>>>>>>>> suggestion. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Mark >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 27 May 2010, at 13:06, Seva >>>>>>>>>>>>>>>>>> Gluschenko<seva.glusche...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hello folks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> There's an error report which happens on regular basis >>>>>>>>>>>>>>>>>>> since a number >>>>>>>>>>>>>>>>>>> of managed servers grew to 100+: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> BAD: keys did not match >>>>>>>>>>>>>>>>>>> !! Authentication dialogue with X.X.X.X failed >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm virtually sure that there're no hijacking attempts in >>>>>>>>>>>>>>>>>>> my network, >>>>>>>>>>>>>>>>>>> so I suppose that happens because of some server >>>>>>>>>>>>>>>>>>> limitations. I rose >>>>>>>>>>>>>>>>>>> initial maxchildren setting from 1000 to 5000 in body >>>>>>>>>>>>>>>>>>> server control, >>>>>>>>>>>>>>>>>>> but it doesn't seem to have effect. Any ideas? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> SY, Seva Gluschenko. >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> Help-cfengine mailing list >>>>>>>>>>>>>>>>>>> Help-cfengine@cfengine.org >>>>>>>>>>>>>>>>>>> https://cfengine.org/mailman/listinfo/help-cfengine >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Mark Burgess >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>>>>> Professor of Network and System Administration >>>>>>>>>>>>>>>> Oslo University College, Norway >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>>>>>>>>>>> Office Telf : +47 22453272 >>>>>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Mark Burgess >>>>>>>>>>>>>> >>>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>>> Professor of Network and System Administration >>>>>>>>>>>>>> Oslo University College, Norway >>>>>>>>>>>>>> >>>>>>>>>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>>>>>>>>> Office Telf : +47 22453272 >>>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Mark Burgess >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>> Professor of Network and System Administration >>>>>>>>>>>> Oslo University College, Norway >>>>>>>>>>>> >>>>>>>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>>>>>>> Office Telf : +47 22453272 >>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> SY, Seva Gluschenko. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> SY, Seva Gluschenko. >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> Mark Burgess >>>>>>>> >>>>>>>> ------------------------------------------------- >>>>>>>> Professor of Network and System Administration >>>>>>>> Oslo University College, Norway >>>>>>>> >>>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>>> Office Telf : +47 22453272 >>>>>>>> ------------------------------------------------- >>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>>>> Mark Burgess >>>>>> >>>>>> ------------------------------------------------- >>>>>> Professor of Network and System Administration >>>>>> Oslo University College, Norway >>>>>> >>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>> Office Telf : +47 22453272 >>>>>> ------------------------------------------------- >>>>>> >>>>>> >>>>> >>>>> >>>> -- >>>> Mark Burgess >>>> >>>> ------------------------------------------------- >>>> Professor of Network and System Administration >>>> Oslo University College, Norway >>>> >>>> Personal Web: http://www.iu.hio.no/~mark >>>> Office Telf : +47 22453272 >>>> ------------------------------------------------- >>>> >>>> >>> >>> >> _______________________________________________ >> Help-cfengine mailing list >> Help-cfengine@cfengine.org >> https://cfengine.org/mailman/listinfo/help-cfengine >> > > > -- Mark Burgess ------------------------------------------------- Professor of Network and System Administration Oslo University College, Norway Personal Web: http://www.iu.hio.no/~mark Office Telf : +47 22453272 ------------------------------------------------- _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine