Perhaps, I forgot to mention, but obvious problems arose after increasing number of managed servers to 200+. There were no problems when it was only 20 of them.
2010/5/31 Nicolas Charles <charl...@gmail.com>: > I complete : there were load of this error on my servers (this doesn't > seem so common since not so many people complained about it, and it was > quite hard to debug) > > On 31/05/2010 17:24, Seva Gluschenko wrote: >> Mark, >> >> there are no configuration errors. cf-promises -v reports "Inputs are >> valid". Nicolas in private conversation told me that there were loads >> of "Denying connection from non-authorized IP" on the 3.0.4 due to an >> issue with the cryptographic part. >> >> Actually, after installing 3.0.5b1 on the policy server bad key errors >> are diminished significantly but not disappeared completely. I've >> scheduled all managed servers for cfengine version upgrade, will see >> if it helps. But, quite obviously, the issue with empty ipaddr still >> may happen in 3.0.5b1 which is hardly to be considered normal. >> >> 2010/5/31 Mark Burgess<mark.burg...@iu.hio.no>: >> >>> Then look for a configuration error. I doubt there is a problem with >>> Cfengine itself. >>> Use -v on both sides to debug >>> >>> M >>> >>> Seva Gluschenko wrote: >>> >>>> Mark, >>>> >>>> both server and clients were cfengine-community-3.0.4p2-1.centos5.i386.rpm. >>>> >>>> Now server is running cfengine-community-3.0.5-b1.i386.rpm, built from >>>> the source. I've configured one of clients for the package update, so >>>> will see whether it helps. It would be really helpful to have a chance >>>> to know the moment when ipaddr becomes empty. >>>> >>>> 2010/5/31 Mark Burgess<mark.burg...@iu.hio.no>: >>>> >>>>> Could be - depends what the clients were before. I am not seeing this >>>>> kind of error. >>>>> Usually it is a case of error in the policy that is difficult to see. >>>>> >>>>> M >>>>> >>>>> Seva Gluschenko wrote: >>>>> >>>>>> Mark, >>>>>> >>>>>> Just updated to rev. 1027. If you could be so kind and add EDEADLK >>>>>> acknowledgment to transation.c (the patch I've sent), that would help >>>>>> some Linux systems users, I believe. >>>>>> >>>>>> At the first glance, the message is still there: >>>>>> >>>>>> May 31 16:56:20 xxx cf-serverd[14666]: Denying connection from >>>>>> non-authorized IP >>>>>> >>>>>> Does it mean all clients must be upgraded to 3.0.5b1 as well to get rid >>>>>> of this? >>>>>> >>>>>> 2010/5/31 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>> >>>>>>> Sorry, patched the threading for something unrelated. Could be a >>>>>>> conflict. Try new svn >>>>>>> >>>>>>> >>>>>>> Seva Gluschenko wrote: >>>>>>> >>>>>>>> Now it seems like I like talking to myself ;> >>>>>>>> >>>>>>>> Anyway, I've looked further and found that error code 35 stands for >>>>>>>> EDEADLK which wasn't obviously meant by pthread_mutex_trylock manual >>>>>>>> page authors, but it states for sure that the mutex is locked. So I've >>>>>>>> patched transaction.c a bit to handle it. The patch is as follows: >>>>>>>> >>>>>>>> --- src/transaction.c.orig 2010-05-31 15:22:16.317657266 +0400 >>>>>>>> +++ src/transaction.c 2010-05-31 15:22:57.226673209 +0400 >>>>>>>> @@ -439,7 +439,7 @@ >>>>>>>> >>>>>>>> status = pthread_mutex_trylock(mutex); >>>>>>>> >>>>>>>> - if(status != EBUSY) >>>>>>>> + if (status != EBUSY&& status != EDEADLK) >>>>>>>> { >>>>>>>> CfOut(cf_error, "", "!! The mutex %d was not locked in %s() -- >>>>>>>> status=%d", name, fname, status); >>>>>>>> FatalError("Software assertion failure\n"); >>>>>>>> >>>>>>>> The new caveat which have been discovered can be best illustrated by >>>>>>>> the following string: >>>>>>>> >>>>>>>> May 31 15:31:25 xxx cf-serverd[19810]: Denying connection from >>>>>>>> non-authorized IP >>>>>>>> >>>>>>>> as far as I've got from sources, the IP address must be listed in this >>>>>>>> message, so in short we have an IP leak somewhere. >>>>>>>> >>>>>>>> 2010/5/31 Seva Gluschenko<seva.glusche...@gmail.com>: >>>>>>>> >>>>>>>>> Well, after certain research I've finally built the Cfengine RPM with >>>>>>>>> bison. Unfortunately I've got no success, because Cfengine is >>>>>>>>> complaining upon startup: >>>>>>>>> >>>>>>>>> # /etc/init.d/cfengine3 start >>>>>>>>> Starting cfengine3 ... >>>>>>>>> !! The mutex 6 was not locked in PromiseIdExists() -- status=35 >>>>>>>>> Fatal cfengine error: Software assertion failure >>>>>>>>> cf-agent was not able to get confirmation of promises from >>>>>>>>> cf-promises, so going to failsafe >>>>>>>>> cf-execd started. [OK] >>>>>>>>> !! The mutex 6 was not locked in PromiseIdExists() -- status=35 >>>>>>>>> Fatal cfengine error: Software assertion failure >>>>>>>>> cf-agent was not able to get confirmation of promises from >>>>>>>>> cf-promises, so going to failsafe >>>>>>>>> cf-serverd started. [OK] >>>>>>>>> >>>>>>>>> So that I've been forced to roll back to 3.0.4p2 and still getting >>>>>>>>> multiple client connection errors now. Any help would be appreciated >>>>>>>>> much. >>>>>>>>> >>>>>>>>> 2010/5/28 Seva Gluschenko<seva.glusche...@gmail.com>: >>>>>>>>> >>>>>>>>>> Thank you for pointing this out. Bison wasn't installed indeed. >>>>>>>>>> >>>>>>>>>> By the way, copying /usr/bin/libtool didn't help until I copied >>>>>>>>>> ltmain.sh and missing from /usr/share/libtool as well. Now I've been >>>>>>>>>> able to compile cfengine, but RPM packaging issues are still >>>>>>>>>> demanding >>>>>>>>>> to be solved. Working on it. >>>>>>>>>> >>>>>>>>>> 2010/5/28 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>>>>>> >>>>>>>>>>> Install bison >>>>>>>>>>> >>>>>>>>>>> Seva Gluschenko wrote: >>>>>>>>>>> >>>>>>>>>>>> Unfortunately, things went wrong much further than I estimated. >>>>>>>>>>>> After >>>>>>>>>>>> installation from the home directory with plain make install I've >>>>>>>>>>>> got >>>>>>>>>>>> the following error: >>>>>>>>>>>> >>>>>>>>>>>> cf3:/var/cfengine/inputs/groups.cf:441,12: yacc stack overflow, >>>>>>>>>>>> near token ',' >>>>>>>>>>>> >>>>>>>>>>>> well, my groups definition contain quite long "or" lists because it >>>>>>>>>>>> was the only way I've found to have a chance to define server >>>>>>>>>>>> groups. >>>>>>>>>>>> Now I rolled back to cfengine-community 3.0.4p2 from RPM since it >>>>>>>>>>>> doesn't have yacc stack overflows. Is there any method to increase >>>>>>>>>>>> its >>>>>>>>>>>> stack at the build time? >>>>>>>>>>>> >>>>>>>>>>>> 2010/5/28 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>>>>>>>> >>>>>>>>>>>>> Right - copy libtool from your system into the directory also >>>>>>>>>>>>> >>>>>>>>>>>>> cp /usr/bin/libtool . >>>>>>>>>>>>> >>>>>>>>>>>>> and try again (might need aclocal again) >>>>>>>>>>>>> >>>>>>>>>>>>> Seva Gluschenko wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Mark, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you for your helpful advice, the following worked: >>>>>>>>>>>>>> >>>>>>>>>>>>>> aclocal >>>>>>>>>>>>>> automake -a -c >>>>>>>>>>>>>> make >>>>>>>>>>>>>> >>>>>>>>>>>>>> But when I wrote cfengine.spec to build an RPM, build failed >>>>>>>>>>>>>> with the >>>>>>>>>>>>>> following output: >>>>>>>>>>>>>> >>>>>>>>>>>>>> if /bin/sh ../libtool --tag=CC --mode=compile gcc >>>>>>>>>>>>>> -DHAVE_CONFIG_H -I. >>>>>>>>>>>>>> -I. -I. -I/usr/include/db4 -I/usr/include -pthread -g -O2 >>>>>>>>>>>>>> -Wreturn-type -Wmissing-prototypes -Wuninitialized -pthread -g >>>>>>>>>>>>>> -O2 >>>>>>>>>>>>>> -I/usr/include/db4 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 >>>>>>>>>>>>>> -pthread >>>>>>>>>>>>>> -g -O2 -I/usr/include/db4 -D_LARGEFILE_SOURCE >>>>>>>>>>>>>> -D_FILE_OFFSET_BITS=64 >>>>>>>>>>>>>> -MT libpromises_la-cf3parse.lo -MD -MP -MF >>>>>>>>>>>>>> ".deps/libpromises_la-cf3parse.Tpo" -c -o >>>>>>>>>>>>>> libpromises_la-cf3parse.lo >>>>>>>>>>>>>> `test -f 'cf3parse.c' || echo './'`cf3parse.c; \ >>>>>>>>>>>>>> then mv -f ".deps/libpromises_la-cf3parse.Tpo" >>>>>>>>>>>>>> ".deps/libpromises_la-cf3parse.Plo"; else rm -f >>>>>>>>>>>>>> ".deps/libpromises_la-cf3parse.Tpo"; exit 1; fi >>>>>>>>>>>>>> ../libtool: line 466: CDPATH: command not found >>>>>>>>>>>>>> ../libtool: line 1144: func_opt_split: command not found >>>>>>>>>>>>>> libtool: Version mismatch error. This is libtool 2.2.6, but the >>>>>>>>>>>>>> libtool: definition of this LT_INIT comes from an older release. >>>>>>>>>>>>>> libtool: You should recreate aclocal.m4 with macros from libtool >>>>>>>>>>>>>> 2.2.6 >>>>>>>>>>>>>> libtool: and run autoconf again. >>>>>>>>>>>>>> make[2]: *** [libpromises_la-cf3parse.lo] Error 1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> any ideas how to get rid of this? It hadn't happened upon plain >>>>>>>>>>>>>> build >>>>>>>>>>>>>> in the home directory. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2010/5/28 Mark Burgess<mark.burg...@iu.hio.no>: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ah this is the perennial problem with these snapshots >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Run >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ./aclocal >>>>>>>>>>>>>>> make >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If that doesn't work, try >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ./aclocal >>>>>>>>>>>>>>> automake -a -c >>>>>>>>>>>>>>> make >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Seva Gluschenko wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Mark, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm experiencing problems trying to build the latest svn on >>>>>>>>>>>>>>>> CentOS5. >>>>>>>>>>>>>>>> First of all, there's no automake 1.10 in RPM available, so >>>>>>>>>>>>>>>> I've >>>>>>>>>>>>>>>> patched configure script downgrading version to 1.9. Even >>>>>>>>>>>>>>>> though, make >>>>>>>>>>>>>>>> fails with the following output: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> $ cd .&& /bin/sh /tmp/cfengine-3.0.5/missing --run >>>>>>>>>>>>>>>> automake-1.9 --gnu >>>>>>>>>>>>>>>> src/Makefile.am:8: Libtool library used but `LIBTOOL' is >>>>>>>>>>>>>>>> undefined >>>>>>>>>>>>>>>> src/Makefile.am:8: >>>>>>>>>>>>>>>> src/Makefile.am:8: The usual way to define `LIBTOOL' is to add >>>>>>>>>>>>>>>> `AC_PROG_LIBTOOL' >>>>>>>>>>>>>>>> src/Makefile.am:8: to `configure.ac' and run `aclocal' and >>>>>>>>>>>>>>>> `autoconf' again. >>>>>>>>>>>>>>>> src/Makefile.am: required file `./compile' not found >>>>>>>>>>>>>>>> WARNING: `automake-1.9' is needed, and you do not seem to have >>>>>>>>>>>>>>>> it handy on your >>>>>>>>>>>>>>>> system. You might have modified some files without >>>>>>>>>>>>>>>> having the >>>>>>>>>>>>>>>> proper tools for further handling them. Check the >>>>>>>>>>>>>>>> `README' file, >>>>>>>>>>>>>>>> it often tells you about the needed prerequirements >>>>>>>>>>>>>>>> for installing >>>>>>>>>>>>>>>> this package. You may also peek at any GNU archive >>>>>>>>>>>>>>>> site, in case >>>>>>>>>>>>>>>> some other package would contain this missing >>>>>>>>>>>>>>>> `automake-1.9' program. >>>>>>>>>>>>>>>> make: *** [Makefile.in] Error 1 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Despite LIBTOOL is defined in configure and present in the >>>>>>>>>>>>>>>> tree. I've >>>>>>>>>>>>>>>> tried to switch to the system-wide libtool but got no success. >>>>>>>>>>>>>>>> At this >>>>>>>>>>>>>>>> point I'm stuck. Is there any change to get some early RPM >>>>>>>>>>>>>>>> build for >>>>>>>>>>>>>>>> CentOS5? We've already faced problems with servers which >>>>>>>>>>>>>>>> weren't >>>>>>>>>>>>>>>> managed until they keys were removed from the master server >>>>>>>>>>>>>>>> because of >>>>>>>>>>>>>>>> bad key issue. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2010/5/28 Mark<m...@iu.hio.no>: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Try the latest svn in case some recent changes could affect >>>>>>>>>>>>>>>>> this. Just a >>>>>>>>>>>>>>>>> suggestion. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Mark >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 27 May 2010, at 13:06, Seva >>>>>>>>>>>>>>>>> Gluschenko<seva.glusche...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello folks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> There's an error report which happens on regular basis since >>>>>>>>>>>>>>>>>> a number >>>>>>>>>>>>>>>>>> of managed servers grew to 100+: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> BAD: keys did not match >>>>>>>>>>>>>>>>>> !! Authentication dialogue with X.X.X.X failed >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm virtually sure that there're no hijacking attempts in my >>>>>>>>>>>>>>>>>> network, >>>>>>>>>>>>>>>>>> so I suppose that happens because of some server >>>>>>>>>>>>>>>>>> limitations. I rose >>>>>>>>>>>>>>>>>> initial maxchildren setting from 1000 to 5000 in body server >>>>>>>>>>>>>>>>>> control, >>>>>>>>>>>>>>>>>> but it doesn't seem to have effect. Any ideas? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> SY, Seva Gluschenko. >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> Help-cfengine mailing list >>>>>>>>>>>>>>>>>> Help-cfengine@cfengine.org >>>>>>>>>>>>>>>>>> https://cfengine.org/mailman/listinfo/help-cfengine >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Mark Burgess >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>>>> Professor of Network and System Administration >>>>>>>>>>>>>>> Oslo University College, Norway >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>>>>>>>>>> Office Telf : +47 22453272 >>>>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Mark Burgess >>>>>>>>>>>>> >>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>> Professor of Network and System Administration >>>>>>>>>>>>> Oslo University College, Norway >>>>>>>>>>>>> >>>>>>>>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>>>>>>>> Office Telf : +47 22453272 >>>>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Mark Burgess >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>> Professor of Network and System Administration >>>>>>>>>>> Oslo University College, Norway >>>>>>>>>>> >>>>>>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>>>>>> Office Telf : +47 22453272 >>>>>>>>>>> ------------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> SY, Seva Gluschenko. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> SY, Seva Gluschenko. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> Mark Burgess >>>>>>> >>>>>>> ------------------------------------------------- >>>>>>> Professor of Network and System Administration >>>>>>> Oslo University College, Norway >>>>>>> >>>>>>> Personal Web: http://www.iu.hio.no/~mark >>>>>>> Office Telf : +47 22453272 >>>>>>> ------------------------------------------------- >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Mark Burgess >>>>> >>>>> ------------------------------------------------- >>>>> Professor of Network and System Administration >>>>> Oslo University College, Norway >>>>> >>>>> Personal Web: http://www.iu.hio.no/~mark >>>>> Office Telf : +47 22453272 >>>>> ------------------------------------------------- >>>>> >>>>> >>>> >>>> >>>> >>> -- >>> Mark Burgess >>> >>> ------------------------------------------------- >>> Professor of Network and System Administration >>> Oslo University College, Norway >>> >>> Personal Web: http://www.iu.hio.no/~mark >>> Office Telf : +47 22453272 >>> ------------------------------------------------- >>> >>> >> >> >> > > _______________________________________________ > Help-cfengine mailing list > Help-cfengine@cfengine.org > https://cfengine.org/mailman/listinfo/help-cfengine > -- SY, Seva Gluschenko. _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine