Hello John, Your assumption is ok. I can not do the facter loop because we are in a production environment. Every time I run puppet on this machines I make sure I can reach its IPMI interface so I can reboot the machine in few minutes. Thanks for you help Regards.
2012/11/28 jcbollinger <john.bollin...@stjude.org> > > > On Wednesday, November 28, 2012 4:49:13 AM UTC-6, Mon wrote: >> >> Hello John, >> Thanks for your answer. I have open an issue with my hardward >> manufacturer and so I will do it with my SO one. >> Anyway I paste the strace listings so maybe someone can shed light on it: >> >> server1: >> >> BIOS: American Megatrends Inc. 1.2 >> SYS: Supermicro X8SIE >> CPU: Intel(R) Core(TM) i3 CPU 550 @ 3.20GHz [4 cores] >> MEM: >> SLOT0 2048 MB >> SLOT1 2048 MB >> >> >> open("/usr/lib/ruby/1.8/**facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3 >> close(3) = 0 >> open("/usr/lib/ruby/1.8/**facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3 >> fstat64(3, {st_mode=S_IFREG|0644, st_size=800, ...}) = 0 >> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> = 0xb7297000 >> read(3, "# Fact: osfamily\n#\n# Purpose: Re"..., 4096) = 800 >> ......CRASH >> >> >> server2: >> >> BIOS: American Megatrends Inc. 1.2 >> SYS: Supermicro X8SIE >> CPU: Intel(R) Core(TM) i3 CPU 560 @ 3.33GHz [4 cores] >> MEM: >> SLOT0 2048 MB >> SLOT1 2048 MB >> >> >> >> stat64("/usr/sbin/dmidecode", {st_mode=S_IFREG|0755, st_size=48408, ...}) >> = 0 >> pipe([3, 4]) = 0 >> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|**CLONE_CHILD_SETTID|SIGCHLD, >> child_tidptr=0xb74e5ba8) = 8709 >> close(4) = 0 >> fcntl64(3, F_GETFL) = 0 (flags O_RDONLY) >> fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 >> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> = 0xb725e000 >> _llseek(3, 0, 0xbf900930, SEEK_CUR) = -1 ESPIPE(Illegal seek) >> fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 >> read(3, "# dmidecode 2.9\nSMBIOS 2.6 prese"..., 1024) = 1024 >> read(3, "oot is supported\n\t\tBIOS boot spe"..., 1024) = 1024 >> read(3, "tate: Safe\n\tThermal State: Safe\n"..., 1024) = 1024 >> read(3, "Maximum Size: 128 KB\n\tSupported "..., 1024) = 1024 >> read(3, "e 5, 28 bytes\nMemory Controller "..., 1024) = 1024 >> read(3, " Installed\n\tError Status: OK\n\nHa"..., 1024) = 1024 >> read(3, " type 8, 9 bytes\nPort Connector "..., 1024) = 1024 >> read(3, "ternal Reference Designator: LPT"..., 1024) = 1024 >> read(3, "nal Reference Designator: Not Sp"..., 1024) = 1024 >> read(3, "nator: Not Specified\n\tExternal C"..., 1024) = 1024 >> read(3, "or Type: None\n\tPort Type: Other\n"..., 1024) = 1024 >> read(3, "ector Information\n\tInternal Refe"..., 1024) = 1024 >> read(3, "\tLength: Short\n\tID: 1\n\tCharacter"..., 1024) = 1024 >> read(3, "escriptor 5: POST error\n\tData Fo"..., 1024) = 1024 >> read(3, "ype 19, 15 bytes\nMemory Array Ma"..., 1024) = 1024 >> read(3, " Width: Unknown\n\tSize: No Module"..., 1024) = 1024 >> read(3, "ry Device Mapped Address\n\tStarti"..., 1024) = 1024 >> read(3, "on Handle: Not Provided\n\tTotal W"..., 1024) = 1024 >> --- SIGCHLD (Child exited) @ 0 (0) --- >> read(3, "\n\nHandle 0x0039, DMI type 20, 19"..., 1024) = 1024 >> read(3, "on-recoverable Threshold: 6\n\nHan"..., 1024) = 1024 >> read(3, "UT OF SPEC>\n\tCooling Unit Group:"..., 1024) = 1024 >> read(3, "ed: Yes\n\tHot Replaceable: No\n\tCo"..., 1024) = 669 >> read(3, "", 1024) = 0 >> close(3) = 0 >> munmap(0xb725e000, 4096) = 0 >> rt_sigaction(SIGHUP, {SIG_IGN}, {0xb77388f0, [HUP], SA_RESTART}, 8) = 0 >> rt_sigaction(SIGQUIT, {SIG_IGN}, {0xb77388f0, [QUIT], SA_RESTART}, 8) = 0 >> rt_sigaction(SIGINT, {SIG_IGN}, {0xb77388f0, [INT], SA_RESTART}, 8) = 0 >> waitpid(8709, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 8709 >> rt_sigaction(SIGHUP, {0xb77388f0, [HUP], SA_RESTART}, {SIG_IGN}, 8) = 0 >> rt_sigaction(SIGQUIT, {0xb77388f0, [QUIT], SA_RESTART}, {SIG_IGN}, 8) = 0 >> rt_sigaction(SIGINT, {0xb77388f0, [INT], SA_RESTART}, {SIG_IGN}, 8) = 0 >> ............ >> sigprocmask(SIG_SETMASK, [], NULL) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_SETMASK, [], NULL) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> ............. >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_SETMASK, [], NULL) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> ......... >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> .......CRASH >> >> > I'm supposing that ".......CRASH" means "more of the same syscall, with > similar results, until the trace ends on account of a system crash. > > The second trace says nothing useful, as far as I can tell. The last > thing it shows before all the signal mask handling is the successful > completion of a fact evaluation. > > The first trace is not much more helpful. The last thing it shows is > Facter reading the Ruby code for the 'osfamily' fact. That might indicate > that it is during evaluation of that fact that the system crashed, but it's > too far removed from fact evaluation for me to have any confidence in that. > > My bet would be that the crash cuts off communication before its cause is > reported in the trace, as I warned might be the case. > > Here's another thing you could try: since facter doesn't always crash the > system (if I understand correctly), you should be able to get a list of all > the facts it is evaluating (and their values) by running "facter -p" from > the command line. Take that list, and use it to stress test facter on each > fact individually (i.e. run facter -p <factname> many times in a loop), in > a way that lets you be sure you always know which fact is currently under > test. In this way you may be able to identify one or more facts whose > evaluation sometimes crashes the machine. > > Note: don't neglect the "or more" above. It is conceivable that your > problem is deeper than just one fact. > > Once you know the facts with which the problem is associated, we can > investigate the commands facter is running, and thereby narrow down the > cause of the crash. > > > John > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/puppet-users/-/B7AKDJ-7U40J. > > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.