Hello John,
Your assumption is ok.
I can not do the facter loop because we are in a production environment.
Every time I run puppet on this machines I make sure I can reach its IPMI
interface so I can reboot the machine in few minutes.
Thanks for you help
Regards.


2012/11/28 jcbollinger <john.bollin...@stjude.org>

>
>
> On Wednesday, November 28, 2012 4:49:13 AM UTC-6, Mon wrote:
>>
>> Hello John,
>> Thanks for your answer. I have open an issue with my hardward
>> manufacturer and so I will do it with my SO one.
>> Anyway I paste the strace listings so maybe someone can shed light on it:
>>
>> server1:
>>
>> BIOS: American Megatrends Inc. 1.2
>> SYS: Supermicro X8SIE
>> CPU: Intel(R) Core(TM) i3 CPU 550 @ 3.20GHz [4 cores]
>> MEM:
>>   SLOT0  2048 MB
>>   SLOT1  2048 MB
>>
>>
>> open("/usr/lib/ruby/1.8/**facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3
>> close(3) = 0
>> open("/usr/lib/ruby/1.8/**facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3
>> fstat64(3, {st_mode=S_IFREG|0644, st_size=800, ...}) = 0
>> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
>> = 0xb7297000
>> read(3, "# Fact: osfamily\n#\n# Purpose: Re"..., 4096) = 800
>> ......CRASH
>>
>>
>> server2:
>>
>> BIOS: American Megatrends Inc. 1.2
>> SYS: Supermicro X8SIE
>> CPU: Intel(R) Core(TM) i3 CPU 560 @ 3.33GHz [4 cores]
>> MEM:
>>   SLOT0  2048 MB
>>   SLOT1  2048 MB
>>
>>
>>
>> stat64("/usr/sbin/dmidecode", {st_mode=S_IFREG|0755, st_size=48408, ...})
>> = 0
>> pipe([3, 4]) = 0
>> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|**CLONE_CHILD_SETTID|SIGCHLD,
>> child_tidptr=0xb74e5ba8) = 8709
>> close(4) = 0
>> fcntl64(3, F_GETFL) = 0 (flags O_RDONLY)
>> fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
>> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
>> = 0xb725e000
>> _llseek(3, 0, 0xbf900930, SEEK_CUR) = -1 ESPIPE(Illegal seek)
>> fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
>> read(3, "# dmidecode 2.9\nSMBIOS 2.6 prese"..., 1024) = 1024
>> read(3, "oot is supported\n\t\tBIOS boot spe"..., 1024) = 1024
>> read(3, "tate: Safe\n\tThermal State: Safe\n"..., 1024) = 1024
>> read(3, "Maximum Size: 128 KB\n\tSupported "..., 1024) = 1024
>> read(3, "e 5, 28 bytes\nMemory Controller "..., 1024) = 1024
>> read(3, " Installed\n\tError Status: OK\n\nHa"..., 1024) = 1024
>> read(3, " type 8, 9 bytes\nPort Connector "..., 1024) = 1024
>> read(3, "ternal Reference Designator: LPT"..., 1024) = 1024
>> read(3, "nal Reference Designator: Not Sp"..., 1024) = 1024
>> read(3, "nator: Not Specified\n\tExternal C"..., 1024) = 1024
>> read(3, "or Type: None\n\tPort Type: Other\n"..., 1024) = 1024
>> read(3, "ector Information\n\tInternal Refe"..., 1024) = 1024
>> read(3, "\tLength: Short\n\tID: 1\n\tCharacter"..., 1024) = 1024
>> read(3, "escriptor 5: POST error\n\tData Fo"..., 1024) = 1024
>> read(3, "ype 19, 15 bytes\nMemory Array Ma"..., 1024) = 1024
>> read(3, " Width: Unknown\n\tSize: No Module"..., 1024) = 1024
>> read(3, "ry Device Mapped Address\n\tStarti"..., 1024) = 1024
>> read(3, "on Handle: Not Provided\n\tTotal W"..., 1024) = 1024
>> --- SIGCHLD (Child exited) @ 0 (0) ---
>> read(3, "\n\nHandle 0x0039, DMI type 20, 19"..., 1024) = 1024
>> read(3, "on-recoverable Threshold: 6\n\nHan"..., 1024) = 1024
>> read(3, "UT OF SPEC>\n\tCooling Unit Group:"..., 1024) = 1024
>> read(3, "ed: Yes\n\tHot Replaceable: No\n\tCo"..., 1024) = 669
>> read(3, "", 1024) = 0
>> close(3) = 0
>> munmap(0xb725e000, 4096) = 0
>> rt_sigaction(SIGHUP, {SIG_IGN}, {0xb77388f0, [HUP], SA_RESTART}, 8) = 0
>> rt_sigaction(SIGQUIT, {SIG_IGN}, {0xb77388f0, [QUIT], SA_RESTART}, 8) = 0
>> rt_sigaction(SIGINT, {SIG_IGN}, {0xb77388f0, [INT], SA_RESTART}, 8) = 0
>> waitpid(8709, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 8709
>> rt_sigaction(SIGHUP, {0xb77388f0, [HUP], SA_RESTART}, {SIG_IGN}, 8) = 0
>> rt_sigaction(SIGQUIT, {0xb77388f0, [QUIT], SA_RESTART}, {SIG_IGN}, 8) = 0
>> rt_sigaction(SIGINT, {0xb77388f0, [INT], SA_RESTART}, {SIG_IGN}, 8) = 0
>> ............
>> sigprocmask(SIG_SETMASK, [], NULL) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_SETMASK, [], NULL) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> .............
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_SETMASK, [], NULL) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> .........
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>> .......CRASH
>>
>>
> I'm supposing that ".......CRASH" means "more of the same syscall, with
> similar results, until the trace ends on account of a system crash.
>
> The second trace says nothing useful, as far as I can tell.  The last
> thing it shows before all the signal mask handling is the successful
> completion of a fact evaluation.
>
> The first trace is not much more helpful.  The last thing it shows is
> Facter reading the Ruby code for the 'osfamily' fact.  That might indicate
> that it is during evaluation of that fact that the system crashed, but it's
> too far removed from fact evaluation for me to have any confidence in that.
>
> My bet would be that the crash cuts off communication before its cause is
> reported in the trace, as I warned might be the case.
>
> Here's another thing you could try: since facter doesn't always crash the
> system (if I understand correctly), you should be able to get a list of all
> the facts it is evaluating (and their values) by running "facter -p" from
> the command line.  Take that list, and use it to stress test facter on each
> fact individually (i.e. run facter -p <factname> many times in a loop), in
> a way that lets you be sure you always know which fact is currently under
> test.  In this way you may be able to identify one or more facts whose
> evaluation sometimes crashes the machine.
>
> Note: don't neglect the "or more" above.  It is conceivable that your
> problem is deeper than just one fact.
>
> Once you know the facts with which the problem is associated, we can
> investigate the commands facter is running, and thereby narrow down the
> cause of the crash.
>
>
> John
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-users/-/B7AKDJ-7U40J.
>
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to