Yeah bummers, but something tells me it might not be OpenMPI's fault. Here's 
why:

1- The tech that takes care of these machines told me that he gets RTC errors 
on bootup (the cpu borads are apprantly "out of sync" since the clocks aren't 
set correctly).
2- There is also a possibility that the prior admin did not put in a "stable" 
firmware version.

So if any Sun guru can help out by telling me which command or point to a quick 
HOWTO for resolvin these clock issues, it would be greatly appreciated (our 
analyst is overloaded and he would not be able to justify the 3 days of reading 
up docs just to satisfy my running parallel code problems ;P)

3- I realised that the OS is not booted in 64 O_o!! (not that this has to do 
with OpenMPI bombing):

Jun 21 07:45:15 unknown genunix: [ID 540533 kern.notice] ^MSunOS Release 5.8 
Version Generic_108528-29 32-bit
Jun 21 07:45:15 unknown NOTICE: 64-bit OS installed, but the 32-bit OS is the 
default
Jun 21 07:45:15 unknown Booting the 32-bit OS ...

4- LAM-MPI 7.1.1 also bombs, but it does so at a much higher processor count 
(OpenMPI bombs at 5, LAM-MPI bombs around 10, but it vraies).

As for the questions regarding OpenMPI build, I just recently built 1.1 with 
the same basic configure options with the exact same results (clean cache).

So, I guess this one is on pause untill I have the confirmation that the clocks 
on the processor boards are set correctly. There is one this that bothers me 
though, one of the machines has only 1 processor board (4 procs) and I still 
get the error on that machine if I go over 4 pcrosesses...how can a board be 
out of sync with itself??

Eric
PS: I am at liberty of providing the source code if anyone wants it.

Le mercredi 28 juin 2006 08:56, Jeff Squyres (jsquyres) a écrit :
> Bummer!  :-(
>  
> Just to be sure -- you had a clean config.cache file before you ran 
> configure, right?  (e.g., the file didn't exist -- just to be sure it didn't 
> get potentially erroneous values from a previous run of configure)  Also, 
> FWIW, it's not necessary to specify --enable-ltdl-convenience; that should be 
> automatic.
>  
> If you had a clean configure, we *suspect* that this might be due to 
> alignment issues on Solaris 64 bit platforms, but thought that we might have 
> had a pretty good handle on it in 1.1.  Obviously we didn't solve everything. 
>  Bonk.
>  
> Did you get a corefile, perchance?  If you could send a stack trace, that 
> would be most helpful.
> 
> 
[...snip...]

Reply via email to