Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-26 Thread Dave Love
"Kevin H. Hobbs" writes: > The program links to fedora's copies of the libraries of interest : > > mpirun -n 1 ldd mpi_simple | grep hwloc > libhwloc.so.5 => /lib64/libhwloc.so.5 (0x003c5760) [I'm surprised it's in /lib64.] > mpirun -n 1 ldd mpi_simple | grep mpi > libmpi.so.1 => /u

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Kevin H. Hobbs
On 07/23/2013 02:22 PM, Ralph Castain wrote: > Yeah, it's failing when trying to unpack the topology obtained from > hwloc. My guess is that one of the following calls changed in > hwloc-1.4.3: > It appears to be this one. hwloc_topology_set_xmlbuffer I'll return what I've gathered so far to th

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Ralph Castain
That's understandable - if you don't disable xml2, then hwloc uses the xml2 library to do the topology encoding. We rely on their internal "quasi-xml" encoding method, which I believe provides some different data (and definitely different format). I suspect this is causing the confusion, though

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Kevin H. Hobbs
On 07/23/2013 02:22 PM, Ralph Castain wrote: > Yeah, it's failing when trying to unpack the topology obtained from hwloc. What I find very interesting is that the hwloc configure options --disable-cairo --disable-libxml2 turn the bug off. I'll keep walking through the execution in gdb maybe I'll

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Ralph Castain
Yeah, it's failing when trying to unpack the topology obtained from hwloc. My guess is that one of the following calls changed in hwloc-1.4.3: if (0 != hwloc_topology_set_xmlbuffer(t, xmlbuffer, strlen(xmlbuffer))) { rc = OPAL_ERROR; free(xmlbuffer); h

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Kevin H. Hobbs
On 07/23/2013 09:54 AM, Jeff Squyres (jsquyres) wrote: > > I don't know if Fedora RPMs include -g in their builds, or if Fedora > includes a debuginfo RPM that you could install such that you can attach > a debugger and be able to dig into OMPI's internals yourself. > There is a debuginfo packag

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Kevin H. Hobbs
On 07/23/2013 06:56 AM, Jeff Squyres (jsquyres) wrote: > With this embedded mechanism, we're calling hwloc's configury with > the moral equivalent of: > > ./configure --disable-cairo --disable-libxml2 --enable-xml > --with-hwloc-symbol-prefix=opal_hwloc152_ --enable-embedded-mode I configured hwl

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Jeff Squyres (jsquyres)
Kevin -- I don't know if Fedora RPMs include -g in their builds, or if Fedora includes a debuginfo RPM that you could install such that you can attach a debugger and be able to dig into OMPI's internals yourself. If that doesn't work, you might need to build from source yourself, link against

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Kevin H. Hobbs
On 07/23/2013 09:36 AM, Ralph Castain wrote: > The Fedora package is built optimized, so no OMPI debugging output is > available and a debugger won't tell us a lot. The fedora package comes with a debuginfo package that has everything gdb needs to let me step through the openmpi functions. I also

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Ralph Castain
I see - I didn't look at the redhat bug list. Sadly, I have no idea how to debug it. The Fedora package is built optimized, so no OMPI debugging output is available and a debugger won't tell us a lot. Best guess is that there is something in the build that doesn't match the user's system. The n

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Jeff Squyres (jsquyres)
On Jul 23, 2013, at 8:54 AM, Ralph Castain wrote: >> Yes, it's curious that they can't reproduce your issue, > > Guess I missed this - where does it say that they can't reproduce the issue?? > I'm suspicious because build-from-source produced a working result. Orion mentioned it in https://bug

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Ralph Castain
On Jul 23, 2013, at 3:56 AM, Jeff Squyres (jsquyres) wrote: > On Jul 21, 2013, at 8:50 AM, Kevin H. Hobbs wrote: > >>> Ah! That would indicate an issue with the external hwloc >>> package they provided, which is the big reason we don't >>> recommend installing from packages. >> >> I'll happil

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-23 Thread Jeff Squyres (jsquyres)
On Jul 21, 2013, at 8:50 AM, Kevin H. Hobbs wrote: >> Ah! That would indicate an issue with the external hwloc >> package they provided, which is the big reason we don't >> recommend installing from packages. > > I'll happily report the bug to the hwloc developers. I don't think that this is ne

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-21 Thread Kevin H. Hobbs
On 07/20/2013 04:14 PM, Ralph Castain wrote: > Ah! That would indicate an issue with the external hwloc > package they provided, which is the big reason we don't > recommend installing from packages. I'll happily report the bug to the hwloc developers. I'll also add what we've found here to the b

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-20 Thread Ralph Castain
Ah! That would indicate an issue with the external hwloc package they provided, which is the big reason we don't recommend installing from packages. We have internal copies of hwloc and libevent that ensure (a) they are at the proper level, and (b) they are configured properly for OMPI's use. W

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-20 Thread Kevin H. Hobbs
On 07/20/2013 10:28 AM, Ralph Castain wrote: > avoid the packages as you have no idea how they were built So I built openmpi-1.6.5 from the tar ball and of course everything works fine well my simple program got through Mpi_init and printed its rank. I configured it very simply : ./configu

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-20 Thread Kevin H. Hobbs
On 07/20/2013 10:28 AM, Ralph Castain wrote: > Afraid I have no earthly idea of the problem - you might try > taking it up with the package provider. This is a link to the bug report filed in the Fedora bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=986409 The advice I got there was to com

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-20 Thread Ralph Castain
Afraid I have no earthly idea of the problem - you might try taking it up with the package provider. I usually advise people to avoid the packages as you have no idea how they were built and thus might find they don't fully support your configuration. Not that hard to just download and build the

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-20 Thread Kevin H. Hobbs
On 07/19/2013 08:27 PM, Jeff Squyres (jsquyres) wrote: > Not offhand. The error you're seeing *typically* indicates > that you've got a mismatch of OMPI version somewhere. So now the fun part for me is to try and find it or in failing to find it eliminate the multiple versions theory. > Are you

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-19 Thread Jeff Squyres (jsquyres)
Not offhand. The error you're seeing *typically* indicates that you've got a mismatch of OMPI version somewhere. Are you running on multiple machines with different Open MPI versions, perchance? If you're running only on a single machine, try completely uninstalling the Open MPI package, re-i

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-19 Thread Kevin H. Hobbs
On 07/19/2013 05:11 PM, Ralph Castain wrote: > Are you sure you're using the same version of OMPI on this new OS? No, I'm sure I'm using a different version of Open MPI in Fedora 18 from the one I was using in Fedora 17. I have only the Open MPI provided by the Fedora distribution. > They typica

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-19 Thread Ralph Castain
Are you sure you're using the same version of OMPI on this new OS? They typically distribute one in your default path, so I'd check to ensure that you really are using the version you think. On Jul 19, 2013, at 12:49 PM, "Kevin H. Hobbs" wrote: > I just upgraded the OS on one of my workstatio

[OMPI users] After OS Update MPI_Init fails on one host

2013-07-19 Thread Kevin H. Hobbs
I just upgraded the OS on one of my workstations from Fedora 17 to 18 and now I can't run even the simplest MPI programs. I filed a bug report with Fedora's bug tracker : https://bugzilla.redhat.com/show_bug.cgi?id=986409 My simple program is attached as mpi_simple.c mpicc works : mpicc -g -