Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-24 Thread Dave Love
Brock Palen writes: > Well I have a new wrench into this situation. > We have a power failure at our datacenter took down our entire system > nodes,switch,sm. > Now I am unable to produce the error with oob default ibflags etc. As far as I know, we could still reproduce it. Mail me if you ne

Re: [OMPI users] Openib with > 32 cores per node

2011-05-24 Thread Dave Love
Jeff Squyres writes: > Assuming you built OMPI with PSM support: > > mpirun --mca pml cm --mca mtl psm > > (although probably just the pml/cm setting is sufficient -- the mtl/psm > option will probably happen automatically) For what it's worth, you needn't specify anything to get psm u

Re: [OMPI users] data types and alignment to word boundary

2011-06-29 Thread Dave Goodell
of" macro for this calculation. Using it instead of the pointer math above greatly improves readability: http://en.wikipedia.org/wiki/Offsetof So the second line becomes: 8< aiDispsT5[1] = offsetof(tVStruct, sCapacityFile); 8< -Dave

[OMPI users] MPI_Isend delay

2011-07-14 Thread dave fournier
I have a master-slave setup. I have noticed that when I send a message from the master to the slave processes using MPI_Isend it never gets sent until I encounter an MPI_recv in the master process. As a result rhe slave processes are wasting time waiting for the message. If I use MPI_Send inst

Re: [OMPI users] MPI_Isend delay

2011-07-14 Thread dave fournier
2.) I have attached to it with gdb previously to monitor the behaviour. On Jul 14, 2011, at 5:50 PM, dave fournier wrote: I have a master-slave setup. I have noticed that when I send a message from the master to the slave processes using MPI_Isend it never gets sent until I encounter an MPI_recv in

Re: [OMPI users] MPI_Isend delay

2011-07-14 Thread dave fournier
On 11-07-14 06:37 PM, Jeff Squyres wrote: OK, Thanks, that is exactly what I needed to know. Dave On Jul 14, 2011, at 8:33 PM, dave fournier wrote: Sorry I should have said it doesn't get sent until the *master* encounters an MPI_recv. Then suddenly the slave finally gets the me

Re: [OMPI users] MPI defined macro

2011-08-23 Thread Dave Goodell
This has been discussed previously in the MPI Forum: http://lists.mpi-forum.org/mpi-forum/2010/11/0838.php I think it resulted in this proposal, but AFAIK it was never pushed forward by a regular attendee of the Forum: https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ReqPPMacro -Dave On Aug

[OMPI users] ompi-checkpoint problem on shared storage

2011-09-23 Thread Dave Schulz
pecially where the mpirun doesn't quit after a checkpoint with --term but the worker processes do? Also, is there some sort of somewhat unusual filesystem semantic that our shared filesystem may not support that ompi/ompi-checkpoint is needing? Thanks for any insights you may have. -Dave

Re: [OMPI users] ompi-checkpoint problem on shared storage

2011-09-27 Thread Dave Schulz
85 prw-r- 1 root root0 Sep 27 15:21 opal_cr_prog_write.9584 prw-r- 1 root root0 Sep 27 15:21 opal_cr_prog_write.9585 I believe those are pipes. But why they aren't cleaned up after the checkpoint completes, I don't understand as the job may be restarted on a different batch

[OMPI users] remote spawned process hangs at MPI_Init

2011-10-15 Thread dave fournier
otely spawn it then junk11 contains only the line before the call to MPI_Init calling MPI_Init and the spawned process appears to have crashed. The master process hangs at the spawn command. The code to spawn the remote process is MPI_Info infotest; int ierr2=MPI_Info_create(&info

[OMPI users] remote spawned process hangs at MPI_Init

2011-10-15 Thread dave fournier
OK, I found that if I inovke the master process with mpirun as in mpirun ./orange -master Then the remote process is successful in the MPI_Init call. I would like to avoid using mpirun if possible. It seems to be responsible for setting up communication between the two machines in so

[OMPI users] checkpointing on other transports

2012-01-12 Thread Dave Love
What would be involved in adding checkpointing to other transports, specifically the PSM MTL? Are there (likely to be?) technical obstacles, and would it be a lot of work if not? I'm asking in case it would be easy, and we don't have to exclude QLogic from a procurement, given they won't respond

Re: [OMPI users] ompi + bash + GE + modules

2012-01-12 Thread Dave Love
Surely this should be on the gridengine list -- and it's in recent archives -- but there's some ob-openmpi below. Can Notre Dame not get the support they've paid Univa for? Reuti writes: > SGE 6.2u5 can't handle multi line environment variables or functions, > it was fixed in 6.2u6 which isn't

[OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)

2012-01-31 Thread Dave Love
This is to help anyone else having this problem, as it doesn't seem to be mentioned anywhere I can find, rather surprisingly. Core binding is broken on Interlagos with open-mpi 1.5.4. I guess it also bites on Magny-Cours, but all our systems are currently busy and I can't check. It does work, at

Re: [OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)

2012-01-31 Thread Dave Love
Brice Goglin writes: > Note that magny-Cours processors are OK, cores are "normal" there. Apologies for the bad guess about the architecture, and thanks for the info. > FWIW, the Linux kernel (at least up to 3.2) still reports wrong L2 and > L1i cache information on AMD Bulldozer. Kernel bug re

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Dave Love
Reuti writes: > Maybe it's a side effect of a tight integration that it would start on > the correct nodes (but I face an incorrect allocation of slots and an > error message at the end if started without mpiexec), as in this case > it has no command line option for the hostfile. How to get the >

Re: [OMPI users] Problems with gridengine integration on RHEL 6

2012-02-15 Thread Dave Love
Brian McNally writes: > Hello Open MPI community, > > I'm running the openmpi 1.5.3 package as provided by Redhat Enterprise > Linux 6, along with SGE 6.2u3. I've discovered that under RHEL 5 orted > gets spawned via qrsh and under RHEL 6 orted gets spanwed via > SSH. This is happening in the sam

Re: [OMPI users] Problems with gridengine integration on RHEL 6

2012-02-16 Thread Dave Love
Brian McNally writes: > Hi Dave, > > I looked through the INSTALL, VERSION, NEWS, and README files in the > 1.5.4 openmpi tarball but didn't see what you were referring to. I can't access the web site, but there's an item in the notes on the download page about the b

[OMPI users] core binding confusion

2012-03-06 Thread Dave Love
Could someone confirm whether this is a bug or misunderstanding the doc (in which case it's not just me, and it needs clarifying!)? I haven't looked at the current code in the hope of a quick authoritative answer. This is with 1.5.5rc3, originally on Interlagos, but also checked on Magny Cours.

Re: [OMPI users] core binding confusion

2012-03-06 Thread Dave Love
Ralph Castain writes: > Well, no - it shouldn't do that, so I would expect it is a bug. Will try to > look at it, but probably won't happen until next week due to travel. OK, thanks. I'll raise an issue and take a look, as we need it working soon.

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Dave Goodell
o scatter it to the remote group. > > ...right? > right. -Dave

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Dave Goodell
, the standard is really confusing in this point... Don't think of it like an intercommunicator-scatter, think of it more like an intercommunicator-allreduce. The allreduce is also bidirectional. The only difference is that instead of an allreduce (logically reduce+bcast), you instead have a reduce_scatter (logically reduce+scatterv). -Dave

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Dave Goodell
ents in the recvcounts array must be equal to the size of the LOCAL group. The text certainly could use a bit of clarification. I'll bring it up at the meeting next week. -Dave

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Dave Goodell
On May 24, 2012, at 10:34 PM CDT, George Bosilca wrote: > On May 24, 2012, at 23:18, Dave Goodell wrote: > >> So I take back my prior "right". Upon further inspection of the text and >> the MPICH2 code I believe it to be true that the number of the elements in >

Re: [OMPI users] Mellanox MLX4_EVENT_TYPE_SRQ_LIMIT kernel messages

2012-10-04 Thread Dave Love
Meanwhile, much later -- you'll sympathize: Did you have any joy with this? You wrote: > These messages appeared when running IMB compiled with openmpi 1.6.1 > across 256 cores (16 nodes, 16 cores per node). The job ran from > 09:56:54 to 10:08:46 and failed with no obvious error messages. I d

Re: [OMPI users] Mellanox MLX4_EVENT_TYPE_SRQ_LIMIT kernel messages

2012-10-05 Thread Dave Love
I wrote: > You wrote: > "Reply-to considered harmful" again, sigh.

Re: [OMPI users] MPI_IN_PLACE not working for Fortran-compiled code linked with mpicc on Mac OS X

2013-01-04 Thread Dave Goodell
g else that any investigation should probably check. For reference on the later MPICH discoveries about dynamically linking common symbols on Darwin: http://trac.mpich.org/projects/mpich/ticket/1590 -Dave

Re: [OMPI users] QLogic HCA random crash after prolonged use

2013-04-24 Thread Dave Love
"Elken, Tom" writes: >> I have seen it recommended to use psm instead of openib for QLogic cards. > [Tom] > Yes. PSM will perform better and be more stable when running OpenMPI > than using verbs. But unfortunately you won't be able to checkpoint. > Intel has acquired the InfiniBand assets of

Re: [OMPI users] QLogic HCA random crash after prolonged use

2013-04-25 Thread Dave Love
Ralph Castain writes: > On Apr 24, 2013, at 8:58 AM, Dave Love wrote: > >> "Elken, Tom" writes: >> >>>> I have seen it recommended to use psm instead of openib for QLogic cards. >>> [Tom] >>> Yes. PSM will perform better an

Re: [OMPI users] QLogic HCA random crash after prolonged use

2013-04-25 Thread Dave Love
"Elken, Tom" writes: >> > Intel has acquired the InfiniBand assets of QLogic >> > about a year ago. These SDR HCAs are no longer supported, but should >> > still work. > [Tom] > I guess the more important part of what I wrote is that " These SDR HCAs are > no longer supported" :) Sure, though

Re: [OMPI users] QLogic HCA random crash after prolonged use

2013-04-30 Thread Dave Love
Ralph Castain writes: >> Dropped CR is definitely reason not to use OMPI past 1.6. [By the way, >> the release notes are confusing, saying that DMTCP is supported, but CR >> is dropped.] I'd have hoped a vendor who needs to support CR would >> contribute, but I suppose changes just become propr

Re: [OMPI users] Problem with Openmpi-1.4.0 and qlogic-ofed-1.5.4.1

2013-04-30 Thread Dave Love
Padma Pavani writes: > Hi Team, > > I am facing some problem while running HPL benchmark. > > > > I am using Intel mpi -4.0.1 with Qlogic-OFED-1.5.4.1 to run benchmark and > also tried with openmpi-1.4.0 but getting same error. > > > Error File : > > [compute-0-1.local:06936] [[14544,1],25] ORTE

Re: [OMPI users] multithreaded jobs

2013-04-30 Thread Dave Love
Ralph Castain writes: > On Apr 25, 2013, at 5:33 PM, Vladimir Yamshchikov wrote: > >> $NSLOTS is what requested by -pe openmpi in the script, my >> understanding that by default it is threads. Is there something in the documentation that suggest

Re: [OMPI users] knem/openmpi performance?

2013-07-18 Thread Dave Love
Paul Kapinos writes: > Jeff, I would turn the question the other way around: > > - are there any penalties when using KNEM? Bull should be able to comment on that -- they turn it on by default in their proprietary OMPI derivative -- but I doubt I can get much of a story on it. Mellanox ship it

Re: [OMPI users] knem/openmpi performance?

2013-07-18 Thread Dave Love
"Elken, Tom" writes: >> I was hoping that someone might have some examples of real application >> behaviour rather than micro benchmarks. It can be crazy hard to get that >> information from users. > [Tom] > I don't have direct performance information on knem, but with Intel's > (formerly QLogic

Re: [OMPI users] knem/openmpi performance?

2013-07-18 Thread Dave Love
Mark Dixon writes: > On Mon, 15 Jul 2013, Elken, Tom wrote: > ... >> Hope these anecdotes are relevant to Open MPI users considering knem. > ... > > Brilliantly useful, thanks! It certainly looks like it may be greatly > significant for some applications. Worth looking into. > > All the best, > >

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-26 Thread Dave Love
"Kevin H. Hobbs" writes: > The program links to fedora's copies of the libraries of interest : > > mpirun -n 1 ldd mpi_simple | grep hwloc > libhwloc.so.5 => /lib64/libhwloc.so.5 (0x003c5760) [I'm surprised it's in /lib64.] > mpirun -n 1 ldd mpi_simple | grep mpi > libmpi.so.1 => /u

Re: [OMPI users] Mixing Linux's CPU-shielding with mpirun's bind-to-core

2013-08-21 Thread Dave Love
John Hearns writes: > You really should install a job scheduler. Indeed (although it's the resource management component that does the job). > There are free versions. > > I'm not sure about cpuset support in Gridengine. Anyone? Yes, but I've had reports of problems (races?) that I haven't sor

Re: [OMPI users] Mixing Linux's CPU-shielding with mpirun's bind-to-core

2013-08-23 Thread Dave Love
John Hearns writes: > Agree with what you say Dave. > > Regarding not wanting jobs to use certsin cores ie. reserving low-numbered > cores for OS processes then surely a good way forward is to use a 'boot > cpuset' of one or two cores and let your jobs run on the rest

Re: [OMPI users] Need help running jobs across different IB vendors

2013-10-15 Thread Dave Love
"Kevin M. Hildebrand" writes: > Hi, I'm trying to run an OpenMPI 1.6.5 job across a set of nodes, some > with Mellanox cards and some with Qlogic cards. Maybe you shouldn't... (I'm blessed in one cluster with three somewhat incompatible types of QLogic card and a set of Mellanox ones, but they'

Re: [OMPI users] knem/openmpi performance?

2013-10-15 Thread Dave Love
[Meanwhile, much later...] Mark Dixon writes: > Hi, > > I'm taking a look at knem, to see if it improves the performance of > any applications on our QDR InfiniBand cluster, so I'm eager to hear > about other people's experiences. This doesn't appear to have been > discussed on this list before.

[OMPI users] debugging performance regressions between versions

2013-10-18 Thread Dave Love
I've been testing an application that turns out to be ~30% slower with OMPI 1.6.5 than (the Red Hat packaged version of) 1.5.4, with the same mca-params and the same binary, just flipping the runtime. It's running over openib, and the profile it prints says that alltoall is a factor of four slower

Re: [OMPI users] Need help running jobs across different IB vendors

2013-10-18 Thread Dave Love
"Jeff Squyres (jsquyres)" writes: > Short version: > -- > > What you really want is: > > mpirun --mca pml ob1 ... > > The "--mca mtl ^psm" way will get the same result, but forcing pml=ob1 is > really a slightly Better solution (from a semantic perspective) I'm afraid ^psm is r

Re: [OMPI users] debugging performance regressions between versions

2013-10-23 Thread Dave Love
"Iliev, Hristo" writes: > Hi Dave, > > Is it MPI_ALLTOALL or MPI_ALLTOALLV that runs slower? Well, the output says MPI_ALLTOALL, but this prompted me to check, and it turns out that it's lumping both together. > If it is the latter, > the reason could be that

Re: [OMPI users] Prototypes for Fortran MPI_ commands using 64-bit indexing

2013-12-10 Thread Dave Love
George Bosilca writes: >> No. The Fortran status must __always__ be 6, because we need enough room to >> correctly convert the 3 useful variables to Fortran, plus copy the rest of >> the hidden things.These 6 type will be INTEGER (which will then be different >> than the C int). The C<->F stuf

Re: [OMPI users] Prototypes for Fortran MPI_ commands using 64-bit indexing

2013-12-12 Thread Dave Love
"Jeff Squyres (jsquyres)" writes: > On Dec 10, 2013, at 10:42 AM, Dave Love wrote: > >> This doesn't seem to have been fixed, and I think it's going to bite >> here. Is this the right change? > > Thanks for reminding us. > >> --- openmpi-1.6

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-18 Thread Dave Love
Noam Bernstein writes: > We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some > collective communication), but now I'm wondering whether I should just test > 1.6.5. What bug, exactly? As you mentioned vasp, is it specifically affecting that? We have seen apparent deadl

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-18 Thread Dave Love
John Hearns writes: > 'Htop' is a very good tool for looking at where processes are running. I'd have thought hwloc-ps is the tool for that.

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-19 Thread Dave Love
Noam Bernstein writes: > On Dec 18, 2013, at 10:32 AM, Dave Love wrote: > >> Noam Bernstein writes: >> >>> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in >>> some >>> collective communication), but now I'm wondering

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-19 Thread Dave Love
Brice Goglin writes: > hwloc-ps (and lstopo --top) are better at showing process binding but > they lack a nice pseudographical interface with dynamic refresh. That seems like an advantage when you want to check on a cluster! > htop uses hwloc internally iirc, so there's hope we'll have everyth

Re: [OMPI users] OpenMPI 1.7.2-2.1.3 does not work with OpenFOAM 2.2.2 on OpenSUSE 13.1

2014-01-28 Thread Dave Love
Elisabeth Beer writes: > Hi, > > I've done an operating system up-grade to OpenSUSE 13.1 and I've up-graded > OpenFOAM from 2.2.1 to 2.2.2. > > Bevor, OpenMPI worked well. > Now, it does not work at all. > > First Step > -- > > After decomposing the domain, I've tried to start pa

Re: [OMPI users] OpenMPI-ROMIO-OrangeFS

2014-02-27 Thread Dave Love
Edgar Gabriel writes: > so we had ROMIO working with PVFS2 (not OrangeFS, which is however > registered as PVFS2 internally). We have one cluster which uses > OrangeFS, on that machine however we used OMPIO, not ROMIO. [What's OMPIO, and should we want it?] This is another vote for working 1.6.

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-02-27 Thread Dave Love
[I don't know what thread this is without References: or citation.] Bernd Dammann writes: > Hi, > > I found this thread from before Christmas, and I wondered what the > status of this problem is. We experience the same problems since our > upgrade to Scientific Linux 6.4, kernel 2.6.32-431.1.2.

Re: [OMPI users] OpenMPI-ROMIO-OrangeFS

2014-03-04 Thread Dave Love
Edgar Gabriel writes: >> [What's OMPIO, and should we want it?] > > OMPIO is the 'native' implementation of MPI I/O in Open MPI, its however > only available from the 1.7 series onwards. Thanks, but I wonder how I'd know that? NEWS mentions "Various OMPIO updates and fixes.", but that's all I c

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-03-04 Thread Dave Love
Bernd Dammann writes: > We use Moab/Torque, so we could use cpusets (but that has had some > other side effects earlier, so we did not implement it in our setup). I don't know remember Torque does, but core binding and (Linux) cpusets are somewhat orthogonal. While a cpuset will obviously restr

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-03-04 Thread Dave Love
Tru Huynh writes: > afaik, 2.6.32-431 series is from RHEL(and clones) version >=6.5 [Right.] > otoh, it might be related to http://bugs.centos.org/view.php?id=6949 That looks likely. As we bind to cores, we wouldn't see it for MPI processes, at least, and will see higher performance generally

[OMPI users] More on AlltoAll

2008-03-20 Thread Dave Grote
ta to exchange, is the OpenMPI AlltoAll is written in such a way so that they don't do any communication? Will the AlltoAll be as efficient (or at least nearly as efficient) as direct send/recv among neighbors? Thanks! Dave

Re: [OMPI users] More on AlltoAll

2008-03-20 Thread Dave Grote
Sorry - my mistake - I meant AlltoAllV, which is what I use in my code. Ashley Pittman wrote: On Thu, 2008-03-20 at 10:27 -0700, Dave Grote wrote: After reading the previous discussion on AllReduce and AlltoAll, I thought I would ask my question. I have a case where I have data

Re: [OMPI users] Problem with X forwarding

2008-06-09 Thread Dave Grote
ut using the -d option does work well and doesn't require any extra fiddling. Dave Allen Barnett wrote: If you are using a recent version of Linux (as machine A), the X server is probably started with its TCP network connection turned off. For example, if you do: $ ps auxw | grep X

[OMPI users] memory leak in alltoallw

2008-08-06 Thread Dave Grote
e is no memory leak. If it helps, I am using OpenMPI on an AMD system running Chaos linux. I tried the latest nightly build of version 1.3 from Aug 5. I run four processors on one quad core node so it should be using shared memory communication. Thanks! Dave program testalltoa

Re: [OMPI users] memory leak in alltoallw

2008-08-18 Thread Dave Grote
Great! Thanks for the fix.    Dave Tim Mattox wrote: The fix for this bug is in the 1.2 branch as of r19360, and will be in the upcoming 1.2.7 release. On Sun, Aug 17, 2008 at 6:10 PM, George Bosilca wrote: Dave, Thanks for your report. As you discovered we had a memory leak

Re: [OMPI users] [Open MPI Announce] Open MPI v1.3.3 released

2009-07-20 Thread Dave Love
Ralph Castain writes: > Hmmm...there should be messages on both the user and devel lists > regarding binary compatibility at the MPI level being promised for > 1.3.2 and beyond. This is confusing. As I read the quotes below, recompilation is necessary, and the announcement has items which sugge

Re: [OMPI users] ifort and gfortran module

2009-07-20 Thread Dave Love
rahmani writes: > Hi, > you should compile openmpi with each pf intel and gfortran seperatly > and install each of them in a separate location, and use mpi-selector > to select one. What, precisely, requires that, at least if you can recompile the MPI program with appropriate options? (Presumab

Re: [OMPI users] [Open MPI Announce] Open MPI v1.3.3 released

2009-07-23 Thread Dave Love
Jeff Squyres writes: > The MPI ABI has not changed since 1.3.2. Good, thanks. I hadn't had time to investigate the items in the release notes that looked suspicious. Are there actually any known ABI incompatibilities between 1.3.0 and 1.3.2? We haven't noticed any as far as I know. > Note th

Re: [OMPI users] ifort and gfortran module

2009-07-23 Thread Dave Love
Jeff Squyres writes: > See https://svn.open-mpi.org/source/xref/ompi_1.3/README#257. Ah, neat. I'd never thought of that, possibly due to ELF not being relevant when I first started worrying about that sort of thing. > Indeed. In OMPI, we tried to make this as simple as possible. But > unles

Re: [OMPI users] ifort and gfortran module

2009-07-23 Thread Dave Love
Jeff Squyres writes: > I *think* that there are compiler flags that you can use with ifort to > make it behave similarly to gfortran in terms of sizes and constant > values, etc. At a slight tangent, if there are flags that might be helpful to add to gfortran for compatibility (e.g. logical cons

Re: [OMPI users] x4100 with IB

2009-08-10 Thread Dave Love
Michael Di Domenico writes: > It a freshly reformatted cluster > converting from solaris to linux. We also reset the bios settings > with "load optimal defaults". [Why?] > Does anyone know which bios setting i > changed to dump the BW? Off-topic, and surely better on the Beowulf list, it's an

[OMPI users] switch and NIC performance (was: very bad parallel scaling of vasp using openmpi)

2009-09-23 Thread Dave Love
Rahul Nabar writes: > So, how does one go about selecting a good switch? "The most expensive > the better" is somewhat a unsatisfying option! Also it's apparently not always right, if I recall correctly, according to the figures on MPI switch performance in the reports somewhere under http://www

[OMPI users] building the 1.4.1 rpm under RHEL 5

2010-01-20 Thread Dave Love
Before I take time investigating, is anyone aware of problems/solutions to building the rpm with gcc on current RedHat 5? Just an `rpmbuild --rebuild' fails as follows, and I'm surprised if it's specific to here. (I looked for a previous problem report, of course.) make[5]: Entering directory

Re: [OMPI users] building the 1.4.1 rpm under RHEL 5

2010-01-20 Thread Dave Love
Matthias Jurenz writes: > Hello Dave, > > unfortunately, we have no such platform for trying to reproduce this error, > but we would be pleased, if you could help us to identify the problem. > > We guess that the following will fix the problem: > > Could yo

Re: [OMPI users] building the 1.4.1 rpm under RHEL 5

2010-01-20 Thread Dave Love
Jeff Squyres writes: > Is it related to FORTIFY_SOURCE? Apparently so. Thanks. The problem seems to be only in that routine. It's obviously somewhat subtle as it doesn't fail on Ubuntu, which defaults to FORTIFY_SOURCE. Perhaps defaulting to on is the wrong choice, as I'd guess RH5-ish is the

[OMPI users] ABI stabilization/versioning

2010-01-25 Thread Dave Love
What's the status of (stabilizing and?) versioning libraries? If I recall correctly, it was supposed to be defined as fixed for some release period as of 1.3.something. I assumed that the libraries would then be versioned (at least for ELF -- I don't know about other formats) and we could remove

Re: [OMPI users] ABI stabilization/versioning

2010-01-26 Thread Dave Love
Manuel Prinz writes: > The ABI should be stable since 1.3.2. OMPI 1.4.x does set the libtool > version info; Oh, sorry. I grepped the code for the relevant libtool args and couldn't see any evidence it was done. I wonder how I missed it. > Versions where bumped to 0.0.1 for libmpi which has n

Re: [OMPI users] ABI stabilization/versioning

2010-01-26 Thread Dave Love
Jeff Squyres writes: > To be absolutely crystal clear: OMPI's MPI shared libraries now have > .so versioning enabled, but you still can't install two copies of Open > MPI into the same $prefix (without overriding a bunch of other > directory names, that is, like $pkglibdir, etc.). This is becaus

Re: [OMPI users] Progress in MPI_Win_unlock

2010-02-04 Thread Dave Goodell
ions about their implementation. -Dave

Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-03 Thread Dave Goodell
SERIALIZED. -Dave

Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-04 Thread Dave Goodell
n-standard substitute for malloc called st_malloc (single thread malloc) that does not do locking. [...snip...] Dick's example is a great illustration of why FUNNELED might be necessary. The moral of the story is "don't lie to the MPI implementation" :) -Dave

Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-04 Thread Dave Goodell
identical (true for stock MPICH2, for example). However Dick's example of thread-safe versus non-thread-safe malloc options clearly shows why programs need to request (and check "provided" for) >=FUNNELED in this scenario if they wish to be truly portable. -Dave

Re: [OMPI users] Option to use only 7 cores out of 8 on each node

2010-03-04 Thread Dave Love
"Addepalli, Srirangam V" writes: > It works after creating a new pe and even from the command prompt with > out using SGE. You shouldn't need anything special -- I don't. (It's common to run, say, one process per core for benchmarking.) Running mpirun -tag-output -np 14 -npernode 7 hostname

Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine

2010-04-07 Thread Dave Love
Serge writes: > However, there are cases when being able to specify the hostfile is > important (hybrid jobs, users with MPICH jobs, etc.). [I don't understand what MPICH has to do with it.] > For example, > with Grid Engine I can request four 4-core nodes, that is total of 16 > slots. But I al

Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine

2010-04-08 Thread Dave Love
Serge writes: > This is exactly what I am doing -- controlling distribution of > processes with mpirun on the SGE-allocated nodes, by supplying the > hostfile. Grid Engine allocates nodes and generates a hostfile, which > I then can modify however I want to, before running the mpirun > command. M

Re: [OMPI users] How to "guess" the incoming data type ?

2010-04-26 Thread Dave Love
Sylvestre Ledru writes: > I am currently extending an application with MPI capabilities. > This high-level application allows users to use dynamic types. Therefor, > on the slaves, I have no way to know what the master will send me. Have you looked at existing MPI work with dynamic languages li

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Dave Love
Sylvestre Ledru writes: > This code will set the precision to double: > > #include > fpu_control_t _cw; > _FPU_GETCW(_cw); > _cw = (_cw & ~_FPU_DOUBLE) | _FPU_EXTENDED; > _FPU_SETCW(_cw); > > You should get the same result on 32 & 64 bits CPU then. Quite off-topic, but as far as I remember f

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Dave Love
Asad Ali writes: >>From run to run the results can only be different if you either use > different input/output or use different random number seeds. Here in my case > the random number seeds are the same as well. Sorry, but that's naïve, even if you can prove your code is well-defined according

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Dave Love
Matthew MacManes writes: > I am using SGE to submit jobs to one of the TeraGrid sites, > specifically TACC-RANGER. It's more on-topic here than the SGE list, but you should still ask the Ranger support people. People who don't know Ranger can't say if you actually can use the TCP BTL on it, bu

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-27 Thread Dave Love
Gus Correa writes: > Or run a serial version on the same set of machines, > compiled in similar ways (compiler version, opt flags, etc) > to the parallel versions, and compare results. > If the results don't differ, then you can start blaming MPI. That wouldn't show that there's actually any Ope

Re: [OMPI users] OpenMPI & SGE: bash errors at mpirun

2010-04-27 Thread Dave Love
Frederik Himpe writes: > bash: module: line 1: syntax error: unexpected end of file > bash: error importing function definition for `module' It's nothing to do with open-mpi -- the job hasn't even started executing at that point. Consult the archives of the SGE users list and the issue tracker.

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-28 Thread Dave Love
Fabian Hänsel writes: > You could try to set optimizations more fine-grained. Every > -Osomething stands for a certain set of optimizations. Start with > e.g. "gcc -Q -O2 --help=optimizers" to see all available optimizations > and which are enabled at -O2. Read about them on the gcc > manpage. Di

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-05-02 Thread Dave Love
Asad Ali writes: > I took your earlier advice regarding the optimization flags causing errors > in your case. I reiterate that any `errors' are much more likely to be in the code than GCC's -On optimizations if it's unstable with respect to them. > You wrote i

[OMPI users] PGI problems

2010-05-10 Thread Dave Love
NEWS says that problems with PGI 10 were fixed in 1.4.1, but PGI 10 won't configure 1.4.2 for me: configure: WARNING: Your compiler does not support offsetof macro configure: error: Configure: Cannot continue # pgcc -V pgcc 10.1-0 64-bit target on x86-64 Linux -tp k8-64e Copyright 1989

[OMPI users] sunstudio patch needed

2010-05-10 Thread Dave Love
For the benefit of anyone else trying to build with it on x86_64 GNU/Linux (or, presumably, i386): Sunstudio 12, update 1 loops while compiling 1.4.2's btl_sm.c with configure's default flags. Sun patch 141859-04 fixes that.

Re: [OMPI users] PGI problems

2010-05-11 Thread Dave Love
Prentice Bisbal writes: > Since I was successful compiled 1.4.1 with PGI 9 and 1.4.2 with PGI > 10.4, Thanks. The difference appears to be the compiler versions. > I suspect the problem is local to you. Can you go through your > environment and make sure you don't have any settings that are in

Re: [OMPI users] PGI problems

2010-05-11 Thread Dave Love
I wrote: > I'll see if we can get a compiler update and report back. Installing PGI 10.5 has fixed the configure problem for me. So, for the archives (or FAQ?) it was a PGI bug, fixed sometime between 10.1 and 10.4, and apparently not present in 9.0-3. It's also present in 8.0-3. Thanks to Pre

Re: [OMPI users] Looking for LAM-MPI sources to create a mirror

2015-06-11 Thread Dave Love
"Jeff Squyres (jsquyres)" writes: > Sadly, I have minimal experience with .debs... if someone would contribute > the necessary packaging, we could talk about hosting a source deb on the main > Open MPI site. What's wrong with the Debian packages (if you really want LAM)? $ apt-cache show la

Re: [OMPI users] Missing file "openmpi/ompi/mpi/f77/constants.h"

2015-06-11 Thread Dave Love
Filippo Spiga writes: > Dear OpenMPI experts, > > I am rebuilding IPM (https://github.com/nerscadmin/ipm) based on OpenMPI > 1.8.5. However, despite OMPI is compiled with the "--with-devel-headers" > option, IPM build fails because the file "openmpi/ompi/mpi/f77/constants.h" > is missing Whic

Re: [OMPI users] Looking for LAM-MPI sources to create a mirror

2015-06-15 Thread Dave Love
Cian Davis writes: > My intention with regard to requesting sources was to create a mirror so > that people who have to use LAM-MPI (e.g. because their applications were > statically compiled against them) would still have some way to get LAM-MPI > instead of scouring the recesses of the web. Hav

Re: [OMPI users] Missing file "openmpi/ompi/mpi/f77/constants.h"

2015-06-15 Thread Dave Love
Gilles Gouaillardet writes: > Dave, > > commit > https://github.com/nerscadmin/IPM/commit/8f628dadc502b3e0113d6ab3075bf66b107f07e5 > broke Open MPI support for Open MPI 1.8 and above Actually it won't build with 1.6 either. It seems to be trying to use internal headers

[OMPI users] vader/sm not being picked up

2015-06-24 Thread Dave Turner
g that has been fixed. Thanks. Dave Turner -- Work: davetur...@ksu.edu (785) 532-7791 118 Nichols Hall, Manhattan KS 66502 Home:drdavetur...@gmail.com cell: (785) 770-5929 ompi_info.out Description: Binary data

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Dave Love
"Schlottke-Lakemper, Michael" writes: > Hi folks, > > We are currently encountering a weird file coherence issue when > running parallel jobs with OpenMPI (1.8.7) and writing files in > parallel to an NFS-mounted file system using Parallel netCDF 1.6.1 > (which internally uses MPI-I/O). Sometimes

[OMPI users] DMTCP checkpointing with openib

2015-07-23 Thread Dave Love
Does anyone have experience of checkpointing (and restarting!) OMPI 1.8 over openib using DMTCP, or just know whether it can work? A while ago I saw a some note saying it wouldn't work because of some OMPI mechanism that couldn't be configured (unreliable connexions?) but now that Sourceforge has

<    1   2   3   4   >