Re: [OMPI users] [EXTERNAL] strange pml error

2021-11-03 Thread Michael Di Domenico via users
perhaps there is > different initialization that happens such that the offending device search > problem doesn't occur? > > > Thanks, > > David > > > > > From: Shrader, David Lee > Sent: Tuesday, November 2, 2021 2:09 P

[OMPI users] strange pml error

2021-11-02 Thread Michael Di Domenico via users
fairly frequently, but not everytime when trying to run xhpl on a new machine i'm bumping into this. it happens with a single node or multiple nodes node1 selected pml ob1, but peer on node1 selected pml ucx if i rerun the exact same command a few minutes later, it works fine. the machine is new

Re: [OMPI users] [EXTERNAL] building openshem on opa

2021-03-22 Thread Michael Di Domenico via users
On Mon, Mar 22, 2021 at 11:13 AM Pritchard Jr., Howard wrote: > https://github.com/Sandia-OpenSHMEM/SOS > if you want to use OpenSHMEM over OPA. > If you have lots of cycles for development work, you could write an OFI SPML > for the OSHMEM component of Open MPI. thanks, i am aware of the sandi

[OMPI users] building openshem on opa

2021-03-22 Thread Michael Di Domenico via users
i can build and run openmpi on an opa network just fine, but it turns out building openshmem fails. the message is (no spml) found looking at the config log it looks like it tries to build spml ikrit and ucx which fail. i turn ucx off because it doesn't support opa and isn't needed. so this mes

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Michael Di Domenico via users
sm_lid: 1 > > port_lid: 99 > > port_lmc: 0x00 > > link_layer: InfiniBand > > > > using gcc/gfortran 9.3.0 > > > > Built Open MPI 4.0.

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Michael Di Domenico via users
for whatever it's worth running the test program on my OPA cluster seems to work. well it keeps spitting out [INFO MEMORY] lines, not sure if it's supposed to stop at some point i'm running rhel7, gcc 10.1, openmpi 4.0.5rc2, with-ofi, without-{psm,ucx,verbs} On Tue, Jan 26, 2021 at 3:44 PM Patri

Re: [OMPI users] openmpi/pmix/ucx

2020-02-07 Thread Michael Di Domenico via users
d to be what Mellanox used to configure OpenMPI in HPC-X > 2.5. > > I have users using GCC, PGI, Intel and AOCC compilers with this config. PGI > was the only one that > was a challenge to build due to conflicts with HCOLL. > > -Ray Muno > > On 2/7/20 10:04 AM, Michael Di

[OMPI users] openmpi/pmix/ucx

2020-02-07 Thread Michael Di Domenico via users
i haven't compiled openmpi in a while, but i'm in the process of upgrading our cluster. the last time i did this there were specific versions of mpi/pmix/ucx that were all tested and supposed to work together. my understanding of this was because pmi/ucx was under rapid development and the api's

Re: [OMPI users] local rank to rank comms

2019-03-20 Thread Michael Di Domenico
h ? could > btl/ofi also be used for intra node communications ?) > > > mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca > mtl_base_verbose 10 ... > > should tell you what is used (feel free to compress and post the full > output if you have some hard time unders

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet wrote: > You can force > mpirun --mca pml ob1 ... > And btl/vader (shared memory) will be used for intra node communications ... > unless MPI tasks are from different jobs (read MPI_Comm_spawn()) if i run mpirun -n 16 IMB-MPI1 alltoallv thing

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 12:19 PM Ralph H Castain wrote: > OFI uses libpsm2 underneath it when omnipath detected > > > On Mar 11, 2019, at 9:06 AM, Gilles Gouaillardet > > wrote: > > It might show that pml/cm and mtl/psm2 are used. In that case, then yes, > > the OmniPath library is used even fo

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 11:51 AM Ralph H Castain wrote: > You are probably using the ofi mtl - could be psm2 uses loopback method? according to ompi_info i do in fact have mtl's ofi,psm,psm2. i haven't changed any of the defaults, so are you saying order to change the behaviour i have to run mpi

[OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
i have a user that's claiming when two ranks on the same node want to talk with each other, they're using the NIC to talk rather then just talking directly. i've never had to test such a scenario. is there a way for me to prove one way or another whether two ranks are talking through say the kern

Re: [OMPI users] pmix and srun

2019-01-18 Thread Michael Di Domenico
s a typo in the v2.2.1 release. Sadly, our Slurm > > plugin folks seem to be off somewhere for awhile and haven’t been testing > > it. Sigh. > > > > I’ll patch the branch and let you know - we’d appreciate the feedback. > > Ralph > > > > > >> On

Re: [OMPI users] Fwd: pmix and srun

2019-01-18 Thread Michael Di Domenico
adding > > PMIX_MCA_pmix_client_event_verbose=5 > PMIX_MCA_pmix_server_event_verbose=5 > OMPI_MCA_pmix_base_verbose=10 > > to your environment and see if that provides anything useful. > > > On Jan 18, 2019, at 12:09 PM, Michael Di Domenico > > wrote: > > > > i compilie

[OMPI users] Fwd: pmix and srun

2019-01-18 Thread Michael Di Domenico
i compilied pmix slurm openmpi ---pmix ./configure --prefix=/hpc/pmix/2.2 --with-munge=/hpc/munge/0.5.13 --disable-debug ---slurm ./configure --prefix=/hpc/slurm/18.08 --with-munge=/hpc/munge/0.5.13 --with-pmix=/hpc/pmix/2.2 ---openmpi ./configure --prefix=/hpc/ompi/3.1 --with-hwloc=external --wit

Re: [OMPI users] OpenFabrics warning

2018-11-12 Thread Michael Di Domenico
On Mon, Nov 12, 2018 at 8:08 AM Andrei Berceanu wrote: > > Running a CUDA+MPI application on a node with 2 K80 GPUs, I get the following > warnings: > > -- > WARNING: There is at least non-excluded one OpenFabrics device foun

Re: [OMPI users] Problem running with UCX/oshmem on single node?

2018-05-14 Thread Michael Di Domenico
On Wed, May 9, 2018 at 9:45 PM, Howard Pritchard wrote: > > You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a > switch), and install that > on your system, or else install xpmem (https://github.com/hjelmn/xpmem). > Note there is a bug right now > in UCX that you may hit if

[OMPI users] shmem

2018-05-09 Thread Michael Di Domenico
before i debug ucx further (cause it's totally not working for me), i figured i'd check to see if it's *really* required to use shmem inside of openmpi. i'm pretty sure the answer is yes, but i wanted to double check. ___ users mailing list users@lists.o

Re: [OMPI users] openmpi/slurm/pmix

2018-04-25 Thread Michael Di Domenico
On Mon, Apr 23, 2018 at 6:07 PM, r...@open-mpi.org wrote: > Looks like the problem is that you didn’t wind up with the external PMIx. The > component listed in your error is the internal PMIx one which shouldn’t have > built given that configure line. > > Check your config.out and see what happe

[OMPI users] openmpi/slurm/pmix

2018-04-23 Thread Michael Di Domenico
i'm trying to get slurm 17.11.5 and openmpi 3.0.1 working with pmix. everything compiled, but when i run something it get : symbol lookup error: /openmpi/mca_pmix_pmix2x.so: undefined symbol: opal_libevent2022_evthread_use_pthreads i more then sure i did something wrong, but i'm not sure what, h

Re: [OMPI users] disabling libraries?

2018-04-10 Thread Michael Di Domenico
On Sat, Apr 7, 2018 at 3:50 PM, Jeff Squyres (jsquyres) wrote: > On Apr 6, 2018, at 8:12 AM, Michael Di Domenico > wrote: >> it would be nice if openmpi had (or may already have) a simple switch >> that lets me disable entire portions of the library chain, ie this >

Re: [OMPI users] disabling libraries?

2018-04-06 Thread Michael Di Domenico
On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet wrote: > That being said, the error suggest mca_oob_ud.so is a module from a > previous install, > Open MPI was not built on the system it is running, or libibverbs.so.1 > has been removed after > Open MPI was built. yes, understood, i compiled

[OMPI users] disabling libraries?

2018-04-05 Thread Michael Di Domenico
i'm trying to compile openmpi to support all of our interconnects, psm/openib/mxm/etc this works fine, openmpi finds all the libs, compiles and runs on each of the respective machines however, we don't install the libraries for everything everywhere so when i run things like ompi_info and mpirun

[OMPI users] openmpi hang on IB disconnect

2018-01-17 Thread Michael Di Domenico
openmpi-2.0.2 running on rhel 7.4 with qlogic QDR infiniband switches/adapters, also using slurm i have a user that's running a job over multiple days. unfortunately after a few days at random the job will seemingly hang. the latest instance was caused by an infiniband adapter that went offline

[OMPI users] openmpi mgmt traffic

2017-10-11 Thread Michael Di Domenico
my cluster nodes are connected on 1g ethernet eth0/eth1 and via infiniband rdma and ib0 my understanding is that openmpi will detect all these interfaces. using eth0/eth1 for connection setup and use rdma for msg passing what would be an appropriate to command line parameters to tell openmpi to i

[OMPI users] alltoallv

2017-10-10 Thread Michael Di Domenico
i'm getting stuck trying to run some fairly large IMB-MPI alltoall tests under openmpi 2.0.2 on rhel 7.4 i have two different clusters, one running mellanox fdr10 and one running qlogic qdr if i issue mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv the job just stalls after t

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-23 Thread Michael Di Domenico
On Thu, Jun 22, 2017 at 12:41 PM, r...@open-mpi.org wrote: > I gather you are using OMPI 2.x, yes? And you configured it > --with-pmi=, then moved the executables/libs to your > workstation? correct > I suppose I could state the obvious and say “don’t do that - just rebuild it” correct... bu

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread Michael Di Domenico
On Thu, Jun 22, 2017 at 10:43 AM, John Hearns via users wrote: > Having had some problems with ssh launching (a few minutes ago) I can > confirm that this works: > > --mca plm_rsh_agent "ssh -v" this doesn't do anything for me if i set OMPI_MCA_sec=^munge i can clear the mca_sec_munge error bu

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread Michael Di Domenico
ently, in the contect of a PBS cluster > > On 22 June 2017 at 16:16, Michael Di Domenico > wrote: >> >> is it possible to disable slurm/munge/psm/pmi(x) from the mpirun >> command line or (better) using environment variables? >> >> i'd like to use the ins

[OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread Michael Di Domenico
is it possible to disable slurm/munge/psm/pmi(x) from the mpirun command line or (better) using environment variables? i'd like to use the installed version of openmpi i have on a workstation, but it's linked with slurm from one of my clusters. mpi/slurm work just fine on the cluster, but when i

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-25 Thread Michael Di Domenico
On Mon, Jul 25, 2016 at 4:53 AM, Gilles Gouaillardet wrote: > > as a workaround, you can configure without -noswitcherror. > > after you ran configure, you have to manually patch the generated 'libtool' > file and add the line with pgcc*) and the next line like this : > > /* if pgcc is used, libto

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-22 Thread Michael Di Domenico
pthread" from libslurm.la and libpmi.la >> >> On 07/11/2016 02:54 PM, Michael Di Domenico wrote: >>> >>> I'm trying to get openmpi compiled using the PGI compiler. >>> >>> the configure goes through and the code starts to compile, but the

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-14 Thread Michael Di Domenico
On Mon, Jul 11, 2016 at 9:52 AM, Åke Sandgren wrote: > Looks like you are compiling with slurm support. > > If so, you need to remove the "-pthread" from libslurm.la and libpmi.la i don't see a configure option in slurm to disable pthreads, so i'm not sure this is possible.

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-14 Thread Michael Di Domenico
On Thu, Jul 14, 2016 at 9:47 AM, Michael Di Domenico wrote: > Have 1.10.3 unpacked, ran through the configure using the same command > line options as 1.10.2 > > but it fails even earlier in the make process at > > Entering openmpi-1.10.3/opal/asm > CPPAS atomic-asm.lo >

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-14 Thread Michael Di Domenico
cense for the pgCC C++ compiler ? > fwiw, FreePGI on OSX has no C++ license and PGI C and gnu g++ does not work > together out of the box, hopefully I will have a fix ready sometimes this > week > > Cheers, > > Gilles > > > On Monday, July 11, 2016, Michael Di Domenico &

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-11 Thread Michael Di Domenico
On Mon, Jul 11, 2016 at 9:11 AM, Gilles Gouaillardet wrote: > Can you try the latest 1.10.3 instead ? i can but it'll take a few days to pull the software inside. > btw, do you have a license for the pgCC C++ compiler ? > fwiw, FreePGI on OSX has no C++ license and PGI C and gnu g++ does not wor

[OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-11 Thread Michael Di Domenico
I'm trying to get openmpi compiled using the PGI compiler. the configure goes through and the code starts to compile, but then gets hung up with entering: openmpi-1.10.2/opal/mca/common/pmi CC common_pmi.lo CCLD libmca_common_pmi.la pgcc-Error-Unknown switch: - pthread

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A wrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" this may have turned up more then i expected. i recompiled openmpi v1.8.4 as a test and reran the test

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:52 PM, Jeff Squyres (jsquyres) wrote: > Can you send all the information listed here? > > https://www.open-mpi.org/community/help/ > > (including the full output from the run with the PML/BTL/MTL/etc. verbosity) > > This will allow Matias to look through all the rele

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A wrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" i see cm best priority 20, which seems to relate to ob1 being selected. i don't see a mention of psm a

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 4:49 PM, Cabral, Matias A wrote: > I didn't go into the code to see who is actually calling this error message, > but I suspect this may be a generic error for "out of memory" kind of thing > and not specific to the que pair. To confirm please add -mca > pml_base_verbos

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 3:37 PM, Cabral, Matias A wrote: > Hi Michael, > > I may be missing some context, if you are using the qlogic cards you will > always want to use the psm mtl (-mca pml cm -mca mtl psm) and not openib btl. > As Tom suggest, confirm the limits are setup on every node: could

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 12:12 PM, Elken, Tom wrote: > Hi Mike, > > In this file, > $ cat /etc/security/limits.conf > ... > < do you see at the end ... > > > * hard memlock unlimited > * soft memlock unlimited > # -- All InfiniBand Settings End here -- > ? Yes. I double checked that it's set on a

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Thu, Mar 10, 2016 at 11:54 AM, Michael Di Domenico wrote: > when i try to run an openmpi job with >128 ranks (16 ranks per node) > using alltoall or alltoallv, i'm getting an error that the process was > unable to get a queue pair. > > i've checked the max lock

[OMPI users] locked memory and queue pairs

2016-03-10 Thread Michael Di Domenico
when i try to run an openmpi job with >128 ranks (16 ranks per node) using alltoall or alltoallv, i'm getting an error that the process was unable to get a queue pair. i've checked the max locked memory settings across my machines; using ulimit -l in and outside of mpirun and they're all set to u

[OMPI users] slurm openmpi 1.8.3 core bindings

2015-01-30 Thread Michael Di Domenico
I'm trying to get slurm and openmpi to cooperate when running multi thread jobs. i'm sure i'm doing something wrong, but i can't figure out what my node configuration is 2 nodes 2 sockets 6 cores per socket i want to run sbatch -N2 -n 8 --ntasks-per-node=4 --cpus-per-task=3 -w node1,node2 prog

Re: [OMPI users] ipath_userinit errors

2014-11-06 Thread Michael Di Domenico
->PSM API versions 11 and 12, so the message is harmless. I > presume you're using the RHEL sourced package for a reason, but using an IFS > release would fix the problem until RHEL 6.7 is ready. > > Andrew > >> -Original Message- >> From: users

[OMPI users] ipath_userinit errors

2014-11-04 Thread Michael Di Domenico
I'm getting the below message on my cluster(s). It seems to only happen when I try to use more then 64 nodes (16-cores each). The clusters are running RHEL 6.5 with Slurm and Openmpi-1.6.5 with PSM. I'm using the OFED versions included with RHEL for infiniband support. ipath_userinit: Mismatched

Re: [OMPI users] debugs for jobs not starting

2012-10-12 Thread Michael Di Domenico
: /tmp [...above lines only come out once...] On Fri, Oct 12, 2012 at 9:27 AM, Michael Di Domenico wrote: > what isn't working is when i fire off an MPI job with over 800 ranks, > they don't all actually start up a process > > fe, if i do srun -n 1024 --ntasks-per-node 12

Re: [OMPI users] debugs for jobs not starting

2012-10-12 Thread Michael Di Domenico
esting to see whether it's a psm related problem now, i'll check back if i can narrow the scope a little more On Thu, Oct 11, 2012 at 10:21 PM, Ralph Castain wrote: > I'm afraid I'm confused - I don't understand what is and isn't working. What > "next process&quo

Re: [OMPI users] debugs for jobs not starting

2012-10-11 Thread Michael Di Domenico
pl, i do see the orte process, but nothing in the > logs about why it failed to launch xhpl > > > > On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico > wrote: >> I'm trying to diagnose an MPI job (in this case xhpl), that fails to >> start when the rank count ge

Re: [OMPI users] debugs for jobs not starting

2012-10-11 Thread Michael Di Domenico
aunch xhpl On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico wrote: > I'm trying to diagnose an MPI job (in this case xhpl), that fails to > start when the rank count gets fairly high into the thousands. > > My symptom is the jobs fires up via slurm, and I can see all the

[OMPI users] debugs for jobs not starting

2012-10-11 Thread Michael Di Domenico
I'm trying to diagnose an MPI job (in this case xhpl), that fails to start when the rank count gets fairly high into the thousands. My symptom is the jobs fires up via slurm, and I can see all the xhpl processes on the nodes, but it never kicks over to the next process. My question is, what debug

Re: [OMPI users] srun and openmpi

2011-04-29 Thread Michael Di Domenico
Certainly, i reached out to several contacts I have inside qlogic (i used to work there)... On Fri, Apr 29, 2011 at 10:30 AM, Ralph Castain wrote: > Hi Michael > > I'm told that the Qlogic contacts we used to have are no longer there. Since > you obviously are a customer, can you ping them and a

Re: [OMPI users] srun and openmpi

2011-04-29 Thread Michael Di Domenico
On Fri, Apr 29, 2011 at 10:01 AM, Michael Di Domenico wrote: > On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castain wrote: >> Hi Michael >> >> Please see the attached updated patch to try for 1.5.3. I mistakenly free'd >> the envar after adding it to the environ :-/

Re: [OMPI users] srun and openmpi

2011-04-29 Thread Michael Di Domenico
On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castain wrote: > Hi Michael > > Please see the attached updated patch to try for 1.5.3. I mistakenly free'd > the envar after adding it to the environ :-/ The patch works great, i can now see the precondition environment variable if i do mpirun -n 2 -host

Re: [OMPI users] srun and openmpi

2011-04-28 Thread Michael Di Domenico
On Thu, Apr 28, 2011 at 9:03 AM, Ralph Castain wrote: > > On Apr 28, 2011, at 6:49 AM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain wrote: >>> >>> On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: >>> >>&

Re: [OMPI users] srun and openmpi

2011-04-28 Thread Michael Di Domenico
On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain wrote: > > On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote: >>> >>> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: >>> >>&

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Michael Di Domenico
On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote: > > On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote: >>> >>> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: >>> >&

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Michael Di Domenico
On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote: > > On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: > >> Was this ever committed to the OMPI src as something not having to be >> run outside of OpenMPI, but as part of the PSM setup that OpenMPI >> does?

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Michael Di Domenico
ery rank. >> >> You can reuse the value as many times as you like - it doesn't have to be >> unique for each job. There is nothing magic about the value itself. >> >> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote: >> >>> How early does this ne

[OMPI users] Ofed v1.5.3?

2011-04-16 Thread Michael Di Domenico
Does OpenMPI v1.5.3 support Ofed v.1.5.3.1 ?

Re: [OMPI users] alltoall messages > 2^26

2011-04-11 Thread Michael Di Domenico
so even though you're > sending array's over 2^26 in size, it may require more than that for MPI to > actually send it. > > On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico < > mdidomeni...@gmail.com> wrote: > >> Has anyone seen an issue where OpenMPI/Infiniban

Re: [OMPI users] alltoall messages > 2^26

2011-04-05 Thread Michael Di Domenico
2^26 in size, it may require more than that for MPI to > actually send it. > > On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico > wrote: >> >> Has anyone seen an issue where OpenMPI/Infiniband hangs when sending >> messages over 2^26 in size? >> >> For a r

[OMPI users] alltoall messages > 2^26

2011-04-04 Thread Michael Di Domenico
Has anyone seen an issue where OpenMPI/Infiniband hangs when sending messages over 2^26 in size? For a reason i have not determined just yet machines on my cluster (OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send array's over 2^26 in size via the AllToAll collective. (user code)

Re: [OMPI users] srun and openmpi

2011-01-25 Thread Michael Di Domenico
gh >> knowledge to dive into the code to help fix, but i can certainly help >> test >> >> On Mon, Jan 24, 2011 at 1:41 PM, Nathan Hjelm wrote: >>> >>> I am seeing similar issues on our slurm clusters. We are looking into the >>> issue. >>> &

Re: [OMPI users] srun and openmpi

2011-01-25 Thread Michael Di Domenico
On Mon, Jan 24, 2011 at 1:41 PM, Nathan Hjelm wrote: > I am seeing similar issues on our slurm clusters. We are looking into the > issue. > > -Nathan > HPC-3, LANL > > On Tue, 11 Jan 2011, Michael Di Domenico wrote: > >> Any ideas on what might be causing this one?  Or

Re: [OMPI users] srun and openmpi

2011-01-11 Thread Michael Di Domenico
Any ideas on what might be causing this one? Or atleast what additional debug information someone might need? On Fri, Jan 7, 2011 at 4:03 PM, Michael Di Domenico wrote: > I'm still testing the slurm integration, which seems to work fine so > far.  However, i just upgraded another

Re: [OMPI users] CQ errors

2011-01-10 Thread Michael Di Domenico
2011/1/10 Peter Kjellström : > On Monday, January 10, 2011 03:06:06 pm Michael Di Domenico wrote: >> I'm not sure if these are being reported from OpenMPI or through >> OpenMPI from OpenFabrics, but i figured this would be a good place to >> start >> >> On

[OMPI users] CQ errors

2011-01-10 Thread Michael Di Domenico
I'm not sure if these are being reported from OpenMPI or through OpenMPI from OpenFabrics, but i figured this would be a good place to start On one node we received the below errors, i'm not sure i under the error sequence, hopefully someone can shed some light on what happened. [[5691,1],49][btl

Re: [OMPI users] srun and openmpi

2011-01-07 Thread Michael Di Domenico
ain wrote: > >> Run the program only once - it can be in the prolog of the job if you like. >> The output value needs to be in the env of every rank. >> >> You can reuse the value as many times as you like - it doesn't have to be >> unique for each

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Michael Di Domenico
ironment, you should be okay. Looks like > this: > > $ ./psm_keygen > OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954 > $ > > You compile the program with the usual mpicc. > > Let me know if this solves the problem (or not). > Ralph > > &g

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Michael Di Domenico
r the srun direct-launch scenario, > if you want to try it. Would be later today, though. > > > On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote: > >> Well maybe not horray, yet.  I might have jumped the gun a bit, it's >> looking like srun works in general, but per

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Michael Di Domenico
n the environment) PML add procs failed --> Returned "Error" (-1) instead of "Success" (0) Turn off PSM and srun works fine On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain wrote: > Hooray! > > On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote: > >> I think

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Michael Di Domenico
I think i take it all back. I just tried it again and it seems to work now. I'm not sure what I changed (between my first and this msg), but it does appear to work now. On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico wrote: > Yes that's true, error messages help.  I was hop

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Michael Di Domenico
best guess is that the port reservation didn't get passed down to the MPI > procs properly - but that's just a guess. > > > On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote: > >> Can anyone point me towards the most recent documentation for using >> s

[OMPI users] srun and openmpi

2010-12-23 Thread Michael Di Domenico
Can anyone point me towards the most recent documentation for using srun and openmpi? I followed what i found on the web with enabling the MpiPorts config in slurm and using the --resv-ports switch, but I'm getting an error from openmpi during setup. I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM

[OMPI users] openmpi v1.5?

2010-07-19 Thread Michael Di Domenico
Since I am a SVN neophyte can anyone tell me when openmpi 1.5 is scheduled for release? And whether the Slurm srun changes are going to make in? thanks

[OMPI users] flex.exe

2010-01-21 Thread Michael Di Domenico
openmpi-1.4.1/contrib/platform/win32/bin/flex.exe I understand this file might be required for building on windows, since I'm not I can just delete the file without issue. However, for those of us under import restrictions, where binaries are not allowed in, this file causes me to open the tarbal

Re: [OMPI users] openmpi 1.4 and barrier

2009-10-01 Thread Michael Di Domenico
Hmm, i don't recall seeing that... On Thu, Oct 1, 2009 at 1:51 PM, Jeff Squyres wrote: > FWIW, I saw this bug to have race-condition-like behavior. I could run a > few times and then it would work. > > On Oct 1, 2009, at 1:42 PM, Michael Di Domenico wrote: > >> On T

Re: [OMPI users] openmpi 1.4 and barrier

2009-10-01 Thread Michael Di Domenico
On Thu, Oct 1, 2009 at 1:37 PM, Jeff Squyres wrote: > On Oct 1, 2009, at 1:24 PM, Michael Di Domenico wrote: > >> I just upgraded to the devel snapshot of 1.4a1r22031 >> >> when i run a simple hello world with a barrier i get >> >> btl_tcp_endpoint.c:484:m

[OMPI users] openmpi 1.4 and barrier

2009-10-01 Thread Michael Di Domenico
I just upgraded to the devel snapshot of 1.4a1r22031 when i run a simple hello world with a barrier i get btl_tcp_endpoint.c:484:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier if i pull the barrier out the hello world runs fine interestingly enough, i can run IMB

Re: [OMPI users] strange IMB runs

2009-08-14 Thread Michael Di Domenico
> One of the differences among MPI implementations is the default placement of > processes within the node. E.g., should processes by default be collocated > on cores of the same socket or on cores of different sockets? I don't know > if that issue is applicable here (that is, HP MPI vs Open MPI

Re: [OMPI users] strange IMB runs

2009-08-14 Thread Michael Di Domenico
On Thu, Aug 13, 2009 at 1:51 AM, Eugene Loh wrote: > Also, I'm puzzled why you should see better results by changing > btl_sm_eager_limit. That shouldn't change long-message bandwidth, but only > the message size at which one transitions from short to long messages. If > anything, tweaking btl_sm

Re: [OMPI users] strange IMB runs

2009-08-13 Thread Michael Di Domenico
On Thu, Aug 13, 2009 at 1:51 AM, Eugene Loh wrote: >>Is this behavior expected? Are there any tunables to get the OpenMPI >>sockets up near HP-MPI? > > First, I want to understand the configuration. It's just a single node. No > interconnect (InfiniBand or Ethernet or anything). Right? Yes, th

Re: [OMPI users] strange IMB runs

2009-08-12 Thread Michael Di Domenico
On Thu, Aug 6, 2009 at 9:30 AM, Michael Di Domenico wrote: > Here's an interesting data point. I installed the RHEL rpm version of > OpenMPI 1.2.7-6 for ia64 > > mpirun -np 2 -mca btl self,sm -mca mpi_paffinity_alone 1 -mca > mpi_leave_pinned 1 $PWD/IMB-MPI1 pingpong > &

[OMPI users] x4100 with IB

2009-08-07 Thread Michael Di Domenico
I have several Sun x4100 with Infiniband which appear to be running at 400MB/sec instead of 800MB/sec. It a freshly reformatted cluster converting from solaris to linux. We also reset the bios settings with "load optimal defaults". Does anyone know which bios setting i changed to dump the BW? x4

Re: [OMPI users] strange IMB runs

2009-08-06 Thread Michael Di Domenico
B/sec With v1.2.7-6 and -mca btl self,sm i get ~225MB/sec With v1.2.7-6 and -mca btl self,tcp i get ~650MB/sec On Fri, Jul 31, 2009 at 10:42 AM, Edgar Gabriel wrote: > Michael Di Domenico wrote: >> >> mpi_leave_pinned didn't help still at ~145MB/sec >> btl_sm_eager_li

Re: [OMPI users] strange IMB runs

2009-07-31 Thread Michael Di Domenico
Outside of me just writing an ugly looping script... On Wed, Jul 29, 2009 at 1:55 PM, Dorian Krause wrote: > Hi, > > --mca mpi_leave_pinned 1 > > might help. Take a look at the FAQ for various tuning parameters. > > > Michael Di Domenico wrote: >> >> I'm

Re: [OMPI users] strange IMB runs

2009-07-30 Thread Michael Di Domenico
On Thu, Jul 30, 2009 at 10:08 AM, George Bosilca wrote: > The leave pinned will not help in this context. It can only help for devices > capable of real RMA operations and that require pinned memory, which > unfortunately is not the case for TCP. What is [really] strange about your > results is tha

[OMPI users] strange IMB runs

2009-07-29 Thread Michael Di Domenico
I'm not sure I understand what's actually happened here. I'm running IMB on an HP superdome, just comparing the PingPong benchmark HP-MPI v2.3 Max ~ 700-800MB/sec OpenMPI v1.3 -mca btl self,sm - Max ~ 125-150MB/sec -mca btl self,tcp - Max ~ 500-550MB/sec Is this behavior expected? Are there an

Re: [OMPI users] quadrics support?

2009-07-08 Thread Michael Di Domenico
On Wed, Jul 8, 2009 at 3:33 PM, Ashley Pittman wrote: >> When i run tping i get: >> ELAN_EXCEOPTIOn @ --: 6 (Initialization error) >> elan_init: Can't get capability from environment >> >> I am not using slurm or RMS at all, just trying to get openmpi to run >> between two nodes. > > To attach to t

Re: [OMPI users] quadrics support?

2009-07-08 Thread Michael Di Domenico
On Wed, Jul 8, 2009 at 12:33 PM, Ashley Pittman wrote: > Is the machine configured correctly to allow non OpenMPI QsNet programs > to run, for example tping? > > Which resource manager are you running, I think slurm compiled for RMS > is essential. I can ping via TCP/IP using the eip0 ports. When

Re: [OMPI users] quadrics support?

2009-07-07 Thread Michael Di Domenico
t the processes to go away I'm not sure if this is a quadrics or openmpi issue at this point, but i figured since there are quadrics people on the list its a good place to start On Tue, Jul 7, 2009 at 3:30 PM, Michael Di Domenico wrote: > Does OpenMPI/Quadrics require the Quadrics Kernel patche

Re: [OMPI users] quadrics support?

2009-07-07 Thread Michael Di Domenico
Does OpenMPI/Quadrics require the Quadrics Kernel patches in order to operate? Or operate at full speed or are the Quadrics modules sufficient? On Thu, Jul 2, 2009 at 1:52 PM, Ashley Pittman wrote: > On Thu, 2009-07-02 at 09:34 -0400, Michael Di Domenico wrote: >> Jeff, >> >>

Re: [OMPI users] quadrics support?

2009-07-02 Thread Michael Di Domenico
are not likely to bring it internally. I was hoping that quadrics >> support was mainline, but the documentation was out of date. >> >> On Thu, Jul 2, 2009 at 8:08 AM, Jeff Squyres wrote: >> > George -- >> > >> > I know that U. Tennessee did some work in th

Re: [OMPI users] quadrics support?

2009-07-02 Thread Michael Di Domenico
I know that U. Tennessee did some work in this area; did it ever > materialize? > > > On Jul 1, 2009, at 4:49 PM, Michael Di Domenico wrote: > >> Did the quadrics support for OpenMPI ever materialize? I can't find >> any documentation on the web about it and the few mail

[OMPI users] quadrics support?

2009-07-01 Thread Michael Di Domenico
Did the quadrics support for OpenMPI ever materialize? I can't find any documentation on the web about it and the few mailing list messages I came across showed some hints that it might be in progress but that was way back in 2007 Thanks