perhaps there is
> different initialization that happens such that the offending device search
> problem doesn't occur?
>
>
> Thanks,
>
> David
>
>
>
>
> From: Shrader, David Lee
> Sent: Tuesday, November 2, 2021 2:09 P
fairly frequently, but not everytime when trying to run xhpl on a new
machine i'm bumping into this. it happens with a single node or
multiple nodes
node1 selected pml ob1, but peer on node1 selected pml ucx
if i rerun the exact same command a few minutes later, it works fine.
the machine is new
On Mon, Mar 22, 2021 at 11:13 AM Pritchard Jr., Howard wrote:
> https://github.com/Sandia-OpenSHMEM/SOS
> if you want to use OpenSHMEM over OPA.
> If you have lots of cycles for development work, you could write an OFI SPML
> for the OSHMEM component of Open MPI.
thanks, i am aware of the sandi
i can build and run openmpi on an opa network just fine, but it turns
out building openshmem fails. the message is (no spml) found
looking at the config log it looks like it tries to build spml ikrit
and ucx which fail. i turn ucx off because it doesn't support opa and
isn't needed.
so this mes
sm_lid: 1
> > port_lid: 99
> > port_lmc: 0x00
> > link_layer: InfiniBand
> >
> > using gcc/gfortran 9.3.0
> >
> > Built Open MPI 4.0.
for whatever it's worth running the test program on my OPA cluster
seems to work. well it keeps spitting out [INFO MEMORY] lines, not
sure if it's supposed to stop at some point
i'm running rhel7, gcc 10.1, openmpi 4.0.5rc2, with-ofi, without-{psm,ucx,verbs}
On Tue, Jan 26, 2021 at 3:44 PM Patri
d to be what Mellanox used to configure OpenMPI in HPC-X
> 2.5.
>
> I have users using GCC, PGI, Intel and AOCC compilers with this config. PGI
> was the only one that
> was a challenge to build due to conflicts with HCOLL.
>
> -Ray Muno
>
> On 2/7/20 10:04 AM, Michael Di
i haven't compiled openmpi in a while, but i'm in the process of
upgrading our cluster.
the last time i did this there were specific versions of mpi/pmix/ucx
that were all tested and supposed to work together. my understanding
of this was because pmi/ucx was under rapid development and the api's
h ? could
> btl/ofi also be used for intra node communications ?)
>
>
> mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca
> mtl_base_verbose 10 ...
>
> should tell you what is used (feel free to compress and post the full
> output if you have some hard time unders
On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet
wrote:
> You can force
> mpirun --mca pml ob1 ...
> And btl/vader (shared memory) will be used for intra node communications ...
> unless MPI tasks are from different jobs (read MPI_Comm_spawn())
if i run
mpirun -n 16 IMB-MPI1 alltoallv
thing
On Mon, Mar 11, 2019 at 12:19 PM Ralph H Castain wrote:
> OFI uses libpsm2 underneath it when omnipath detected
>
> > On Mar 11, 2019, at 9:06 AM, Gilles Gouaillardet
> > wrote:
> > It might show that pml/cm and mtl/psm2 are used. In that case, then yes,
> > the OmniPath library is used even fo
On Mon, Mar 11, 2019 at 11:51 AM Ralph H Castain wrote:
> You are probably using the ofi mtl - could be psm2 uses loopback method?
according to ompi_info i do in fact have mtl's ofi,psm,psm2. i
haven't changed any of the defaults, so are you saying order to change
the behaviour i have to run mpi
i have a user that's claiming when two ranks on the same node want to
talk with each other, they're using the NIC to talk rather then just
talking directly.
i've never had to test such a scenario. is there a way for me to
prove one way or another whether two ranks are talking through say the
kern
s a typo in the v2.2.1 release. Sadly, our Slurm
> > plugin folks seem to be off somewhere for awhile and haven’t been testing
> > it. Sigh.
> >
> > I’ll patch the branch and let you know - we’d appreciate the feedback.
> > Ralph
> >
> >
> >> On
adding
>
> PMIX_MCA_pmix_client_event_verbose=5
> PMIX_MCA_pmix_server_event_verbose=5
> OMPI_MCA_pmix_base_verbose=10
>
> to your environment and see if that provides anything useful.
>
> > On Jan 18, 2019, at 12:09 PM, Michael Di Domenico
> > wrote:
> >
> > i compilie
i compilied pmix slurm openmpi
---pmix
./configure --prefix=/hpc/pmix/2.2 --with-munge=/hpc/munge/0.5.13
--disable-debug
---slurm
./configure --prefix=/hpc/slurm/18.08 --with-munge=/hpc/munge/0.5.13
--with-pmix=/hpc/pmix/2.2
---openmpi
./configure --prefix=/hpc/ompi/3.1 --with-hwloc=external
--wit
On Mon, Nov 12, 2018 at 8:08 AM Andrei Berceanu
wrote:
>
> Running a CUDA+MPI application on a node with 2 K80 GPUs, I get the following
> warnings:
>
> --
> WARNING: There is at least non-excluded one OpenFabrics device foun
On Wed, May 9, 2018 at 9:45 PM, Howard Pritchard wrote:
>
> You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a
> switch), and install that
> on your system, or else install xpmem (https://github.com/hjelmn/xpmem).
> Note there is a bug right now
> in UCX that you may hit if
before i debug ucx further (cause it's totally not working for me), i
figured i'd check to see if it's *really* required to use shmem inside
of openmpi. i'm pretty sure the answer is yes, but i wanted to double
check.
___
users mailing list
users@lists.o
On Mon, Apr 23, 2018 at 6:07 PM, r...@open-mpi.org wrote:
> Looks like the problem is that you didn’t wind up with the external PMIx. The
> component listed in your error is the internal PMIx one which shouldn’t have
> built given that configure line.
>
> Check your config.out and see what happe
i'm trying to get slurm 17.11.5 and openmpi 3.0.1 working with pmix.
everything compiled, but when i run something it get
: symbol lookup error: /openmpi/mca_pmix_pmix2x.so: undefined symbol:
opal_libevent2022_evthread_use_pthreads
i more then sure i did something wrong, but i'm not sure what, h
On Sat, Apr 7, 2018 at 3:50 PM, Jeff Squyres (jsquyres)
wrote:
> On Apr 6, 2018, at 8:12 AM, Michael Di Domenico
> wrote:
>> it would be nice if openmpi had (or may already have) a simple switch
>> that lets me disable entire portions of the library chain, ie this
>
On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet
wrote:
> That being said, the error suggest mca_oob_ud.so is a module from a
> previous install,
> Open MPI was not built on the system it is running, or libibverbs.so.1
> has been removed after
> Open MPI was built.
yes, understood, i compiled
i'm trying to compile openmpi to support all of our interconnects,
psm/openib/mxm/etc
this works fine, openmpi finds all the libs, compiles and runs on each
of the respective machines
however, we don't install the libraries for everything everywhere
so when i run things like ompi_info and mpirun
openmpi-2.0.2 running on rhel 7.4 with qlogic QDR infiniband
switches/adapters, also using slurm
i have a user that's running a job over multiple days. unfortunately
after a few days at random the job will seemingly hang. the latest
instance was caused by an infiniband adapter that went offline
my cluster nodes are connected on 1g ethernet eth0/eth1 and via
infiniband rdma and ib0
my understanding is that openmpi will detect all these interfaces.
using eth0/eth1 for connection setup and use rdma for msg passing
what would be an appropriate to command line parameters to tell
openmpi to i
i'm getting stuck trying to run some fairly large IMB-MPI alltoall
tests under openmpi 2.0.2 on rhel 7.4
i have two different clusters, one running mellanox fdr10 and one
running qlogic qdr
if i issue
mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv
the job just stalls after t
On Thu, Jun 22, 2017 at 12:41 PM, r...@open-mpi.org wrote:
> I gather you are using OMPI 2.x, yes? And you configured it
> --with-pmi=, then moved the executables/libs to your
> workstation?
correct
> I suppose I could state the obvious and say “don’t do that - just rebuild it”
correct... bu
On Thu, Jun 22, 2017 at 10:43 AM, John Hearns via users
wrote:
> Having had some problems with ssh launching (a few minutes ago) I can
> confirm that this works:
>
> --mca plm_rsh_agent "ssh -v"
this doesn't do anything for me
if i set OMPI_MCA_sec=^munge
i can clear the mca_sec_munge error
bu
ently, in the contect of a PBS cluster
>
> On 22 June 2017 at 16:16, Michael Di Domenico
> wrote:
>>
>> is it possible to disable slurm/munge/psm/pmi(x) from the mpirun
>> command line or (better) using environment variables?
>>
>> i'd like to use the ins
is it possible to disable slurm/munge/psm/pmi(x) from the mpirun
command line or (better) using environment variables?
i'd like to use the installed version of openmpi i have on a
workstation, but it's linked with slurm from one of my clusters.
mpi/slurm work just fine on the cluster, but when i
On Mon, Jul 25, 2016 at 4:53 AM, Gilles Gouaillardet wrote:
>
> as a workaround, you can configure without -noswitcherror.
>
> after you ran configure, you have to manually patch the generated 'libtool'
> file and add the line with pgcc*) and the next line like this :
>
> /* if pgcc is used, libto
pthread" from libslurm.la and libpmi.la
>>
>> On 07/11/2016 02:54 PM, Michael Di Domenico wrote:
>>>
>>> I'm trying to get openmpi compiled using the PGI compiler.
>>>
>>> the configure goes through and the code starts to compile, but the
On Mon, Jul 11, 2016 at 9:52 AM, Åke Sandgren wrote:
> Looks like you are compiling with slurm support.
>
> If so, you need to remove the "-pthread" from libslurm.la and libpmi.la
i don't see a configure option in slurm to disable pthreads, so i'm
not sure this is possible.
On Thu, Jul 14, 2016 at 9:47 AM, Michael Di Domenico
wrote:
> Have 1.10.3 unpacked, ran through the configure using the same command
> line options as 1.10.2
>
> but it fails even earlier in the make process at
>
> Entering openmpi-1.10.3/opal/asm
> CPPAS atomic-asm.lo
>
cense for the pgCC C++ compiler ?
> fwiw, FreePGI on OSX has no C++ license and PGI C and gnu g++ does not work
> together out of the box, hopefully I will have a fix ready sometimes this
> week
>
> Cheers,
>
> Gilles
>
>
> On Monday, July 11, 2016, Michael Di Domenico
&
On Mon, Jul 11, 2016 at 9:11 AM, Gilles Gouaillardet
wrote:
> Can you try the latest 1.10.3 instead ?
i can but it'll take a few days to pull the software inside.
> btw, do you have a license for the pgCC C++ compiler ?
> fwiw, FreePGI on OSX has no C++ license and PGI C and gnu g++ does not wor
I'm trying to get openmpi compiled using the PGI compiler.
the configure goes through and the code starts to compile, but then
gets hung up with
entering: openmpi-1.10.2/opal/mca/common/pmi
CC common_pmi.lo
CCLD libmca_common_pmi.la
pgcc-Error-Unknown switch: - pthread
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A
wrote:
> I was looking for lines like" [nodexyz:17085] selected cm best priority 40"
> and " [nodexyz:17099] select: component psm selected"
this may have turned up more then i expected. i recompiled openmpi
v1.8.4 as a test and reran the test
On Thu, Mar 17, 2016 at 12:52 PM, Jeff Squyres (jsquyres)
wrote:
> Can you send all the information listed here?
>
> https://www.open-mpi.org/community/help/
>
> (including the full output from the run with the PML/BTL/MTL/etc. verbosity)
>
> This will allow Matias to look through all the rele
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A
wrote:
> I was looking for lines like" [nodexyz:17085] selected cm best priority 40"
> and " [nodexyz:17099] select: component psm selected"
i see cm best priority 20, which seems to relate to ob1 being
selected. i don't see a mention of psm a
On Wed, Mar 16, 2016 at 4:49 PM, Cabral, Matias A
wrote:
> I didn't go into the code to see who is actually calling this error message,
> but I suspect this may be a generic error for "out of memory" kind of thing
> and not specific to the que pair. To confirm please add -mca
> pml_base_verbos
On Wed, Mar 16, 2016 at 3:37 PM, Cabral, Matias A
wrote:
> Hi Michael,
>
> I may be missing some context, if you are using the qlogic cards you will
> always want to use the psm mtl (-mca pml cm -mca mtl psm) and not openib btl.
> As Tom suggest, confirm the limits are setup on every node: could
On Wed, Mar 16, 2016 at 12:12 PM, Elken, Tom wrote:
> Hi Mike,
>
> In this file,
> $ cat /etc/security/limits.conf
> ...
> < do you see at the end ... >
>
> * hard memlock unlimited
> * soft memlock unlimited
> # -- All InfiniBand Settings End here --
> ?
Yes. I double checked that it's set on a
On Thu, Mar 10, 2016 at 11:54 AM, Michael Di Domenico
wrote:
> when i try to run an openmpi job with >128 ranks (16 ranks per node)
> using alltoall or alltoallv, i'm getting an error that the process was
> unable to get a queue pair.
>
> i've checked the max lock
when i try to run an openmpi job with >128 ranks (16 ranks per node)
using alltoall or alltoallv, i'm getting an error that the process was
unable to get a queue pair.
i've checked the max locked memory settings across my machines;
using ulimit -l in and outside of mpirun and they're all set to u
I'm trying to get slurm and openmpi to cooperate when running multi
thread jobs. i'm sure i'm doing something wrong, but i can't figure
out what
my node configuration is
2 nodes
2 sockets
6 cores per socket
i want to run
sbatch -N2 -n 8 --ntasks-per-node=4 --cpus-per-task=3 -w node1,node2
prog
->PSM API versions 11 and 12, so the message is harmless. I
> presume you're using the RHEL sourced package for a reason, but using an IFS
> release would fix the problem until RHEL 6.7 is ready.
>
> Andrew
>
>> -Original Message-
>> From: users
I'm getting the below message on my cluster(s). It seems to only
happen when I try to use more then 64 nodes (16-cores each). The
clusters are running RHEL 6.5 with Slurm and Openmpi-1.6.5 with PSM.
I'm using the OFED versions included with RHEL for infiniband support.
ipath_userinit: Mismatched
: /tmp
[...above lines only come out once...]
On Fri, Oct 12, 2012 at 9:27 AM, Michael Di Domenico
wrote:
> what isn't working is when i fire off an MPI job with over 800 ranks,
> they don't all actually start up a process
>
> fe, if i do srun -n 1024 --ntasks-per-node 12
esting to see whether it's a psm related problem now, i'll check
back if i can narrow the scope a little more
On Thu, Oct 11, 2012 at 10:21 PM, Ralph Castain wrote:
> I'm afraid I'm confused - I don't understand what is and isn't working. What
> "next process&quo
pl, i do see the orte process, but nothing in the
> logs about why it failed to launch xhpl
>
>
>
> On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico
> wrote:
>> I'm trying to diagnose an MPI job (in this case xhpl), that fails to
>> start when the rank count ge
aunch xhpl
On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico
wrote:
> I'm trying to diagnose an MPI job (in this case xhpl), that fails to
> start when the rank count gets fairly high into the thousands.
>
> My symptom is the jobs fires up via slurm, and I can see all the
I'm trying to diagnose an MPI job (in this case xhpl), that fails to
start when the rank count gets fairly high into the thousands.
My symptom is the jobs fires up via slurm, and I can see all the xhpl
processes on the nodes, but it never kicks over to the next process.
My question is, what debug
Certainly, i reached out to several contacts I have inside qlogic (i
used to work there)...
On Fri, Apr 29, 2011 at 10:30 AM, Ralph Castain wrote:
> Hi Michael
>
> I'm told that the Qlogic contacts we used to have are no longer there. Since
> you obviously are a customer, can you ping them and a
On Fri, Apr 29, 2011 at 10:01 AM, Michael Di Domenico
wrote:
> On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castain wrote:
>> Hi Michael
>>
>> Please see the attached updated patch to try for 1.5.3. I mistakenly free'd
>> the envar after adding it to the environ :-/
On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castain wrote:
> Hi Michael
>
> Please see the attached updated patch to try for 1.5.3. I mistakenly free'd
> the envar after adding it to the environ :-/
The patch works great, i can now see the precondition environment
variable if i do
mpirun -n 2 -host
On Thu, Apr 28, 2011 at 9:03 AM, Ralph Castain wrote:
>
> On Apr 28, 2011, at 6:49 AM, Michael Di Domenico wrote:
>
>> On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain wrote:
>>>
>>> On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote:
>>>
>>&
On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain wrote:
>
> On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote:
>
>> On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote:
>>>
>>> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote:
>>>
>>&
On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote:
>
> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote:
>
>> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote:
>>>
>>> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
>>>
>&
On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote:
>
> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
>
>> Was this ever committed to the OMPI src as something not having to be
>> run outside of OpenMPI, but as part of the PSM setup that OpenMPI
>> does?
ery rank.
>>
>> You can reuse the value as many times as you like - it doesn't have to be
>> unique for each job. There is nothing magic about the value itself.
>>
>> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote:
>>
>>> How early does this ne
Does OpenMPI v1.5.3 support Ofed v.1.5.3.1 ?
so even though you're
> sending array's over 2^26 in size, it may require more than that for MPI to
> actually send it.
>
> On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico <
> mdidomeni...@gmail.com> wrote:
>
>> Has anyone seen an issue where OpenMPI/Infiniban
2^26 in size, it may require more than that for MPI to
> actually send it.
>
> On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico
> wrote:
>>
>> Has anyone seen an issue where OpenMPI/Infiniband hangs when sending
>> messages over 2^26 in size?
>>
>> For a r
Has anyone seen an issue where OpenMPI/Infiniband hangs when sending
messages over 2^26 in size?
For a reason i have not determined just yet machines on my cluster
(OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send
array's over 2^26 in size via the AllToAll collective. (user code)
gh
>> knowledge to dive into the code to help fix, but i can certainly help
>> test
>>
>> On Mon, Jan 24, 2011 at 1:41 PM, Nathan Hjelm wrote:
>>>
>>> I am seeing similar issues on our slurm clusters. We are looking into the
>>> issue.
>>>
&
On Mon, Jan 24, 2011 at 1:41 PM, Nathan Hjelm wrote:
> I am seeing similar issues on our slurm clusters. We are looking into the
> issue.
>
> -Nathan
> HPC-3, LANL
>
> On Tue, 11 Jan 2011, Michael Di Domenico wrote:
>
>> Any ideas on what might be causing this one? Or
Any ideas on what might be causing this one? Or atleast what
additional debug information someone might need?
On Fri, Jan 7, 2011 at 4:03 PM, Michael Di Domenico
wrote:
> I'm still testing the slurm integration, which seems to work fine so
> far. However, i just upgraded another
2011/1/10 Peter Kjellström :
> On Monday, January 10, 2011 03:06:06 pm Michael Di Domenico wrote:
>> I'm not sure if these are being reported from OpenMPI or through
>> OpenMPI from OpenFabrics, but i figured this would be a good place to
>> start
>>
>> On
I'm not sure if these are being reported from OpenMPI or through
OpenMPI from OpenFabrics, but i figured this would be a good place to
start
On one node we received the below errors, i'm not sure i under the
error sequence, hopefully someone can shed some light on what
happened.
[[5691,1],49][btl
ain wrote:
>
>> Run the program only once - it can be in the prolog of the job if you like.
>> The output value needs to be in the env of every rank.
>>
>> You can reuse the value as many times as you like - it doesn't have to be
>> unique for each
ironment, you should be okay. Looks like
> this:
>
> $ ./psm_keygen
> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954
> $
>
> You compile the program with the usual mpicc.
>
> Let me know if this solves the problem (or not).
> Ralph
>
>
&g
r the srun direct-launch scenario,
> if you want to try it. Would be later today, though.
>
>
> On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote:
>
>> Well maybe not horray, yet. I might have jumped the gun a bit, it's
>> looking like srun works in general, but per
n the environment)
PML add procs failed
--> Returned "Error" (-1) instead of "Success" (0)
Turn off PSM and srun works fine
On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain wrote:
> Hooray!
>
> On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote:
>
>> I think
I think i take it all back. I just tried it again and it seems to
work now. I'm not sure what I changed (between my first and this
msg), but it does appear to work now.
On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico
wrote:
> Yes that's true, error messages help. I was hop
best guess is that the port reservation didn't get passed down to the MPI
> procs properly - but that's just a guess.
>
>
> On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote:
>
>> Can anyone point me towards the most recent documentation for using
>> s
Can anyone point me towards the most recent documentation for using
srun and openmpi?
I followed what i found on the web with enabling the MpiPorts config
in slurm and using the --resv-ports switch, but I'm getting an error
from openmpi during setup.
I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM
Since I am a SVN neophyte can anyone tell me when openmpi 1.5 is
scheduled for release? And whether the Slurm srun changes are going
to make in?
thanks
openmpi-1.4.1/contrib/platform/win32/bin/flex.exe
I understand this file might be required for building on windows,
since I'm not I can just delete the file without issue.
However, for those of us under import restrictions, where binaries are
not allowed in, this file causes me to open the tarbal
Hmm, i don't recall seeing that...
On Thu, Oct 1, 2009 at 1:51 PM, Jeff Squyres wrote:
> FWIW, I saw this bug to have race-condition-like behavior. I could run a
> few times and then it would work.
>
> On Oct 1, 2009, at 1:42 PM, Michael Di Domenico wrote:
>
>> On T
On Thu, Oct 1, 2009 at 1:37 PM, Jeff Squyres wrote:
> On Oct 1, 2009, at 1:24 PM, Michael Di Domenico wrote:
>
>> I just upgraded to the devel snapshot of 1.4a1r22031
>>
>> when i run a simple hello world with a barrier i get
>>
>> btl_tcp_endpoint.c:484:m
I just upgraded to the devel snapshot of 1.4a1r22031
when i run a simple hello world with a barrier i get
btl_tcp_endpoint.c:484:mca_btl_tcp_endpoint_recv_connect_ack] received
unexpected process identifier
if i pull the barrier out the hello world runs fine
interestingly enough, i can run IMB
> One of the differences among MPI implementations is the default placement of
> processes within the node. E.g., should processes by default be collocated
> on cores of the same socket or on cores of different sockets? I don't know
> if that issue is applicable here (that is, HP MPI vs Open MPI
On Thu, Aug 13, 2009 at 1:51 AM, Eugene Loh wrote:
> Also, I'm puzzled why you should see better results by changing
> btl_sm_eager_limit. That shouldn't change long-message bandwidth, but only
> the message size at which one transitions from short to long messages. If
> anything, tweaking btl_sm
On Thu, Aug 13, 2009 at 1:51 AM, Eugene Loh wrote:
>>Is this behavior expected? Are there any tunables to get the OpenMPI
>>sockets up near HP-MPI?
>
> First, I want to understand the configuration. It's just a single node. No
> interconnect (InfiniBand or Ethernet or anything). Right?
Yes, th
On Thu, Aug 6, 2009 at 9:30 AM, Michael Di
Domenico wrote:
> Here's an interesting data point. I installed the RHEL rpm version of
> OpenMPI 1.2.7-6 for ia64
>
> mpirun -np 2 -mca btl self,sm -mca mpi_paffinity_alone 1 -mca
> mpi_leave_pinned 1 $PWD/IMB-MPI1 pingpong
>
&
I have several Sun x4100 with Infiniband which appear to be running at
400MB/sec instead of 800MB/sec. It a freshly reformatted cluster
converting from solaris to linux. We also reset the bios settings
with "load optimal defaults". Does anyone know which bios setting i
changed to dump the BW?
x4
B/sec
With v1.2.7-6 and -mca btl self,sm i get ~225MB/sec
With v1.2.7-6 and -mca btl self,tcp i get ~650MB/sec
On Fri, Jul 31, 2009 at 10:42 AM, Edgar Gabriel wrote:
> Michael Di Domenico wrote:
>>
>> mpi_leave_pinned didn't help still at ~145MB/sec
>> btl_sm_eager_li
Outside of me
just writing an ugly looping script...
On Wed, Jul 29, 2009 at 1:55 PM, Dorian Krause wrote:
> Hi,
>
> --mca mpi_leave_pinned 1
>
> might help. Take a look at the FAQ for various tuning parameters.
>
>
> Michael Di Domenico wrote:
>>
>> I'm
On Thu, Jul 30, 2009 at 10:08 AM, George Bosilca wrote:
> The leave pinned will not help in this context. It can only help for devices
> capable of real RMA operations and that require pinned memory, which
> unfortunately is not the case for TCP. What is [really] strange about your
> results is tha
I'm not sure I understand what's actually happened here. I'm running
IMB on an HP superdome, just comparing the PingPong benchmark
HP-MPI v2.3
Max ~ 700-800MB/sec
OpenMPI v1.3
-mca btl self,sm - Max ~ 125-150MB/sec
-mca btl self,tcp - Max ~ 500-550MB/sec
Is this behavior expected? Are there an
On Wed, Jul 8, 2009 at 3:33 PM, Ashley Pittman wrote:
>> When i run tping i get:
>> ELAN_EXCEOPTIOn @ --: 6 (Initialization error)
>> elan_init: Can't get capability from environment
>>
>> I am not using slurm or RMS at all, just trying to get openmpi to run
>> between two nodes.
>
> To attach to t
On Wed, Jul 8, 2009 at 12:33 PM, Ashley Pittman wrote:
> Is the machine configured correctly to allow non OpenMPI QsNet programs
> to run, for example tping?
>
> Which resource manager are you running, I think slurm compiled for RMS
> is essential.
I can ping via TCP/IP using the eip0 ports.
When
t the processes to go away
I'm not sure if this is a quadrics or openmpi issue at this point, but
i figured since there are quadrics people on the list its a good place
to start
On Tue, Jul 7, 2009 at 3:30 PM, Michael Di
Domenico wrote:
> Does OpenMPI/Quadrics require the Quadrics Kernel patche
Does OpenMPI/Quadrics require the Quadrics Kernel patches in order to
operate? Or operate at full speed or are the Quadrics modules
sufficient?
On Thu, Jul 2, 2009 at 1:52 PM, Ashley Pittman wrote:
> On Thu, 2009-07-02 at 09:34 -0400, Michael Di Domenico wrote:
>> Jeff,
>>
>>
are not likely to bring it internally. I was hoping that quadrics
>> support was mainline, but the documentation was out of date.
>>
>> On Thu, Jul 2, 2009 at 8:08 AM, Jeff Squyres wrote:
>> > George --
>> >
>> > I know that U. Tennessee did some work in th
I know that U. Tennessee did some work in this area; did it ever
> materialize?
>
>
> On Jul 1, 2009, at 4:49 PM, Michael Di Domenico wrote:
>
>> Did the quadrics support for OpenMPI ever materialize? I can't find
>> any documentation on the web about it and the few mail
Did the quadrics support for OpenMPI ever materialize? I can't find
any documentation on the web about it and the few mailing list
messages I came across showed some hints that it might be in progress
but that was way back in 2007
Thanks
99 matches
Mail list logo