anges to the MPSS stack and this Phi stuff is very infantile at the moment
so minimal (decent) documentation, does anyone know what current package
provides PMI for the Xeon Phi?
Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936
/
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On May 23, 2014, at 9:19 AM, Albert Solernou
wrote:
> Hi,
> after compiling and installing OpenMPI 1.8.1, I find that OpenMPI is pinning
> processes onto cores. Although th
mpiifort and mpiicc are intel MPI library commands, in openmpi and others the
analogous would be mpifort and mpicc
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On May 27, 2014, at 2:11 PM, Lorenzo DonĂ wrote:
> Dear all
Ok I have dug into this more. Is this PMI the Slurm process manager?
To use OpenMPI on Phi just build OPenMPI for it? Does that mean I need to add
CFLAGS FCFLAGS -mmic ?
How does one go about doing multi-phi MPI code?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus
0x0f00
8,9,10,11
Which is exactly what I would expect.
So ummm, i'm lost why this might happen? What else should I check? Like I
said not all jobs show this behavior.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
signatur
te on nodes.
That is good to know, I think we would want to turn our default to 'bind to
core' except for our few users who use hybrid mode.
Our CPU set tells you what cores the job is assigned. So in the problem case
provided, the cpuset/cgroup shows only cores 8-11 are available
In this case they are a single socket, but as you can see they could be
ether/or depending on the job.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Jun 19, 2014, at 2:44 PM, Ralph Castain wrote:
> Sorry, I should have b
Got it,
I have the input from the user and am testing it out.
It probably has less todo with torque and more cpuset's,
I'm working on producing it myself also.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Jun 20
:68072] MCW rank 28 is not bound (or bound to all
available processors)
[nyx5552.engin.umich.edu:30481] MCW rank 12 is not bound (or bound to all
available processors)
[nyx5552.engin.umich.edu:30482] MCW rank 13 is not bound (or bound to all
available processors)
Brock Palen
www.umich.edu
as on each host?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Jun 20, 2014, at 12:38 PM, Brock Palen wrote:
> I was able to produce it in my test.
>
> orted affinity set by cpuset:
> [root@nyx5874 ~]# hwloc-b
Perfection! That appears to do it for our standard case.
Now I know how to set MCA options by env var or config file. How can I make
this the default, that then a user can override?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
"eth0,192.168.0.0/16"). If set to a non-default
value, it is mutually exclusive with
btl_tcp_if_include.
This is normally much longer. And yes we don't have the PHI stuff installed on
all nodes, strange that 'all all' is
set to a non-default
value, it is mutually exclusive with
btl_tcp_if_include.
[brockp@flux-login1 34241]$
ompi_info --param all all --level 9
(gives me what I expect).
Thanks,
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campu
nodes can provide).
One thought is to have the data collector processes be threads inside the MPI
job running across all nodes, but was curious is there is a way to pass data
still in memory (to much to hit disk) to the running MPI filter job.
Thanks!
Brock Palen
www.umich.edu/~brockp
CAE
But this is within the same MPI "universe" right?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Jun 27, 2014, at 10:19 AM, George Bosilca wrote:
> The One-Sided Communications from the Chapter 11 of the MPI st
performance, but i'm
not sure what todo about it? There is nothing on the list, but there was one
reference to another MPI library. Is there any idea what would cause this?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936
it a little. Should we investigate adding it to our systems?
Is there a way to suppress this warning?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
signature.asc
Description: Message signed with OpenPGP using GPGMail
trying to understand why that is the default.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Aug 27, 2014, at 10:15 AM, Alina Sklarevich
wrote:
> Hi,
>
> KNEM can improve the performance significantly for i
Brice, et al.
Thanks a lot for this info. We are setting up new builds of OMPI 1.8.2 with
knem and mxm 3.0,
If we have questions we will let you know.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Aug 27, 2014, at 12:44 PM
Interesting, we are using 3.0 that is in MOFED, and that is also what is on the
MXM download site. Kinda confusing.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Aug 28, 2014, at 2:12 AM, Mike Dubman wrote:
> btw, you
a \
--with-cuda=$CUDA \
--with-hwloc=internal \
--with-verbs \
--with-psm \
--with-tm=/usr/local/torque \
--with-fca=$FCA \
--with-mxm=$MXM \
--with-knem=$KNEM \
--with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \
$COMPILERS
Brock Palen
www.umich.edu/~brockp
CAEN
t; From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Brock Palen
>> Sent: Friday, September 05, 2014 5:22 PM
>> To: Open MPI Users
>> Subject: [OMPI users] enable-cuda with disable-dlopen
>>
>> * PGP Signed by an unknown key
>>
>> We foun
I do wrong? I'm stumped why one works one doesn't but the one that
doesn't if your force it appears correct.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
signature.asc
Description: Message signed with OpenPGP using GPGMail
Yes the request to torque was procs=64,
We are using cpusets.
the mpirun without -np 64 creates 64 spawned hostnames.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Sep 23, 2014, at 3:02 PM, Ralph Castain wrote:
> F
where the only 2 processes
come from.
I checked some of the other jobs and the cpusets and the pbs server cpu list
are the same.
More investigation required. Still strange why would it give that message at
all? Why would OpenMPI care, and why only when -np ## is given.
Brock Palen
www.umi
Doing special files on NFS can be weird, try the other /tmp/ locations:
/var/tmp/
/dev/shm (ram disk careful!)
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Oct 21, 2014, at 10:18 PM, Vinson Leung wrote:
>
> B
http://pivotalhd.docs.pivotal.io/doc/2100/Hamster.html
Which appears to imply extra setup required. Is this documented anywhere for
OpenMPI?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
I think a lot of the information on this page:
http://www.open-mpi.org/faq/?category=java
Is out of date with the 1.8 release.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
Thanks this is good feedback.
I was worried with the dynamic nature of Yarn containers that it would be hard
to coordinate wire up, and you have confirmed that.
Thanks
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Oct
same source
don't they?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
B fabric,
this can happen. Multiple times now when just plugging in line cards to
switches on a live system causes large swaths of jobs to die with this error.
Does anyone else have this problem? We are a Mellonox based fabric.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE
Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Oct 31, 2014, at 2:22 PM, Ralph Castain wrote:
>
>
>> On Oct 30, 2014, at 3:15 PM, Brock Palen wrote:
>>
>> If i'm on the node h
PI figure out that it can also talk over the others? How does it
chose to load balance?
BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 and
eoib0 are the same physical device and may screw with load balancing if anyone
ver falls back to TCP.
Brock Palen
www.umich.ed
Right I understand those are TCP interfaces, I was just showing that I have two
TCP interfaces over one physical interface, so why I was asking how TCP
interfaces were selected. It rarely if ever will mater to us.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
match the 1gig interfaces but
yet data is being sent out the 10gig eoib0/ib0 interfaces.
I'll go do some measurements and see.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Nov 8, 2014, at 8:30 AM, Jeff Squyres (j
Are you doing this just for debugging? Or you really want to do it within the
MPI program?
orte-ps
Gives you the pid/host for each rank, but I don't think there is any standard
way to do this via API.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champio
http://citutor.org/login.php
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Dec 19, 2014, at 5:56 PM, Diego Avesani wrote:
>
> dear all users,
> I am new in MPI world.
> I would like to know what is the bes
I'll hit you off list with my Abinit OpenMPI build notes,
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Feb 3, 2015, at 2:26 PM, Nick Papior Andersen wrote:
>
> I also concur with Jeff about asking sof
there being a way to give OMPI a stack of work todo from the talk
at SC this year, but I can't figure it out if it does what I think it
should do.
Thanks,
Brock Palen
www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp>
Director Advanced Research Computing - TS
XSEDE Campus C
In lib/openmpi/ after install.
Also psm does work, it just does not appear in ompi_info
Sorry if this has already been filed.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
1ad6]
[nyx5049:07901] [16] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b5dcbe9b994]
[nyx5049:07901] [17] orted [0x401999]
[nyx5049:07901] *** End of error message ***
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
process is stuck
and to drill into.
Thanks
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
Inner stdout (connecting)
Unexpected EOF from Inner stderr (connecting)
Unexpected exit from parallel command (state=connecting)
Bad exit code from parallel command (exit_code=131)
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Sep 1, 2010, at
Ah ok, I put it there just because the user couldn't read that from my home
space, and never even thought of that. gahhh.
Thanks,
BTW I tried joining the padb mailing list.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Sep 1, 2010,
peer
No active jobs could be found for user 'dianawon'
The job is running, I get this error running just orte-ps,
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Sep 2, 2010, at 9:47 AM, Brock Palen wrote:
> Ah ok, I put it there
You can build openMPI without tm, which will disable it, or you can test
first with a nasty option like:
mpirun \
--mca plm ^tm \
--mca ras ^tm \
--hostfile $PBS_NODEFILE \
./testmom
Brock Palen
www.umich.edu/~brockp
Center for
?
Thanks just trying to get some clarification.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
thanks guys.
Brock
>
>> David
>>
>> On 01/04/2011 03:47 AM, Brock Palen wrote:
>>> I have a user who reports that sending a broadcast of
>>>
>>> 540*1080 of reals (just over 2GB) fails with this:
>>>
>>>
>>> *** An er
bugs? I found one but only on shared memory and our version
should be new enough (from what I could tell) to avoid it.
Thanks, what should I look for to diagnose the issue?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Mar 21, 2011, at 1:59 PM, Jeff Squyres wrote:
> I no longer run Torque on my cluster, so my Torqueology is pretty rusty --
> but I think there's a Torque command to launch on remote nodes. tmrsh or
> pbsrsh or something like that...?
pbsdsh
If TM is working pbsdsh should work fine.
Torque+
Running with rdmacm the problem does seam to resolve its self,
The code is large and complicated, but the problem does appear to arise
regularly when ran.
Just FYI, can I collect extra information to help find a fix?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro
issues we should be aware
of?
Is there a reason we should not use rdmacm?
Thanks!
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
Given that part of our cluster is TCP only, openib wouldn't even startup on
those hosts and this would be ignored on hosts with IB adaptors?
Just checking thanks!
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Apr 21, 2011, at 6:
On Apr 21, 2011, at 6:49 PM, Ralph Castain wrote:
>
> On Apr 21, 2011, at 4:41 PM, Brock Palen wrote:
>
>> Given that part of our cluster is TCP only, openib wouldn't even startup on
>> those hosts
>
> That is correct - it would have no impact on those hosts
&g
Local port: 1
CPCs attempted: rdmacm
--
Again I think this is expected on this older hardware.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Apr 22, 2011, a
ges
2 total processes killed (some possibly by mpirun during cleanup)
We were being bit by a number of codes hanging in collectives, and was resolved
by using rdmacm. We tried setting this as default till the two bugs in
bugzilla are resolved as a work around. Then we hit this problem on our
-1
librdmacm-devel-1.0.11-1
librdmacm-devel-1.0.11-1
librdmacm-utils-1.0.11-1
So all the libraries are installed (I think) is there a way to verify this? Or
to have OpenMPI be more verbose what caused rdmacm to fail as an oob option?
Brock Palen
www.umich.edu/~brockp
Center for Advanced
, we set all our ib0 interfaces to have IP's on a 172. network. This
allowed the use of rdmacm to work and get latencies that we would expect. That
said we are still getting hangs. I can very reliably reproduce it using IMB
with a specific core count on a specific test case.
Jus
On May 12, 2011, at 10:13 AM, Jeff Squyres wrote:
> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>
>> We can reproduce it with IMB. We could provide access, but we'd have to
>> negotiate with the owners of the relevant nodes to give you interactive
>> access to them. Maybe Brock's would be mor
I am pretty sure MTL's and BTL's are very different, but just as a note,
This users code (Crash) hangs at MPI_Allreduce() in
Openib
But runs on:
tcp
psm (an mtl, different hardware)
Putting it out there if it does have any bearing. Otherwise ignore.
Brock Palen
www.umich.edu/~bro
relevant-looking one.
>>
>> https://svn.open-mpi.org/trac/ompi/ticket/2714
>
> Thanks. In csse it's useful info, it hangs for me with 1.5.3 & np=32 on
> connectx with more than one collective I can't recall.
Extra data point, that ticket said it ran wi
rogress pass their lockup points,
I will have a user test this,
Is this an ok option to put in our environment? What does 305 mean?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
>
> Thanks,
>
> Samuel Gutierrez
> Los Alamos Nationa
code.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On May 16, 2011, at 11:49 AM, George Bosilca wrote:
> Here is the output of the "ompi_info --param btl openib":
>
> MCA btl: parameter "btl_openib_fla
Sorry typo 314 not 313,
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On May 17, 2011, at 2:02 PM, Brock Palen wrote:
> Thanks, I though of looking at ompi_info after I sent that note sigh.
>
> SEND_INPLACE appears to help perfo
being able to reproduce it.
Any thoughts? Am I overlooking something?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On May 17, 2011, at 2:18 PM, Brock Palen wrote:
> Sorry typo 314 not 313,
>
> Brock Palen
> www.umich.edu/~bro
x27;
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/tmp/openmpi-1.4.3/ompi/contrib/vt'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/openmpi-1.4.3/ompi'
make: *** [all-recursive] Error 1
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
enabling different openib_flags
of 313 fix the issue abit lower bw for some message sizes.
Has there been any progress on this issue?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On May 18, 2011, at 10:25 AM, Brock Palen wrote:
> Well I hav
Question,
If we are using torque with TM with cpusets enabled for pinning should we not
enable numactl? Would they conflict with each other?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
For those interested in MPI-IO, and ROMIO Jeff and I did an interview Rajeev
and Rob:
http://www.rce-cast.com/Podcast/rce-66-romio-mpi-io.html
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
This should be fixed, there was a bad upload, the server had a different copy
than my machine. The fixed version is in place. Feel free to grab it again.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Feb 20, 2012, at 4:43 PM, Jeffrey Squyres
get a reable list
of every ranks posted sends? And then query an wiating MPI_Waitall() of a
running job to get what rends/recvs it is waiting on?
Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
PMPI_Waitall() at ?:?
ompi_request_default_wait_all() at ?:?
opal_progress() at ?:?
Stack trace(s) for thread: 2
-
[0-63] (64 processes)
-
start_thread() at ?:?
ips_ptl_pollintr() at ptl_rcvthread.c:324
tcp with this code?
Can we disable the psm mtl and use the verbs emulation on qlogic? While the
qlogic verbs isn't that great it is still much faster in my tests than tcp.
Is there a particular reason to pick tcp?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
Will do,
Right now I have asked the user to try rebuilding with the newest openmpi just
to be safe.
Interesting behavior rank0 the ib counters (using collctl) never gets a packet
in, only packets out.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Apr 24, 2012, at 3:09 PM, Jeffrey Squyres wrote:
> Could you repeat your tests with 1.4.5 and/or 1.5.5?
>
>
> On Apr 23, 2012, at 1:32 PM, Martin Siegert wrote:
>
>> Hi,
&g
om [[48116,2],39] to [[48116,1],0]:16, can't find route
[0]
func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f)
[0x2ae2ad17d0df]
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
Ralph,
Rmpi wraps everything up, so I tried setting them with
export OMPI_plm_base_verbose=5
export OMPI_dpm_base_verbose=5
and I get no extra messages even on helloworld example simple MPI-1.0 code.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
Ok will see, Rmpi we had working with 1.4 and has not been updated after 2010,
this this kinda stinks.
I will keep digging into it thanks for the help.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Jul 26, 2012, at 7:16 PM, Ralph Castain wrote
I think so, sorry if I gave you the impression that Rmpi changed,
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Jul 26, 2012, at 7:30 PM, Ralph Castain wrote:
> Guess I'm confused - your original note indicated that something had chang
MPI to blow up saying "can't allocate
registered memory, fatal, contact your admin", rather than fall back to
send/receive and just be slower.
Am I reading the release notes correctly? Is there a tunable setting to blow
up rather than fallback?
Brock Palen
www.umich.edu/~brock
On Aug 24, 2012, at 10:38 AM, Jeff Squyres wrote:
> On Aug 24, 2012, at 10:28 AM, Brock Palen wrote:
>
>> I grabbed the new OMPI 1.6.1 and ran my test that would cause a hang with
>> 1.6.0 with low registered memory. From reading the release notes rather
>>
Thanks and super cool.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Aug 25, 2012, at 7:06 AM, Jeff Squyres wrote:
> On Aug 24, 2012, at 10:45 AM, Brock Palen wrote:
>
>>> Right now we should be just warning if we can't
mmend?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
Our case is a single user expressing interest, and maybe long term mixing as we
explore hadoop options, and would mix MPI+hadoop.
I would not go into the effort if it is non-trivial to add it to 1.6.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
built with when I built
it?
Can I make ROMIO go into 'verbose' mode and have it print what it is setting
all its values to?
Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
x27;
We have Lustre, local filesystems (ufs), and NFSv3 and NFSv4 clients. So that
list should be good for our site.
Would this be a good recommendation for us to include in all our MPI builds?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Dec 3, 20
Thanks!
So it looks like most OpenMPI builds out there are running with ROMIO's that
are obvious to any optimizations to what they are running on.
I have added this to our build notes so we get it in next time. Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computin
to M processors, where N >
M). Double check that you have enough unique processors for all the
MPI processes that you are launching on this host.
You job will now abort.
--
Brock Palen
www.umich.edu/~brockp
CAEN Advanced C
27; cores, even if fake, I was looking for a node that had a bad
socket or wrong part.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Dec 19, 2012, at 9:08 PM, Ralph Castain wrote:
> I'm afraid these are both known problems in the 1.6.2 releas
w00t :-)
Thanks
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Dec 20, 2012, at 10:46 AM, Ralph Castain wrote:
> HmmmI'll see what I can do about the error message. I don't think there
> is much in 1.6 I can do, but in 1.7 I c
ow, "hey you are
always running out of your queue of size X" Or " your queue of size Y is
never used"
We are kinda blind for a lot of our applications :-)
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
connectX cards?
Lastly looking at the faq looks like MXM is used by default if available over
openIB
Should I take that to mean "use MXM if available and supported" ? As in only
use openib if that is the only thing you have?
Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced
FAQ page it states
that MXM was used in the past only for >128 ranks, but is in 1.6 used for rank
counts of any size.
I think we will do some testing, we never even heard of MXM before,
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Jan 22, 2013,
On Jan 22, 2013, at 2:53 PM, Shamis, Pavel wrote:
>>
>> Switching to SRQ and some guess of queue values selected appears to let the
>> code run.
>> S,4096,128:S,12288,128:S,65536,12
>>
>> Two questions,
>>
>> This is a ConnectX fabric, should I switch them to XRC queues? And should I
>> use
, unless "everything" is a
workload.
We will do some testing, we are setting up a time to talk to our Mellonox SA to
try to understand these components better.
Note most of our users run just fine with the standard Peer-Peer queues,
default out the box OpenMPI.
>
> -Pasha
>
>
>> You sound like our vendors, "what is your app"
>
> ;-) I used to be one.
>
> Ideally OMPI should do the switch between MXM/RC/XRC internally in the
> transport layer. Unfortunately,
> we don't have such smart selection logic. Hopefully IB vendors will fix some
> day.
I actually looked i
problems within the fabric;
please contact your system administrator.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Jan 24, 2013, at 10:10 AM, Sabuj Pattanayek wrote:
> or do i just need to compile two versions, one with IB and one without?
You should not need to, we have OMPI compiled for openib/psm and run that same
install on psm/tcp and verbs(openib) based gear.
All the nodes assigned to your job have
" (current value:
<0>, data source: default value)
MCA coll: parameter "coll_fca_enable_alltoallv" (current value:
<0>, data source: default value)
MCA coll: parameter "coll_fca_enable_alltoallw" (current value:
<0>, data source: default value)
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
That would do it.
Thanks!
Now to make even the normal ones work
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Apr 3, 2013, at 10:31 AM, Ralph Castain wrote:
> Looking at the source code, it is because those other collectives are
1 - 100 of 286 matches
Mail list logo