[OMPI users] Enable PMI build

2014-05-16 Thread Brock Palen
anges to the MPSS stack and this Phi stuff is very infantile at the moment so minimal (decent) documentation, does anyone know what current package provides PMI for the Xeon Phi? Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936

Re: [OMPI users] pinning processes by default

2014-05-23 Thread Brock Palen
/ Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On May 23, 2014, at 9:19 AM, Albert Solernou wrote: > Hi, > after compiling and installing OpenMPI 1.8.1, I find that OpenMPI is pinning > processes onto cores. Although th

Re: [OMPI users] mpiifort mpiicc not found

2014-05-27 Thread Brock Palen
mpiifort and mpiicc are intel MPI library commands, in openmpi and others the analogous would be mpifort and mpicc Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On May 27, 2014, at 2:11 PM, Lorenzo DonĂ  wrote: > Dear all

Re: [OMPI users] Enable PMI build

2014-05-29 Thread Brock Palen
Ok I have dug into this more. Is this PMI the Slurm process manager? To use OpenMPI on Phi just build OPenMPI for it? Does that mean I need to add CFLAGS FCFLAGS -mmic ? How does one go about doing multi-phi MPI code? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus

[OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-18 Thread Brock Palen
0x0f00 8,9,10,11 Which is exactly what I would expect. So ummm, i'm lost why this might happen? What else should I check? Like I said not all jobs show this behavior. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 signatur

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-19 Thread Brock Palen
te on nodes. That is good to know, I think we would want to turn our default to 'bind to core' except for our few users who use hybrid mode. Our CPU set tells you what cores the job is assigned. So in the problem case provided, the cpuset/cgroup shows only cores 8-11 are available

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen
In this case they are a single socket, but as you can see they could be ether/or depending on the job. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 19, 2014, at 2:44 PM, Ralph Castain wrote: > Sorry, I should have b

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen
Got it, I have the input from the user and am testing it out. It probably has less todo with torque and more cpuset's, I'm working on producing it myself also. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 20

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen
:68072] MCW rank 28 is not bound (or bound to all available processors) [nyx5552.engin.umich.edu:30481] MCW rank 12 is not bound (or bound to all available processors) [nyx5552.engin.umich.edu:30482] MCW rank 13 is not bound (or bound to all available processors) Brock Palen www.umich.edu

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen
as on each host? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 20, 2014, at 12:38 PM, Brock Palen wrote: > I was able to produce it in my test. > > orted affinity set by cpuset: > [root@nyx5874 ~]# hwloc-b

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen
Perfection! That appears to do it for our standard case. Now I know how to set MCA options by env var or config file. How can I make this the default, that then a user can override? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-23 Thread Brock Palen
"eth0,192.168.0.0/16"). If set to a non-default value, it is mutually exclusive with btl_tcp_if_include. This is normally much longer. And yes we don't have the PHI stuff installed on all nodes, strange that 'all all' is

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-25 Thread Brock Palen
set to a non-default value, it is mutually exclusive with btl_tcp_if_include. [brockp@flux-login1 34241]$ ompi_info --param all all --level 9 (gives me what I expect). Thanks, Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campu

[OMPI users] importing to MPI data already in memory from another process

2014-06-27 Thread Brock Palen
nodes can provide). One thought is to have the data collector processes be threads inside the MPI job running across all nodes, but was curious is there is a way to pass data still in memory (to much to hit disk) to the running MPI filter job. Thanks! Brock Palen www.umich.edu/~brockp CAE

Re: [OMPI users] importing to MPI data already in memory from another process

2014-06-27 Thread Brock Palen
But this is within the same MPI "universe" right? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 27, 2014, at 10:19 AM, George Bosilca wrote: > The One-Sided Communications from the Chapter 11 of the MPI st

[OMPI users] HugeTLB messages from mpi code

2014-07-01 Thread Brock Palen
performance, but i'm not sure what todo about it? There is nothing on the list, but there was one reference to another MPI library. Is there any idea what would cause this? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936

[OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen
it a little. Should we investigate adding it to our systems? Is there a way to suppress this warning? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 signature.asc Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen
trying to understand why that is the default. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Aug 27, 2014, at 10:15 AM, Alina Sklarevich wrote: > Hi, > > KNEM can improve the performance significantly for i

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen
Brice, et al. Thanks a lot for this info. We are setting up new builds of OMPI 1.8.2 with knem and mxm 3.0, If we have questions we will let you know. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Aug 27, 2014, at 12:44 PM

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-28 Thread Brock Palen
Interesting, we are using 3.0 that is in MOFED, and that is also what is on the MXM download site. Kinda confusing. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Aug 28, 2014, at 2:12 AM, Mike Dubman wrote: > btw, you

[OMPI users] enable-cuda with disable-dlopen

2014-09-05 Thread Brock Palen
a \ --with-cuda=$CUDA \ --with-hwloc=internal \ --with-verbs \ --with-psm \ --with-tm=/usr/local/torque \ --with-fca=$FCA \ --with-mxm=$MXM \ --with-knem=$KNEM \ --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \ $COMPILERS Brock Palen www.umich.edu/~brockp CAEN

Re: [OMPI users] enable-cuda with disable-dlopen

2014-09-05 Thread Brock Palen
t; From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Brock Palen >> Sent: Friday, September 05, 2014 5:22 PM >> To: Open MPI Users >> Subject: [OMPI users] enable-cuda with disable-dlopen >> >> * PGP Signed by an unknown key >> >> We foun

[OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Brock Palen
I do wrong? I'm stumped why one works one doesn't but the one that doesn't if your force it appears correct. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 signature.asc Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Brock Palen
Yes the request to torque was procs=64, We are using cpusets. the mpirun without -np 64 creates 64 spawned hostnames. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Sep 23, 2014, at 3:02 PM, Ralph Castain wrote: > F

Re: [OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-24 Thread Brock Palen
where the only 2 processes come from. I checked some of the other jobs and the cpusets and the pbs server cpu list are the same. More investigation required. Still strange why would it give that message at all? Why would OpenMPI care, and why only when -np ## is given. Brock Palen www.umi

Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-21 Thread Brock Palen
Doing special files on NFS can be weird, try the other /tmp/ locations: /var/tmp/ /dev/shm (ram disk careful!) Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Oct 21, 2014, at 10:18 PM, Vinson Leung wrote: > > B

[OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Brock Palen
http://pivotalhd.docs.pivotal.io/doc/2100/Hamster.html Which appears to imply extra setup required. Is this documented anywhere for OpenMPI? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985

[OMPI users] Java FAQ Page out of date

2014-10-27 Thread Brock Palen
I think a lot of the information on this page: http://www.open-mpi.org/faq/?category=java Is out of date with the 1.8 release. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985

Re: [OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Brock Palen
Thanks this is good feedback. I was worried with the dynamic nature of Yarn containers that it would be hard to coordinate wire up, and you have confirmed that. Thanks Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Oct

[OMPI users] orte-ps and orte-top behavior

2014-10-30 Thread Brock Palen
same source don't they? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985

[OMPI users] IB Retry Limit Errors when fabric changes

2014-10-31 Thread Brock Palen
B fabric, this can happen. Multiple times now when just plugging in line cards to switches on a live system causes large swaths of jobs to die with this error. Does anyone else have this problem? We are a Mellonox based fabric. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE

Re: [OMPI users] orte-ps and orte-top behavior

2014-10-31 Thread Brock Palen
Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Oct 31, 2014, at 2:22 PM, Ralph Castain wrote: > > >> On Oct 30, 2014, at 3:15 PM, Brock Palen wrote: >> >> If i'm on the node h

[OMPI users] How OMPI picks ethernet interfaces

2014-11-07 Thread Brock Palen
PI figure out that it can also talk over the others? How does it chose to load balance? BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 and eoib0 are the same physical device and may screw with load balancing if anyone ver falls back to TCP. Brock Palen www.umich.ed

Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces

2014-11-08 Thread Brock Palen
Right I understand those are TCP interfaces, I was just showing that I have two TCP interfaces over one physical interface, so why I was asking how TCP interfaces were selected. It rarely if ever will mater to us. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-08 Thread Brock Palen
match the 1gig interfaces but yet data is being sent out the 10gig eoib0/ib0 interfaces. I'll go do some measurements and see. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Nov 8, 2014, at 8:30 AM, Jeff Squyres (j

Re: [OMPI users] How to find MPI ranks located in remote nodes?

2014-11-25 Thread Brock Palen
Are you doing this just for debugging? Or you really want to do it within the MPI program? orte-ps Gives you the pid/host for each rank, but I don't think there is any standard way to do this via API. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champio

Re: [OMPI users] best function to send data

2014-12-22 Thread Brock Palen
http://citutor.org/login.php Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Dec 19, 2014, at 5:56 PM, Diego Avesani wrote: > > dear all users, > I am new in MPI world. > I would like to know what is the bes

Re: [OMPI users] configuring a code with MPI/OPENMPI

2015-02-03 Thread Brock Palen
I'll hit you off list with my Abinit OpenMPI build notes, Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Feb 3, 2015, at 2:26 PM, Nick Papior Andersen wrote: > > I also concur with Jeff about asking sof

[OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-23 Thread Brock Palen
there being a way to give OMPI a stack of work todo from the talk at SC this year, but I can't figure it out if it does what I think it should do. Thanks, Brock Palen www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp> Director Advanced Research Computing - TS XSEDE Campus C

[OMPI users] psm mtl not appearing in ompi_info in 1.4.2

2010-07-21 Thread Brock Palen
In lib/openmpi/ after install. Also psm does work, it just does not appear in ompi_info Sorry if this has already been filed. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985

[OMPI users] random IB failures when running medium core counts

2010-08-30 Thread Brock Palen
1ad6] [nyx5049:07901] [16] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b5dcbe9b994] [nyx5049:07901] [17] orted [0x401999] [nyx5049:07901] *** End of error message *** Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985

[OMPI users] simplest way to check message queues

2010-09-01 Thread Brock Palen
process is stuck and to drill into. Thanks Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] simplest way to check message queues

2010-09-01 Thread Brock Palen
Inner stdout (connecting) Unexpected EOF from Inner stderr (connecting) Unexpected exit from parallel command (state=connecting) Bad exit code from parallel command (exit_code=131) Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Sep 1, 2010, at

Re: [OMPI users] simplest way to check message queues

2010-09-02 Thread Brock Palen
Ah ok, I put it there just because the user couldn't read that from my home space, and never even thought of that. gahhh. Thanks, BTW I tried joining the padb mailing list. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Sep 1, 2010,

Re: [OMPI users] simplest way to check message queues

2010-09-02 Thread Brock Palen
peer No active jobs could be found for user 'dianawon' The job is running, I get this error running just orte-ps, Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Sep 2, 2010, at 9:47 AM, Brock Palen wrote: > Ah ok, I put it there

Re: [OMPI users] Using (or not using) Torque/Moab under PBS Pro as the OpenMPI launcher

2010-12-17 Thread Brock Palen
You can build openMPI without tm, which will disable it, or you can test first with a nasty option like: mpirun \ --mca plm ^tm \ --mca ras ^tm \ --hostfile $PBS_NODEFILE \ ./testmom Brock Palen www.umich.edu/~brockp Center for

[OMPI users] Sending large boradcasts

2011-01-03 Thread Brock Palen
? Thanks just trying to get some clarification. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] Sending large boradcasts

2011-01-04 Thread Brock Palen
thanks guys. Brock > >> David >> >> On 01/04/2011 03:47 AM, Brock Palen wrote: >>> I have a user who reports that sending a broadcast of >>> >>> 540*1080 of reals (just over 2GB) fails with this: >>> >>> >>> *** An er

[OMPI users] MPI_AllReduce() deadlock on IB

2011-03-16 Thread Brock Palen
bugs? I found one but only on shared memory and our version should be new enough (from what I could tell) to avoid it. Thanks, what should I look for to diagnose the issue? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] OpenMPI and Torque

2011-03-21 Thread Brock Palen
On Mar 21, 2011, at 1:59 PM, Jeff Squyres wrote: > I no longer run Torque on my cluster, so my Torqueology is pretty rusty -- > but I think there's a Torque command to launch on remote nodes. tmrsh or > pbsrsh or something like that...? pbsdsh If TM is working pbsdsh should work fine. Torque+

Re: [OMPI users] MPI_AllReduce() deadlock on IB

2011-03-25 Thread Brock Palen
Running with rdmacm the problem does seam to resolve its self, The code is large and complicated, but the problem does appear to arise regularly when ran. Just FYI, can I collect extra information to help find a fix? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro

[OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-20 Thread Brock Palen
issues we should be aware of? Is there a reason we should not use rdmacm? Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-21 Thread Brock Palen
Given that part of our cluster is TCP only, openib wouldn't even startup on those hosts and this would be ignored on hosts with IB adaptors? Just checking thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Apr 21, 2011, at 6:

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-22 Thread Brock Palen
On Apr 21, 2011, at 6:49 PM, Ralph Castain wrote: > > On Apr 21, 2011, at 4:41 PM, Brock Palen wrote: > >> Given that part of our cluster is TCP only, openib wouldn't even startup on >> those hosts > > That is correct - it would have no impact on those hosts &g

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-27 Thread Brock Palen
Local port: 1 CPCs attempted: rdmacm -- Again I think this is expected on this older hardware. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Apr 22, 2011, a

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-28 Thread Brock Palen
ges 2 total processes killed (some possibly by mpirun during cleanup) We were being bit by a number of codes hanging in collectives, and was resolved by using rdmacm. We tried setting this as default till the two bugs in bugzilla are resolved as a work around. Then we hit this problem on our

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-05 Thread Brock Palen
-1 librdmacm-devel-1.0.11-1 librdmacm-devel-1.0.11-1 librdmacm-utils-1.0.11-1 So all the libraries are installed (I think) is there a way to verify this? Or to have OpenMPI be more verbose what caused rdmacm to fail as an oob option? Brock Palen www.umich.edu/~brockp Center for Advanced

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-11 Thread Brock Palen
, we set all our ib0 interfaces to have IP's on a 172. network. This allowed the use of rdmacm to work and get latencies that we would expect. That said we are still getting hangs. I can very reliably reproduce it using IMB with a specific core count on a specific test case. Jus

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-12 Thread Brock Palen
On May 12, 2011, at 10:13 AM, Jeff Squyres wrote: > On May 11, 2011, at 3:21 PM, Dave Love wrote: > >> We can reproduce it with IMB. We could provide access, but we'd have to >> negotiate with the owners of the relevant nodes to give you interactive >> access to them. Maybe Brock's would be mor

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-12 Thread Brock Palen
I am pretty sure MTL's and BTL's are very different, but just as a note, This users code (Crash) hangs at MPI_Allreduce() in Openib But runs on: tcp psm (an mtl, different hardware) Putting it out there if it does have any bearing. Otherwise ignore. Brock Palen www.umich.edu/~bro

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-13 Thread Brock Palen
relevant-looking one. >> >> https://svn.open-mpi.org/trac/ompi/ticket/2714 > > Thanks. In csse it's useful info, it hangs for me with 1.5.3 & np=32 on > connectx with more than one collective I can't recall. Extra data point, that ticket said it ran wi

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-16 Thread Brock Palen
rogress pass their lockup points, I will have a user test this, Is this an ok option to put in our environment? What does 305 mean? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 > > Thanks, > > Samuel Gutierrez > Los Alamos Nationa

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-17 Thread Brock Palen
code. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 16, 2011, at 11:49 AM, George Bosilca wrote: > Here is the output of the "ompi_info --param btl openib": > > MCA btl: parameter "btl_openib_fla

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-17 Thread Brock Palen
Sorry typo 314 not 313, Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 17, 2011, at 2:02 PM, Brock Palen wrote: > Thanks, I though of looking at ompi_info after I sent that note sigh. > > SEND_INPLACE appears to help perfo

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-18 Thread Brock Palen
being able to reproduce it. Any thoughts? Am I overlooking something? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 17, 2011, at 2:18 PM, Brock Palen wrote: > Sorry typo 314 not 313, > > Brock Palen > www.umich.edu/~bro

[OMPI users] openmpi-1.4.3 and pgi-11.6 segfault

2011-06-21 Thread Brock Palen
x27; make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/tmp/openmpi-1.4.3/ompi/contrib/vt' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/tmp/openmpi-1.4.3/ompi' make: *** [all-recursive] Error 1 Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-07-27 Thread Brock Palen
enabling different openib_flags of 313 fix the issue abit lower bw for some message sizes. Has there been any progress on this issue? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 18, 2011, at 10:25 AM, Brock Palen wrote: > Well I hav

[OMPI users] numactl with torque cpusets

2011-11-09 Thread Brock Palen
Question, If we are using torque with TM with cpusets enabled for pinning should we not enable numactl? Would they conflict with each other? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985

[OMPI users] ROMIO Podcast

2012-02-20 Thread Brock Palen
For those interested in MPI-IO, and ROMIO Jeff and I did an interview Rajeev and Rob: http://www.rce-cast.com/Podcast/rce-66-romio-mpi-io.html Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] ROMIO Podcast

2012-02-20 Thread Brock Palen
This should be fixed, there was a bad upload, the server had a different copy than my machine. The fixed version is in place. Feel free to grab it again. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Feb 20, 2012, at 4:43 PM, Jeffrey Squyres

[OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen
get a reable list of every ranks posted sends? And then query an wiating MPI_Waitall() of a running job to get what rends/recvs it is waiting on? Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen
PMPI_Waitall() at ?:? ompi_request_default_wait_all() at ?:? opal_progress() at ?:? Stack trace(s) for thread: 2 - [0-63] (64 processes) - start_thread() at ?:? ips_ptl_pollintr() at ptl_rcvthread.c:324

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen
tcp with this code? Can we disable the psm mtl and use the verbs emulation on qlogic? While the qlogic verbs isn't that great it is still much faster in my tests than tcp. Is there a particular reason to pick tcp? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen
Will do, Right now I have asked the user to try rebuilding with the newest openmpi just to be safe. Interesting behavior rank0 the ib counters (using collctl) never gets a packet in, only packets out. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] MPI_Allreduce hangs

2012-04-24 Thread Brock Palen
. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Apr 24, 2012, at 3:09 PM, Jeffrey Squyres wrote: > Could you repeat your tests with 1.4.5 and/or 1.5.5? > > > On Apr 23, 2012, at 1:32 PM, Martin Siegert wrote: > >> Hi, &g

[OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen
om [[48116,2],39] to [[48116,1],0]:16, can't find route [0] func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f) [0x2ae2ad17d0df] Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen
Ralph, Rmpi wraps everything up, so I tried setting them with export OMPI_plm_base_verbose=5 export OMPI_dpm_base_verbose=5 and I get no extra messages even on helloworld example simple MPI-1.0 code. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen
Ok will see, Rmpi we had working with 1.4 and has not been updated after 2010, this this kinda stinks. I will keep digging into it thanks for the help. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Jul 26, 2012, at 7:16 PM, Ralph Castain wrote

Re: [OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen
I think so, sorry if I gave you the impression that Rmpi changed, Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Jul 26, 2012, at 7:30 PM, Ralph Castain wrote: > Guess I'm confused - your original note indicated that something had chang

[OMPI users] openmpi 1.6.1 Questions

2012-08-24 Thread Brock Palen
MPI to blow up saying "can't allocate registered memory, fatal, contact your admin", rather than fall back to send/receive and just be slower. Am I reading the release notes correctly? Is there a tunable setting to blow up rather than fallback? Brock Palen www.umich.edu/~brock

Re: [OMPI users] openmpi 1.6.1 Questions

2012-08-24 Thread Brock Palen
On Aug 24, 2012, at 10:38 AM, Jeff Squyres wrote: > On Aug 24, 2012, at 10:28 AM, Brock Palen wrote: > >> I grabbed the new OMPI 1.6.1 and ran my test that would cause a hang with >> 1.6.0 with low registered memory. From reading the release notes rather >>

Re: [OMPI users] openmpi 1.6.1 Questions

2012-08-26 Thread Brock Palen
Thanks and super cool. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Aug 25, 2012, at 7:06 AM, Jeff Squyres wrote: > On Aug 24, 2012, at 10:45 AM, Brock Palen wrote: > >>> Right now we should be just warning if we can't

[OMPI users] Java MPI Bindings in 1.6.x

2012-11-28 Thread Brock Palen
mmend? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] Java MPI Bindings in 1.6.x

2012-11-28 Thread Brock Palen
Our case is a single user expressing interest, and maybe long term mixing as we explore hadoop options, and would mix MPI+hadoop. I would not go into the effort if it is non-trivial to add it to 1.6. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

[OMPI users] Romio and OpenMPI builds

2012-12-03 Thread Brock Palen
built with when I built it? Can I make ROMIO go into 'verbose' mode and have it print what it is setting all its values to? Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] Romio and OpenMPI builds

2012-12-06 Thread Brock Palen
x27; We have Lustre, local filesystems (ufs), and NFSv3 and NFSv4 clients. So that list should be good for our site. Would this be a good recommendation for us to include in all our MPI builds? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Dec 3, 20

Re: [OMPI users] Romio and OpenMPI builds

2012-12-07 Thread Brock Palen
Thanks! So it looks like most OpenMPI builds out there are running with ROMIO's that are obvious to any optimizations to what they are running on. I have added this to our build notes so we get it in next time. Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced Computin

[OMPI users] 1.6.2 affinity failures

2012-12-19 Thread Brock Palen
to M processors, where N > M). Double check that you have enough unique processors for all the MPI processes that you are launching on this host. You job will now abort. -- Brock Palen www.umich.edu/~brockp CAEN Advanced C

Re: [OMPI users] 1.6.2 affinity failures

2012-12-20 Thread Brock Palen
27; cores, even if fake, I was looking for a node that had a bad socket or wrong part. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Dec 19, 2012, at 9:08 PM, Ralph Castain wrote: > I'm afraid these are both known problems in the 1.6.2 releas

Re: [OMPI users] 1.6.2 affinity failures

2012-12-20 Thread Brock Palen
w00t :-) Thanks Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Dec 20, 2012, at 10:46 AM, Ralph Castain wrote: > HmmmI'll see what I can do about the error message. I don't think there > is much in 1.6 I can do, but in 1.7 I c

[OMPI users] XRC vs SRQ vs PRQ

2013-01-22 Thread Brock Palen
ow, "hey you are always running out of your queue of size X" Or " your queue of size Y is never used" We are kinda blind for a lot of our applications :-) Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

[OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
connectX cards? Lastly looking at the faq looks like MXM is used by default if available over openIB Should I take that to mean "use MXM if available and supported" ? As in only use openib if that is the only thing you have? Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
FAQ page it states that MXM was used in the past only for >128 ranks, but is in 1.6 used for rank counts of any size. I think we will do some testing, we never even heard of MXM before, Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Jan 22, 2013,

Re: [OMPI users] XRC vs SRQ vs PRQ

2013-01-22 Thread Brock Palen
On Jan 22, 2013, at 2:53 PM, Shamis, Pavel wrote: >> >> Switching to SRQ and some guess of queue values selected appears to let the >> code run. >> S,4096,128:S,12288,128:S,65536,12 >> >> Two questions, >> >> This is a ConnectX fabric, should I switch them to XRC queues? And should I >> use

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
, unless "everything" is a workload. We will do some testing, we are setting up a time to talk to our Mellonox SA to try to understand these components better. Note most of our users run just fine with the standard Peer-Peer queues, default out the box OpenMPI. > > -Pasha > >

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
>> You sound like our vendors, "what is your app" > > ;-) I used to be one. > > Ideally OMPI should do the switch between MXM/RC/XRC internally in the > transport layer. Unfortunately, > we don't have such smart selection logic. Hopefully IB vendors will fix some > day. I actually looked i

[OMPI users] IBV_EVENT_QP_ACCESS_ERR

2013-01-23 Thread Brock Palen
problems within the fabric; please contact your system administrator. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified

2013-01-24 Thread Brock Palen
On Jan 24, 2013, at 10:10 AM, Sabuj Pattanayek wrote: > or do i just need to compile two versions, one with IB and one without? You should not need to, we have OMPI compiled for openib/psm and run that same install on psm/tcp and verbs(openib) based gear. All the nodes assigned to your job have

[OMPI users] FCA collectives disabled by default

2013-04-02 Thread Brock Palen
" (current value: <0>, data source: default value) MCA coll: parameter "coll_fca_enable_alltoallv" (current value: <0>, data source: default value) MCA coll: parameter "coll_fca_enable_alltoallw" (current value: <0>, data source: default value) Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] FCA collectives disabled by default

2013-04-03 Thread Brock Palen
That would do it. Thanks! Now to make even the normal ones work Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Apr 3, 2013, at 10:31 AM, Ralph Castain wrote: > Looking at the source code, it is because those other collectives are

  1   2   3   >