Re: [OMPI users] CUDA mpi question

2019-11-28 Thread Justin Luitjens via users
That is not guaranteed to work. There is no streaming concept in the MPI standard. The fundamental issue here is MPI is only asynchronous on the completion and not the initiation of the send/recv. It would be nice if the next version of mpi would look to add something like a triggered send or

[OMPI users] OpenMPI 3.0.0 Failing To Compile

2018-02-28 Thread Justin Luitjens
I'm trying to build OpenMPI on Ubuntu 16.04.3 and I'm getting an error. Here is how I configure and build: ./configure --with-cuda=$CUDA_HOME --prefix=$MPI_HOME && make clean && make -j && make install Here is the error I see: make[2]: Entering directory '/tmpnfs/jluitjens/libs/src/openmpi

[OMPI users] Crash in libopen-pal.so

2017-06-19 Thread Justin Luitjens
ree. Does anyone have any ideas of what I could try to work around this issue? Thanks, Justin --- This email message is for the sole use of the intended recipient(s) and may contain confidential informa

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-18 Thread Justin Luitjens
I'd suggest updating the configure/make scripts to look for nvml there and link in the stubs. This way the build is not dependent on the driver being installed and only the toolkit. Thanks, Justin From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin Luitjens Sent: Tu

[OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-18 Thread Justin Luitjens
a what I might need to change to get around this error? Thanks, Justin --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, discl

Re: [OMPI users] Strange errors when running mpirun

2016-09-30 Thread Justin Chang
Thank you, using the default $TMPDIR works now. On Fri, Sep 30, 2016 at 7:32 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Justin and all, > > the root cause is indeed a bug i fixed in > https://github.com/open-mpi/ompi/pull/2135 > i also had this patch

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread Justin Chang
Oh, so setting this in my ~/.profile export TMPDIR=/tmp in fact solves my problem completely! Not sure why this is the case, but thanks! Justin On Thu, Sep 22, 2016 at 7:33 AM, Gilles Gouaillardet wrote: > Justin, > > i do not see this error on my laptop > > which version

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread Justin Chang
I tried that and also deleted everything inside $TMPDIR. The error still persists On Thu, Sep 22, 2016 at 4:21 AM, r...@open-mpi.org wrote: > Try removing the “pmix” entries as well > >> On Sep 22, 2016, at 2:19 AM, Justin Chang wrote: >> >> "mpirun -n 1"

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread Justin Chang
at error indicates that you have some cruft > sitting in your tmpdir. You just need to clean it out - look for something > that starts with “openmpi” > > >> On Sep 22, 2016, at 1:45 AM, Justin Chang wrote: >> >> Dear all, >> >> So I upgraded/updated my H

[OMPI users] Strange errors when running mpirun

2016-09-22 Thread Justin Chang
64-apple-darwin15.6.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin I tested Hello World with both mpicc and mpif90, and they still work despite showing those two error/warning messages. Thanks, Justin _

Re: [OMPI users] Ssh launch code

2016-07-14 Thread Justin Cinkelj
Fork call location: https://github.com/open-mpi/ompi-release/blob/v2.x/orte/mca/plm/rsh/plm_rsh_module.c#L911-921 BR Justin On 07/14/2016 03:12 PM, larkym wrote: Where in the code does the tree based launch via ssh occur in open-mpi? I have read a few articles, but would like to understand it

Re: [OMPI users] CUDA IPC/RDMA Not Working

2016-03-30 Thread Justin Luitjens
We have figured this out. It turns out that the first call to each MPI_Isend/Irecv is staged through the host but subsequent calls are not. Thanks, Justin From: Justin Luitjens Sent: Wednesday, March 30, 2016 9:37 AM To: us...@open-mpi.org Subject: CUDA IPC/RDMA Not Working Hello, I have

[OMPI users] CUDA IPC/RDMA Not Working

2016-03-30 Thread Justin Luitjens
v1.10.2) MCA topo: basic (MCA v2.0.0, API v2.1.0, Component v1.10.2) MCA vprotocol: pessimist (MCA v2.0.0, API v2.0.0, Component v1.10.2) Thanks, Justin

[OMPI users] [PATCH] hooks: disable malloc override inside of Gentoo sandbox

2013-07-01 Thread Justin Bronder
ng for SANDBOX_PID. +*/ + if (getenv("FAKEROOTKEY") != NULL || -getenv("FAKED_MODE") != NULL) { +getenv("FAKED_MODE") != NULL || +getenv("SANDBOX_PID") != NULL ) { return; } -- 1.8.1.5 -- Justin Bronder signature.asc Description: Digital signature

[OMPI users] Cluster hangs/shows error while executing simple MPI program in C

2013-03-05 Thread Justin Joseph
Cluster hangs/shows error while executing simple MPI program in C I am trying to run a simple MPI program(multiple array addition), it runs perfectly in my PC but simply hangs or shows the following error in the cluster. I am using open mpi and the following command to execute . mpirun -machinefi

Re: [OMPI users] Stream interactions in CUDA

2012-12-13 Thread Justin Luitjens
for me was to avoid self sends/receives at the application level. Thanks, Justin From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart [rvandeva...@nvidia.com] Sent: Thursday, December 13, 2012 6:18 AM To: Open M

[OMPI users] Stream interactions in CUDA

2012-12-12 Thread Justin Luitjens
plain what all of the traffic ping-ponging back and forth between the host and device is? Is this traffic necessary? Thanks, Justin uint64_t scatter_gather( uint128 * input_buffer, uint128 *output_buffer, uint128 *recv_buckets, int* send_sizes, int MAX_RECV_SIZE_PER_PE) { std::vector sre

Re: [OMPI users] Seg fault with PBS Pro 10.4

2011-07-27 Thread Justin Wood
. I don't get any segfaults. -Justin. On 07/26/2011 05:49 PM, Ralph Castain wrote: I don't believe we ever got anywhere with this due to lack of response. If you get some info on what happened to tm_init, please pass it along. Best guess: something changed in a recent PBS Pro release. S

[OMPI users] Seg fault with PBS Pro 10.4

2011-07-26 Thread Wood, Justin Contractor, SAIC
Altair and have them look at why it was failing to do the tm_init. Does anyone have an update to this, and has anyone been able to run successfully using recent versions of PBSPro? I've also contacted our rep at Altair, but he hasn't responded yet. Thanks, Justin. Justin Wood System

[OMPI users] Problem with private variables in modules

2010-03-10 Thread Justin Watson
work within the context of a module as well? I have been getting different result using different compilers. I have tried Lahey and Intel and they both show signs of not handling this properly. I have attach a small test problem that mimics what I am doing in the large code. Jus

[OMPI users] building OpenMPI on Windows XP 64 using Visual Studio 6 and Compaq Visual Fortran

2010-01-28 Thread Justin Watson
t): contrib/platform/win32/CMakeModules/setup_f77.cmake:26 (OMPI_F77_FIND_EXT_SYMBOL_CONVENTION) contrib/platform/win32/CMakeModules/ompi_configure.cmake:1113 (INCLUDE) CMakeLists.txt:87 (INCLUDE) Configuring incomplete, errors occurred! Has anyone had success in building with a similar configuration

Re: [OMPI users] Wrappers should put include path *after* user args

2010-01-19 Thread Justin Bronder
sr/lib64/mpi/mpi-mpich2/usr/include OpenMPI: jbronder@mejis ~ $ which mpicc /usr/lib64/mpi/mpi-openmpi/usr/bin/mpicc jbronder@mejis ~ $ mpicc -showme:compile -I/bleh -I/usr/lib64/mpi/mpi-openmpi/usr/include/openmpi -pthread -I/bleh Thanks, -- Justin Bronder pgpUpu5h4BdhJ.pgp Description: PGP signature

Re: [OMPI users] MPI-Send for entire entire matrix when allocating memory dynamically

2009-10-31 Thread Justin Luitjens
original way to create the matrices, one can use >> MPI_Create_type_struct to create an MPI datatype ( >> http://web.mit.edu/course/13/13.715/OldFiles/build/mpich2-1.0.6p1/www/www3/MPI_Type_create_struct.html >> ) >> using MPI_BOTTOM as the original displacement. >> >

Re: [OMPI users] MPI-Send for entire entire matrix when allocating memory dynamically

2009-10-29 Thread Justin Luitjens
Why not do something like this: double **A=new double*[N]; double *A_data new double [N*N]; for(int i=0;i wrote: > Hi >thanks for the quick response. Yes, that is what I meant. I thought > there was no other way around what I am doing but It is always good to ask a > expert rather than assum

Re: [OMPI users] Segfault when using valgrind

2009-07-09 Thread Justin Luitjens
. Thanks, Justin On Thu, Jul 9, 2009 at 5:16 AM, Jeff Squyres wrote: > On Jul 7, 2009, at 11:47 AM, Justin wrote: > > (Sorry if this is posted twice, I sent the same email yesterday but it >> never appeared on the list). >> >> > Sorry for the delay in replying. FWI

[OMPI users] Segfault when using valgrind

2009-07-07 Thread Justin
==by 0x834F418: Uintah::AMRSimulationController::run() (AMRSimulationController.cc:117) ==22736==by 0x4089AE: main (sus.cc:629) Are these problems with openmpi and is there any known work arounds? Thanks, Justin

[OMPI users] Segfault when using valgrind

2009-07-06 Thread Justin Luitjens
ontroller.cc:243) ==22736==by 0x834F418: Uintah::AMRSimulationController::run() (AMRSimulationController.cc:117) ==22736==by 0x4089AE: main (sus.cc:629) Are these problems with openmpi and is there any known work arounds? Thanks, Justin

Re: [OMPI users] MPI_Test without deallocation

2009-03-25 Thread Justin
obe return that there is no message waiting to be received? The message has already been received by the MPI_Irecv. It's the MPI_Request object of the MPI_Irecv call that needs to be probed, but MPI_Test has the side effect of also deallocating the MPI_Request object. Cheers, Shaun Justin w

Re: [OMPI users] MPI_Test without deallocation

2009-03-25 Thread Justin
Have you tried MPI_Probe? Justin Shaun Jackman wrote: Is there a function similar to MPI_Test that doesn't deallocate the MPI_Request object? I would like to test if a message has been received (MPI_Irecv), check its tag, and dispatch the MPI_Request to another function based on tha

Re: [OMPI users] Run-time problem

2009-03-16 Thread justin oppenheim
e delay in replying; this week unexpectedly turned exceptionally hectic for several us... On Mar 9, 2009, at 2:53 PM, justin oppenheim wrote: > Yes. As I indicated earlier, I did use these options to compile my program > > MPI_CXX=/programs/openmpi/bin/mpicxx > MPI_CC=/programs/openmpi

Re: [OMPI users] Run-time problem

2009-03-09 Thread justin oppenheim
program with the provided mpicc (or mpiCC, mpif90, etc. - as appropriate) wrapper compiler? The wrapper compilers contain all the required library definitions to make the application work. Compiling without the wrapper compilers is a very bad idea... Ralph On Mar 6, 2009, at 11:02 AM, justin

Re: [OMPI users] Run-time problem

2009-03-06 Thread justin oppenheim
/bin/mpicc MPI_INCLUDE=/programs/openmpi/include/ MPI_LIB=mpi MPI_LIBDIR=/programs/openmpi/lib/ MPI_LINKERFORPROGRAMS=/programs/openmpi/bin/mpicxx Any clue? The directory /programs is NSF mounted on the nodes. Many thanks again, JO --- On Thu, 3/5/09, justin oppenheim wrote: From

[OMPI users] Run-time problem

2009-03-05 Thread justin oppenheim
Hi: When I execute something like mpirun -machinefile machinefile my_mpi_executable I get something like this my_mpi_executable symbol lookup error: remote_openmpi/lib/libmpi_cxx.so.0: undefined symbol: ompi_registered_datareps where both my_mpi_executable and remote_openmpi are installed o

Re: [OMPI users] valgrind problems

2009-02-26 Thread Justin
Also the stable version of openmpi on Debian is 1.2.7rc2. Are there any known issues with this version and valgrid? Thanks, Justin Justin wrote: Is there any tricks to getting it to work? When we run with valgrind we get segfaults, valgrind reports errors in different MPI functions for

Re: [OMPI users] valgrind problems

2009-02-26 Thread Justin
double&) (SimulationController.cc:352) ==3629==by 0x89A8568: Uintah::AMRSimulationController::run() (AMRSimulationController.cc:126) ==3629==by 0x408B9F: main (sus.cc:622) This is then followed by a segfault. Justin Jeff Squyres wrote: On Feb 26, 2009, at 7:03 PM, Justin wrote: I'm trying t

[OMPI users] valgrind problems

2009-02-26 Thread Justin
e=valgrind.%p executable. Are valgrind and openmpi compatible? Is there any special tricks to getting them to work together? Thanks, Justin

Re: [OMPI users] MPI_Send over 2 GB

2009-02-18 Thread Justin
My guess would be that your count argument is overflowing. Is the count a signed 32 bit integer? If so it will overflow around 2GB. Try outputting the size that you are sending and see if you get large negative number. Justin Vittorio wrote: Hi! I'm doing a test to measure the tra

Re: [OMPI users] Deadlock on large numbers of processors

2009-01-12 Thread Justin
t them to update it but it would be a lot easier to request an actual release. What is the current schedule for the 1.3 release? Justin Jeff Squyres wrote: Justin -- Could you actually give your code a whirl with 1.3rc3 to ensure that it fixes the problem for you? http://www.op

Re: [OMPI users] Deadlock on large numbers of processors

2009-01-12 Thread Justin
Hi, has this deadlock been fixed in the 1.3 source yet? Thanks, Justin Jeff Squyres wrote: On Dec 11, 2008, at 5:30 PM, Justin wrote: The more I look at this bug the more I'm convinced it is with openMPI and not our code. Here is why: Our code generates a communication/exec

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-11 Thread Justin
dlock? Thanks, Justin Jeff Squyres wrote: George -- Is this the same issue that you're working on? (we have a "blocker" bug for v1.3 about deadlock at heavy messaging volume -- on Tuesday, it looked like a bug in our freelist...) On Dec 9, 2008, at 10:28 AM, Justin wrote: I have

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-09 Thread Justin
might alleviate these deadlocks I would be grateful. Thanks, Justin Rolf Vandevaart wrote: The current version of Open MPI installed on ranger is 1.3a1r19685 which is from early October. This version has a fix for ticket #1378. Ticket #1449 is not an issue is this case because each node

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Justin
dlock reproduceable.In addition we might be able to lower the number of processors down. Right now determining which processor is deadlocks when we are using 8K cores and each processor has hundreds of messages sent out would be quite difficult. Thanks for your suggestions, Justin Brock Palen

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Justin
will turn off buffering? Thanks, Justin Brock Palen wrote: When ever this happens we found the code to have a deadlock. users never saw it until they cross the eager->roundevous threshold. Yes you can disable shared memory with: mpirun --mca btl ^sm Or you can try increasing the eager li

[OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Justin
ompi_request_default_wait_some () from /opt/apps/intel10_1/openmpi/1.3//lib/libmpi.so.0 #4 0x2b2ded109e34 in PMPI_Waitsome () from /opt/apps/intel10_1/openmpi/1.3//lib/libmpi.so.0 Thanks, Justin

Re: [OMPI users] Open-MPI 1.2 and GM

2007-03-27 Thread Justin Bronder
Thanks for the response, I was hoping I'd just messed up something simple. Your advice took care of my issues. On 27/03/07 14:15 -0400, George Bosilca wrote: > Justin, > > There is no GM MTL. Therefore, the first mpirun allow the use of > every available BTL, while the secon

[OMPI users] Open-MPI 1.2 and GM

2007-03-27 Thread Justin Bronder
owing fails: /usr/local/ompi-gnu/bin/mpirun -np 4 -mca btl gm --host node84,node83 ./xhpl I've attached gziped files as suggested on the "Getting Help" section of the website and the output from the failed mpirun. Both nodes are known good Myrinet nodes, using FMA to map. T

Re: [OMPI users] how do i link to .la library files?

2006-10-27 Thread Justin Bronder
If you just add this to your .bashrc you should be fine. The other options, assuming root access is to just add the lib directory to /etc/ld.so.conf and rerun ldconfig on all machines. This will have the same effect, albeit for all users. -Justin. On 10/27/06, shane kennedy wrote: thank you for

Re: [OMPI users] problem abut openmpi running

2006-10-19 Thread Justin Bronder
On a number of my Linux machines, /usr/local/lib is not searched by ldconfig, and hence, is not going to be found by gcc. You can fix this by adding /usr/local/lib to /etc/ld.so.conf and running ldconfig ( add the -v flag if you want to see the output ). -Justin. On 10/19/06, Durga Choudhury

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-08 Thread Justin Bronder
27;ll begin the build with the standard gcc compilers that are included with OS X. This is powerpc-apple-darwin8-gcc-4.0.1. Thanks, Justin. Jeff Squyres (jsquyres) wrote: > Justin -- > > Can we eliminate some variables so that we can figure out where the > error is originating? &g

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder
yrinet (GM)? If so, I'd love to hear the configure arguments and various versions you are using. Bonus points if you are using the IBM XL compilers. Thanks, Justin. On 7/6/06, Justin Bronder wrote: Yes, that output was actually cut and pasted from an OS X run. I'm about to test a

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder
Yes, that output was actually cut and pasted from an OS X run. I'm about to test against 1.0.3a1r10670. Justin. On 7/6/06, Galen M. Shipman wrote: Justin, Is the OS X run showing the same residual failure? - Galen On Jul 6, 2006, at 10:49 AM, Justin Bronder wrote: Disregard the fa

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder
Disregard the failure on Linux, a rebuild from scratch of HPL and OpenMPI seems to have resolved the issue. At least I'm not getting the errors during the residual checks. However, this is persisting under OS X. Thanks, Justin. On 7/6/06, Justin Bronder wrote: For OS X: /usr/local/om

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder
For OS X: /usr/local/ompi-xl/bin/mpirun -mca btl gm -np 4 ./xhpl For Linux: ARCH=ompi-gnu-1.1.1a /usr/local/$ARCH/bin/mpiexec -mca btl gm -np 2 -path /usr/local/$ARCH/bin ./xhpl Thanks for the speedy response, Justin. On 7/6/06, Galen M. Shipman wrote: Hey Justin, Please provide us your mca

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder
As far as the nightly builds go, I'm still seeing what I believe to be this problem in both r10670 and r10652. This is happening with both Linux and OS X. Below are the systems and ompi_info for the newest revision 10670. As an example of the error, when running HPL with Myrinet I get the follo

Re: [OMPI users] OpenMpi 1.1 and Torque 2.1.1

2006-06-30 Thread Justin Bronder
know. Thanks, Justin Bronder. On 6/30/06, Jeff Squyres (jsquyres) wrote: There was a bug in early Torque 2.1.x versions (I'm afraid I don't remember which one) that -- I think -- had something to do with a faulty poll() implementation. Whatever the problem was, it caused all TM la

[OMPI users] OpenMpi 1.1 and Torque 2.1.1

2006-06-29 Thread Justin Bronder
o=-13 node96:/usr/src/openmpi-1.1 jbronder$ My thanks for any help in advance, Justin Bronder. ompi_info.log.gz Description: GNU Zip compressed data

Re: [OMPI users] [PMX:VIRUS] Re: OpenMPI 1.0.3a1r10002 Fails to build with IBM XL Compilers.

2006-05-31 Thread Justin Bronder
./../../opal/.libs/libopal.so -ldl -lm -lutil -lnsl --rpath /usr/local/ompi-xl/lib -lpthread ld: warning: cannot find entry symbol _start; defaulting to 10013ed8 Of course, I've been told that directly linking with ld isn't such a great idea in the first place. Ideas? Thanks, Justin.

Re: [OMPI users] [PMX:VIRUS] Re: OpenMPI 1.0.3a1r10002 Fails to build with IBM XL Compilers.

2006-05-31 Thread Justin Bronder
On 5/30/06, Brian Barrett wrote: On May 28, 2006, at 8:48 AM, Justin Bronder wrote: > Brian Barrett wrote: >> On May 27, 2006, at 10:01 AM, Justin Bronder wrote: >> >> >>> I've attached the required logs. Essentially the problem seems to >>>

[OMPI users] [PMX:VIRUS] Re: OpenMPI 1.0.3a1r10002 Fails to build with IBM XL Compilers.

2006-05-28 Thread Justin Bronder
Brian Barrett wrote: > On May 27, 2006, at 10:01 AM, Justin Bronder wrote: > > >> I've attached the required logs. Essentially the problem seems to >> be that the XL Compilers fail to recognize "__asm__ __volatile__" in >> opal/include/sys/powerpc

[OMPI users] OpenMPI 1.0.3a1r10002 Fails to build with IBM XL Compilers.

2006-05-27 Thread Justin Bronder
o 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9) Thanks, -- Justin Bronder University of Maine, Orono Advanced Computing Research Lab 20 Godfrey Dr Orono, ME 04473 www.clusters.umaine.edu Mathematics Department 425 Neville Hall Orono, ME 04469 ompi-build.tar.gz Description: application/gzip