[OMPI users] bug in CUDA support for dual-processor systems?
Hi, I wrote a simple program to see if OpenMPI can really handle cuda pointers as promised in the FAQ and how efficiently. The program (see below) breaks if MPI communication is to be performed between two devices that are on the same node but under different IOHs in a dual-processor Intel machine. Note that cudaMemCpy works for such devices, although not as efficiently as for the devices on the same IOH and GPUDirect enabled. Here's the output from my program: === > mpirun -n 6 ./a.out Init Init Init Init Init Init rank: 1, size: 6 rank: 2, size: 6 rank: 3, size: 6 rank: 4, size: 6 rank: 5, size: 6 rank: 0, size: 6 device 3 is set Process 3 is on typhoon1 Using regular memory device 0 is set Process 0 is on typhoon1 Using regular memory device 4 is set Process 4 is on typhoon1 Using regular memory device 1 is set Process 1 is on typhoon1 Using regular memory device 5 is set Process 5 is on typhoon1 Using regular memory device 2 is set Process 2 is on typhoon1 Using regular memory ^C^[[A^C zkoza@typhoon1:~/multigpu$ zkoza@typhoon1:~/multigpu$ vim cudamussings.c zkoza@typhoon1:~/multigpu$ mpicc cudamussings.c -lcuda -lcudart -L/usr/local/cuda/lib64 -I/usr/local/cuda/include zkoza@typhoon1:~/multigpu$ vim cudamussings.c zkoza@typhoon1:~/multigpu$ mpicc cudamussings.c -lcuda -lcudart -L/usr/local/cuda/lib64 -I/usr/local/cuda/include zkoza@typhoon1:~/multigpu$ mpirun -n 6 ./a.out Process 1 of 6 is on typhoon1 Process 2 of 6 is on typhoon1 Process 0 of 6 is on typhoon1 Process 4 of 6 is on typhoon1 Process 5 of 6 is on typhoon1 Process 3 of 6 is on typhoon1 device 2 is set device 1 is set device 0 is set Using regular memory device 5 is set device 3 is set device 4 is set Host->device bandwidth for processor 1: 1587.993499 MB/sec Host->device bandwidth for processor 2: 1570.275316 MB/sec Host->device bandwidth for processor 3: 1569.890751 MB/sec Host->device bandwidth for processor 5: 1483.637702 MB/sec Host->device bandwidth for processor 0: 1480.888029 MB/sec Host->device bandwidth for processor 4: 1476.241371 MB/sec MPI_Send/MPI_Receive, Host [0] -> Host [1] bandwidth: 3338.57 MB/sec MPI_Send/MPI_Receive, Device[0] -> Host [1] bandwidth: 420.85 MB/sec MPI_Send/MPI_Receive, Host [0] -> Device[1] bandwidth: 362.13 MB/sec MPI_Send/MPI_Receive, Device[0] -> Device[1] bandwidth: 6552.35 MB/sec MPI_Send/MPI_Receive, Host [0] -> Host [2] bandwidth: 3238.88 MB/sec MPI_Send/MPI_Receive, Device[0] -> Host [2] bandwidth: 418.18 MB/sec MPI_Send/MPI_Receive, Host [0] -> Device[2] bandwidth: 362.06 MB/sec MPI_Send/MPI_Receive, Device[0] -> Device[2] bandwidth: 5022.82 MB/sec MPI_Send/MPI_Receive, Host [0] -> Host [3] bandwidth: 3295.32 MB/sec MPI_Send/MPI_Receive, Device[0] -> Host [3] bandwidth: 418.90 MB/sec MPI_Send/MPI_Receive, Host [0] -> Device[3] bandwidth: 359.16 MB/sec MPI_Send/MPI_Receive, Device[0] -> Device[3] bandwidth: 5019.89 MB/sec MPI_Send/MPI_Receive, Host [0] -> Host [4] bandwidth: 4619.55 MB/sec MPI_Send/MPI_Receive, Device[0] -> Host [4] bandwidth: 419.24 MB/sec MPI_Send/MPI_Receive, Host [0] -> Device[4] bandwidth: 364.52 MB/sec -- The call to cuIpcOpenMemHandle failed. This is an unrecoverable error and will cause the program to abort. cuIpcOpenMemHandle return value: 205 address: 0x20020 Check the cuda.h file for what the return value means. Perhaps a reboot of the node will clear the problem. -- [typhoon1:06098] Failed to register remote memory, rc=-1 [typhoon1:06098] [[33788,1],4] ORTE_ERROR_LOG: Error in file pml_ob1_recvreq.c at line 465 Comment: In my machine there are 2 six-core intel processors with HT on, yielding 24 virtual processors, and 6 Tesla C2070s. The devices are grouped in two groups, one with 4 and the other with 2 devices. Devices in the same group can talk to each other via GPUDirect at approx 6GB/s; devices in different groups can use cudaMemCpy and UVA at somewhat smaller transfer rates. my OpenMPI is openmpi-1.9a1r26904 compiled from sources ./configure -prefix=/home/zkoza/openmpi.1.9.cuda --with-cuda=/usr/local/cuda --with-cuda-libdir=/usr/lib > nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2012 NVIDIA Corporation Built on Thu_Apr__5_00:24:31_PDT_2012 Cuda compilation tools, release 4.2, V0.2.1221 gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) Ubuntu 12.04 64-bit Nvidia Driver Version: 295.41 | The program was compiled with: > mpicc prog.c -lcuda -lcudart -L/usr/local/cuda/lib64 -I/usr/local/cuda/include SOURCE CODE: #include #include #include #include #include #include #define NREPEAT 20 #define NBYTES 1 #define CALL(x)\ {
Re: [OMPI users] bug in CUDA support for dual-processor systems?
Thanks for a quick reply. I do not know much about low-level CUDA and IPC, but there's no problem using high-level CUDA to determine if device A can talk to B via GPUDirect (cudaDeviceCanAccessPeer). Then, for such connections, one only needs to call cudaDeviceEnablePeerAccess and then essentially "sit back and laugh" - given correct current device and stream, functions like cudaMemcpyPeer work irrespectively of whether GPUDirect is on or off for a given pair of devices, the only difference being the speed. So, I hope it should be possible to implement device-IOH-IOH-device communication using low-level CUDA. Such functionality should be an important step in the "CPU-GPU high-performance war" :-), as 8-GPU fast-MPI-link systems bring a new meaning to a "GPU node" in GPU clusters... Here is the output of my test program that was aimed at determining a) aggregate, best-case transfer rate between 6 GPUs running in parallel and b) whether devices on different IOHs can talk to each other: 3 [GB] in 78.6952 [ms] = 38.1218 GB/s (aggregate) sending 6 bytes from device 0: 0 -> 0: 11.3454 [ms] 52.8848 GB/s 0 -> 1: 90.3628 [ms] 6.6399 GB/s 0 -> 2: 113.396 [ms] 5.29117 GB/s 0 -> 3: 113.415 [ms] 5.29032 GB/s 0 -> 4: 170.307 [ms] 3.52305 GB/s 0 -> 5: 169.613 [ms] 3.53747 GB/s This shows that even if devices are on different IOHs, like 0 and 4, they can talk to each other at a fantastic speed of 3.5 GB/s and it would be pity if OpenMPI did not used this opportunity. I have also 2 questions: a) I noticed that on my 6-GPU 2-CPU platform the initialization of CUDA 4.2 takes a long time, approx 10 seconds. Do you think I should report this as a bug to nVidia? b) Is there any info on running OpenMPI + CUDA? For example, what are the dependencies of transfer rates and latencies on transfer size? A dedicated www page, blog or whatever? How can I know if the current problem was solved? Many thanks for making CUDA available in OpenMPI. Regards Z Koza W dniu 31.07.2012 19:39, Rolf vandeVaart pisze: The current implementation does assume that the GPUs are on the same IOH and therefore can use the IPC features of the CUDA library for communication. One of the initial motivations for this was that to be able to detect whether GPUs can talk to one another, the CUDA library has to be initialized and the GPUs have to be selected by each rank. It is at that point that we can determine whether the IPC will work between the GPUs.However, this means that the GPUs need to be selected by each rank prior to the call to MPI_Init as that is where we determine whether IPC is possible, and we were trying to avoid that requirement. I will submit a ticket against this and see if we can improve this. Rolf -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Zbigniew Koza Sent: Tuesday, July 31, 2012 12:38 PM To: us...@open-mpi.org Subject: [OMPI users] bug in CUDA support for dual-processor systems? Hi, I wrote a simple program to see if OpenMPI can really handle cuda pointers as promised in the FAQ and how efficiently. The program (see below) breaks if MPI communication is to be performed between two devices that are on the same node but under different IOHs in a dual-processor Intel machine. Note that cudaMemCpy works for such devices, although not as efficiently as for the devices on the same IOH and GPUDirect enabled. Here's the output from my program: === mpirun -n 6 ./a.out Init Init Init Init Init Init rank: 1, size: 6 rank: 2, size: 6 rank: 3, size: 6 rank: 4, size: 6 rank: 5, size: 6 rank: 0, size: 6 device 3 is set Process 3 is on typhoon1 Using regular memory device 0 is set Process 0 is on typhoon1 Using regular memory device 4 is set Process 4 is on typhoon1 Using regular memory device 1 is set Process 1 is on typhoon1 Using regular memory device 5 is set Process 5 is on typhoon1 Using regular memory device 2 is set Process 2 is on typhoon1 Using regular memory ^C^[[A^C zkoza@typhoon1:~/multigpu$ zkoza@typhoon1:~/multigpu$ vim cudamussings.c zkoza@typhoon1:~/multigpu$ mpicc cudamussings.c -lcuda -lcudart -L/usr/local/cuda/lib64 -I/usr/local/cuda/include zkoza@typhoon1:~/multigpu$ vim cudamussings.c zkoza@typhoon1:~/multigpu$ mpicc cudamussings.c -lcuda -lcudart -L/usr/local/cuda/lib64 -I/usr/local/cuda/include zkoza@typhoon1:~/multigpu$ mpirun -n 6 ./a.out Process 1 of 6 is on typhoon1 Process 2 of 6 is on typhoon1 Process 0 of 6 is on typhoon1 Process 4 of 6 is on typhoon1 Process 5 of 6 is on typhoon1 Process 3 of 6 is on typhoon1 device 2 is set device 1 is set device 0 is set Using regular memory device 5 is set device 3 is set device 4 is set Host->device bandwidth for processor 1: 1587.993499 MB/sec device Host->bandwidth for processor 2: 1570.275316 MB/sec device bandwidth for Host->processor 3: 1569.89075
Re: [OMPI users] 1D and 2D arrays allocate memory by maloc() and MPI_Send and MPI_Recv problem.
Look at this declaration: int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) here*"count" is the**number of elements* (not bytes!) in the send buffer (nonnegative integer) Your "count" was defined as count = rows*matrix_size*sizeof (double); and seems to be erroneous; variable "count" cannot depend on the size of the matrix element! Z Koza On Aug 7, 2012, at 10:33 , Paweł Jaromin wrote: Hello all Sorry, may be this is a stupid question, bat a have a big problem with maloc() and matrix arrays. I want to make a program that do very simple thing like matriA * matrixB = matrixC. Because I need more matrix size than 100x100 (5000x5000), I have to use maloc() for memory allocation. First I tried this way: The typical form for dynamically allocating an NxM array of type T is: T **a = malloc(sizeof *a * N); if (a) { for (i = 0; i < N; i++) { a[i] = malloc(sizeof *a[i] * M); } } // the arrays are created before split to nodes No problem with create, fill array,but the problem started when I have send and receive it. Of course before send I calculated "cont" for MPI_Send. To be shore, that the count for MPI_Send i MPI_Recv is the same I also send "count". count = rows*matrix_size*sizeof (double); //part of matrix MPI_Send(&count, 1, MPI_INT, dest, mtype,MPI_COMM_WORLD); MPI_Send(&matrixA[offset][0], count, MPI_DOUBLE, dest, mtype, MPI_COMM_WORLD); from worker side the code looks like: MPI_Recv(&countA, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status); MPI_Recv(&matrixA[0][0], countA, MPI_DOUBLE, source, mtype, MPI_COMM_WORLD, &status); An error looks like: [pawcioj-VirtualBox:01700] *** Process received signal *** [pawcioj-VirtualBox:01700] Signal: Segmentation fault (11) [pawcioj-VirtualBox:01700] Signal code: Address not mapped (1) [pawcioj-VirtualBox:01700] Failing at address: 0x88fa000 [pawcioj-VirtualBox:01700] [ 0] [0xc2740c] [pawcioj-VirtualBox:01700] [ 1] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x906c) [0x17606c] [pawcioj-VirtualBox:01700] [ 2] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x6a1b) [0x173a1b] [pawcioj-VirtualBox:01700] [ 3] /usr/lib/openmpi/lib/openmpi/mca_btl_sm.so(+0x3ae6) [0x7b7ae6] [pawcioj-VirtualBox:01700] [ 4] /usr/lib/libopen-pal.so.0(opal_progress+0x81) [0x406fa1] [pawcioj-VirtualBox:01700] [ 5] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x48e5) [0x1718e5] [pawcioj-VirtualBox:01700] [ 6] /usr/lib/libmpi.so.0(MPI_Recv+0x165) [0x1ef9d5] [pawcioj-VirtualBox:01700] [ 7] macierz_V.02(main+0x927) [0x8049870] [pawcioj-VirtualBox:01700] [ 8] /lib/libc.so.6(__libc_start_main+0xe7) [0xddfce7] [pawcioj-VirtualBox:01700] [ 9] macierz_V.02() [0x8048b71] [pawcioj-VirtualBox:01700] *** End of error message *** -- mpirun noticed that process rank 1 with PID 1700 on node pawcioj-VirtualBox exited on signal 11 (Segmentation fault). Because I have no result, I tied do that by 1D array but the problem seems similar. Probably I do something wrong, so I would like to ask you about advice how do that proper or maybe link to useful tutorial. I spend two weeks to find out how do that but unfortunately without result :(. -- -- pozdrawiam Paweł Jaromin ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] RDMA GPUDirect CUDA...
Hi, I've just found this information on nVidia's plans regarding enhanced support for MPI in their CUDA toolkit: http://developer.nvidia.com/cuda/nvidia-gpudirect The idea that two GPUs can talk to each other via network cards without CPU as a middleman looks very promising. This technology is supposed to be revealed and released in September. My questions: 1. Will OpenMPI include RDMA support in its CUDA interface? 2. Any idea how much can this technology reduce the CUDA Send/Recv latency? 3. Any idea whether this technology will be available for Fermi-class Tesla devices or only for Keplers? Regards, Z Koza
[OMPI users] what is a "node"?
Hi, consider this specification: "Curie fat consists in 360 nodes which contains 4 eight cores CPU Nehalem-EX clocked at 2.27 GHz, let 32 cores / node and 11520 cores for the full fat configuration" Suppose I would like to run some performance tests just on a single processor rather than 4 of them. Is there a way to do this? I'm afraid specifying that I need 1 cluster node with 8 MPI prcesses will result in OS distributing these 8 processes among 4 processors forming the node, and this is not what I'm after. Z Koza
Re: [OMPI users] what is a "node"?
Thanks a lot! Z Koza 2012/8/30 Gus Correa > Hi Zbigniew > > Besides the OpenMPI processor affinity capability that Jeff mentioned. > > If your Curie cluster has a resource manager [Torque, SGE, etc], > your job submission script to the resource manager/ queue system > should specifically request a single node, for the test that you have in > mind. > > For instance, on Torque/PBS, this would be done by adding this directive to > the top of the job script: > > #PBS -l nodes=1:ppn=8 > ... > mpiexec -np 8 ... > > meaning that you want the 8 processors [i.e. cores] to be in a single node. > > On top of this, you need to add the appropriate process binding > keywords to the mpiexec command line, as Jeff suggested. > 'man mpiexec' will tell you a lot about the OpenMPI process binding > capability, specially in 1.6 and 1.4 series. > > In the best of the worlds your resource manager has the ability to also > assign a group of > cores exclusively to each of the jobs that may be sharing the node. > Say, job1 requests 4 cores and gets cores 0-3 and cannot use any other > cores, > job2 requests 8 cores and gets cores 4-11 and cannot use any other cores, > and so on. > > However, not all resource managers/ queue systems are built this way > [particularly the older versions], > and may let the various job processes to drift across all cores in the > node. > > If the resource manager is old and doesn't have that hardware locality > capability, > and if you don't want your performance test to risk being polluted by > other jobs running on the same node, that perhaps share the same cores > with your job, > then you can request all 32 cores in the node for your job, > but use only 8 of them to run your MPI program. > It is wasteful, but may be the only way to go. > For instance, on Torque: > > #PBS -l nodes=1:ppn=32 > ... > mpiexec -np 8 ... > > Again, add the OpenMPI process binding keywords to the mpiexec command > line, > to ensure the use of a fixed group of 8 cores. > > With SGE and Slurm the syntax is different than the above, > but I would guess that there is an equivalent setup. > > I hope this helps, > Gus Correa > > > On 08/30/2012 08:07 AM, Jeff Squyres wrote: > >> In the OMPI v1.6 series, you can use the processor affinity options. And >> you can use --report-bindings to show exactly where processes were bound. >> For example: >> >> - >> % mpirun -np 4 --bind-to-core --report-bindings -bycore uptime >> [svbu-mpi056:18904] MCW rank 0 bound to socket 0[core 0]: [B . . .][. . . >> .] >> [svbu-mpi056:18904] MCW rank 1 bound to socket 0[core 1]: [. B . .][. . . >> .] >> [svbu-mpi056:18904] MCW rank 2 bound to socket 0[core 2]: [. . B .][. . . >> .] >> [svbu-mpi056:18904] MCW rank 3 bound to socket 0[core 3]: [. . . B][. . . >> .] >> 05:06:13 up 7 days, 6:57, 1 user, load average: 0.29, 0.10, 0.03 >> 05:06:13 up 7 days, 6:57, 1 user, load average: 0.29, 0.10, 0.03 >> 05:06:13 up 7 days, 6:57, 1 user, load average: 0.29, 0.10, 0.03 >> 05:06:13 up 7 days, 6:57, 1 user, load average: 0.29, 0.10, 0.03 >> % >> - >> >> I bound each process to a single core, and mapped them on a round-robin >> basis by core. Hence, all 4 processes ended up on their own cores on a >> single processor socket. >> >> The --report-bindings output shows that this particular machine has 2 >> sockets, each with 4 cores. >> >> >> >> On Aug 30, 2012, at 5:37 AM, Zbigniew Koza wrote: >> >> Hi, >>> >>> consider this specification: >>> >>> "Curie fat consists in 360 nodes which contains 4 eight cores CPU >>> Nehalem-EX clocked at 2.27 GHz, let 32 cores / node and 11520 cores for the >>> full fat configuration" >>> >>> Suppose I would like to run some performance tests just on a single >>> processor rather than 4 of them. >>> Is there a way to do this? >>> I'm afraid specifying that I need 1 cluster node with 8 MPI prcesses >>> will result in OS distributing these 8 processes among 4 >>> processors forming the node, and this is not what I'm after. >>> >>> Z Koza >>> __**_ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> >> >> > __**_ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users> >
Re: [OMPI users] what is a "node"?
Hi, I have one more question. I wanted to experiment with processor affinity command-line options on my ubuntu PC. When I use OpenMPI compiled from sourecs a few weeks ago, mpirun returns error messages. However, the"official" OpenMPI installation on the same machine makes no problem. Does it mean there's a bug in OpenMPI-current and I should report a bug? === 1. OpneMPI version: mpirun -V mpirun (Open MPI) 1.9a1r26880 Report bugs to http://www.open-mpi.org/community/help/ 2. mpirun "offending" command and error report: === zkoza@zbyszek:~$ mpirun -np 2 --bind-to-core -bycore --report-bindings uptime -- WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to the process location. Node: zbyszek This is a warning only; your job will continue, though performance may be degraded. -- -- mpirun was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 0; it may have occurred for other processes as well. NOTE: A common cause for this error is misspelling a mpirun command line parameter option (remember that mpirun interprets the first unrecognized command line token as the executable). Node: zbyszek Executable: -bycore -- 2 total processes failed to start 3. the same mpirun command using standard MPI installation LD_LIBRARY_PATH=/usr/lib/openmpi /usr/bin/mpirun --path /usr/lib/openmpi -np 2 --bind-to-core -bycore --report-bindings uptime [zbyszek:03104] [[7637,0],0] odls:default:fork binding child [[7637,1],0] to cpus 0001 [zbyszek:03104] [[7637,0],0] odls:default:fork binding child [[7637,1],1] to cpus 0002 12:25:51 up 21:27, 1 user, load average: 0.00, 0.01, 0.05 12:25:51 up 21:27, 1 user, load average: 0.00, 0.01, 0.05 4. version of standard OpenMPI === zkoza@zbyszek:~$ LD_LIBRARY_PATH=/usr/lib/openmpi /usr/bin/mpirun --path /usr/lib/openmpi --version mpirun (Open MPI) 1.4.3 Z Koza
Re: [OMPI users] what is a "node"?
Thanks, Ralph, the new syntax works well (I used "man mpirun", which displayed the old syntax). Also, the report displayed by --report-binding is far more human-readable than in previous versions of OpenMPI Out of curiosity, and also to supress the warning, I installed the libnuma-dev package with libnuma.so and libnuma.a libraries, but the warning remains. Does it mean I should recompile OpenMPI to get rid of this warning? Z Koza 2012/9/1 Ralph Castain > You are using cmd line options that no longer exist in the 1.9 release - > look at "mpirun -h" for the current list of options. > > FWIW: in your example, the correct cmd line would be: > > mpirun -np 2 --bind-to core -map-by core --report-bindings uptime > > Note the space in "--bind-to core" and the "-map-by" option syntax. The > warning means that we didn't find libnuma installed on your machine, so we > cannot bind memory allocations (but can bind processes). > > On Sep 1, 2012, at 3:41 AM, Zbigniew Koza wrote: > > Hi, > > I have one more question. > I wanted to experiment with processor affinity command-line options on my > ubuntu PC. > When I use OpenMPI compiled from sourecs a few weeks ago, mpirun returns > error messages. > However, the"official" OpenMPI installation on the same machine makes no > problem. > Does it mean there's a bug in OpenMPI-current and I should report a bug? > > === 1. OpneMPI version: > > mpirun -V > mpirun (Open MPI) 1.9a1r26880 > > Report bugs to http://www.open-mpi.org/community/help/ > > > 2. mpirun "offending" command and error report: === > > zkoza@zbyszek:~$ mpirun -np 2 --bind-to-core -bycore --report-bindings > uptime > -- > WARNING: a request was made to bind a process. While the system > supports binding the process itself, at least one node does NOT > support binding memory to the process location. > > Node: zbyszek > > This is a warning only; your job will continue, though performance may > be degraded. > -- > -- > mpirun was unable to find the specified executable file, and therefore > did not launch the job. This error was first reported for process > rank 0; it may have occurred for other processes as well. > > NOTE: A common cause for this error is misspelling a mpirun command > line parameter option (remember that mpirun interprets the first > unrecognized command line token as the executable). > > Node: zbyszek > Executable: -bycore > -- > 2 total processes failed to start > > > 3. the same mpirun command using standard MPI installation > > LD_LIBRARY_PATH=/usr/lib/openmpi /usr/bin/mpirun --path /usr/lib/openmpi > -np 2 --bind-to-core -bycore --report-bindings uptime > [zbyszek:03104] [[7637,0],0] odls:default:fork binding child [[7637,1],0] > to cpus 0001 > [zbyszek:03104] [[7637,0],0] odls:default:fork binding child [[7637,1],1] > to cpus 0002 > 12:25:51 up 21:27, 1 user, load average: 0.00, 0.01, 0.05 > 12:25:51 up 21:27, 1 user, load average: 0.00, 0.01, 0.05 > > > 4. version of standard OpenMPI === > > zkoza@zbyszek:~$ LD_LIBRARY_PATH=/usr/lib/openmpi /usr/bin/mpirun --path > /usr/lib/openmpi --version > mpirun (Open MPI) 1.4.3 > > > > Z Koza > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] What is the default install library for PATH and LD_LIBRARY_PATH
./configure does not compile, but generates the Makefile. Did you run > make > make install after running ./configure? Notice also that openmpi can very likely be already installed on your system from ubuntu packages; anyway, I suggest you use ubuntu packages rather than compiling from sources unless you have a very good reason not to use the packaged version. you can also quite safely re-run make install to see where the libraries are going to. If you're unsure which version of openmpi you have, you can start from > which mpicc > mpicc --showme Z Koza 2012/11/13 huaibao zhang : > Hi Reuti, > > Thanks for your answer. I really appreciate it. > I am using an old version 1.4.3. for my code. If I only type $./configure, > it will compile, but I have no idea where it is installed. I typed $ find > /lib -name "libopen-pal.so.0", but it shows nothing. Do you thinks it is > caused since I am not a root user or the old version. > > Thanks, > Paul > > -- > Huaibao (Paul) Zhang > Gas Surface Interactions Lab > Department of Mechanical Engineering > University of Kentucky, > Lexington, KY, 40506-0503 > Office: 216 Ralph G. Anderson Building > Web:gsil.engineering.uky.edu > > On Nov 13, 2012, at 12:24 PM, Reuti wrote: > > Am 13.11.2012 um 15:44 schrieb huaibao zhang: > > I installed OpenMPI on my Ubuntu 64 bit desktop. At first, I did not specify > "prefix", so even I've installed it. I could not find where it is. Since the > "PATH" and "LD" have to be given, the mpicc can find the "lib open-pal.so.0" > file. > > > You mean "...can't find..."? If you use the default location, it should have > the correct settings already even without adding any path to PATH or > LD_LIBRARY_PATH. > > You can use: > > $ find /lib -name "libopen-pal.so.0" > > to spot the location. But I wonder about the version. The actual one seems > to be libopen-pal.so.4 -> libopen-pal.so.4.0.3 - which version are you > using? > > -- Reuti > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users