Hi,
Am 19.08.2014 um 19:06 schrieb Oscar Mojica:
> I discovered what was the error. I forgot include the '-fopenmp' when I
> compiled the objects in the Makefile, so the program worked but it didn't
> divide the job in threads. Now the program is working and I can use until 15
> cores for mach
Reuti
I discovered what was the error. I forgot include the '-fopenmp' when I
compiled the objects in the Makefile, so the program worked but it didn't
divide the job in threads. Now the program is working and I can use until 15
cores for machine in the queue one.q.
Anyway i would like to try im
I am also filing a bug at Adaptive Computing since, while I do set
CUDA_VISIBLE_DEVICES myself, the default value set by Torque in that
case is also wrong.
Maxime
Le 2014-08-19 10:47, Rolf vandeVaart a écrit :
Glad it was solved. I will submit a bug at NVIDIA as that does not seem like a
ve
Glad it was solved. I will submit a bug at NVIDIA as that does not seem like a
very friendly way to handle that error.
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
>Boissonneault
>Sent: Tuesday, August 19, 2014 10:39 AM
>To: Open MPI Users
>Sub
Hi,
I believe I found what the problem was. My script set the
CUDA_VISIBLE_DEVICES based on the content of $PBS_GPUFILE. Since the
GPUs were listed twice in the $PBS_GPUFILE because of the two nodes, I had
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7
instead of
CUDA_VISIBLE_DEVICES=0,1
Hi:
This problem does not appear to have anything to do with MPI. We are getting a
SEGV during the initial call into the CUDA driver. Can you log on to
gpu-k20-08, compile your simple program without MPI, and run it there? Also,
maybe run dmesg on gpu-k20-08 and see if there is anything in the
Hi,
I recompiled OMPI 1.8.1 without Cuda and with debug, but it did not give
me much more information.
[mboisson@gpu-k20-07 simple_cuda_mpi]$ ompi_info | grep debug
Prefix:
/software-gpu/mpi/openmpi/1.8.1-debug_gcc4.8_nocuda
Internal debug support: yes
Memory debugging supp
Indeed, there were those to problems. I took the code from here and
simplified it.
http://cudamusing.blogspot.ca/2011/08/cuda-mpi-and-infiniband.html
However, even with the modified code here http://pastebin.com/ax6g10GZ
The symptoms are still the same.
Maxime
Le 2014-08-19 07:59, Alex A. Gr
Also you need to check return code from cudaMalloc before calling cudaFree -
the pointer may be invalid as you did not initialized cuda properly.
Alex
-Original Message-
From: Maxime Boissonneault
Sent: Tuesday, August 19, 2014 2:19 AM
To: Open MPI Users
Subject: Re: [OMPI users] Segfa
Hello,
I think your cuda program may be incorrect. Add proper cudaSetDevice call at
the beginning and check it again.
Kind regards,
Alex Granovsky
-Original Message-
From: Maxime Boissonneault
Sent: Tuesday, August 19, 2014 2:19 AM
To: Open MPI Users
Subject: Re: [OMPI users] Segfau
so, it seems you have old ofed w/o this parameter.
Can you install latest Mellanox ofed? or check which community ofed has it?
On Tue, Aug 19, 2014 at 9:34 AM, Rio Yokota wrote:
> Here is what "modinfo mlx4_core" gives
>
> filename:
>
> /lib/modules/3.13.0-34-generic/kernel/drivers/net/ether
Here is what "modinfo mlx4_core" gives
filename:
/lib/modules/3.13.0-34-generic/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko
version:2.2-1
license:Dual BSD/GPL
description:Mellanox ConnectX HCA low-level driver
author: Roland Dreier
srcversion: 3AE2
12 matches
Mail list logo