Re: [OMPI users] CuEventCreate Failed...

2014-10-20 Thread Steven Eliuk
Thanks for your quick response,

1)mpiexec --allow-run-as-root --mca btl_openib_want_cuda_gdr 1 --mca 
btl_openib_cuda_rdma_limit 6 --mca mpi_common_cuda_event_max 1000 -n 5 
test/RunTests
2)Yes, cuda aware support using Mellanox IB,
3)Yes, we have the ability to use several version of OpenMPI, Mvapich2, etc.

Also, our defaults for openmpi-mca-params.conf are:

mtl=^mxm

btl=^usnic,tcp

btl_openib_flags=1


service nv_peer_mem status

nv_peer_mem module is loaded.

Kindest Regards,
—
Steven Eliuk,


From: Rolf vandeVaart mailto:rvandeva...@nvidia.com>>
Reply-To: Open MPI Users mailto:us...@open-mpi.org>>
List-Post: users@lists.open-mpi.org
Date: Sunday, October 19, 2014 at 7:33 PM
To: Open MPI Users mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] CuEventCreate Failed...

The error 304 corresponds to CUDA_ERRROR_OPERATNG_SYSTEM which means an OS call 
failed.  However, I am not sure how that relates to the call that is getting 
the error.
Also, the last error you report is from MVAPICH2-GDR, not from Open MPI.  I 
guess then I have a few questions.


1.   Can you supply your configure line for Open MPI?

2.   Are you making use of CUDA-aware support?

3.   Are you set up so that users can use both Open MPI and MVAPICH2?

Thanks,
Rolf

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Steven Eliuk
Sent: Friday, October 17, 2014 6:48 PM
To: us...@open-mpi.org
Subject: [OMPI users] CuEventCreate Failed...

Hi All,

We have run into issues, that don’t really seem to materialize into incorrect 
results, nonetheless, we hope to figure out why we are getting them.

We have several environments with test from one machine, with say 1-16 
processes per node, to several machines with 1-16 processes. All systems are 
certified from Nvidia and use Nvidia Tesla k40 GPUs.

We notice frequent situations of the following,

--

The call to cuEventCreate failed. This is a unrecoverable error and will

cause the program to abort.

  Hostname: aHost

  cuEventCreate return value:   304

Check the cuda.h file for what the return value means.

--

--

The call to cuIpcGetEventHandle failed. This is a unrecoverable error and will

cause the program to abort.

  cuIpcGetEventHandle return value:   304

Check the cuda.h file for what the return value means.

--

--

The call to cuIpcGetMemHandle failed. This means the GPU RDMA protocol

cannot be used.

  cuIpcGetMemHandle return value:   304

  address: 0x700fd0400

Check the cuda.h file for what the return value means. Perhaps a reboot

of the node will clear the problem.

--

Now, our test suite still verifies results but this does cause the following 
when it happens,

The call to cuEventDestory failed. This is a unrecoverable error and will

cause the program to abort.

  cuEventDestory return value:   400

Check the cuda.h file for what the return value means.

--

---

Primary job  terminated normally, but 1 process returned

a non-zero exit code.. Per user-direction, the job has been aborted.

---

--

mpiexec detected that one or more processes exited with non-zero status, thus 
causing

the job to be terminated. The first process to do so was:



  Process name: [[37290,1],2]

  Exit code:1


We have traced the code back to the following files:
-ompi/mca/common/cuda/common_cuda.c :: 
mca_common_cuda_construct_event_and_handle()

We also know the the following:
-it happens on every machine on the very first entry to the function previously 
mentioned,
-does not happen if the buffer size is under 128 bytes… likely a different 
mech. Used for the IPC,

Last, here is an intermittent one and it produces a lot failed tests in our 
suite… when in fact they are solid, besides this error. Cause notification, 
annoyances and it would be nice to clean it up.

mpi_rank_3][cudaipc_allocate_ipc_region] 
[src/mpid/ch3/channels/mrail/src/gen2/ibv_cuda_ipc.c:487] cuda failed with 
mapping of buffer object failed


We have not been able to duplicate these errors in other MPI libs,

Thank you for your time & looking forward to your response,


Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-578

Re: [OMPI users] CuEventCreate Failed...

2014-10-20 Thread Rolf vandeVaart
Hi:
I just tried running a program similar to yours with CUDA 6.5 and Open MPI and 
I could not reproduce.  Just to make sure I am doing things correctly, your 
example below is running with np=5 and on a single node? Which version of CUDA 
are you using?  Can you also send the output from nvidia-smi?  Also, based on 
the usage of -allow-run-as-root I assume you are running the program as root?


From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Steven Eliuk
Sent: Monday, October 20, 2014 1:59 PM
To: Open MPI Users
Subject: Re: [OMPI users] CuEventCreate Failed...

Thanks for your quick response,

1)mpiexec --allow-run-as-root --mca btl_openib_want_cuda_gdr 1 --mca 
btl_openib_cuda_rdma_limit 6 --mca mpi_common_cuda_event_max 1000 -n 5 
test/RunTests
2)Yes, cuda aware support using Mellanox IB,
3)Yes, we have the ability to use several version of OpenMPI, Mvapich2, etc.

Also, our defaults for openmpi-mca-params.conf are:

mtl=^mxm

btl=^usnic,tcp

btl_openib_flags=1


service nv_peer_mem status

nv_peer_mem module is loaded.

Kindest Regards,
-
Steven Eliuk,


From: Rolf vandeVaart mailto:rvandeva...@nvidia.com>>
Reply-To: Open MPI Users mailto:us...@open-mpi.org>>
List-Post: users@lists.open-mpi.org
Date: Sunday, October 19, 2014 at 7:33 PM
To: Open MPI Users mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] CuEventCreate Failed...

The error 304 corresponds to CUDA_ERRROR_OPERATNG_SYSTEM which means an OS call 
failed.  However, I am not sure how that relates to the call that is getting 
the error.
Also, the last error you report is from MVAPICH2-GDR, not from Open MPI.  I 
guess then I have a few questions.


1.  Can you supply your configure line for Open MPI?

2.  Are you making use of CUDA-aware support?

3.  Are you set up so that users can use both Open MPI and MVAPICH2?

Thanks,
Rolf

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Steven Eliuk
Sent: Friday, October 17, 2014 6:48 PM
To: us...@open-mpi.org
Subject: [OMPI users] CuEventCreate Failed...

Hi All,

We have run into issues, that don't really seem to materialize into incorrect 
results, nonetheless, we hope to figure out why we are getting them.

We have several environments with test from one machine, with say 1-16 
processes per node, to several machines with 1-16 processes. All systems are 
certified from Nvidia and use Nvidia Tesla k40 GPUs.

We notice frequent situations of the following,

--

The call to cuEventCreate failed. This is a unrecoverable error and will

cause the program to abort.

  Hostname: aHost

  cuEventCreate return value:   304

Check the cuda.h file for what the return value means.

--

--

The call to cuIpcGetEventHandle failed. This is a unrecoverable error and will

cause the program to abort.

  cuIpcGetEventHandle return value:   304

Check the cuda.h file for what the return value means.

--

--

The call to cuIpcGetMemHandle failed. This means the GPU RDMA protocol

cannot be used.

  cuIpcGetMemHandle return value:   304

  address: 0x700fd0400

Check the cuda.h file for what the return value means. Perhaps a reboot

of the node will clear the problem.

--

Now, our test suite still verifies results but this does cause the following 
when it happens,

The call to cuEventDestory failed. This is a unrecoverable error and will

cause the program to abort.

  cuEventDestory return value:   400

Check the cuda.h file for what the return value means.

--

---

Primary job  terminated normally, but 1 process returned

a non-zero exit code.. Per user-direction, the job has been aborted.

---

--

mpiexec detected that one or more processes exited with non-zero status, thus 
causing

the job to be terminated. The first process to do so was:



  Process name: [[37290,1],2]

  Exit code:1


We have traced the code back to the following files:
-ompi/mca/common/cuda/common_cuda.c :: 
mca_common_cuda_construct_event_and_handle()

We also know the the following:
-it happens on every machine on the very first entry to the function previously 
mentioned,
-does not happen if the buffer size is under 128 bytes... likely a different 
mech. Used for the IPC,

Last, here is an intermittent one and it produces a lot failed tests in our 
suite... when in fact they 

Re: [OMPI users] CuEventCreate Failed...

2014-10-20 Thread Steven Eliuk
Hi Sir,


We are using cuda6.0 release and the 331.89 driver…


Little background, the master does not init CUDA. We have tried this method 
too, having all five processes init cuda but it seems to cause the problem more 
easily.


Yes the example below was on one machine, but we have seen it even with more 
machines… except its not typically the event IPC error, just an event creation 
failed. Lets stick to the single machine issue first,


We were thinking of trying the Multi-Process Service from nvidia, likely won’t 
help though.


Please lmk if you need anything else and thank you!!!


# nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2013 NVIDIA Corporation

Built on Thu_Mar_13_11:58:58_PDT_2014

Cuda compilation tools, release 6.0, V6.0.1



nvidia-smi

Mon Oct 20 12:53:12 2014

+--+

| NVIDIA-SMI 331.89 Driver Version: 331.89 |

|---+--+--+

| GPU  NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage | GPU-Util  Compute M. |

|===+==+==|

|   0  Tesla K40m  On   | :42:00.0 Off |0 |

| N/A   37CP859W / 235W | 54MiB / 11519MiB |  0%  Default |

+---+--+--+



+-+

| Compute processes:   GPU Memory |

|  GPU   PID  Process name Usage  |

|=|

|  No running compute processes found |

+-+

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.


From: Rolf vandeVaart mailto:rvandeva...@nvidia.com>>
Reply-To: Open MPI Users mailto:us...@open-mpi.org>>
List-Post: users@lists.open-mpi.org
Date: Monday, October 20, 2014 at 12:30 PM
To: Open MPI Users mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] CuEventCreate Failed...

Hi:
I just tried running a program similar to yours with CUDA 6.5 and Open MPI and 
I could not reproduce.  Just to make sure I am doing things correctly, your 
example below is running with np=5 and on a single node? Which version of CUDA 
are you using?  Can you also send the output from nvidia-smi?  Also, based on 
the usage of –allow-run-as-root I assume you are running the program as root?


From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Steven Eliuk
Sent: Monday, October 20, 2014 1:59 PM
To: Open MPI Users
Subject: Re: [OMPI users] CuEventCreate Failed...

Thanks for your quick response,

1)mpiexec --allow-run-as-root --mca btl_openib_want_cuda_gdr 1 --mca 
btl_openib_cuda_rdma_limit 6 --mca mpi_common_cuda_event_max 1000 -n 5 
test/RunTests
2)Yes, cuda aware support using Mellanox IB,
3)Yes, we have the ability to use several version of OpenMPI, Mvapich2, etc.

Also, our defaults for openmpi-mca-params.conf are:

mtl=^mxm

btl=^usnic,tcp

btl_openib_flags=1


service nv_peer_mem status

nv_peer_mem module is loaded.

Kindest Regards,
—
Steven Eliuk,


From: Rolf vandeVaart mailto:rvandeva...@nvidia.com>>
Reply-To: Open MPI Users mailto:us...@open-mpi.org>>
List-Post: users@lists.open-mpi.org
Date: Sunday, October 19, 2014 at 7:33 PM
To: Open MPI Users mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] CuEventCreate Failed...

The error 304 corresponds to CUDA_ERRROR_OPERATNG_SYSTEM which means an OS call 
failed.  However, I am not sure how that relates to the call that is getting 
the error.
Also, the last error you report is from MVAPICH2-GDR, not from Open MPI.  I 
guess then I have a few questions.


1.  Can you supply your configure line for Open MPI?

2.  Are you making use of CUDA-aware support?

3.  Are you set up so that users can use both Open MPI and MVAPICH2?

Thanks,
Rolf

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Steven Eliuk
Sent: Friday, October 17, 2014 6:48 PM
To: us...@open-mpi.org
Subject: [OMPI users] CuEventCreate Failed...

Hi All,

We have run into issues, that don’t really seem to materialize into incorrect 
results, nonetheless, we hope to figure out why we are getting them.

We have several environments with test from one machine, with say 1-16 
processes per node, to several machines with 1-16 processes. All systems are 
certified from Nvidia and use Nvidia Tesla k40 GPUs.

We notice frequent situations of the following,


Re: [OMPI users] large memory usage and hangs when preconnecting beyond 1000 cpus

2014-10-20 Thread Marshall Ward
Thanks, it's at least good to know that the behaviour isn't normal!

Could it be some sort of memory leak in the call? The code in

ompi/runtime/ompi_mpi_preconnect.c

looks reasonably safe, though maybe doing thousands of of isend/irecv
pairs is causing problems with the buffer used in ptp messages?

I'm trying to see if valgrind can see anything, but nothing from
ompi_init_preconnect_mpi is coming up (although there are some other
warnings).


On Sun, Oct 19, 2014 at 2:37 AM, Ralph Castain  wrote:
>
>> On Oct 17, 2014, at 3:37 AM, Marshall Ward  wrote:
>>
>> I currently have a numerical model that, for reasons unknown, requires
>> preconnection to avoid hanging on an initial MPI_Allreduce call.
>
> That is indeed odd - it might take a while for all the connections to form, 
> but it shouldn’t hang
>
>> But
>> when we try to scale out beyond around 1000 cores, we are unable to
>> get past MPI_Init's preconnection phase.
>>
>> To test this, I have a basic C program containing only MPI_Init() and
>> MPI_Finalize() named `mpi_init`, which I compile and run using `mpirun
>> -mca mpi_preconnect_mpi 1 mpi_init`.
>
> I doubt preconnect has been tested in a rather long time as I’m unaware of 
> anyone still using it (we originally provided it for some legacy code that 
> otherwise took a long time to initialize). However, I could give it a try and 
> see what happens. FWIW: because it was so targeted and hasn’t been used in a 
> long time, the preconnect algo is really not very efficient. Still, shouldn’t 
> have anything to do with memory footprint.
>
>>
>> This preconnection seems to consume a large amount of memory, and is
>> exceeding the available memory on our nodes (~2GiB/core) as the number
>> gets into the thousands (~4000 or so). If we try to preconnect to
>> around ~6000, we start to see hangs and crashes.
>>
>> A failed 5600 core preconnection gave this warning (~10k times) while
>> hanging for 30 minutes:
>>
>>[warn] opal_libevent2021_event_base_loop: reentrant invocation.
>> Only one event_base_loop can run on each event_base at once.
>>
>> A failed 6000-core preconnection job crashed almost immediately with
>> the following error.
>>
>>[r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in
>> file ras_tm_module.c at line 159
>>[r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in
>> file ras_tm_module.c at line 85
>>[r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in
>> file base/ras_base_allocate.c at line 187
>
> This doesn’t have anything to do with preconnect - it indicates that mpirun 
> was unable to open the Torque allocation file. However, it shouldn’t have 
> “crashed”, but instead simply exited with an error message.
>
>>
>> Should we expect to use very large amounts of memory for
>> preconnections of thousands of CPUs? And can these
>>
>> I am using Open MPI 1.8.2 on Linux 2.6.32 (centOS) and FDR infiniband
>> network. This is probably not enough information, but I'll try to
>> provide more if necessary. My knowledge of implementation is
>> unfortunately very limited.
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25527.php
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25536.php