Sorry for not providing an update earlier. The bug has been fixed and the messages should disappear in a future version of the driver (hopefully the next one if it got picked in time).

On 05/04/2017 10:23 PM, Ben Menadue wrote:
Hi,

Sorry to reply to an old thread, but we’re seeing this message with 2.1.0 built against CUDA 8.0. We're using libcuda.so.375.39. Has anyone had any luck suppressing these messages?

Thanks,
Ben


On 27 Mar 2017, at 7:13 pm, Roland Fehrenbacher <r...@q-leap.de <mailto:r...@q-leap.de>> wrote:

"SJ" == Sylvain Jeaugey <sjeau...@nvidia.com <mailto:sjeau...@nvidia.com>> writes:

Hi Sylvain,

thanks for looking into this further.

   SJ> I'm still working to get a clear confirmation of what is
   SJ> printing this error message and since when.

   SJ> However, running strings, I could only find this string in
   SJ> /usr/lib/libnvidia-ml.so, which comes with the CUDA driver, so
   SJ> it should not be related to the CUDA runtime version ... but
   SJ> again, until I find the code responsible for that, I can't say
   SJ> for sure.

libcuda (in my case libcuda.so.367.57) also contains the string, and I'm
pretty sure, that's where it's coming from. libcudart (linked to orted
and libmpi.so.x) seems to dlopen libcuda.1 (at least "strings libcudart"
suggests that) ...

Best,

Roland

-------
http://www.q-leap.com / http://qlustar.com
         --- HPC / Storage / Cloud Linux Cluster OS ---

   SJ> I'm sorry it's taking so long -- I'm on it though.

   SJ> On 03/24/2017 01:56 PM, Roland Fehrenbacher wrote:
"SJ" == Sylvain Jeaugey <sjeau...@nvidia.com <mailto:sjeau...@nvidia.com>> writes:
Hi Sylvain,

   SJ> Hi Roland, I can't find this message in the Open MPI source
   SJ> code. Could it be hwloc ? Some other library you are using ?

after a longer detour about the suspicion it might have something
to do with nvml support of hwloc, I now found that a change in
libcudart between 7.5 and 8.0 is the cause of the messages
appearing now. Our earlier 1.8 version was built against CUDA 7.5
and didn't show the problem, but a 1.8 version built against CUDA
8 shows the same problem as 2.0.2 built against CUDA 8. Do you
think you could ask your team members at Nvidia how this new
behaviour in libcudart can be suppressed?

BTW: Disabling nvml support for the internal hwloc has the effect
that OpenMPI doesn't link in libnvidia-ml.so.x anymore, but has
no effect on the messages.

Thanks,

Roland

   SJ> On 03/16/2017 04:23 AM, Roland Fehrenbacher wrote:
Hi,

OpenMPI 2.0.2 built with cuda support brings up lots of
warnings like

NVIDIA: no NVIDIA devices found

when running on HW without Nvidia devices. Is there a way to
suppress these warnings? It would be quite a hassle to
maintain different OpenMPI builds on clusters with just some
GPU machines.
_______________________________________________ users mailing
list users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


SJ> -----------------------------------------------------------------------------------
   SJ> This email message is for the sole use of the intended
   SJ> recipient(s) and may contain confidential information.  Any
   SJ> unauthorized review, use, disclosure or distribution is
   SJ> prohibited.  If you are not the intended recipient, please
   SJ> contact the sender by reply email and destroy all copies of the
   SJ> original message.
SJ> -----------------------------------------------------------------------------------
   SJ> _______________________________________________ users mailing
   SJ> list users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
   SJ> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

--
_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to