Hi,

Sorry to reply to an old thread, but we’re seeing this message with 2.1.0 built 
against CUDA 8.0. We're using libcuda.so.375.39. Has anyone had any luck 
suppressing these messages?

Thanks,
Ben


> On 27 Mar 2017, at 7:13 pm, Roland Fehrenbacher <r...@q-leap.de> wrote:
> 
>>>>>> "SJ" == Sylvain Jeaugey <sjeau...@nvidia.com> writes:
> 
> Hi Sylvain,
> 
> thanks for looking into this further.
> 
>    SJ> I'm still working to get a clear confirmation of what is
>    SJ> printing this error message and since when.
> 
>    SJ> However, running strings, I could only find this string in
>    SJ> /usr/lib/libnvidia-ml.so, which comes with the CUDA driver, so
>    SJ> it should not be related to the CUDA runtime version ... but
>    SJ> again, until I find the code responsible for that, I can't say
>    SJ> for sure.
> 
> libcuda (in my case libcuda.so.367.57) also contains the string, and I'm
> pretty sure, that's where it's coming from. libcudart (linked to orted
> and libmpi.so.x) seems to dlopen libcuda.1 (at least "strings libcudart"
> suggests that) ...
> 
> Best,
> 
> Roland
> 
> -------
> http://www.q-leap.com / http://qlustar.com
>          --- HPC / Storage / Cloud Linux Cluster OS ---
> 
>    SJ> I'm sorry it's taking so long -- I'm on it though.
> 
>    SJ> On 03/24/2017 01:56 PM, Roland Fehrenbacher wrote:
>>>>>>>> "SJ" == Sylvain Jeaugey <sjeau...@nvidia.com> writes:
>>> Hi Sylvain,
>>> 
>    SJ> Hi Roland, I can't find this message in the Open MPI source
>    SJ> code. Could it be hwloc ? Some other library you are using ?
>>> 
>>> after a longer detour about the suspicion it might have something
>>> to do with nvml support of hwloc, I now found that a change in
>>> libcudart between 7.5 and 8.0 is the cause of the messages
>>> appearing now. Our earlier 1.8 version was built against CUDA 7.5
>>> and didn't show the problem, but a 1.8 version built against CUDA
>>> 8 shows the same problem as 2.0.2 built against CUDA 8. Do you
>>> think you could ask your team members at Nvidia how this new
>>> behaviour in libcudart can be suppressed?
>>> 
>>> BTW: Disabling nvml support for the internal hwloc has the effect
>>> that OpenMPI doesn't link in libnvidia-ml.so.x anymore, but has
>>> no effect on the messages.
>>> 
>>> Thanks,
>>> 
>>> Roland
>>> 
>    SJ> On 03/16/2017 04:23 AM, Roland Fehrenbacher wrote:
>>>>> Hi,
>>>>> 
>>>>> OpenMPI 2.0.2 built with cuda support brings up lots of
>>>>> warnings like
>>>>> 
>>>>> NVIDIA: no NVIDIA devices found
>>>>> 
>>>>> when running on HW without Nvidia devices. Is there a way to
>>>>> suppress these warnings? It would be quite a hassle to
>>>>> maintain different OpenMPI builds on clusters with just some
>>>>> GPU machines.
>>> _______________________________________________ users mailing
>>> list users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 
>    SJ> 
> -----------------------------------------------------------------------------------
>    SJ> This email message is for the sole use of the intended
>    SJ> recipient(s) and may contain confidential information.  Any
>    SJ> unauthorized review, use, disclosure or distribution is
>    SJ> prohibited.  If you are not the intended recipient, please
>    SJ> contact the sender by reply email and destroy all copies of the
>    SJ> original message.
>    SJ> 
> -----------------------------------------------------------------------------------
>    SJ> _______________________________________________ users mailing
>    SJ> list users@lists.open-mpi.org
>    SJ> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> -- 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to