Re: Proposal to remove '--with-cuda-driver'

Thomas Schwinge Fri, 29 Apr 2022 06:50:32 -0700

Hi Tom!

On 2022-04-06T11:57:57+0200, Tom de Vries <tdevr...@suse.de> wrote:
> On 4/5/22 17:14, Thomas Schwinge wrote:
>> Regarding the following:
>>
>> On 2022-03-30T14:27:41+0200, Tom de Vries <tdevr...@suse.de> wrote:
>>>     <li>The <code>-mptx</code> flag has been added to specify the PTX ISA 
>>> version
>>>         for the generated code; permitted values are <code>3.1</code>
>>> -      (default, matches previous GCC versions) and <code>6.3</code>.
>>> +      (matches previous GCC versions), <code>6.0</code>, <code>6.3</code>,
>>> +      and <code>7.0</code>. If not specified, the used version is the 
>>> minimal
>>> +      version required for <code>-march</code> but at least 
>>> <code>6.0</code>.
>>>     </li>
>>
>> For "the PTX ISA version [used is] at least '6.0'", per
>> <https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes>,
>> this means we now require "CUDA 9.0, driver r384" (or more recent).
>
> Well, that would be the case if there was no -mptx=3.1.


When considering *using* GCC/nvptx, the '-mptx-3.1' multilib may be used
with "old" hardware/CUDA Driver versions, correct.  When considering
*building* GCC/nvptx, we do require CUDA 9.0, as otherwise the default
multilib can't be built (unless you disable 'ptxas' verification).

>> Per <https://developer.nvidia.com/cuda-toolkit-archive>:
>> "CUDA Toolkit 9.0 (Sept 2017)", so ~4.5 years old.
>> Per <https://download.nvidia.com/XFree86/Linux-x86_64/>, I'm guessing a
>
> I just see a list with version numbers there, I'm not sure what
> information you're referring to.

I'd assumed that from that URL as well as structure of these version
numbers, you could tell these are Nvidia Driver releases (including
bundled CUDA Driver libraries).

>> similar timeframe for the imprecise "r384" Driver version stated in that
>> table.  That should all be fine (re not mandating use of all-too-recent
>> versions).
>
> I don't know what an imprecise driver is.

Not "imprecise driver" but "imprecise [...] version".  For example,
<https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes>
talks about "driver r384", but such a version doesn't exist; it's rather
384.130, or 384.111, or 384.98, etc.


>> Now, consider doing a GCC/nvptx offloading build with
>> '--with-cuda-driver' pointing to CUDA 9.0 (or more recent).  This means
>> that the libgomp nvptx plugin may now use CUDA Driver features of the
>> CUDA 9.0 distribution ("driver r384", etc.) -- because that's what it is
>> being 'configure'd and linked against.  (I say "may now use", because
>> we're currently not making a lot of effort to use "modern" CUDA Driver
>> features -- but we could, and probably should.  That's a separate
>> discussion, of course.)  It then follows that the libgomp nvptx plugin
>> has a hard dependency on CUDA Driver features of the CUDA 9.0
>> distribution ("driver r384", etc.).  That's dependency as in ABI: via
>> '*.so' symbol versions as well as internal CUDA interface configuration;
>> see <cuda.h> doing different '#define's for different
>> '__CUDA_API_VERSION' etc.)
>>
>> Now assume one such dependency on "modern" CUDA Driver were not
>> implemented by:
>
> Thanks for reminding me, I forgot about this configure option.

OK, good.  ;-)

>>> +  <li>An <code>mptx-3.1</code> multilib was added.  This allows using older
>>> +      drivers which do not support PTX ISA version 6.0.</li>
>>
>> ... this "old" CUDA Driver.  Then you do have the '-mptx-3.1' multilib to
>> use with "old" CUDA Driver -- but you cannot actually use the libgomp
>> nvptx plugin, because that's been built against "modern" CUDA Driver.
>
> I remember the following problem: using -with-cuda-driver to specify
> what cuda driver interface (version) you want to link the libgomp plugin
> against, and then using an older driver in combination with that libgomp
> plugin.   We may run into trouble, typically at libgomp plugin load
> time, with an error mentioning an unresolved symbol or some abi symbol
> version being not sufficient.

Right.

> So, do I understand it correctly that your point is that using -mptx=3.1
> doesn't fix that problem?

Right.

>> Same problem, generally, for 'nvptx-run' of the nvptx-tools, which has
>> similar CUDA Driver dependencies.
>>
>> Now, that may currently be a latent problem only, because we're not
>> actually making use of "modern" CUDA Driver features.  But, I'd like to
>> resolve this "impedance mismatch", before we actually run into such
>> problems.
>
> It would be helpful for me if you would come up with an example of a
> modification to the libgomp plugin that would cause trouble in
> combination with mptx=3.1.

For example, something like the following scenario -- made up, so details
may be wrong.

Consider GCC's libgomp nvptx plugin is built with '--with-cuda-driver'
pointing to a modern CUDA release.  The 'configure' script finds the
'cuMemPrefetchAsync' function available (CUDA 8.0+?), enables respective
hypothetical code in the libgomp nvptx plugin, and thus
'libgomp-plugin-nvptx.so' now has a load-time dependency on 'libcuda.so'
providing 'cuMemPrefetchAsync': the plugin will fail to load if that's
not available.  If you now use this plugin on an "old" system (old CUDA
Driver version), the '-mptx-3.1' multilib that is meant to keep tings
working for such "old" configurations, doesn't help, because the plugin
won't load.  (Orthogonal aspects.)

..., and I've since discovered that we use 'libgomp/plugin/cuda-lib.def'
not just for 'PLUGIN_NVPTX_DYNAMIC' (that is, '--without-cuda-driver')
configurations, but also for '!PLUGIN_NVPTX_DYNAMIC' (that is,
'--with-cuda-driver') configurations.  So, instead of 'configure'-time
detection of 'cuMemPrefetchAsync' availability and using
'CUDA_CALL (cuMemPrefetchAsync)', 'libgomp/plugin/cuda-lib.def' should
specify 'CUDA_ONE_CALL_MAYBE_NULL (cuMemPrefetchAsync)', which for
'!PLUGIN_NVPTX_DYNAMIC' configurations would make sure to provide a
'weak' prototype for 'cuMemPrefetchAsync', and then ought to check
'CUDA_CALL_EXISTS (cuMemPrefetchAsync)' before doing
'CUDA_CALL (cuMemPrefetchAsync)'.

So: all is good, my scenario can be made work correctly even for the
'--with-cuda-driver' case: at run-time we see whether
'cuMemPrefetchAsync' is available, and only call if it is -- no matter
how the CUDA Driver library gets linked ('-lcuda') or loaded ('dlopen').


Anyway, despite that, we seem to agree that '--with-cuda-driver' is not
very useful, and may be removed:

>> Already long ago Jakub put in changes to use '--without-cuda-driver' to
>> "Allow building GCC with PTX offloading even without CUDA being installed
>> (gcc and nvptx-tools patches)": "Especially for distributions it is
>> undesirable to need to have proprietary CUDA libraries and headers
>> installed when building GCC.", and I understand GNU/Linux distributions
>> all use that.  That configuration uses the GCC-provided
>> 'libgomp/plugin/cuda/cuda.h', 'libgomp/plugin/cuda-lib.def' to manually
>> define the CUDA Driver ABI to use, and then 'dlopen("libcuda.so.1")'.
>> (Similar to what the libgomp GCN (and before: HSA) plugin is doing, for
>> example.)  Quite likely that our group (at work) are the only ones to
>> actually use '--with-cuda-driver'?
>
> Right, I see in my scripts that I don't use --with-cuda-driver, possibly
> because of years-ago running into issues when changing drivers forth and
> back.
>
>> My proposal now is: we remove '--with-cuda-driver' (make its use a no-op,
>> per standard GNU Autoconf behavior), and offer '--without-cuda-driver'
>> only.  This shouldn't cause any user-visible change in behavior, so safe
>> without a prior deprecation phase.
>
> I think the dlopen use-case is the most flexible, and I don't see any
> user benefit from using --with-cuda-driver, so I don't see a problem
> with removing --with-cuda-driver for the user.

ACK, thanks.

> I did wonder about keeping it available in some form, say rename to
> --maintainer-mode-with-cuda-driver.  This could be useful for debugging
> / comparison purposes.  But it would mean having to test it when making
> relevant changes, which is maintenance burden for a feature not visible
> to the user, so I guess that's not worth it.
>
> So, I'm fine with removing.

Based on the point you made above, I realized that it may be beneficial
to "keep the underlying functionality available for the developers":
"if you develop CUDA API-level changes in the libgomp nvptx plugin, it's
likely to be easier to just use the full CUDA toolkit 'cuda.h' and
directly link against libcuda (so that you've got all symbols etc.
available), and only once you know what exactly you need, update GCC's
'include/cuda/cuda.h' and 'libgomp/plugin/cuda-lib.def'".  (See the
thread "libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'".)

Do we agree that it's OK to remove the user-visiable '--with-cuda-driver'
etc. options, and do not introduce any new
'--enable-maintainer-mode-with-cuda-driver' (or similar) option, and
instead let this functionality be available to developers only, via
manually editing 'libgomp/plugin/Makefrag.am'?

Happy to submit an illustrative patch, if that helps.


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: Proposal to remove '--with-cuda-driver'

Reply via email to