yaxunl added a comment.

In D79526#2027552 <https://reviews.llvm.org/D79526#2027552>, @tra wrote:

> In D79526#2027470 <https://reviews.llvm.org/D79526#2027470>, @yaxunl wrote:
>
> > For implicit host device functions, since they are not guaranteed to work 
> > in device compilation, we can only resolve them as if they are host 
> > functions. This causes asymmetry but implicit host device functions are 
> > originally host functions so it is biased toward host compilation in the 
> > beginning.
>
>
> I don't think that the assertion that `implicit host device functions are 
> originally host functions` is always true. While in practice most such 
> functions may indeed come from the existing host code (e.g. the standard 
> library), I don't see any inherent reason why they can't come from the code 
> written for GPU. E.g. thrust is likely to have some implicitly HD functions 
> in the code that was not intended for CPUs and your assumption will be wrong. 
> Even if such case may not exist now, it would not be unreasonable for users 
> to have such code on device. 
>  This overload resolution difference is observable and it will likely create 
> new corner cases in convoluted enough C++ code.


I agree currently it is possible to force a device function to be implicitly 
host device by pragma. However it is arguable whether we should have special 
handling of overload resolution in this case. We do special handling of 
overload resolution because we can not modify some system headers which are 
intended for host originally. If a function was originally device function, it 
is CUDA/HIP code and it should follow normal overloading resolution rule and 
should be fixed if issues occur when it is marked as a host device function.

> I think we need something more principled than "happens to work for existing 
> code".
> 
>> Only the original resolution guarantees no other issues.  For example, in 
>> the failed compilation in TF, some ctor of std::atomic becomes implicit host 
>> device function because it is constexpr. We should treated as wrong-sided in 
>> device compilation, but we should treated as same-sided in host compilation, 
>> otherwise it changes the resolution in host compilation and causes other 
>> issues.
> 
> It may be true for atomic, where we do need to have GPU-specific 
> implementation. However, I can also see classes with constexpr constructors 
> that are prefectly usable on both sides and do not have to be treated as the 
> wrong-side.

Before this patch (together with the reverted commit), the device host 
candidates are always treated with the same preference as wrong-sided 
candidates in device compilation, so a wrong-sided candidate may hide a viable 
host device candidate. This patch fixes that for most cases, including: 1. host 
compilation 2. explicit host device caller 3. explicit host device callee. Only 
in device compilation when an implicit host device caller calls an implicit 
host device callee we apply the special 'incorrect' overloading resolution 
rule. If the special handling causes undesirable effect on users code, users 
can either mark the caller or callee to be explicit host device to bypass the 
special handling.

> TBH, I do not see any reasonable way to deal with this with the current 
> implementation of how HD functions are treated. This patch and its base do 
> improve things somewhat, but it all comes at the cost of further complexity 
> and potentially paints us even deeper into a corner. Current behavior is 
> already rather hard to explain.
> 
> Some time back @wash from NVIDIA was asking about improving HD function 
> handling. Maybe it's time for all interested parties to figure out whether 
> it's time to come up with a better solution. Not in this patch, obviously.

This patch is trying to fix the incorrect overloading resolution rule about 
host device callee in host device caller. It should be favored over wrong-sided 
callee but currently it is not.

If we reject this patch, we have to bear with the incorrect overloading rule 
until a better fix is implemented.

The complexity introduced by this patch is that it needs to have special rule 
for implicit host device caller and implicit host device callee in device 
compilation, where implicit host device callee is not favored over wrong-sided 
callee to preserve the overloading resolution result as if they are both host 
callees. This is to allow some functions in system headers becoming implicitly 
host device functions without causing undeferrable diagnostics.

The complexity introduced in the compiler code is not significant: a new 
function Sema::IsCUDAImplicitHostDeviceFunction is introduced and used in 
isBetterOverloadCandidate to detect the special situation that needs special 
handling. The code for special handling is trivial.

The complexity introduced in the overloading resolution rule is somehow 
concerning.

Before this patch, the rule is: same sided candidates are favored over wrong 
sided candidates, but host device candidates have same preference as wrong 
sided candidates.

After this patch, the rule is: same sided candidates and host device candidates 
have the same preference over wrong-sided candidates, except implicit host 
device function in device compilation, which preserves original resolution.

The reason to have the exception is that the implicit host device caller may be 
in system headers which users cannot modify. In device compilation, favoring 
implicit host device candidates over host candidates may change the resolution 
results, which incurs diagnostics.

Alternative solution I can think of:

1. defer all diagnostics possibly incurred due to overloading resolution 
change. Since the host device function is invalid, it cannot really be used by 
device code. As long as it is not really emitted, it should be OK. However, 
this requires all the possible diagnostics incurred due to overloading 
resolution change to be deferred. This requires some significant changes since 
1) the PartialDiagnosticBuilder currently has limited input types than the 
DiagnosticBuilder; 2) the diagnostic to be deferred may have accompanying notes 
which need to be deferred in coordination;  3) If there are control flow 
changes depending on whether diagnostics happen, they need to be modified so 
that compilation will continue.

2. change precedence of host-ness: If selection of a candidate will incur 
error, then it is not favored over host-ness, i.e. we would rather choose a 
wrong-sided candidate that does not cause other error, than choosing a implicit 
host device candidate that causes other error.




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to