rjmccall added inline comments.

================
Comment at: clang/lib/CodeGen/CGCall.cpp:2169
+    if (!CodeGenOpts.NullPointerIsValid &&
+        getContext().getTargetAddressSpace(FI.arg_begin()->type) == 0) {
+      Attrs.addAttribute(llvm::Attribute::NonNull);
----------------
arichardson wrote:
> rjmccall wrote:
> > rsmith wrote:
> > > jdoerfert wrote:
> > > > rjmccall wrote:
> > > > > rsmith wrote:
> > > > > > jdoerfert wrote:
> > > > > > > arichardson wrote:
> > > > > > > > Isn't the `this` pointer also nonnull in other address spaces?
> > > > > > > > 
> > > > > > > > In our CHERI fork we use AS200 for the this pointer and would 
> > > > > > > > quite like to have the nonnull attribute.
> > > > > > > > I can obviously change this line locally when I next merge from 
> > > > > > > > upstream, but I would like to avoid diffs and it seems to me 
> > > > > > > > like this restriction is unnecessary.
> > > > > > > I also think `NullPointerIsValid` is sufficient. 
> > > > > > It's my understanding that:
> > > > > > * The LLVM `null` value in any address space is the all-zero-bits 
> > > > > > value.
> > > > > > * In address space zero, the `null` value does not correspond to 
> > > > > > addressable memory, but this is not assumed to hold in other 
> > > > > > address spaces.
> > > > > > * An address-space-zero `null` value that is addressspacecast to a 
> > > > > > different address space might not be the `null` in the target 
> > > > > > address space.
> > > > > > * The `nonnull` attribute implies that the pointer value is not the 
> > > > > > `null` value.
> > > > > > * A null pointer in the frontend in a non-zero address space 
> > > > > > corresponds to the value produced by an addressspacecast of an 
> > > > > > address-space-zero `null` value to the target address space.
> > > > > > 
> > > > > > That being the case, there is simply no connection between the C 
> > > > > > and C++ notion of a null pointer and a `null` LLVM pointer value in 
> > > > > > a non-zero address space in general, so it is not correct to use 
> > > > > > the `nonnull` attribute in a non-zero address space in general. 
> > > > > > Only if we know that a C++ null pointer is actually represented by 
> > > > > > the LLVM `null` value in the corresponding address space can we use 
> > > > > > the `nonnull` attribute to expose that fact to LLVM. And we do not 
> > > > > > know that in general.
> > > > > I think all of this is correct except that the frontend does know 
> > > > > what the bit-pattern of the null pointer is in any particular 
> > > > > language-level address space, and it knows what the language-level 
> > > > > address space of `this` is.  So we should be able to ask whether the 
> > > > > null value in the `this` address space is the all-zero value and use 
> > > > > that to decide whether to apply `nonnull`.
> > > > Hm, I think the problem is that we don't couple `NullPointerIsValid` 
> > > > with the address space. As I said before. In LLVM-IR, if we don't have 
> > > > the `null-pointer-is-valid` property, then "memory access" implies 
> > > > `dereferenceable` implies `nonnull`. Now, we usually assume non-0 
> > > > address spaces to have the above property, but that should be 
> > > > decoupled. The query if "null is valid" should take the function and 
> > > > the AS, as it does conceptually in LLVM-core, and then decide if 
> > > > `null-pointer-is-valid` or not. If we had that, @arichardson could 
> > > > define that AS200 does not have valid null pointers. If you do that 
> > > > right now the IR passes will at least deduce `nonnull` based on 
> > > > `derferenceable`.
> > > > I think all of this is correct except that the frontend does know what 
> > > > the bit-pattern of the null pointer is in any particular language-level 
> > > > address space
> > > 
> > > Interesting. How do we get at that? Do we ask the middle-end to 
> > > constant-fold `addrspacecast i8* null to i8* addrspace(N)` and see if we 
> > > get a `null` back, or is there a better way?
> > > 
> > > In any case, this patch follows the same pattern we use for return values 
> > > of reference type, parameters of reference type, and decayed array 
> > > function parameters with static array bounds, all of which apply 
> > > `nonnull` only in address space 0. If we want to use a different pattern, 
> > > to consider whether LLVM's `nonnull` means "not a null pointer" rather 
> > > than assuming that holds only in address space 0, we should make a 
> > > holistic change for that throughout CGCall.cpp, rather than touching only 
> > > the handling of the `this` pointer.
> > > 
> > > It'd also be interesting to consider what we want 
> > > `__attribute__((nonnull))` to mean in address spaces where the null 
> > > pointer isn't the zero pointer: should it mean non-zero at the source 
> > > level / non-null in LLVM IR, or should it mean non-null at the source 
> > > level (which might be unrepresentable in LLVM IR, but static analysis 
> > > etc. could still detect it)? We're currently inconsistent on this: static 
> > > analysis, constant evaluation, and sanitizers treat the attribute as 
> > > meaning non-null, but IR generation emits the `nonnull` IR attribute, 
> > > treating it as meaning non-zero instead.
> > > Interesting. How do we get at that? Do we ask the middle-end to 
> > > constant-fold addrspacecast i8* null to i8* addrspace(N) and see if we 
> > > get a null back, or is there a better way?
> > 
> > In fact, the middle-end has no ability to constant-fold `addrspacecast`, 
> > because LLVM's address-space support is a giant pile of hacks rather than 
> > exhibiting any cohesive design principles; or to put it another way, the 
> > lack of any centralized technical design authority in LLVM means that 
> > nobody has been able to enforce any rules about, say, targets being able to 
> > contribute knowledge about address spaces to the middle-end.
> > 
> > But in the frontend, we essentially have our own independent concept of 
> > address spaces, and the mapping from an AST address space to an IR address 
> > space can be non-obvious (and, in particular, non-injective), targets can 
> > perform arbitrary IR for address-space conversions instead of just using 
> > `addrspacecast`, and so on.  This is necessary because we have to actually 
> > ship a functional compiler, whether with LLVM or despite it.  And one of 
> > the places where it's necessary to work around LLVM is emitting a null 
> > pointer constant, because of course C requires a null pointer constant to 
> > be a constant expression, which in LLVM means we have to produce an 
> > `llvm::Constant*` for it, and we don't get to rely on assumptions like 
> > `addrspacecast` having appropriate semantics.  So instead we just have 
> > `ASTContext::getTargetNullPointerValue`, which is ultimately fed by the 
> > TargetInfo, and we expect that to be consistent with the behavior of e.g. 
> > the address-space-conversion customization point.
> > 
> > I think `__attribute__((nonnull))` clearly needs to be based on the 
> > language-level null pointer concept rather than having anything to do with 
> > a zero bit representation, and if that means we can't use it in IR, maybe 
> > we can improve IR.
> > > I think all of this is correct except that the frontend does know what 
> > > the bit-pattern of the null pointer is in any particular language-level 
> > > address space
> > 
> > Interesting. How do we get at that? Do we ask the middle-end to 
> > constant-fold `addrspacecast i8* null to i8* addrspace(N)` and see if we 
> > get a `null` back, or is there a better way?
> > 
> We try rather hard to avoid any addrspacecasts in the IR and emit `i8* 
> addrspace(200) null` for language-level NULL pointers.
> 
> A bit more context: In our compiler we support two compilation modes: 
> so-called "pure-capability" where we use AS 200 for all pointers. This maps 
> to using capability instructions in the backend (effectively 128-bit fat 
> pointers with a hidden validity tag). We also support a hybrid compilation 
> mode where we emit everything in AS0 (64-bit integer pointers) by default and 
> only use AS200 for annotated language-level pointers.
> In both of these address spaces a zero bit-pattern is NULL and not 
> dereferenceable. 
> In the "pure-capability" mode our AS200 is effectively the default AS0. 
> However, we can't use that since we need to use a different AS to select our 
> instructions in the backend since we extend the MIPS,RISC-V and AArch64 
> backends and AS0 already maps to integer loads/stores/etc.
> 
> So the problem as I see it is that clang needs to be able to tell that a for 
> a given LLVM IR address space llvm IR `null` (i.e. all zero bits) maps to 
> C/C++ language-level `NULL/nullptr`?
> 
> However, I did not see anywhere in the langref that `null` is equivalent to 
> "all-zero-bits", but I may have missed that part.
> All I saw about null was:
> 
> > Null pointer constants
> > The identifier ‘null’ is recognized as a null pointer constant and must be 
> > of pointer type.
> 
> So unless I missed something it seems like AS<N> could define null as being 
> all-ones instead, and that could then map to the representation of 
> language-level NULL on machines where it is not all-zeroes?
> 
> 
> In our fork I added a local hack to replace the checks for adding the nonnull 
> attribute if AS==0 with a `canMarkAsNonNull` helper that checks for 
> `AS==0||AS==200`. Maybe there should be a helper function that checks whether 
> language-level NULL maps to null in address space N, but that would probably 
> require making null-pointer-is-valid a per-AS property?
> 
> 
LLVM's assumption that `null` (`ConstantPointerNull`)  is a zero bit-pattern is 
reflected in places like `isNullValue()`, which (despite the name) actually 
determines whether the value is a zero bit-pattern, as you can see from how it 
operates on `ConstantFP` and so on.

But if your null pointers are actually the zero bit-pattern across all address 
spaces, I don't think that would cause you any problems.

We should be able to just replace this check with 
`getContext().getTargetNullPointerValue(FI.arg_begin()->type) == 0`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D17993/new/

https://reviews.llvm.org/D17993

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to