arichardson added inline comments.
================ Comment at: clang/lib/CodeGen/CGCall.cpp:2169 + if (!CodeGenOpts.NullPointerIsValid && + getContext().getTargetAddressSpace(FI.arg_begin()->type) == 0) { + Attrs.addAttribute(llvm::Attribute::NonNull); ---------------- rjmccall wrote: > arichardson wrote: > > rjmccall wrote: > > > rsmith wrote: > > > > jdoerfert wrote: > > > > > rjmccall wrote: > > > > > > rsmith wrote: > > > > > > > jdoerfert wrote: > > > > > > > > arichardson wrote: > > > > > > > > > Isn't the `this` pointer also nonnull in other address spaces? > > > > > > > > > > > > > > > > > > In our CHERI fork we use AS200 for the this pointer and would > > > > > > > > > quite like to have the nonnull attribute. > > > > > > > > > I can obviously change this line locally when I next merge > > > > > > > > > from upstream, but I would like to avoid diffs and it seems > > > > > > > > > to me like this restriction is unnecessary. > > > > > > > > I also think `NullPointerIsValid` is sufficient. > > > > > > > It's my understanding that: > > > > > > > * The LLVM `null` value in any address space is the all-zero-bits > > > > > > > value. > > > > > > > * In address space zero, the `null` value does not correspond to > > > > > > > addressable memory, but this is not assumed to hold in other > > > > > > > address spaces. > > > > > > > * An address-space-zero `null` value that is addressspacecast to > > > > > > > a different address space might not be the `null` in the target > > > > > > > address space. > > > > > > > * The `nonnull` attribute implies that the pointer value is not > > > > > > > the `null` value. > > > > > > > * A null pointer in the frontend in a non-zero address space > > > > > > > corresponds to the value produced by an addressspacecast of an > > > > > > > address-space-zero `null` value to the target address space. > > > > > > > > > > > > > > That being the case, there is simply no connection between the C > > > > > > > and C++ notion of a null pointer and a `null` LLVM pointer value > > > > > > > in a non-zero address space in general, so it is not correct to > > > > > > > use the `nonnull` attribute in a non-zero address space in > > > > > > > general. Only if we know that a C++ null pointer is actually > > > > > > > represented by the LLVM `null` value in the corresponding address > > > > > > > space can we use the `nonnull` attribute to expose that fact to > > > > > > > LLVM. And we do not know that in general. > > > > > > I think all of this is correct except that the frontend does know > > > > > > what the bit-pattern of the null pointer is in any particular > > > > > > language-level address space, and it knows what the language-level > > > > > > address space of `this` is. So we should be able to ask whether > > > > > > the null value in the `this` address space is the all-zero value > > > > > > and use that to decide whether to apply `nonnull`. > > > > > Hm, I think the problem is that we don't couple `NullPointerIsValid` > > > > > with the address space. As I said before. In LLVM-IR, if we don't > > > > > have the `null-pointer-is-valid` property, then "memory access" > > > > > implies `dereferenceable` implies `nonnull`. Now, we usually assume > > > > > non-0 address spaces to have the above property, but that should be > > > > > decoupled. The query if "null is valid" should take the function and > > > > > the AS, as it does conceptually in LLVM-core, and then decide if > > > > > `null-pointer-is-valid` or not. If we had that, @arichardson could > > > > > define that AS200 does not have valid null pointers. If you do that > > > > > right now the IR passes will at least deduce `nonnull` based on > > > > > `derferenceable`. > > > > > I think all of this is correct except that the frontend does know > > > > > what the bit-pattern of the null pointer is in any particular > > > > > language-level address space > > > > > > > > Interesting. How do we get at that? Do we ask the middle-end to > > > > constant-fold `addrspacecast i8* null to i8* addrspace(N)` and see if > > > > we get a `null` back, or is there a better way? > > > > > > > > In any case, this patch follows the same pattern we use for return > > > > values of reference type, parameters of reference type, and decayed > > > > array function parameters with static array bounds, all of which apply > > > > `nonnull` only in address space 0. If we want to use a different > > > > pattern, to consider whether LLVM's `nonnull` means "not a null > > > > pointer" rather than assuming that holds only in address space 0, we > > > > should make a holistic change for that throughout CGCall.cpp, rather > > > > than touching only the handling of the `this` pointer. > > > > > > > > It'd also be interesting to consider what we want > > > > `__attribute__((nonnull))` to mean in address spaces where the null > > > > pointer isn't the zero pointer: should it mean non-zero at the source > > > > level / non-null in LLVM IR, or should it mean non-null at the source > > > > level (which might be unrepresentable in LLVM IR, but static analysis > > > > etc. could still detect it)? We're currently inconsistent on this: > > > > static analysis, constant evaluation, and sanitizers treat the > > > > attribute as meaning non-null, but IR generation emits the `nonnull` IR > > > > attribute, treating it as meaning non-zero instead. > > > > Interesting. How do we get at that? Do we ask the middle-end to > > > > constant-fold addrspacecast i8* null to i8* addrspace(N) and see if we > > > > get a null back, or is there a better way? > > > > > > In fact, the middle-end has no ability to constant-fold `addrspacecast`, > > > because LLVM's address-space support is a giant pile of hacks rather than > > > exhibiting any cohesive design principles; or to put it another way, the > > > lack of any centralized technical design authority in LLVM means that > > > nobody has been able to enforce any rules about, say, targets being able > > > to contribute knowledge about address spaces to the middle-end. > > > > > > But in the frontend, we essentially have our own independent concept of > > > address spaces, and the mapping from an AST address space to an IR > > > address space can be non-obvious (and, in particular, non-injective), > > > targets can perform arbitrary IR for address-space conversions instead of > > > just using `addrspacecast`, and so on. This is necessary because we have > > > to actually ship a functional compiler, whether with LLVM or despite it. > > > And one of the places where it's necessary to work around LLVM is > > > emitting a null pointer constant, because of course C requires a null > > > pointer constant to be a constant expression, which in LLVM means we have > > > to produce an `llvm::Constant*` for it, and we don't get to rely on > > > assumptions like `addrspacecast` having appropriate semantics. So > > > instead we just have `ASTContext::getTargetNullPointerValue`, which is > > > ultimately fed by the TargetInfo, and we expect that to be consistent > > > with the behavior of e.g. the address-space-conversion customization > > > point. > > > > > > I think `__attribute__((nonnull))` clearly needs to be based on the > > > language-level null pointer concept rather than having anything to do > > > with a zero bit representation, and if that means we can't use it in IR, > > > maybe we can improve IR. > > > > I think all of this is correct except that the frontend does know what > > > > the bit-pattern of the null pointer is in any particular language-level > > > > address space > > > > > > Interesting. How do we get at that? Do we ask the middle-end to > > > constant-fold `addrspacecast i8* null to i8* addrspace(N)` and see if we > > > get a `null` back, or is there a better way? > > > > > We try rather hard to avoid any addrspacecasts in the IR and emit `i8* > > addrspace(200) null` for language-level NULL pointers. > > > > A bit more context: In our compiler we support two compilation modes: > > so-called "pure-capability" where we use AS 200 for all pointers. This maps > > to using capability instructions in the backend (effectively 128-bit fat > > pointers with a hidden validity tag). We also support a hybrid compilation > > mode where we emit everything in AS0 (64-bit integer pointers) by default > > and only use AS200 for annotated language-level pointers. > > In both of these address spaces a zero bit-pattern is NULL and not > > dereferenceable. > > In the "pure-capability" mode our AS200 is effectively the default AS0. > > However, we can't use that since we need to use a different AS to select > > our instructions in the backend since we extend the MIPS,RISC-V and AArch64 > > backends and AS0 already maps to integer loads/stores/etc. > > > > So the problem as I see it is that clang needs to be able to tell that a > > for a given LLVM IR address space llvm IR `null` (i.e. all zero bits) maps > > to C/C++ language-level `NULL/nullptr`? > > > > However, I did not see anywhere in the langref that `null` is equivalent to > > "all-zero-bits", but I may have missed that part. > > All I saw about null was: > > > > > Null pointer constants > > > The identifier ‘null’ is recognized as a null pointer constant and must > > > be of pointer type. > > > > So unless I missed something it seems like AS<N> could define null as being > > all-ones instead, and that could then map to the representation of > > language-level NULL on machines where it is not all-zeroes? > > > > > > In our fork I added a local hack to replace the checks for adding the > > nonnull attribute if AS==0 with a `canMarkAsNonNull` helper that checks for > > `AS==0||AS==200`. Maybe there should be a helper function that checks > > whether language-level NULL maps to null in address space N, but that would > > probably require making null-pointer-is-valid a per-AS property? > > > > > LLVM's assumption that `null` (`ConstantPointerNull`) is a zero bit-pattern > is reflected in places like `isNullValue()`, which (despite the name) > actually determines whether the value is a zero bit-pattern, as you can see > from how it operates on `ConstantFP` and so on. > > But if your null pointers are actually the zero bit-pattern across all > address spaces, I don't think that would cause you any problems. > > We should be able to just replace this check with > `getContext().getTargetNullPointerValue(FI.arg_begin()->type) == 0`. That sounds like a good solution, and we should probably also be explicit in the Langref that null is assumed to be a zero bit-pattern. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D17993/new/ https://reviews.llvm.org/D17993 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits