gribozavr wrote:

## On resugaring bodies of function template instantiations

> I just don't think that can be something we can always enable, as that would 
> make some examples of source code take exponentially more time and memory to 
> compile.

We keep track of propagated and inferred nullability annotations in a [side 
data 
structure](https://github.com/google/crubit/blob/main/nullability/type_nullability.h#L185).
 Our approach allows us to save on memory by not actually re-instantiating 
everything with new sugar. 

But I do have a question - how is the built-in resugarer going to do this 
without maintaining a side data structure like we do? Clang currently does not 
have a concept of non-canonical instantiations. `std::vector<int*>`, 
`std::vector<absl::Nullable<int*>>`, `std::vector<absl::Nonnull<int*>>` are the 
same canonical type, so they must share one instantiation. Are you also 
proposing to change that?

> Right now the priority is to implement the stuff that won't take a big hit on 
> perf. But that is just priority.

But wouldn't it be better if out-of-tree users like us were not blocked on the 
priorities of your work? With the `Subst*` nodes in the AST the information is 
available for out-of-tree users to use in any way they like. Are you sure you 
can predict all use cases? Do you really want to be in the business of 
satisfying them?

---

## On (in)completeness of the out-of-tree resugarer

> I don't think, from looking at the resugarer implemented in 
> type_nullability.cc, it's anywhere close to being able to resugar through a 
> complex type like an STL vector, as shown in the example.

Actually, resugaring in nullability analysis already works for 
`std::vector::push_back()`:

```c++
#include <vector>
#include "absl/base/nullability.h"

void Test() {
  int *p = nullptr;
  std::vector<absl::Nonnull<int*>> xs;
  xs.push_back(p);
}
```

```
$ clang-tidy example.cc
example.cc:7:16: warning: expected parameter '__x' of 'std::vector<int 
*>::push_back' to be nonnull, but a nullable argument was used 
[google-runtime-pointer-nullability]
    7 |   xs.push_back(p);
      |                ^
```

> as for example both libc++ and libstdc++ vector implementations derive the 
> element type not through the first template argument, but through the 
> allocator instead.

As far as I see, libc++ does not actually use the allocator to define 
`const_reference`, `value_type` and so on:

```c++
template <class _Tp, class _Allocator /* = allocator<_Tp> */>
class _LIBCPP_TEMPLATE_VIS vector {
  typedef _Tp value_type;
  typedef value_type& reference;
  typedef const value_type& const_reference;

  /* ... */ void push_back(const_reference __x);
```

(see https://github.com/llvm/llvm-project/blob/main/libcxx/include/vector)

> I see the implementation of the resugarer in `type_nullability.cc`, and it's 
> much more incomplete that the one I am proposing to upstream. So I think if 
> you get it to work through upstream transforms, you will get there faster.

I'm not sure what you mean by "get there faster" - our implementation already 
works, is already deployed internally, and covers a lot of cases that are 
relevant in practice. We are still tweaking things of course, and we are 
working on inference tooling (that infers nullability contracts based on the 
implementation and suggests edits to the source code accordingly).

That is, the difficult part and the focus at the moment is not resugaring in 
the analysis. I do admit that our implementation of resugaring is probably not 
as comprehensive as the one that you're working on, but it works for the vast 
majority of real-world code that people write in our monorepo - we tweaked it 
based on the cases that we actually observed.

---

## On upstreaming the work

> As far as I can see, you don't need to keep the Subst* nodes after resugaring 
> if you implement:

We are in the process of adding Clang's `_Nullable` and `_Nonnull` attributes 
to the implementation of `absl::Nullable` and `absl::Nonnull`. This work is 
taking time because we need to adjust our internal codebase to conform to 
Clang's existing expectations, behaviors, and warnings that get triggered by 
these attributes. These attributes will get picked up by the in-tree resugaring 
implementation because the source of the annotation is physically spelled in 
the code. Good.

However, a part of the nullability language extension is the [pragma that 
changes the default meaning of 
pointers](https://github.com/google/crubit/blob/main/nullability/test/pragma_nonnull.h#L6)
 from "unknown" to "nonnull". That is, we interpret the following two files in 
an identical way:

```c++
#include "absl/base/nullability.h"

#pragma nullability file_default nonnull

void TakePointer(int *p);
```

```c++
#include "absl/base/nullability.h"

void TakePointer(absl::Nonnull<int *> p);
```

Unfortunately we expect this pragma to be a controversial upstream because 
Clang already has a pragma for this exact purpose (`#pragma clang 
assume_nonnull begin/end`), but it was introduced by Apple in context of 
Objective-C and some of its behavior is incompatible with C++. It is long story 
and the issues are subtle, this is not the right place to discuss it.

In the end, we need a separate pragma with slightly different semantics because 
Clang's pragma was not suitable for C++. Getting these differences reconciled 
can take months in discussions alone, and we don't want to spend time on this 
when we are still not 100% sure that the model is going to work for us in the 
long run.

---

Fundamentally, your proposal to reuse the built-in resugarer depends on 
upstreaming an incomplete language extension, which cuts against Clang's 
extension policy, https://clang.llvm.org/get_involved.html:

> Evidence of a significant user community: This is based on a number of 
> factors, including an existing user community

I don't know how to demonstrate a significant user community if the extension 
must live in the Clang core. Of course we could fork Clang, but we would need 
to maintain this fork for years. We have been working on nullability since 
April 2020 - that's 4+ years already. The cost involved in maintaining a Clang 
fork for 4+ years, compared to an out-of-tree ClangTidy check, is just 
unpalatable.

We're lucky that Clang upstream already has nullability annotations that happen 
to be a mostly good match (modulo the pragma and some other issues). But what 
about our work on lifetimes?

---

## On separating resugaring improvements and `Subst*` removal

Let me raise the level of the discussion here. We do appreciate your work on 
improving resugaring. However, simultaneously removing the 
`SubstTemplateTypeParmType` is breaking one out-of-tree language extension that 
is already deployed (nullability) and is blocking research and prototyping on 
another out-of-tree extension (lifetimes).

I do want to emphasize that nullability is already deployed, we depend on 
nullability warnings internally, and developers benefit from them greatly. We 
are still tweaking the semantics though (e.g., we are in the process of 
introducing the pragma throughout the codebase etc.), and that's why it is not 
the right time to upstream things yet.

For lifetimes work, the language extension is not yet deployed, but further 
experimentation and research would be blocked if `Subst*` nodes are removed.

Is it possible to decouple landing the improvements in resugaring and removal 
of `SubstTemplateTypeParmType`? The first you can land immediately, but the 
second blocks our work. Is it separable?

---

## On `Subst*` removal and AST fidelity

Could you provide a more detailed rationale for removing 
`SubstTemplateTypeParmType`?

Generally Clang tries to preserve information in the AST as much as possible. 
Isn't removing this node going against this principle?

https://github.com/llvm/llvm-project/pull/101858
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to