ChuanqiXu9 wrote:

> > If Clang is configured with 64 bit `SourceLocation`, we can't use 96 bits 
> > for serialization. We can at most use 64 bits for a slot. In that case, we 
> > can only assume the offset of source location **in its own module** (not 
> > the global offset!) is not large than 2^32. I feel this assumption may be 
> > valid in a lot of places.
> > Or otherwise we can use less bits for module file index (32 bits seems to 
> > be too much honestly), then we can use higher 16 bits to store the module 
> > file index, and leave the lower 48 bits to store the source location. In 
> > this case, the assumption becomes to "the offset of the source location may 
> > not large than 2^48". But it is slightly hard to believe we can reach such 
> > extreme cases.
> 
> Let's see if @statham-arm (who introduced the `SourceLocation::[U]IntTy` 
> typedefs) wants to weight in here.
> 
> > I **tried** to encode this as two separate 32bit values. But it will break 
> > too many codes. Since a lot of places assume that we can encode the source 
> > location as an uint64_t.
> > What I mean is, with VBR6 format 
> > (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer), we can 
> > save more space for small integers in **on-disk** .pcm files (the memory 
> > representation should be the same). For example, for a 64 bits unsigned int 
> > `1`, VBR6 can use only 6 bits to store that `000001` to represent the 64 
> > bits value `1` in the on-disk representations. So that even if I don't use 
> > more slots to store the module file index, the size of the .pcm files will 
> > increase after all.
> 
> Right. My thinking was that single 64bit value with the module file index in 
> the upper 32 bits would basically disable VBR6 encoding for the lower 32 
> bits. If we split this thing into two separate 32bit values, we are more 
> likely to VBR6 encode both of them. But this would actually increase size for 
> (what I assume is the most common case) local source locations. 

Yes, this is the trade offs.

> Still, I think having a rough idea of how alternative implementations compare 
> would be great.
> 
> Do you have any data on how much recompilation this can save for real world 
> projects?

I don't have a specific data though. But I think it is understandable that the 
feature is super important. And we can save countless unnecessary compilations 
after this feature.

After the patch, for reduced BMI, we're already able to avoid changing the BMI 
if we only change the definitions of non-inline function bodies in the module 
unit.

Further more, based on this patch, we can do further optimizations. e.g., if we 
split declaration ID like we did in this patch, we may be able to avoid 
recompilations if a unreferenced module unit adds or deletes declarations. This 
may be pretty common in practice:

```
export module interface:part1;
...

//--- 
export module interface:part2;
...

//--- export module interface;
export import :part1;
export import :part2;

//--- consumer.cppm
export module consumer;
import interface;
// only used declarations in interface:part1;
```

Then if the user only changes module unit `interface:part2`, then we can keep 
the BMI for `consumer.cppm` the same. That implies every user of 
`consumer.cppm` can avoid recompilations.

---

I see this is a killer feature for C++20 modules. I think it is significantly 
valuable. So I really want to make this (including later optimizations) into 
clang19.

https://github.com/llvm/llvm-project/pull/86912
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to