naveen-seth wrote:

Hi, sorry! I left a comment earlier but noticed that a large part of the 
explanation was incorrect and deleted it. Here is the fixed comment:

At first I thought that the issue can be fixed by just checking for duplicates 
while parsing the command-line arguments, but that doesn't seem to be the case.

I will try to describe the crash point and the crash reason with this example 
(without duplicate arguments):

```cpp
//--- a.cppm
export module A;

export int a() {
  return 41;
}
```

```cpp
//--- b.cppm
export module B;
import A;

export int b() {
  return a() + 1;
}
```

```cpp
//--- c.cppm
export module C;

export int c() {
  return 42;
}
```

```cpp
//--- main.cpp
import B;

int main() {
  return b();
}
```

```
# Correctly precompile the modules
clang++ -std=c++20 --precompile a.cppm -o a.pcm
clang++ -std=c++20 --precompile b.cppm -fmodule-file=A=a.pcm -o b.pcm
clang++ -std=c++20 --precompile c.cppm -o c.pcm

# Bad use of -fmodule-file=...
clang++ -std=c++20 main.cpp -std=c++20 main.cpp -fmodule-file=B=b.pcm 
-fmodule-file=A=c.pcm
```

After `import B;` is parsed from `main.cpp`, the following sequence of function 
calls occurs (from top to bottom): 

…
`Sema::ActOnModuleImport`
`CompilerInstance::loadModule`
`CompilerInstance::findOrCompileModuleAndReadAST`
`ASTReader::ReadAST`
`ASTReader::ReadASTCore`
`ASTReader::ReadControlBlock`

When reading a modules control block in `ASTReader::ReadControlBlock`, Clang 
also attempts to get the `ModuleFile` for each of its imports ([see 
here](https://github.com/llvm/llvm-project/blob/943a70717c629f43b309ab56e8141ffb131871a6/clang/lib/Serialization/ASTReader.cpp#L3024)).
First, the matching BMI file for an imported module is looked up, then Clang 
calls back to `ASTReader::ReadASTCore` to ensure the BMI files `ModuleFile`.

Currently (prior to this PR), there is no validation to ensure that the BMI 
file actually contains the module clang is trying to load after having read the 
module name in `ReadControlBlock`.

Because of the faulty `-fmodule-file=` mapping, the `ModuleFile` of the import 
will refer to a BMI built for a different module than the one clang is trying 
to load.
Here, the import of `A` in `b.cppm` will result in loading the `ModuleFile` 
corresponding to `c.pcm` while deserializing `b.pcm`. 

The crash occurs because Clang, assuming it has the correct `ModuleFile`s, 
continues reading AST blocks and attempts to resolve `ModuleFile` remappings 
(any of the mappings mentioned 
[here](https://github.com/llvm/llvm-project/blob/74ace4fd3c71ff59f2d89680c0f1382d0f9933f4/clang/lib/Serialization/ASTReader.cpp#L8765)).
 This results in invalid access into members of the invalid `ModuleFile`s and 
finally the crash.

(Depending on the specific example, the exact location of the crash will be 
different, but it will always happen while trying to resolve the remappings.)

While writing this comment, I noticed that my commit and PR message describe 
the cause of the crash incorrectly and have edited both.

https://github.com/llvm/llvm-project/pull/133462
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to