[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

Iain Sandoe via Phabricator via cfe-commits Tue, 11 Oct 2022 01:21:03 -0700

iains added a comment.

In D134267#3848745 <https://reviews.llvm.org/D134267#3848745>, @ChuanqiXu wrote:


>> FAOD: I think this should not be about "who's patches" - but about the 
>> compilation model and command lines that we want to end up with.
>
> BTW, in the current context, when I say "my" and "your", I just want to refer 
> the corresponding things. There is no means to offend.

sure - no problem.

I guess we ought to be a bit more clear:
There is implementation divergence between GCC and clang and from the point to 
view of users and build systems having two different command line sets is not 
helpful.

The patches I drafted were aimed at removing that divergence.
If that is not possible (because the ODR analysis requires steps that make it 
very difficult or inefficient to do so) then some more thought and/or 
restructuring is needed.

> In D134267#3848672 <https://reviews.llvm.org/D134267#3848672>, @iains wrote:
>
>>> (2) Your scheme saves the time for deserialization. However, clang 
>>> implement many ODR checks in the deserialization part. Skipping the 
>>> deserialization will omit these ODR checks, which will break the semantics. 
>>> And this is the reason why that scheme may not be good.
>>
>> What you seem to be saying here is that:
>>
>> `clang++ -std=c++20 foo.cxxm -c -o foo,o`
>>
>> edit;  maybe I should have written -xc++ foo.cxxm
>>
>> Does different ODR analysis from:
>>
>> `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
>> `clang++ -std=c++20 foo.pcm -o foo.o`
>>
>> If so, then that is of some concern since those are both valid compilations 
>> in the current compiler.
>>
>> In the first case, the back end is connected as an AST consumer to the front 
>> - there is no serialise/deserialise.
>
> No. **Currently**, in clang, the following two is the same:
>
> `clang++ -std=c++20 foo.cxxm -c -o foo.o`

That's why I added the edit ...

`clang++ -std=c++20 -xc++ foo.cxxm -c -o foo.o`

AFAIU that will not produce any error for the user?

suppose I have foo.cxx which includes modules and has local declarations?
`clang++ -std=c++20  foo.cxx -c -o foo.o`

> with
>
> `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
> `clang++ -std=c++20 foo.pcm -o foo.o`
>
> The first compilation will generate `.pcm` files in the `/tmp` directories 
> and compile the `.pcm` files in the `/tmp` directories.

yes, I should have added the -xc++ in the first case ;)

> We could verify it by inserting debugging informations. Or writing a 
> ODR-mismatch example, then when the compiler emits an error, we'll find the 
> compiler emits the error in the deserialization part. I met these 2 
> situations for many times : )
>
>> The model I implemented simply streams the AST instead of throwing it away 
>> ...
>
> So that model misses the ODR checks, which is not good. Since it matters with 
> the semantics.

Understood - I wonder if we have layering right here - surely Sema should be 
responsible for ODR checks?
Is there any reason that relevant hashes cannot be generated on the AST 
//**without**// streaming it?
Was this model intended from the start ?
(the fact that one can connect a direct AST consumer to the back end, suggests 
that enforcing serialisation/deserialisation was not expected to be a 
requirement at some stage)

>> (repeating here, it is not the deserialisation that I think is key [although 
>> that is also useful] but minimising process launches).
>
> So from my point of view, it is much  more important to preserve the 
> semantics than minimizing the process launches.
>
>> That is a possibility, at the expense of more process launches - and I do 
>> not think that the corresponding model for the server-interactive approach 
>> has yet been fully explored.
>>
>> We go to some trouble (with the driver invoking the compiler in-process) to 
>> avoid unnecessary process launches .. with the current state of hardware, it 
>> could well be more efficient to have many processes waiting rather than many 
>> processes launched.
>>
>> (I am not making a claim here, just observing that the data are incomplete, 
>> since we do not have a corresponding model offered for the client-server 
>> case, and we do not have any kind of measurements - these are all "thought 
>> experiments").
>
> Yeah, I agree that there more space to explore in the client-server model. 
> But for the current building model, the 2 phase compilation is obviously 
> better than the 1 phase compilation. And my point is, if one day we proved 
> that there are many good points in the client/server model, it is not too 
> late to add it in clang. And **currently** it is not a good reason to block 
> the current patch, at least it is pretty simple and innocent.
>
>> There seem to be two schools of thought. (1) It is good to use extensions to 
>> determine the file contents (2) we should not impose a requirement for magic 
>> extension names if the implementation can determine the content without.
>>
>> For my part, I do prefer the second .. but of course will be happy to go 
>> with consensus,
>
> I'm confused the second. I mean what if the other static analyzing tools 
> (including build system but not limited to) want to analyze the modular 
> project, if they can only invoke the compiler to query the result. Then 
> things look complex enough in the state. But assume we can implement great 
> compiler interfaces. There are still problems. The tools need to pass the 
> potential module unit files to the compiler. With the first choice, the tools 
> can pass the module unit files precisely while with the second one, the input 
> number will be much larger. So it looks really worse to me...
>
> ----
>
> Since we discussed many things. Let me try to summarize to avoid missing the 
> topic.
>
> (1) For the client-server model, I think it is too far now. I am not against 
> it. Also I don't feel the current simple patch would is not good for it.

Yeah, I get that - what we need is an example of how both 'pre-scan' and 
'discovery' schemes would work.

> (2) Your implementation skips the deserialization part, which misses many ODR 
> checks. I think this is not good.

We should fix the confusion here - if the serialisation is required for 
correctness, then we should diagnose an incorrect compilation.  We should also 
consider if we have broken layering - ISTM that Sema should be responsible for 
the analysis not the deserialisation.

> (3) I suddenly realize that your **design** and my **design** are not 
> overlapped in some perspective. I mean the patch is working for `*.cppm` 
> files, and your design is intended for `*.cpp` files. Then a question is: 
> "what would the patch do for `*.cppm` files"? If we don't handle it 
> specially, in your implementation, when we compile a `.cppm` file, the `.pcm` 
> file would be generated twice. Once in the `/tmp` and once in the place you 
> specified. I think this is not acceptable.

There are some driver changes associated with the patches I have (not yet 
pushed to Phab) that would avoid that case (we certainly would not want to 
generate the output twice)

You are right that the overlap is not in code, but only in concept - your patch 
would not prevent the "on demand BMI" from being added later - the only 
down-side is that two ways to do things is likely to create confusion.

> So in another shorter word, for the current states of clang, the current 
> patch is more natural and more appropriate with the current states of clang. 
> And for your patch, I feel like it is not so fit for the current clang. I'm 
> not saying not fit the current states is not good. Otherwise we don't have 
> any inventions. I just want to say it requires more work.

yes, it seems that more work is needed.

However, I will say that when I started looking at this problem, I did consider 
trying to make the driver do the work to match GCC's model but that seemed like 
a very bad fit - since that would force the driver to inspect the source (or to 
start a process that does the same) to see if there are module keywords and 
then to query the name mapping interface (whatever that is) in order to decide 
what command lines to construct.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

Reply via email to