Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc

Mark Millard Wed, 17 Feb 2016 21:24:07 -0800

My fpr related notes/worries about the fix were wrong.

I finally got some time to look at this again and I see that I somehow missed 
the following code when I looked before:

  // The calling convention either uses 1-2 GPRs or 1 FPR.
  Address NumRegsAddr = Address::invalid();
  if (isInt || IsSoftFloatABI) {
    NumRegsAddr = Builder.CreateStructGEP(VAList, 0, CharUnits::Zero(), "gpr");
  } else {
    NumRegsAddr = Builder.CreateStructGEP(VAList, 1, CharUnits::One(), "fpr");
  }

So the

    Builder.CreateStore(Builder.getInt8(8), NumRegsAddr);

in Case 2 is tracking gpr vs. fpr usage contexts as it should. Also:

  llvm::Value *NumRegs = Builder.CreateLoad(NumRegsAddr, "numUsedRegs");

  // "Align" the register count when TY is i64.
  if (isI64 || (isF64 && IsSoftFloatABI)) {
    NumRegs = Builder.CreateAdd(NumRegs, Builder.getInt8(1));
    NumRegs = Builder.CreateAnd(NumRegs, Builder.getInt8((uint8_t) ~1U));
  }

  llvm::Value *CC =
      Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond");

is using the same bounds check figure (8) for gpr and fpr.

Apparently that common bound is one reason that the clang numbering is not the 
same as the ABI document's numbering: clang's numbering allows using the same 
figure for both contexts. (Given the prior alignment for isI64 (or isF64 with 
IsSoftFloatABI).)

Sorry for the prior noise about fpr.

It is still true that DOUBLE_OR_FLOAT is untested so far.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-16, at 5:51 AM, Mark Millard <[email protected]> wrote:

By the way: Nothing tested or seen so far checks DOUBLE_OR_FLOAT handling.

That involves fr (fpr in va_list in clang terms) instead of gr/gpr. fr/fpr has 
its own independent count and bound for using floating point registers vs. 
using the overflow area. There is also condition register bit 6 that indicates 
if floating point is involved overall or not.

Ultimately which of gpr vs. fpr and which bound (if the numbers are distinct in 
value) depends on the type specified in va_arg (SIMPLE_ARG/LONG_LONG vs. 
DOUBLE_OR_FLOAT status).

This may mean that the fix is an improvement for some types of usage but not a 
complete update: It is wrong for DOUBLE_OR_FLOAT instances of var_arg as 
stands. fpr would need to be involved instead. For world I expect it is fairly 
generally an improvement.

Also if the condition register indicates floating point is involved overall 
then there is likely management/handling of floating point state (for context 
switching management). (If it indicates no floating point involvement then 
there might be optimizations possible.)

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-16, at 2:45 AM, Mark Millard <[email protected]> wrote:

I used:

> Index: /usr/src/contrib/llvm/tools/clang/lib/CodeGen/TargetInfo.cpp
> ===================================================================
> --- /usr/src/contrib/llvm/tools/clang/lib/CodeGen/TargetInfo.cpp      
> (revision 295601)
> +++ /usr/src/contrib/llvm/tools/clang/lib/CodeGen/TargetInfo.cpp      
> (working copy)
> @@ -3569,6 +3569,8 @@
>  {
>    CGF.EmitBlock(UsingOverflow);
> 
> +    Builder.CreateStore(Builder.getInt8(8), NumRegsAddr);
> +
>    // Everything in the overflow area is rounded up to a size of at least 4.
>    CharUnits OverflowAreaAlign = CharUnits::fromQuantity(4);

as my test change. (Get evidence of operation before potential cleanup of the 
duplicated 8's.)

After a full buildworld/installworld based on the updated compiler. . .

My simple example of the problem no longer fails.

"ls -l -n /" now works.

"svnlite update -r295601 /usr/src" now works.

So whatever you want to do for the details of any submitted code, the basics of 
the change do avoid the SEGVs and allow these programs to work.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-15, at 4:27 PM, Mark Millard <[email protected]> wrote:

On 2016-Feb-15, at 1:20 PM, Mark Millard <markmi at dsl-only.net> wrote:
> 
> On 2016-Feb-15, at 12:18 PM, Roman Divacky <rdivacky at vlakno.cz> wrote:
>> 
>> On Mon, Feb 15, 2016 at 12:17:50PM -0800, Mark Millard wrote:
>>> On 2016-Feb-15, at 11:11 AM, Roman Divacky <rdivacky at vlakno.cz> wrote:
>>>> 
>>>> Mark, I believe you're right. What do you think about this patch?
>>>> 
>>>> Index: tools/clang/lib/CodeGen/TargetInfo.cpp
>>>> ===================================================================
>>>> --- tools/clang/lib/CodeGen/TargetInfo.cpp (revision 260852)
>>>> +++ tools/clang/lib/CodeGen/TargetInfo.cpp (working copy)
>>>> @@ -3599,6 +3599,8 @@
>>>> {
>>>> CGF.EmitBlock(UsingOverflow);
>>>> 
>>>> +    Builder.CreateStore(Builder.getInt8(11), NumRegsAddr);
>>>> +
>>>> // Everything in the overflow area is rounded up to a size of at least 4.
>>>> CharUnits OverflowAreaAlign = CharUnits::fromQuantity(4);
>>>> 
>>>> 
>>>> Can you test it?
>>> 
>>> It may be later today before I can start the the test process.
>>> 
>>> While your change is not wrong as presented, it does seem to be based on 
>>> the ABI document's numbering with the range 3 <= gr <12, where 3 <= gr < 11 
>>> cover r3-r10 use and gr=11 implies overflow stack area use. (gr being the 
>>> ABI documents name.)
>>> 
>>> The clang code generation that I saw while analyzing the problem and the 
>>> clang source that you had me look at did not use that numbering. Instead it 
>>> seems to be based on 0 <= gpr < 9, where 0 <= gpr < 8 cover r3-r10 use and 
>>> gpr=8 implies overflow stack area use. (gpr being what gdb showed me as I 
>>> remember.) In other words: clang counts the number of "parameter registers" 
>>> already in use as it goes along instead of tracking register numbers that 
>>> have been used.
>>> 
>>> So assigning any value that appears to be positive and >= 8 should work, 
>>> such as:
>>> 
>>> Builder.CreateStore(Builder.getInt8(8), NumRegsAddr);
>> 
>> Can you check what number gcc uses? We want to be interoperable with gcc.
>> 
>> Anyway, thanks for testing!
>> 
>> Roman
> 
> I'll do that check of gcc 4.2.1 code generation before starting the test 
> later today.
> 
> But if the clang numbering is different in gcc 4.2.1 then far more than just 
> adding a
> 
>> Builder.CreateStore(Builder.getInt8(?), NumRegsAddr)
> 
> 
> for some "?" would need to be involved in the changes in order to reach 
> compatibility.
> 
> 
> I'll note that for clang 3.8.0 the actual comparison instruction generated is 
> of the form
> 
>> cmplwi  r?,7
> 
> 
> for some r?, such as r5 or r4, and the conditional branch generated is a bgt 
> instruction.
> 
> ===
> Mark Millard
> markmi at dsl-only.net

gcc 4.2.1 generates comparison instructions for va_arg of the form:

cmplwi  cr7,r0,8

and the conditional branch generated is a "bge cr7, . . ." instruction.

So the same number range is in use by both compilers: They are compatible for 
the bounds checks for reg vs. overflow for how they count, equality 
inclusion/exclusion matching up with the specific number (8 vs. 7) used to make 
things the same overall.

Other aspects of the code generation distinctions would take me time to 
analyze. It will be a while before I will be looking at other points.

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[email protected]"

Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc

Reply via email to