Re: [PATCH] Make GO string literals properly NUL terminated

Bernd Edlinger Mon, 20 Aug 2018 05:10:53 -0700

On 08/20/18 13:01, Richard Biener wrote:
> On Wed, Aug 1, 2018 at 3:05 PM Bernd Edlinger <bernd.edlin...@hotmail.de> 
> wrote:
>>
>>
>>
>> On 08/01/18 11:29, Richard Biener wrote:
>>>
>>> Hmm.  I think it would be nice if TREE_STRING_LENGTH would
>>> match char[2] and TYPE_SIZE_UNIT even if that is inconvenient
>>> for your check above.  Because the '\0' doesn't belong to the
>>> string.  Then build_string internally appends a '\0' outside
>>> of TREE_STRING_LENGTH.
>>>
>>
>> Hmm. Yes, but the outside-0 byte is just one byte, not a wide
>> character.
> 
> That could be fixed though (a wide 0 is just N 0s).  Add a elsz = 1
> parameter to build_string and allocate as many extra 0s as needed.
> 
>    There are STRING_CSTs which are not string literals,
>> for instance attribute tags, Pragmas, asm constrants, etc.
>> They use the '\0' outside, and have probably no TREE_TYPE.
>>
>>>
>>>> So I would like to be able to assume that the STRING_CST objects
>>>> are internally always generated properly by the front end.
>>>
>>> Yeah, I guess we need to define what "properly" is ;)
>>>
>> Yes.
>>
>>>> And that the ARRAY_TYPE of the string literal either has the
>>>> same length than the TREE_STRING_LENGTH or if it is shorter,
>>>> this is always exactly one (wide) character size less than 
>>>> TREE_STRING_LENGTH
>>>
>>> I think it should be always the same...
>>>
>>
>> One could not differentiate between "\0" without zero-termination
>> and "" with zero-termination, theoretically.
> 
> Is that important?  Doesn't the C standard say how to parse string literals?
> 
>> We also have char x[100] = "ab";
>> that is TREE_STRING_LENGTH=3, and TYPE_SIZE_UNIT(TREE_TYPE(x)) = 100.
>> Of course one could create it with a TREE_STRING_LENGTH = 100,
>> but imagine char x[100000000000] = "ab"
> 
> The question is more about TYPE_SIZE_UNIT (TREE_TYPE ("ab")) which I
> hope matches "ab" and not 'x'.  If it matches 'x' then I'd rather have it
> unconditionally be [], thus incomplete (because the literals "size" depends
> on the context/LHS it is used on).
>


Sorry, but I must say, it is not at all like that.

If I compile x.c:
const char x[100] = "ab";

and set a breakpoint at output_constant:

Breakpoint 1, output_constant (exp=0x7ffff6ff9dc8, size=100, align=256,
     reverse=false) at ../../gcc-trunk/gcc/varasm.c:4821
4821      if (size == 0 || flag_syntax_only)
(gdb) p size
$1 = 100
(gdb) call debug(exp)
"ab"
(gdb) p *exp
$2 = {base = {code = STRING_CST, side_effects_flag = 0, constant_flag = 1,
(gdb) p exp->typed.type->type_common.size_unit
$5 = (tree) 0x7ffff6ff9d80
(gdb) call debug(exp->typed.type->type_common.size_unit)
100
(gdb) p exp->string.length
$6 = 3
(gdb) p exp->string.str[0]
$8 = 97 'a'
(gdb) p exp->string.str[1]
$9 = 98 'b'
(gdb) p exp->string.str[2]
$10 = 0 '\000'
(gdb) p exp->string.str[3]
$11 = 0 '\000'


This is an important property of string_cst objects, that is used in c_strlen:

It folds c_strlen(&x[4]) directly to 0, because every byte beyond 
TREE_STRING_LENGTH
is guaranteed to be zero up to the type size.

I would not have spent one thought on implementing an optimization like that,
but that's how it is right now.

All I want to do, is make sure that all string constants have the same look and 
feel
in the middle-end, and restrict the variations that are allowed by the current
implementation.


Bernd.


>>>> The idea is to use this property of string literals where needed,
>>>> and check rigorously in varasm.c.
>>>>
>>>> Does that make sense?
>>>
>>> So if it is not the same then the excess character needs to be
>>> a (wide) NUL in your model?  ISTR your varasm.c patch didn't verify
>>> that.
>>>
>>
>> I think it does.
>>
>>
>> Bernd.

Re: [PATCH] Make GO string literals properly NUL terminated

Reply via email to