On 08/20/18 13:01, Richard Biener wrote: > On Wed, Aug 1, 2018 at 3:05 PM Bernd Edlinger <bernd.edlin...@hotmail.de> > wrote: >> >> >> >> On 08/01/18 11:29, Richard Biener wrote: >>> >>> Hmm. I think it would be nice if TREE_STRING_LENGTH would >>> match char[2] and TYPE_SIZE_UNIT even if that is inconvenient >>> for your check above. Because the '\0' doesn't belong to the >>> string. Then build_string internally appends a '\0' outside >>> of TREE_STRING_LENGTH. >>> >> >> Hmm. Yes, but the outside-0 byte is just one byte, not a wide >> character. > > That could be fixed though (a wide 0 is just N 0s). Add a elsz = 1 > parameter to build_string and allocate as many extra 0s as needed. > > There are STRING_CSTs which are not string literals, >> for instance attribute tags, Pragmas, asm constrants, etc. >> They use the '\0' outside, and have probably no TREE_TYPE. >> >>> >>>> So I would like to be able to assume that the STRING_CST objects >>>> are internally always generated properly by the front end. >>> >>> Yeah, I guess we need to define what "properly" is ;) >>> >> Yes. >> >>>> And that the ARRAY_TYPE of the string literal either has the >>>> same length than the TREE_STRING_LENGTH or if it is shorter, >>>> this is always exactly one (wide) character size less than >>>> TREE_STRING_LENGTH >>> >>> I think it should be always the same... >>> >> >> One could not differentiate between "\0" without zero-termination >> and "" with zero-termination, theoretically. > > Is that important? Doesn't the C standard say how to parse string literals? > >> We also have char x[100] = "ab"; >> that is TREE_STRING_LENGTH=3, and TYPE_SIZE_UNIT(TREE_TYPE(x)) = 100. >> Of course one could create it with a TREE_STRING_LENGTH = 100, >> but imagine char x[100000000000] = "ab" > > The question is more about TYPE_SIZE_UNIT (TREE_TYPE ("ab")) which I > hope matches "ab" and not 'x'. If it matches 'x' then I'd rather have it > unconditionally be [], thus incomplete (because the literals "size" depends > on the context/LHS it is used on). >
Sorry, but I must say, it is not at all like that. If I compile x.c: const char x[100] = "ab"; and set a breakpoint at output_constant: Breakpoint 1, output_constant (exp=0x7ffff6ff9dc8, size=100, align=256, reverse=false) at ../../gcc-trunk/gcc/varasm.c:4821 4821 if (size == 0 || flag_syntax_only) (gdb) p size $1 = 100 (gdb) call debug(exp) "ab" (gdb) p *exp $2 = {base = {code = STRING_CST, side_effects_flag = 0, constant_flag = 1, (gdb) p exp->typed.type->type_common.size_unit $5 = (tree) 0x7ffff6ff9d80 (gdb) call debug(exp->typed.type->type_common.size_unit) 100 (gdb) p exp->string.length $6 = 3 (gdb) p exp->string.str[0] $8 = 97 'a' (gdb) p exp->string.str[1] $9 = 98 'b' (gdb) p exp->string.str[2] $10 = 0 '\000' (gdb) p exp->string.str[3] $11 = 0 '\000' This is an important property of string_cst objects, that is used in c_strlen: It folds c_strlen(&x[4]) directly to 0, because every byte beyond TREE_STRING_LENGTH is guaranteed to be zero up to the type size. I would not have spent one thought on implementing an optimization like that, but that's how it is right now. All I want to do, is make sure that all string constants have the same look and feel in the middle-end, and restrict the variations that are allowed by the current implementation. Bernd. >>>> The idea is to use this property of string literals where needed, >>>> and check rigorously in varasm.c. >>>> >>>> Does that make sense? >>> >>> So if it is not the same then the excess character needs to be >>> a (wide) NUL in your model? ISTR your varasm.c patch didn't verify >>> that. >>> >> >> I think it does. >> >> >> Bernd.