On Fri, 17 Aug 2018, Bernd Edlinger wrote: > On 08/17/18 14:19, Richard Biener wrote: > > On Fri, 17 Aug 2018, Bernd Edlinger wrote: > > > >> Richard Biener wrote: > >>> +embedded @code{NUL} characters. However, the > >>> +@code{TREE_STRING_LENGTH} always includes a trailing @code{NUL} that > >>> +is not part of the language string literal but appended by the front end. > >>> +If the string shall not be @code{NUL}-terminated the @code{TREE_TYPE} > >>> +is one character shorter than @code{TREE_STRING_LENGTH}. > >>> +Excess caracters other than one trailing @code{NUL} character are not > > > > characters btw. > > > > thanks, updated. > > > I read the above that the string literal for > > > > char x[2] = "1"; > > > > is actually "1\0\0" - there's one NUL that is not part of the language > > string literal. The second sentence then suggests that both \0 > > are removed because 2 is less than 3? > > > > maybe 2 is a bad example, lets consider: > char x[2000000000] = "1"; > > That is a string_cst with STRING_LENGTH = 2, content = "2\0\0" > the array_type is used on both x, and the string_cst, > I was assuming that both tree objects refer to the same type object. > which is char[0..2000000000-1]
Oh, didn't realize we use char[200000000] for the STRING_CST. Makes my suggestion to use char[] instead not (very) much worse than the existing practice then. > varasm assembles the bytes that are given by STRING_LENGTH > and appends zeros as appropriate. > > > As said, having this extra semantics of a STRING_CST tied to > > another tree node (its TREE_TYPE) looks ugly. > > > >>> +permitted. > >>> > >>> I find this very confusing and oppose to that change. Can we get > >>> back to the drawing board please? If we want an easy way to > >>> see whether a string is "properly" terminated then maybe we can > >>> simply use a flag that gets set by build_string? > >>> > >> > >> What I mean with that is the case like > >> char x[2] = "123456"; > >> > >> which is build_string(7, "123456"), but with a type char[2], > >> so varasm throws away "3456\0". > > > > I think varasm throws away chars not because of the type of > > the STRING_CST but because of the available storage in x. > > > > But at other places we look at the type of the string_cst, don't we? > Shouldn't those be the same? I think most (all?) places look at TREE_TYPE (TREE_TYPE (string)) only. I'm not aware of users of the array domain of the array type of a string - but I'm far from knowing GCC inside-out ;) > >> I want to say that this is not okay, the excess precision > >> should only be used to strip the nul termination, in cases > >> where it is intended to be a assembled as a not zero terminated > >> string. But maybe the wording could be improved? > > > > ISTR we always assemble a NUL in .strings to get string merging > > working. > > > > String merging is not working when the string is not explicitly > NUL terminated, my followup patch here tries to fix that: > > [PATCH] Handle not explicitly zero terminated strings in merge sections > https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00481.html I'd have expected sth as simple as if (merge_strings && str[thissize - 1] != '\0') thissize++; being appended in output_constant. Richard.