Hi Dave, is the patch OK, or do you still have questions?
Thanks Bernd. On 11/2/18 10:48 PM, Bernd Edlinger wrote: > On 11/2/18 9:40 PM, David Malcolm wrote: >> On Sun, 2018-08-05 at 16:59 +0000, Bernd Edlinger wrote: >>> Hi! >>> >>> >>> My other patch with adds assertions to varasm.c regarding correct >>> nul termination of sting literals did make these incorrect string >>> constants in JIT frontend fail. >>> >>> The string constants are not nul terminated if their length exceeds >>> 200 characters. The test cases do not use strings of that size where >>> that would make a difference. But using a fixed index type is >>> clearly >>> wrong. >>> >>> This patch removes the fixed char[200] array type from >>> playback::context, >>> and uses build_string_literal instead of using build_string directly. >>> >>> >>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >>> Is it OK for trunk? >> >> Sorry for the belated response. >> >> Was this tested with --enable-host-shared and --enable-languages=jit ? >> Note that "jit" is not included in --enable-languages=all. >> > > Yes, of course. The test suite contains a few string constants, just > all of them are shorter than 200 characters. But I think removing this > artificial limit enables the existing test cases to test that the > shorter string is in fact zero terminated. > >> The patch seems reasonable, but I'm a little confused over the meaning >> of "len" in build_string_literal and build_string: does it refer to the >> length or the size of the string? >> > > build_string_literal: > For languages that use zero-terminated strings, len is strlen(str)+1, and > str is a zero terminated single-byte character string. > For languages that don't use zero-terminated strings, len is the size of > the string and str is not zero terminated. > > build_string: > constructs a STRING_CST tree object, which is usable as is in some contexts, > like for asm constraints, but as a string literal it is incomplete, and > needs an index type. The index type defines the memory size which must > be larger than the string precision. Excess memory is implicitly cleared. > > This means currently all jit strings shorter than 200 characters > are filled with zero up to the limit of 200 chars as imposed by > m_char_array_type_node. Strings of exactly 200 chars are not zero terminated, > and larger strings should result in an assertion (excess precision was > previously > allowed, but no zero termination was appended, when that is not part of > the original string constant). > > Previously it was allowed to have memory size less than the string len, which > had complicated the STRING_CST semantics in the middle-end, but with the > string_cst semantic rework I did for gcc-9 this is no longer allowed and > results in (checking) assertions in varasm.c. > >>> @@ -617,16 +616,9 @@ playback::rvalue * >>> playback::context:: >>> new_string_literal (const char *value) >>> { >>> - tree t_str = build_string (strlen (value), value); >>> - gcc_assert (m_char_array_type_node); >>> - TREE_TYPE (t_str) = m_char_array_type_node; >>> - >>> - /* Convert to (const char*), loosely based on >>> - c/c-typeck.c: array_to_pointer_conversion, >>> - by taking address of start of string. */ >>> - tree t_addr = build1 (ADDR_EXPR, m_const_char_ptr, t_str); >>> + tree t_str = build_string_literal (strlen (value) + 1, value); >>> - return new rvalue (this, t_addr); >>> + return new rvalue (this, t_str); >>> } >> >> In the above, the call to build_string with strlen is replaced with >> build_string_literal with strlen + 1. >> >> >> build_string's comment says: >> >> "Note that for a C string literal, LEN should include the trailing >> NUL." >> >> but has: >> >> length = len + offsetof (struct tree_string, str) + 1; >> >> and: >> >> TREE_STRING_LENGTH (s) = len; >> memcpy (s->string.str, str, len); >> s->string.str[len] = '\0'; >> >> suggesting that the "len" parameter is in fact the length *without* the >> trailing NUL, and that a trailing NUL is added by build_string. >> > > Yes, string constants in tree objects have another zero termiation, > but varasm.c does something different, there the index range takes > precedence. > The index range is built in build_string_literal as follows: > > elem = build_type_variant (char_type_node, 1, 0); > index = build_index_type (size_int (len - 1)); > type = build_array_type (elem, index); > > therefore the string constant hast the type char[0..len-1] > thus only len bytes are significant for code generation, the extra > nul is just for "convenience". > >> However build_string_literal has: >> >> t = build_string (len, str); >> elem = build_type_variant (char_type_node, 1, 0); >> index = build_index_type (size_int (len - 1)); >> >> suggesting that the len is passed directly to build_string (and thus >> ought to be strlen), but the build_index_type uses len - 1 (which >> suggests that len is the size of the string, rather than its length). >> >> What's the intended meaning of len in these functions? >> > > I hope this helps. > > > Thanks > Bernd. > > >> Thanks >> Dave >>