On 6/25/19 5:03 PM, Martin Sebor wrote: > > The caller ensures that handle_char_store is only called for stores > to arrays (MEM_REF) or single elements as wide as char. Where? I don't see it, even after fixing the formatting in strlen_check_and_optimize_stmt :-)
> gimple *stmt = gsi_stmt (*gsi); > > if (is_gimple_call (stmt)) I think we can agree that we don't have a call, so this is false. > else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt)) > { > tree lhs = gimple_assign_lhs (stmt); This should be true IIUC, so we'll go into its THEN block. > if (TREE_CODE (lhs) == SSA_NAME && POINTER_TYPE_P (TREE_TYPE (lhs))) Should be false. > else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE > (lhs))) Should also be false. > else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs)) Should be true since LHS is a MEM_REF. > { > tree type = TREE_TYPE (lhs); > if (TREE_CODE (type) == ARRAY_TYPE) > type = TREE_TYPE (type); > if (TREE_CODE (type) == INTEGER_TYPE > && TYPE_MODE (type) == TYPE_MODE (char_type_node) > && TYPE_PRECISION (type) == TYPE_PRECISION (char_type_node)) > { > if (! handle_char_store (gsi)) > return false; > } > } If TREE_TYPE (type) is an ARRAY_TYPE, we strip the ARRAY_TYPE. We then verify that TYPE is a single character type. _But_ we stripped away the ARRAY_TYPE. So ISTM that we allow either an array of chars or a single char on the LHS. So how does this avoid multiple character stores?!? We could have had an ARRAY_REF on the LHS and we never check the number of elements in the array. There's no check on the RHS either. SO I don't see how we guarantee that we're dealing with a single character store. What am I missing here? > > What you describe sounds like > > char a[N]; > *(int*)a = 0x31323300; > > which is represented as > > MEM[(int *)&b] = 825373440;This would be closer (I realize it's not C): char a[N]; a[0..3] = 0x313233300; > > The LHS type of that is int so the function doesn't get called. I'm concerned about the case where the LHS is an array. >> And if the NUL byte in the original was at byte offset 2, then didn't we >> just change the length by overwriting where the NUL is? > > No, because cmp is the result of compare_nonzero_chars and cmp > 0 > means: > > 1 if SI is known to start with more than OFF nonzero characters > > i.e., the character is being stored before the terminating nul. > This is the basis of the original optimization: > > /* If si->nonzero_chars > OFFSET, we aren't overwriting '\0', > and if we aren't storing '\0', we know that the length of the > string and any other zero terminated string in memory remains > the same. But all this is predicated on the assumption that we're dealing with a single character memory store. I don't see what enforces that precondition.