On 6/26/19 4:31 PM, Jeff Law wrote:
On 6/25/19 5:03 PM, Martin Sebor wrote:


The caller ensures that handle_char_store is only called for stores
to arrays (MEM_REF) or single elements as wide as char.
Where?  I don't see it, even after fixing the formatting in
strlen_check_and_optimize_stmt :-)

   gimple *stmt = gsi_stmt (*gsi);

   if (is_gimple_call (stmt))

I think we can agree that we don't have a call, so this is false.

  else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
     {
       tree lhs = gimple_assign_lhs (stmt);
This should be true IIUC, so we'll go into its THEN block.


       if (TREE_CODE (lhs) == SSA_NAME && POINTER_TYPE_P (TREE_TYPE (lhs)))
Should be false.

       else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE 
(lhs)))

Should also be false.

       else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs))
Should be true since LHS is a MEM_REF.


        {
           tree type = TREE_TYPE (lhs);
           if (TREE_CODE (type) == ARRAY_TYPE)
             type = TREE_TYPE (type);
           if (TREE_CODE (type) == INTEGER_TYPE
               && TYPE_MODE (type) == TYPE_MODE (char_type_node)
               && TYPE_PRECISION (type) == TYPE_PRECISION (char_type_node))
             {
               if (! handle_char_store (gsi))
                 return false;
             }
         }
If TREE_TYPE (type) is an ARRAY_TYPE, we strip the ARRAY_TYPE.  We then
verify that TYPE is a single character type.  _But_ we stripped away the
ARRAY_TYPE.  So ISTM that we allow either an array of chars or a single
char on the LHS.

So how does this avoid multiple character stores?!?  We could have had
an ARRAY_REF on the LHS and we never check the number of elements in the
array.  There's no check on the RHS either.  SO I don't see how we
guarantee that we're dealing with a single character store.

What am I missing here?

Can you show me a test case where it breaks?  If not, I don't know
what you want me to do.  I'll just move on to something else.

Martin





What you describe sounds like

   char a[N];
   *(int*)a = 0x31323300;

which is represented as

   MEM[(int *)&b] = 825373440;This would be closer (I realize it's not C):

   char a[N];
   a[0..3] = 0x313233300;



The LHS type of that is int so the function doesn't get called.
I'm concerned about the case where the LHS is an array.

And if the NUL byte in the original was at byte offset 2, then didn't we
just change the length by overwriting where the NUL is?

No, because cmp is the result of compare_nonzero_chars and cmp > 0
means:

   1 if SI is known to start with more than OFF nonzero characters

i.e., the character is being stored before the terminating nul.
This is the basis of the original optimization:

   /* If si->nonzero_chars > OFFSET, we aren't overwriting '\0',
      and if we aren't storing '\0', we know that the length of the
      string and any other zero terminated string in memory remains
      the same.
But all this is predicated on the assumption that we're dealing with a
single character memory store.  I don't see what enforces that precondition.


Reply via email to