[RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work

David Wohlferd Wed, 24 Sep 2014 00:44:11 -0700

Hans-Peter Nilsson: I should have listened to you back when you raisedconcerns about this. My apologies for ever doubting you.


In summary:

- The "trick" in the docs for using an arbitrarily sized struct to forceregister flushes for inline asm does not work.- Placing the inline asm in a separate routine can sometimes mask theproblem with the trick not working.- The sample that has been in the docs forever performs an unhelpful,unexpected, and probably unwanted stack allocation + memcpy.


Details:

Here is the text from the docs:

-----------

One trick to avoid [using the "memory" clobber] is available if the sizeof the memory being accessed is known at compile time. For example, ifaccessing ten bytes of a string, use a memory input like:


    "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )
-----------

When I did the re-write of gcc's inline asm docs, I left the descriptionfor this (essentially) untouched. I just took it on faith that "magichappens" and the right code gets generated. But reading a recent postraised questions for me, so I tried it. And what I found was that notonly does this not work, it actually just makes a mess.


I started with some code that I knew required some memory clobbering:

    #include <stdio.h>

    int main(int argc, char* argv[])
    {
      struct
      {
        int a;
        int b;
      } c;

      c.a = 1;
      c.b = 2;

      int Count = sizeof(c);
      void *Dest;

      __asm__ __volatile__ ("rep; stosb"
           : "=D" (Dest), "+c" (Count)
           : "0" (&c), "a" (0)
           //: "memory"
      );

      printf("%u %u\n", c.a, c.b);
    }

As written, this x64 code (compiled with -O2) will print out "1 2", eventhough someone might (incorrectly) expect the asm to overwrite thestruct with zeros. Adding the memory clobber allows this code to workas expected (printing "0 0").

Now that I have code I can use to see if registers are getting flushed,I removed the memory clobber, and tried just 'clobbering' the struct:


    #include <stdio.h>

    int main(int argc, char* argv[])
    {
      struct
      {
        int a;
        int b;
      } c;

      c.a = 1;
      c.b = 2;

      int Count = sizeof(c);
      void *Dest;

      __asm__ __volatile__ ("rep; stosb"
           : "=D" (Dest), "+c" (Count)
           : "0" (&c), "a" (0),

"m" ( ({ struct foo { char x[8]; } *p = (struct foo *)&c ;*p; }) )

      );

      printf("%u %u\n", c.a, c.b);
    }

I'm using a named struct (foo) to avoid some compiler messages, butother than that, I believe this is the same as what's in the docs. Andit doesn't work. I still get "1 2".

At this point I realized that code I've seen using this trick usuallyhas the asm in its own routine. When I try this, it still fails.Unless I start cranking up the size of x from 8 to ~250. At ~250,suddenly it starts working. Apparently this is because at this point,gcc decides not to inline the routine anymore, and flushes the registersbefore calling the non-inline code.

And why does changing the size of the structure we are pointing toresult in increases in the size of the routine? Reading the -S output,the "*p" at the end of this constraint generates a call to memcpy the250 characters onto the stack, which it passes to the asm as %4, whichis never used. Argh!


Conclusion:

What I expected when using that sample code from the docs was that anyregisters that contain values from the struct would get flushed tomemory. This was intended to be a 'cheaper' alternative to doing afull-on "memory" clobber. What I got instead was an unexpected/unneededstack allocation and memcpy, and STILL didn't get the values flushed.Yeah, not exactly the 'cheaper' I was hoping for.

Is the example in the docs just written incorrectly? Did this getbroken somewhere along the line? Or am I just using it wrong?

I'm using gcc version 4.9.0 (x86_64-win32-seh-rev2, Built by MinGW-W64project). Remember to compile these x64 samples with -O2.

dw

[RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work

Reply via email to