Hans-Peter Nilsson: I should have listened to you back when you raised
concerns about this. My apologies for ever doubting you.
In summary:
- The "trick" in the docs for using an arbitrarily sized struct to force
register flushes for inline asm does not work.
- Placing the inline asm in a separate routine can sometimes mask the
problem with the trick not working.
- The sample that has been in the docs forever performs an unhelpful,
unexpected, and probably unwanted stack allocation + memcpy.
Details:
Here is the text from the docs:
-----------
One trick to avoid [using the "memory" clobber] is available if the size
of the memory being accessed is known at compile time. For example, if
accessing ten bytes of a string, use a memory input like:
"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )
-----------
When I did the re-write of gcc's inline asm docs, I left the description
for this (essentially) untouched. I just took it on faith that "magic
happens" and the right code gets generated. But reading a recent post
raised questions for me, so I tried it. And what I found was that not
only does this not work, it actually just makes a mess.
I started with some code that I knew required some memory clobbering:
#include <stdio.h>
int main(int argc, char* argv[])
{
struct
{
int a;
int b;
} c;
c.a = 1;
c.b = 2;
int Count = sizeof(c);
void *Dest;
__asm__ __volatile__ ("rep; stosb"
: "=D" (Dest), "+c" (Count)
: "0" (&c), "a" (0)
//: "memory"
);
printf("%u %u\n", c.a, c.b);
}
As written, this x64 code (compiled with -O2) will print out "1 2", even
though someone might (incorrectly) expect the asm to overwrite the
struct with zeros. Adding the memory clobber allows this code to work
as expected (printing "0 0").
Now that I have code I can use to see if registers are getting flushed,
I removed the memory clobber, and tried just 'clobbering' the struct:
#include <stdio.h>
int main(int argc, char* argv[])
{
struct
{
int a;
int b;
} c;
c.a = 1;
c.b = 2;
int Count = sizeof(c);
void *Dest;
__asm__ __volatile__ ("rep; stosb"
: "=D" (Dest), "+c" (Count)
: "0" (&c), "a" (0),
"m" ( ({ struct foo { char x[8]; } *p = (struct foo *)&c ;
*p; }) )
);
printf("%u %u\n", c.a, c.b);
}
I'm using a named struct (foo) to avoid some compiler messages, but
other than that, I believe this is the same as what's in the docs. And
it doesn't work. I still get "1 2".
At this point I realized that code I've seen using this trick usually
has the asm in its own routine. When I try this, it still fails.
Unless I start cranking up the size of x from 8 to ~250. At ~250,
suddenly it starts working. Apparently this is because at this point,
gcc decides not to inline the routine anymore, and flushes the registers
before calling the non-inline code.
And why does changing the size of the structure we are pointing to
result in increases in the size of the routine? Reading the -S output,
the "*p" at the end of this constraint generates a call to memcpy the
250 characters onto the stack, which it passes to the asm as %4, which
is never used. Argh!
Conclusion:
What I expected when using that sample code from the docs was that any
registers that contain values from the struct would get flushed to
memory. This was intended to be a 'cheaper' alternative to doing a
full-on "memory" clobber. What I got instead was an unexpected/unneeded
stack allocation and memcpy, and STILL didn't get the values flushed.
Yeah, not exactly the 'cheaper' I was hoping for.
Is the example in the docs just written incorrectly? Did this get
broken somewhere along the line? Or am I just using it wrong?
I'm using gcc version 4.9.0 (x86_64-win32-seh-rev2, Built by MinGW-W64
project). Remember to compile these x64 samples with -O2.
dw