On 3/3/2014 3:36 AM, Richard Sandiford wrote:
dw <limegreenso...@yahoo.com> writes:
On 2/27/2014 11:32 PM, Richard Sandiford wrote:
dw <limegreenso...@yahoo.com> writes:
On 2/27/2014 4:11 AM, Richard Sandiford wrote:
Andrew Haley <a...@redhat.com> writes:
Over the years there has been a great deal of traffic on these lists
caused by misunderstandings of GCC's inline assembler. That's partly
because it's inherently tricky, but the existing documentation needs
to be improved.
dw <limegreenso...@yahoo.com> has done a fairly thorough reworking of
the documentation. I've helped a bit.
Section 6.41 of the GCC manual has been rewritten. It has become:
6.41 How to Use Inline Assembly Language in C Code
6.41.1 Basic Asm - Assembler Instructions with No Operands
6.41.2 Extended Asm - Assembler Instructions with C Expression Operands
We could simply post the patch to GCC-patches and have at it, but I
think it's better to discuss the document here first. You can read it
at
http://www.LimeGreenSocks.com/gcc/Basic-Asm.html
http://www.LimeGreenSocks.com/gcc/Extended-Asm.html
http://www.LimeGreenSocks.com/gcc/extend04.zip (contains .texi, .patch,
and affected html pages)
All comments are very welcome.
Thanks for doing this, looks like a big improvement.
Thanks, I did my best. I appreciate you taking the time to review them.
A couple of comments:
The section on basic asms says:
Do not expect a sequence of asm statements to remain perfectly
consecutive after compilation. To ensure that assembler instructions
maintain their order, use a single asm statement containing multiple
instructions. Note that GCC's optimizer can move asm statements
relative to other code, including across jumps.
The "maintain their order" might be a bit misleading, since volatile asms
(including basic asms) must always be executed in the original order.
Maybe this was meaning placement/address order instead?
This statement is based on this text from the existing docs:
"Similarly, you can't expect a sequence of volatile |asm| instructions
to remain perfectly consecutive. If you want consecutive output, use a
single |asm|."
I do not dispute what you are saying. I just want to confirm that the
existing docs are incorrect before making a change. Also, see Andi's
response re -fno-toplevel-reorder.
It seems to me that recommending "single statement" is both the
clearest, and the safest approach here. But I'm prepared to change my
mind if there is consensus I should.
Right. I agree with that part. I just thought that the "maintain their
order" could be misunderstood as meaning execution order, whereas I think
both sentences of the original docs were talking about being "perfectly
consecutive" (which to me means "there are no other instructions inbetween").
Hmm. I'm not seeing the differences here that you do.
Well, like you say, things can be moved across branches. So, although
this is a very artificial example:
asm ("x");
asm ("y");
could become:
goto bar;
foo:
asm ("y");
...
bar:
asm ("x");
goto foo;
This has reordered the instructions in the sense that they have a
different order in memory. But they are still _executed_ in the same
order. Actually reordering the execution would be a serious bug.
So I just want to avoid anything that gives the impression that "y" can
be executed before "x" in this example. I still think:
Since the existing docs say "GCC's optimizer can move asm statements
relative to other code", how would you feel about:
"Do not expect a sequence of |asm| statements to remain perfectly
consecutive after compilation. If you want to stop the compiler from
reordering or inserting anything into a sequence of assembler
instructions, use a single |asm| statement containing multiple
instructions. Note that GCC's optimizer can move |asm| statements
relative to other code, including across jumps."
...this gives the impression that we might try to execute volatiles
in a different order.
Ahh! Ok, I see what you mean. Hmm. Based on the description of
"no-toplevel-reorder", I assumed that it actually *might* re-order them.
So, more like:
"GCC's optimizer can move asm statements relative to other
code, including across jumps. This has implications for code
that contains a sequence of asm statements. While the execution
order of asm statements will be preserved, do not expect the sequence of asm
statements to remain perfectly consecutive in the compiler's output.
To ensure that assembler instructions maintain their order, use a
single asm statement containing multiple instructions."
It might also be
worth mentioning that the number of instances of an asm in the output
may be different from the input. (Can it increase as well as decrease?
I'm not sure off-hand, but probably yes.)
So, in the volatile section, how about something like this for decrease:
"GCC does not delete a volatile |asm| if it is reachable, but may delete
it if it can prove that control flow never reaches the location of the
instruction."
It's not just that though. AIUI it would be OK for:
if (foo)
{
...
asm ("x");
}
else
{
...
asm ("x");
}
to become:
if (foo)
...
else
...
asm ("x");
Could be. However, I'm not clear what benefit there would be from
doc'ing this possibility?
I was just thinking that something along the lines of "Optimizations
may introduce or remove duplicates of an asm, provided that this does
not change which asms are executed." would be more general than just
talking about introducing duplicates.
Ok.
In the extended section:
Unless an output operand has the '&' constraint modifier (see
Modifiers), GCC may allocate it in the same register as an unrelated
input operand, [...]
It could also use it for addresses in other (memory) outputs.
Ok. But I'm not sure this really adds anything. Having warned people
that the register may be re-used unless '&' is used seems sufficient.
It matters where it can be reused though. If you talk about input
operands only, people might think it is OK to write asms of the form:
foo tmp,[input0]
bar [output0],tmp
frob [output1],tmp
where output0 is a register and output1 is a memory. This safely avoids
using the input operand after assigning to output0, but the address in
output1 is still live and could be changed by bar.
I'm not sure we're talking about the same problem. I'm borrowing this
x86 example from someone else:
static inline char *
lcopy( char *dst, const char *src, long len )
{
asm(
"shr $3,%2; " /* how many qwords to copy */
"rep movsq; " /* copy that many */
"mov %3,%2; " /* how many bytes to copy */
"rep movsb" /* copy that many */
: "+D" (dst), "+S" (src), "+c" (len)
: "r" (len & 7)
: "memory");
return dst;
}
You might expect that "len" and "len & 7" are two different things.
However if the function is called with a constant less than 8, the
compiler knows that they are actually the same and uses rcx for both,
giving mov rcx,rcx for mov %3,%2 and of course by then rcx is zero.
Using & on len forces the use of two separate registers.
This seems to me to be a different kind of problem than:
asm ("xxx": "=r" (x), "=m" (x));
Or am I missing your point?
Well, that code is just one instance of (and a good example of)
the principle that GCC assumes all inputs are consumed before any
outputs are written. And the point is that the "inputs" in that
description aren't restricted to input operands: they apply to any
rvalues in the output operands too.
E.g. the same thing could occur for an artificial case like:
asm ("...." : "+r" (ptr), "=m" (*x));
if GCC realises that x==ptr. Then the address in operand 1 might
be the same as operand 0. The same goes for:
asm ("...." : "=r" (ptr), "=m" (*x) : "0" (ptr));
which is really just another way of writing the same thing.
So, more like this:
"Unless an output operand has the '|&|' constraint modifier (see
Modifiers <cid:part1.06070300.09010507@yahoo.com>), GCC may allocate it
in the same register as an unrelated input operand, on the assumption
that the assembler code will consume its inputs before producing
outputs. This assumption may be false if the assembler code consists of
more than one instruction. Further, if the compiler determines that two
output operands refer to the same object, output operands can also be
combined to use the same register. To prevent this combining, use '|&|'
for each output operand that must not overlap another operand.