Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4

Laurent Desnogues Mon, 21 Dec 2009 14:21:21 -0800

On Mon, Dec 21, 2009 at 9:28 PM, Richard Henderson <r...@twiddle.net> wrote:
> On 12/21/2009 01:13 AM, Laurent Desnogues wrote:
>>
>> The question for the generalized movcond is how useful is it?
>> Which front-ends would need it and would the cost to generate
>> code for it on some (most?) back-ends be amortized?
>
> ... Any front end that has a conditional move instruction?
> Sparcv9, Mips32, Alpha, ARM...


As far as I know these CPU's don't need the full movcond but
only the variant with vtrue. Even if movcond was quick to generate
host code, for instance for ARM, you'd have to explicitly detect
conditional moves, which probably wouldn't be worth the cost;
I might be wrong, since no one has given it a try.

> That said, I think the *biggest* gains are to be had because with movcond --
> at least on some targets -- we can have one BB per TB, and avoid any
> intermediate spilling of global registers back to memory.

I can't count the number of times I thought some branch removal
could only bring improvement, only to see QEMU slow down.
The balance between simplicity and good generated code is
very hard to achieve (and in that particular case, benchmarking
on an Intel just shows how Intel engineers are good at designing
branch predictors :-).

>> My guess (I use that word given that I didn't do any benchmark
>> to sustain my claim) is that your implementation is too complex.
>
> Too complex for what?  The message against which you are quoting has an
> implementation of 2 lines.

Well I answered to this mail after seeing the SPARC
implementation :)  Indeed your implementation for i386 setcond2
(setcond is trivial) is not that complex.

>> Of course setcond can be implemented in terms of movcond,
>> but my guess (again that word...) is that setcond could be
>> enough and even faster in most cases.
>
> To implement condition codes, yes, to implement compare instructions (e.g.
> mips slt, alpha cmp{eq,lt,lte}), yes.  To implement conditional moves, no.
>  At least not without using 5 instructions where 1 would suffice.

How many instructions would you need to generate one
host instruction?  If the block is not executed that often, it could
be a waste.  If you want I can provide you with dynamic counts
of ARM conditional mov when running SPEC;  but that wouldn't
be enough, someone would need to do that for the kernel
boot too.

I'm not saying movcond is useless, I'm just wondering if it
would bring improvements.  That's why I would prefer to do
all of that stuff incrementally:  setcond, then movcond.

>> Regarding your patches, I would like to see setcond put in
>> mainline with a simplified version for i386.
>
> Again, simplified from what?  The last setcond implementation was 2 lines.

I was wrong sorry, I mixed several of your patches.


Laurent

Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4

Reply via email to