On Mon, Dec 21, 2009 at 9:28 PM, Richard Henderson <r...@twiddle.net> wrote: > On 12/21/2009 01:13 AM, Laurent Desnogues wrote: >> >> The question for the generalized movcond is how useful is it? >> Which front-ends would need it and would the cost to generate >> code for it on some (most?) back-ends be amortized? > > ... Any front end that has a conditional move instruction? > Sparcv9, Mips32, Alpha, ARM...
As far as I know these CPU's don't need the full movcond but only the variant with vtrue. Even if movcond was quick to generate host code, for instance for ARM, you'd have to explicitly detect conditional moves, which probably wouldn't be worth the cost; I might be wrong, since no one has given it a try. > That said, I think the *biggest* gains are to be had because with movcond -- > at least on some targets -- we can have one BB per TB, and avoid any > intermediate spilling of global registers back to memory. I can't count the number of times I thought some branch removal could only bring improvement, only to see QEMU slow down. The balance between simplicity and good generated code is very hard to achieve (and in that particular case, benchmarking on an Intel just shows how Intel engineers are good at designing branch predictors :-). >> My guess (I use that word given that I didn't do any benchmark >> to sustain my claim) is that your implementation is too complex. > > Too complex for what? The message against which you are quoting has an > implementation of 2 lines. Well I answered to this mail after seeing the SPARC implementation :) Indeed your implementation for i386 setcond2 (setcond is trivial) is not that complex. >> Of course setcond can be implemented in terms of movcond, >> but my guess (again that word...) is that setcond could be >> enough and even faster in most cases. > > To implement condition codes, yes, to implement compare instructions (e.g. > mips slt, alpha cmp{eq,lt,lte}), yes. To implement conditional moves, no. > At least not without using 5 instructions where 1 would suffice. How many instructions would you need to generate one host instruction? If the block is not executed that often, it could be a waste. If you want I can provide you with dynamic counts of ARM conditional mov when running SPEC; but that wouldn't be enough, someone would need to do that for the kernel boot too. I'm not saying movcond is useless, I'm just wondering if it would bring improvements. That's why I would prefer to do all of that stuff incrementally: setcond, then movcond. >> Regarding your patches, I would like to see setcond put in >> mainline with a simplified version for i386. > > Again, simplified from what? The last setcond implementation was 2 lines. I was wrong sorry, I mixed several of your patches. Laurent