On Tue, Jun 10, 2014 at 10:32 AM, Thomas Preud'homme <thomas.preudho...@arm.com> wrote: > Hi there, > > With recent changes to it, the bswap pass can now replace a series of > (probably aligned) load + bitwise operation (AND, OR and shifts) + casts > by a (potentially unaligned) load and a bswap. I was rightfully pointed > out that this might be more expensive than the original sequence of > gimple statements. Therefore I am trying to compute the cost of the > sequence with and without the transformation to make an informed > decision. > > So far I proceeded by reusing the computation_cost function from > ivopts and various functions from expmed (shift_cost, convert_cost > and some new ones: rot_cost for instance). However, this doesn't > allow me to compute the cost of a function call (the call to the bswap > builtin) and I am lurking towards exposing expand_gimple_stmt () in > a new function gimple_stmt_cost (). I am wondering though if it is a > correct thing to do as I am not familiar with how expansion operates. > I am also wondering if I should use gimple_stmt_cost as seldomly as > possible or on the contrary make use of it for all statements so as to > get rid of the modifications in ivopts and expmed. > > I'd appreciate any advices on how to compute the cost of a sequence > of gimple statements.
In general this is impossible to do. I don't have a good answer on how to determine whether (unaligned) load + bswap is faster than doing sth else - but there is a very good chance that the original code is even worse. For the unaligned load you can expect an optimal code sequence to be generated - likewise for the bswap. Now - if you want to do the best for the combination of both I'd say you add support to the expr.c bitfield extraction code to do the bswap on-the-fly and use TER to see that you are doing the bswap on a memory source. Anyway, what you'd really need to do is compare the original code against the transform where on GIMPLE it's very-many-stmts vs. two-stmts, and thus "obviously faster". There is only two choices - disable unaligned-load + bswap on SLOW_UNALIGNED_ACCESS targets or not. Doing sth more fancy won't do the trick and isn't worth the trouble IMHO. Richard. > Best regards, > > Thomas Preud'homme > >