http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56199



Ondrej Bilka <neleai at seznam dot cz> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

             Status|RESOLVED                    |UNCONFIRMED

         Resolution|INVALID                     |



--- Comment #4 from Ondrej Bilka <neleai at seznam dot cz> 2013-02-04 15:19:22 
UTC ---

> It should be faster if the string is not in the cache.  Which of course it is

> for your testcase (because you have an artificial loop here).



And also expected case because you did expansion. It should be on hot path and

string will be in case. Otherwise not doing expansion at all is faster.



As you mentioned cache behaviour it includes also instruction cache. And

current implementation is quite hostile to instruction cache (see another

benchmark). 

Cases where 



> So the benchmark does not show that the transform is bad but instead it shows

> that if repeatedly initializing sth from the same (large) constants then it's

> profitable to use a smaller instruction encoding.

One property of benchmark is minimality. I could write benchmark strcpy called

at five places with different strings and more complex control flow if that is

your point.



>  But of course that's again

> likely only true when 'cpy' is not inlined - in which case we cannot

> distinguish the cases.

Please explain.



And for strings larger that 128 bytes you inline repne strcpy variant that is

slower than calling strcpy.

Reply via email to