This speeds up streamer_read_uhwi (top in mozilla LTO profile) by delaying the section overrun check and inlining streamer_read_uchar manually, performing CSE and optimizing the 1-byte case.
LTO bootstrapped on x86_64-unknown-linux-gnu, applied. Richard. 2013-06-20 Richard Biener <rguent...@suse.de> * data-streamer-in.c (streamer_read_uhwi): Optimize single byte case, inline streamer_read_uchar and defer section overrun check. Index: gcc/data-streamer-in.c =================================================================== *** gcc/data-streamer-in.c (revision 200189) --- gcc/data-streamer-in.c (working copy) *************** bp_unpack_string (struct data_in *data_i *** 120,137 **** unsigned HOST_WIDE_INT streamer_read_uhwi (struct lto_input_block *ib) { ! unsigned HOST_WIDE_INT result = 0; ! int shift = 0; unsigned HOST_WIDE_INT byte; ! while (true) { ! byte = streamer_read_uchar (ib); ! result |= (byte & 0x7f) << shift; ! shift += 7; ! if ((byte & 0x80) == 0) ! return result; } } --- 120,152 ---- unsigned HOST_WIDE_INT streamer_read_uhwi (struct lto_input_block *ib) { ! unsigned HOST_WIDE_INT result; ! int shift; unsigned HOST_WIDE_INT byte; + unsigned int p = ib->p; + unsigned int len = ib->len; ! const char *data = ib->data; ! result = data[p++]; ! if ((result & 0x80) != 0) { ! result &= 0x7f; ! shift = 7; ! do ! { ! byte = data[p++]; ! result |= (byte & 0x7f) << shift; ! shift += 7; ! } ! while ((byte & 0x80) != 0); } + + /* We check for section overrun after the fact for performance reason. */ + if (p > len) + lto_section_overrun (ib); + + ib->p = p; + return result; }