Hello list,
I've been working on my project full time since last week, and on a
part-time basis before then. Hopefully I'll be posting updates/patches
more often now that my exams are over. For anyone that wants to talk to
me, I'm jimis on the IRC.
I've looked a little into hash tables, symtab and hashtab, with no
encouraging results unfortunately. Measurements showed that too many
collisions were happening (more than N/2 for N searches) so I was hoping
that reducing them would make a difference. This was not the case, the
difference was negligible, so I stopped looking into hash tables for the
time being. The only patches worth submitting until now are probably some
statistics printing for various hash tables when -fmem-report is passed.
In the future I plan to try more radical changes like changing the hash
function (we are using Bob Jenkins' v2 hash, upgrade to v3 which is
supposed to be faster and better) and breaking the functionality of
htab_find_slot* functions into smaller ones. Maybe also use the symtab
hash table outside of the preprocessor, where only strings are stored
(file_table in dwarf2out for example). Nicola (CC'd), you'd mentioned that
you have done work on hash tables. Which parts did you change? Have you
seen any measurable differences?
I spent more time fiddling with dwarf2out_* functions that output the
final assembly, and this has proven more fruitful. I measured a
significant amount of time spent into libc's vfprintf(), mostly coming
from ASM_OUTPUT_ASCII macros and dw2_asm_output_data(), with a format
string mostly like "%s %#x". In order to avoid the argument parsing
overhead and the hex conversion (implemented suboptimally in glibc) I
changed the hottest callers with fwrite()/fputs() and implemented a
puthexl() function for hex conversion:
static void puthexl (unsigned long value, FILE *f)
{
static char hex_repr[16]= {'0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
static char buf[2 + 2*sizeof(value)]= "0x";
int i;
int j= 2;
for (i = 8*sizeof(value)-4; i>=0; i-= 4)
{
char c= (value >> i) & 0xf;
if (c!=0 || j>2)
{
buf[j]= hex_repr[(int)c];
j++;
}
}
if (j>2)
fwrite(buf, 1, j, f);
else
putc('0', f);
}
I also performed some other minor changes like implementing
ASM_OUTPUT_LIMITED_STRING with a function using a jump table in elfos.h.
BTW, elfos.h is using a too complex ASM_OUTPUT_ASCII macro, it actually
reparses the whole string to find substrings to output with
ASM_OUTPUT_LIMITED_STRING... is that necessary?
All in all I measured a 30-50 ms speedup in cc1 run time out of ~850ms
total. Take the numbers with a grain of salt until I fix some library
problems with an old PC, I'll publish final numbers then.
All comments are welcome,
Dimitris
P.S. I am keeping notes at http://gcc.gnu.org/wiki/OptimisingGCC feel free
to comment/edit on anything