Hello list,

I've been working on my project full time since last week, and on a part-time basis before then. Hopefully I'll be posting updates/patches more often now that my exams are over. For anyone that wants to talk to me, I'm jimis on the IRC.

I've looked a little into hash tables, symtab and hashtab, with no encouraging results unfortunately. Measurements showed that too many collisions were happening (more than N/2 for N searches) so I was hoping that reducing them would make a difference. This was not the case, the difference was negligible, so I stopped looking into hash tables for the time being. The only patches worth submitting until now are probably some statistics printing for various hash tables when -fmem-report is passed.

In the future I plan to try more radical changes like changing the hash function (we are using Bob Jenkins' v2 hash, upgrade to v3 which is supposed to be faster and better) and breaking the functionality of htab_find_slot* functions into smaller ones. Maybe also use the symtab hash table outside of the preprocessor, where only strings are stored (file_table in dwarf2out for example). Nicola (CC'd), you'd mentioned that you have done work on hash tables. Which parts did you change? Have you seen any measurable differences?

I spent more time fiddling with dwarf2out_* functions that output the final assembly, and this has proven more fruitful. I measured a significant amount of time spent into libc's vfprintf(), mostly coming from ASM_OUTPUT_ASCII macros and dw2_asm_output_data(), with a format string mostly like "%s %#x". In order to avoid the argument parsing overhead and the hex conversion (implemented suboptimally in glibc) I changed the hottest callers with fwrite()/fputs() and implemented a puthexl() function for hex conversion:

static void puthexl (unsigned long value, FILE *f)
{
  static char hex_repr[16]= {'0', '1', '2', '3', '4', '5', '6', '7',
                             '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
  static char buf[2 + 2*sizeof(value)]= "0x";
  int i;
  int j= 2;

  for (i = 8*sizeof(value)-4; i>=0; i-= 4)
    {
      char c= (value >> i) & 0xf;
      if (c!=0 || j>2)
        {
          buf[j]= hex_repr[(int)c];
          j++;
        }
    }

  if (j>2)
    fwrite(buf, 1, j, f);
  else
    putc('0', f);
}

I also performed some other minor changes like implementing ASM_OUTPUT_LIMITED_STRING with a function using a jump table in elfos.h. BTW, elfos.h is using a too complex ASM_OUTPUT_ASCII macro, it actually reparses the whole string to find substrings to output with ASM_OUTPUT_LIMITED_STRING... is that necessary?

All in all I measured a 30-50 ms speedup in cc1 run time out of ~850ms total. Take the numbers with a grain of salt until I fix some library problems with an old PC, I'll publish final numbers then.


All comments are welcome,
Dimitris


P.S. I am keeping notes at http://gcc.gnu.org/wiki/OptimisingGCC feel free to comment/edit on anything

Reply via email to