> Jan Hubicka <hubi...@ucw.cz> writes: > > > Hi, > > this patch started as experiment moving hash_table_mod1 inline because it > > shows > > high in streaming profiles and it represents a branch-less code that is good > > to schedule to surrounding instructions. > > FWIW with a good modern hash function it shouldn't be needed to have > prime hash table sizes anymore. Without that just a power of two size > can be used, so it would be just a mask. > > I considered this last time I messed with hashes, but I didn't actually > see this function as beening hot.
If I measure the wpa-stream thread only, I get about 18% in the hashtable lookup: 18.33% lto1-wpa-stream lto1 [.] hash_table<hash_map<tree_node*, unsigned int, default_hashmap_traits>::hash_entry, xcallocator, true>::find_with_hash(tree_node* const&, uďż˝ 9.53% lto1-wpa-stream lto1 [.] DFS::DFS_write_tree(output_block*, DFS::sccs*, tree_node*, bool, bool, bool) ďż˝ 8.43% lto1-wpa-stream lto1 [.] linemap_lookup(line_maps*, unsigned int) ďż˝ 5.81% lto1-wpa-stream lto1 [.] streamer_write_uhwi_stream(lto_output_stream*, unsigned long) ďż˝ 4.47% lto1-wpa-stream lto1 [.] lto_output_tree(output_block*, tree_node*, bool, bool) ďż˝ 3.11% lto1-wpa-stream libc-2.13.so [.] _int_malloc ďż˝ 2.85% lto1-wpa-stream lto1 [.] DFS::DFS_write_tree_body(output_block*, tree_node*, DFS::sccs*, bool, bool) ďż˝ I am not quite sure hash function is the bottleneck though - I would more expect it to be cache miss sink. It may be interesting though to replace it by something more resonable than this pre-computed divide. Honza > > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only