Re: Hash Function for "switch statement"

Jae Hyuk Kwak Thu, 18 Mar 2010 17:10:33 -0700

Hi Michael,

Thank you for the details.

On Thu, Mar 18, 2010 at 8:17 AM, Michael Meissner
<meiss...@linux.vnet.ibm.com> wrote:
>> > Note, that many hash tables are computed by the modulus operation, which is
>> > often fairly expensive (and on machines without a hardware divide unit,
>> > requiring a function call).  I would expect many switch statements would 
>> > slow
>> > down if you switch to a hash operation that used modolus.
>>
>> I agree that the cost of modulation can be high, but it can be even
>> higher if we use a bunch of "else if".
>> Consider the situation that a program has about 400 cases on a single
>> switch statement.
>
> However the compiler doesn't do 400 if-then-else's, instead it generates a
> decision tree using less than/greater than type tests so that log2
> comparison/branches are made.  So if there are 400 cases, it means there are
> probably 10 comparisons/branches done for any given value (and on some
> architectures, the last comparison for equality can be eliminated).  Looking 
> at
> the i386.c tunings, you see divides are in the 66 - 74 cycle ranges for 
> popular
> current chips.  If a comparison/branch is say 6 cycles, that means it is still
> faster on average to do the tree of if's than to do a modulus, load, and
> branch for 400 or so switch elements.

If I understood correctly, your point is that O(log N) is fast enough
because the size of switch is small in practice.
But I am still not convinced that using hash table would not increase the speed.
As we know hash table requires O(N) only.
There must be some point that O(N) bits O(logN), depending on the size
of N and the constant cost of O(N).

Is that true that current implementation of gcc on i386 optimizes
switch statement with decision tree?
I was expecting somebody who can confirm this.

I tried to find the specific implementation on the file, i386.c.
But it is very big file and I have only limited knowledge of gcc yet.
If you can point more specific implementation part, I may be able to
catch up what you meant.

>> If jump table is possible, it can be a choice, but jump table is not
>> always feasible depending on the values on "case".
>
> There are other strategies, for example if most values are clustered together,
> you can use if's for the outlying values, and then use a jump table for the
> values clustered together.  When I did the Data General C compiler many years
> ago, this was a win in our particular environment, because there were often
> code that did a switch on errno with a 0 case, and all of the E values.  In 
> the
> DG environment, the E values were high because each language had its own range
> of error codes, and the C language was a relative late comer.

The paper that Basile pointed well summarized some of strategies you
might refer to.
As I mentioned earlier, the paper uses a branching tree way but does
not dig into hash table part much.
And at the time moment, 2008, gcc did not seem to use even decision
tree for switch statement yet.

Thanks

Jay

Re: Hash Function for "switch statement"

Reply via email to