I'm sorry I haven't gotten around to chasing this sooner.  I needed to learn 
how to use objdump and read the assembler.  The code now lets you define the 
depth.

It generates some pretty crappy code even with gcc-4.6.0 on O3.  The code when 
generating a 20 deep template inlines every 6 levels.  At 50 depth it only does 
2 levels at a time.

Someone mentioned taking control of this myself and implementing array lookups. 
 Sure, I could also do binary searches and hash tables when the data numbers 
aren't continuous.  The problem with doing these things is that it takes 
control away from the compiler's optimizer.  Theoretically an optimizer should 
know best when to go from if/elses to binary lookups to hash tables.  I could 
do this kind of stuff at the metaprogramming level, but that could break an 
optimizer in the future.  It reminds me of all the "if x==0" you see in old 
Fortran code which made things run faster on old machines were branching was 
cheap.

Theoretically I might know which ids are most probable, but profile feedback 
optimization could be used for this.

Would gcc join switch/case statements?
        switch(id) { case 1: foo(); break;}
        switch(id) { case 2: bar(); break;}

Another idea to revive the switch/case is to implement a utility with a N 
switch cases, one for each size.  I could maybe use self including headers or 
generate such code.


By the way this kind of code exists.  Boost::variant::apply_visitor uses 
repeated switch/case statements.  Changing foo() below to switch/case doesn't 
seem to help.

Chris

------------------ improved code
#include <iostream>
#include <cstdlib>
#include <typeinfo>

struct DecodeContext;

struct M1{
        static const int id=13;
        static void decode(DecodeContext& ){ std::cout<<"1\n";}
};

struct M2{
        static const int id=17;
        static void decode(DecodeContext& ){ std::cout<<"2\n";}
};

struct M3{
        static const int id=19;
        static void decode(DecodeContext& ){ std::cout<<"3\n";}
};


template<int N>
struct M{
        static const int id=N*3+1024;
        static void decode(DecodeContext& ){ std::cout<<typeid(M).name()<<" 
"<<N<<"!\n";}
};


template <int> struct ListMember;

template <> struct ListMember<1>{ typedef M1 type; };
template <> struct ListMember<2>{ typedef M2 type; };
template <> struct ListMember<3>{ typedef M3 type; };

template <int N> struct ListMember { typedef M<N> type; };



template<int i>
void foo(int id, DecodeContext& dc) {
        typedef typename ListMember<i>::type M;
        if(M::id==id)
                M::decode(dc);
        else
                foo<i-1>(id,dc);
}

template<>
void foo<0>(int id, DecodeContext& dc) {}



int main(){
        DecodeContext& dc= *(DecodeContext*)0;// junk for now
        int id=std::rand(); // runtime dependent
        foo<50>(id,dc);
        return 0;
}

-----Original Message-----
From: Piotr Rak [mailto:piotr....@gmail.com] 
Sent: Wednesday, July 28, 2010 8:17 PM
To: Hite, Christopher
Cc: Richard Guenther; gcc@gcc.gnu.org
Subject: Re: optimization question: mpl

Hi,

2010/7/28 Hite, Christopher <christopher.h...@partner.commerzbank.com>:
>> Generally without knowing the compiler version you are using it is 
>> hard to tell.
> I'll use whatever's best.  Right now I'm still on 4.4.3.  I'll 
> probably upgrade soon to 4.5.
>
>> The same is true without a complete compilable testcase.
> I didn't want to make a test case that depends on boost::mpl. How's
> this:
>
> struct DecodeContext;
>
> struct M1{
>        static const int id=1;
>        static void decode(DecodeContext& ){} };
>
> struct M2{
>        static const int id=2;
>        static void decode(DecodeContext& ){} };
>
> struct M3{
>        static const int id=3;
>        static void decode(DecodeContext& ){} };
>
>
> template <int> struct ListMember;
>
> template <> struct ListMember<1>{ typedef M1 type; };
> template <> struct ListMember<2>{ typedef M2 type; };
> template <> struct ListMember<3>{ typedef M3 type; };
>
> template<int i>
> void foo(int id, DecodeContext& dc) {
>        typedef typename ListMember<i>::type M;
>        if(M::id==id)
>                M::decode(dc);
>        else
>                foo<i-1>(id,dc);
> }
>
> template<>
> void foo<0>(int id, DecodeContext& dc) {}
>
>
>
> int main(){
>        DecodeContext& dc= *(DecodeContext*)0;// junk for now
>        int id=2; //sometime runtime dependent
>        foo<3>(id,dc);
>        return 0;
> }
>
>
>> You can use the flatten attribute to tell the compiler to inline all
>> calls in a given function, like
>>
>> void __attribute__((flatten)) foo(void)
>> {
>> ...
>> decode1();
>> ...
>> }
>
> That would cause decode1() to be inlined, which might not be what you
> want.
>
> Hmm maybe I could rewrite things so the switch case returns a function
> pointer.  I'm guessing that would make things slower though.
>

Or you could just initialize static array of pointers, like that
(please note, that I never compiled code below):

typedef void (*pfn) (DeclContext&);

tempate <int I>
struct InitDispatchHelper
{
  static void init(std::tr1::array<pfn,N>& a)
  {
    a[I-1] = ListMemeber<I-1>::foo;
    InitPointersHelper<I-1>::init(a);
  }
};

// End recursion
template<>
struct InitDispatchHelper
{
  static void init(std::tr1::array<pfn,N>& a)
  {    /*does nothing */  }
};

// Dispatch table;
template <int N>
struct DispatchFoo : std::tr1::array<pfn, N>
{
  DispatchFoo() : std::tr1::array<pfn, N>()
  {
     InitDispatchHelper<N>::init(*this);
  }
};

then your call would look like:

static const int X = /* your maximal I in ListMembers meta-class */

void call_foo(int i, DeclContext& c)
{
  static const DispatchFoo<X> foo_dispatch;
  foo_dispatch[i] (c);
}

Bonus points for benchmarking both solutions and making it policy
class choosing between those two solutions depending on X.

Good luck,

Piotr

Reply via email to