I'm sorry I haven't gotten around to chasing this sooner. I needed to learn how to use objdump and read the assembler. The code now lets you define the depth.
It generates some pretty crappy code even with gcc-4.6.0 on O3. The code when generating a 20 deep template inlines every 6 levels. At 50 depth it only does 2 levels at a time. Someone mentioned taking control of this myself and implementing array lookups. Sure, I could also do binary searches and hash tables when the data numbers aren't continuous. The problem with doing these things is that it takes control away from the compiler's optimizer. Theoretically an optimizer should know best when to go from if/elses to binary lookups to hash tables. I could do this kind of stuff at the metaprogramming level, but that could break an optimizer in the future. It reminds me of all the "if x==0" you see in old Fortran code which made things run faster on old machines were branching was cheap. Theoretically I might know which ids are most probable, but profile feedback optimization could be used for this. Would gcc join switch/case statements? switch(id) { case 1: foo(); break;} switch(id) { case 2: bar(); break;} Another idea to revive the switch/case is to implement a utility with a N switch cases, one for each size. I could maybe use self including headers or generate such code. By the way this kind of code exists. Boost::variant::apply_visitor uses repeated switch/case statements. Changing foo() below to switch/case doesn't seem to help. Chris ------------------ improved code #include <iostream> #include <cstdlib> #include <typeinfo> struct DecodeContext; struct M1{ static const int id=13; static void decode(DecodeContext& ){ std::cout<<"1\n";} }; struct M2{ static const int id=17; static void decode(DecodeContext& ){ std::cout<<"2\n";} }; struct M3{ static const int id=19; static void decode(DecodeContext& ){ std::cout<<"3\n";} }; template<int N> struct M{ static const int id=N*3+1024; static void decode(DecodeContext& ){ std::cout<<typeid(M).name()<<" "<<N<<"!\n";} }; template <int> struct ListMember; template <> struct ListMember<1>{ typedef M1 type; }; template <> struct ListMember<2>{ typedef M2 type; }; template <> struct ListMember<3>{ typedef M3 type; }; template <int N> struct ListMember { typedef M<N> type; }; template<int i> void foo(int id, DecodeContext& dc) { typedef typename ListMember<i>::type M; if(M::id==id) M::decode(dc); else foo<i-1>(id,dc); } template<> void foo<0>(int id, DecodeContext& dc) {} int main(){ DecodeContext& dc= *(DecodeContext*)0;// junk for now int id=std::rand(); // runtime dependent foo<50>(id,dc); return 0; } -----Original Message----- From: Piotr Rak [mailto:piotr....@gmail.com] Sent: Wednesday, July 28, 2010 8:17 PM To: Hite, Christopher Cc: Richard Guenther; gcc@gcc.gnu.org Subject: Re: optimization question: mpl Hi, 2010/7/28 Hite, Christopher <christopher.h...@partner.commerzbank.com>: >> Generally without knowing the compiler version you are using it is >> hard to tell. > I'll use whatever's best. Right now I'm still on 4.4.3. I'll > probably upgrade soon to 4.5. > >> The same is true without a complete compilable testcase. > I didn't want to make a test case that depends on boost::mpl. How's > this: > > struct DecodeContext; > > struct M1{ > static const int id=1; > static void decode(DecodeContext& ){} }; > > struct M2{ > static const int id=2; > static void decode(DecodeContext& ){} }; > > struct M3{ > static const int id=3; > static void decode(DecodeContext& ){} }; > > > template <int> struct ListMember; > > template <> struct ListMember<1>{ typedef M1 type; }; > template <> struct ListMember<2>{ typedef M2 type; }; > template <> struct ListMember<3>{ typedef M3 type; }; > > template<int i> > void foo(int id, DecodeContext& dc) { > typedef typename ListMember<i>::type M; > if(M::id==id) > M::decode(dc); > else > foo<i-1>(id,dc); > } > > template<> > void foo<0>(int id, DecodeContext& dc) {} > > > > int main(){ > DecodeContext& dc= *(DecodeContext*)0;// junk for now > int id=2; //sometime runtime dependent > foo<3>(id,dc); > return 0; > } > > >> You can use the flatten attribute to tell the compiler to inline all >> calls in a given function, like >> >> void __attribute__((flatten)) foo(void) >> { >> ... >> decode1(); >> ... >> } > > That would cause decode1() to be inlined, which might not be what you > want. > > Hmm maybe I could rewrite things so the switch case returns a function > pointer. I'm guessing that would make things slower though. > Or you could just initialize static array of pointers, like that (please note, that I never compiled code below): typedef void (*pfn) (DeclContext&); tempate <int I> struct InitDispatchHelper { static void init(std::tr1::array<pfn,N>& a) { a[I-1] = ListMemeber<I-1>::foo; InitPointersHelper<I-1>::init(a); } }; // End recursion template<> struct InitDispatchHelper { static void init(std::tr1::array<pfn,N>& a) { /*does nothing */ } }; // Dispatch table; template <int N> struct DispatchFoo : std::tr1::array<pfn, N> { DispatchFoo() : std::tr1::array<pfn, N>() { InitDispatchHelper<N>::init(*this); } }; then your call would look like: static const int X = /* your maximal I in ListMembers meta-class */ void call_foo(int i, DeclContext& c) { static const DispatchFoo<X> foo_dispatch; foo_dispatch[i] (c); } Bonus points for benchmarking both solutions and making it policy class choosing between those two solutions depending on X. Good luck, Piotr