> It's hard to optimize Python code well without global analysis. > The problem is that you have to make sure that a long list of "wierd > things", like modifying code or variables via getattr/setattr, aren't > happening before doing significant optimizations. Without that, > you're doomed to a slow implementation like CPython. > > ShedSkin, which imposes some restrictions, is on the right track here. > The __slots__ feature is useful but doesn't go far enough. > > I'd suggest defining "simpleobject" as the base class, instead of > "object", > which would become a derived class of "simpleobject". Objects descended > directly from "simpleobject" would have the following restrictions: > > - "getattr" and "setattr" are not available (as with __slots__) > - All class member variables must be initialized in __init__, or > in functions called by __init__. The effect is like __slots__, > but you don't have to explictly write declarations. > - Class members are implicitly typed with the type of the first > thing assigned to them. This is the ShedSkin rule. It might > be useful to allow assignments like > > self.str = None(string) > > to indicate that a slot holds strings, but currently has the null > string. > - Function members cannot be modified after declaration. Subclassing > is fine, but replacing a function member via assignment is not. > This allows inlining of function calls to small functions, which > is a big win. > - Private function members (self._foo and self.__foo) really are > private and are not callable outside the class definition. > > You get the idea. This basically means that "simpleobject" objects have > roughly the same restrictions as C++ objects, for which heavy compile time > optimization is possible. Most Python classes already qualify for > "simpleobject". And this approach doesn't require un-Pythonic stuff like > declarations or extra "decorators". > > With this, the heavy optimizations are possible. Strength reduction. > Hoisting > common subexpressious out of loops. Hoisting reference count updates > out of > loops. Keeping frequently used variables in registers. And elimination of > many unnecessary dictionary lookups.
I won't give you the "prove it by doing it"-talk. It's to cheap. Instead I'd like to say why I don't think that this will buy you much performance-wise: it's a local optimization only. All it can and will do is to optimize lookups and storage of attributes - either functions or values - and calls to methods from within one specialobject. As long as expressions stay in their own "soup", things might be ok. The very moment you mix this with "regular", no-strings-attached python code, you have to have the full dynamic machinery in place + you need tons of guarding statements in the optimized code to prevent access violations. So in the end, I seriously doubt the performance gains are noticable. Instead I'd rather take the pyrex-road, which can go even further optimizing with some more declarations. But then I at least know exactly where the boundaries are. As does the compiler. > Python could get much, much faster. Right now CPython is said to be 60X > slower > than C. It should be possible to get at least an order of magnitude over > CPython. Regardless of the possibility of speeding it up - why should one want this? Coding speed is more important than speed of coding in 90%+ of all cases. The other ones - well, if you _really_ want speed, assembler is the way to go. I'm serious about that. There is one famous mathematical library author that does code in assembler - because in the end, it's all about processor architecture and careful optimization for that. [1] The same is true for e.g. the new Cell architecture, or the altivec-optimized code in photoshop that still beats the crap out of Intel processors on PPC-machines. I'm all for making python faster if it doesn't suffer functionality-wise. But until there is a proof that something really speeds up python w/o crippling it, I'm more than skeptical. Diez [1] http://math-atlas.sourceforge.net/faq.html#auth """ Kazushige Goto His ev5/ev6 GEMM is used directly by ATLAS if the user answers "yes" to its use during the configuration procedure on an alpha processor. This results in a significant speedup over ATLAS's own GEMM codes, and is the fastest ev5/ev6 implementation we are aware of. """ -- http://mail.python.org/mailman/listinfo/python-list