[issue39150] See if PyToken_OneChar would be faster as a lookup table

Andy Lester Wed, 08 Jan 2020 19:00:14 -0800


Andy Lester <a...@petdance.com> added the comment:


I tried out some experimenting with the lookup table vs. the switch
statement.

The relevant diff (not including the patches to the code generator) is:


--- Parser/token.c
+++ Parser/token.c
@@ -77,31 +77,36 @@
 int
 PyToken_OneChar(int c1)
 {
-    switch (c1) {
-    case '%': return PERCENT;
-    case '&': return AMPER;
-    case '(': return LPAR;
-    case ')': return RPAR;
-    case '*': return STAR;
-    case '+': return PLUS;
-    case ',': return COMMA;
-    case '-': return MINUS;
-    case '.': return DOT;
-    case '/': return SLASH;
-    case ':': return COLON;
-    case ';': return SEMI;
-    case '<': return LESS;
-    case '=': return EQUAL;
-    case '>': return GREATER;
-    case '@': return AT;
-    case '[': return LSQB;
-    case ']': return RSQB;
-    case '^': return CIRCUMFLEX;
-    case '{': return LBRACE;
-    case '|': return VBAR;
-    case '}': return RBRACE;
-    case '~': return TILDE;
-    }
+    static char op_lookup[] = {
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        PERCENT,   AMPER,     OP,
+        LPAR,      RPAR,      STAR,      PLUS,      COMMA,
+        MINUS,     DOT,       SLASH,     OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        COLON,     SEMI,
+        LESS,      EQUAL,     GREATER,   OP,        AT,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        LSQB,      OP,        RSQB,      CIRCUMFLEX,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        LBRACE,    VBAR,
+        RBRACE,    TILDE
+    };
+    if (c1>=37 && c1<=126)
+        return op_lookup[c1];
     return OP;
 }

To test the speed change, I couldn't use pyperformance, because the only
thing I wanted to time was the In my testing, I didn't use pyperformance
because the only part of the code I wanted to test was the actual
compilation of the code.  My solution for this was to find the 100 largest
*.py files in the cpython repo and compile them like so:

    python -m py_compile $(List-of-big-*.py-files)

The speedup was significant: My table-driven lookup ran the compile tests
about 10% than the existing switch approach.  That was without
--enable-optimizations in my configure.

However, as pablogsal suspected, with PGO enabled, the two approaches ran
the code in pretty much the same speed.

I do think that there may be merit in using a table-driven approach that
generates less code and doesn't rely on PGO speeding things up.

If anyone's interested, all my work is on branch Issue39150 in my fork
petdance/cpython.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39150>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue39150] See if PyToken_OneChar would be faster as a lookup table

Reply via email to