New submission from Raymond Hettinger: On little-endian machines, the decoding of an oparg can be sped-up by using a single 16-bit pointer deference.
Current decoding: leaq 2(%rcx), %rbp movzbl -1(%rbp), %eax movzbl -2(%rbp), %r14d sall $8, %eax addl %eax, %r14d New decoding: leaq 2(%rdx), %r12 movzwl -2(%r12), %r8d The patch uses (unsigned short *) like the struct module does, but it could use uint16_t if necessary. If next_instr can be advanced after the lookup rather than before, the generated code would be tighter still (removing the data dependency and shortening the movzwl instruction to drop the offset byte): movzwl (%rdx), %r8d leaq 2(%rdx), %rbp ---------- assignee: serhiy.storchaka components: Interpreter Core messages: 256106 nosy: rhettinger, serhiy.storchaka priority: normal severity: normal status: open title: Speed-up oparg decoding on little-endian machines type: performance versions: Python 3.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25823> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com