New submission from Raymond Hettinger:

On little-endian machines, the decoding of an oparg can be sped-up by using a 
single 16-bit pointer deference.

Current decoding:
    leaq    2(%rcx), %rbp
    movzbl  -1(%rbp), %eax
    movzbl  -2(%rbp), %r14d
    sall    $8, %eax
    addl    %eax, %r14d

New decoding:
    leaq    2(%rdx), %r12
    movzwl  -2(%r12), %r8d

The patch uses (unsigned short *) like the struct module does, but it could use 
uint16_t if necessary.

If next_instr can be advanced after the lookup rather than before, the 
generated code would be tighter still (removing the data dependency and 
shortening the movzwl instruction to drop the offset byte):

    movzwl  (%rdx), %r8d
    leaq    2(%rdx), %rbp

----------
assignee: serhiy.storchaka
components: Interpreter Core
messages: 256106
nosy: rhettinger, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Speed-up oparg decoding on little-endian machines
type: performance
versions: Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25823>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to