New submission from Ma Lin <malin...@163.com>:
On windows, it seems 32bit builds (3.7.2/3.8.0a2) don't using SSE2 sufficiently. I test on 3.8 branch, python38.dll only uses XMM register 28 times. The official build is the same. After enable this option, python38.dll uses XMM register 11,704 times. --- a/PCbuild/pythoncore.vcxproj +++ b/PCbuild/pythoncore.vcxproj @@ -88,6 +88,7 @@ <AdditionalIncludeDirectories Condition="$(IncludeExternals)">$(zlibDir);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories> <PreprocessorDefinitions>_USRDLL;Py_BUILD_CORE;Py_ENABLE_SHARED;MS_DLL_ID="$(SysWinVer)";%(PreprocessorDefinitions)</PreprocessorDefinitions> <PreprocessorDefinitions Condition="$(IncludeExternals)">_Py_HAVE_ZLIB;%(PreprocessorDefinitions)</PreprocessorDefinitions> + <EnableEnhancedInstructionSet Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">StreamingSIMDExtensions2</EnableEnhancedInstructionSet> </ClCompile> <Link> <AdditionalDependencies>version.lib;shlwapi.lib;ws2_32.lib;%(AdditionalDependencies)</AdditionalDependencies> x86 instruction set has only a few number of registers. In my understanding, using XMM registers on 32bit build will brings a small speed up. I'm not an expert of this kind knowledge, sorry if I'm wrong. ---------- components: Build, Windows messages: 338317 nosy: Ma Lin, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: Build 32bit Python on Windows with SSE2 instruction set versions: Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue36357> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com