On 2018/11/18 22:03, Samuel Neves wrote: > On 11/17/18 12:36 AM, Li, Aubrey wrote: >> On 2018/11/17 7:10, Dave Hansen wrote: >>> Just to be clear: there are 3 AVX-512 XSAVE states: >>> >>> XFEATURE_OPMASK, >>> XFEATURE_ZMM_Hi256, >>> XFEATURE_Hi16_ZMM, >>> >>> I honestly don't know what XFEATURE_OPMASK does. It does not appear to >>> be affected by VZEROUPPER (although VZEROUPPER's SDM documentation isn't >>> looking too great). > > XFEATURE_OPMASK refers to the additional 8 mask registers used in > AVX512. These are more similar to general purpose registers than > vector registers, and should not be too relevant here. > >>> >>> But, XFEATURE_ZMM_Hi256 is used for the upper 256 bits of the >>> registers ZMM0-ZMM15. Those are AVX-512-only registers. The only way >>> to get data into XFEATURE_ZMM_Hi256 state is by using AVX512 instructions. >>> >>> XFEATURE_Hi16_ZMM is the same. The only way to get state in there is >>> with AVX512 instructions. >>> >>> So, first of all, I think you *MUST* check XFEATURE_ZMM_Hi256 and >>> XFEATURE_Hi16_ZMM. That's without question. >> >> No, XFEATURE_ZMM_Hi256 does not request turbo license 2, so it's less >> interested to us. >> > > I think Dave is right, and it's easy enough to check this. See the > attached program. For the "high current" instruction vpmuludq > operating on zmm0--zmm3 registers, we have (on a Skylake-SP Xeon Gold > 5120) > > 175,097 core_power.lvl0_turbo_license:u > ( +- 2.18% ) > 41,185 core_power.lvl1_turbo_license:u > ( +- 1.55% ) > 83,928,648 core_power.lvl2_turbo_license:u > ( +- 0.00% ) > > while for the same code operating on zmm28--zmm31 registers, we have > > 163,507 core_power.lvl0_turbo_license:u > ( +- 6.85% ) > 47,390 core_power.lvl1_turbo_license:u > ( +- 12.25% ) > 83,927,735 core_power.lvl2_turbo_license:u > ( +- 0.00% ) > > In other words, the register index does not seem to matter at all for > turbo license purposes (this makes sense, considering these chips have > 168 vector registers internally; zmm15--zmm31 are simply newly exposed > architectural registers). > > We can also see that XFEATURE_Hi16_ZMM does not imply license 1 or 2; > we may be using xmm15--xmm31 purely for the convenient extra register > space. For example, cases 4 and 5 of the sample program: > > 84,064,239 core_power.lvl0_turbo_license:u > ( +- 0.00% ) > 0 core_power.lvl1_turbo_license:u > 0 core_power.lvl2_turbo_license:u > > 84,060,625 core_power.lvl0_turbo_license:u > ( +- 0.00% ) > 0 core_power.lvl1_turbo_license:u > 0 core_power.lvl2_turbo_license:u >
Thanks for your program, Samuel, it's very helpful. But I saw a different output on my side, May I have your glibc version? Thanks, -Aubrey > So what's most important is the width of the vectors being used, not > the instruction set or the register index. Second to that is the > instruction type, namely whether those are "heavy" instructions. > Neither of these things can be accurately captured by the XSAVE state. > >>> >>> It's probably *possible* to run AVX512 instructions by loading state >>> into the YMM register and then executing AVX512 instructions that only >>> write to memory and never to register state. That *might* allow >>> XFEATURE_Hi16_ZMM and XFEATURE_ZMM_Hi256 to stay in the init state, but >>> for the frequency to be affected since AVX512 instructions _are_ >>> executing. But, there's no way to detect this situation from XSAVE >>> states themselves. >>> >> >> Andi should have more details on this. FWICT, not all AVX512 instructions >> has high current, those only touching memory do not cause notable frequency >> drop. > > According to section 15.26 of the Intel optimization reference manual, > "heavy" instructions consist of floating-point and integer > multiplication. Moves, adds, logical operations, etc, will request at > most turbo license 1 when operating on zmm registers. > >> >> Thanks, >> -Aubrey >>