On 04/29/2016 01:25 PM, Andy Lutomirski wrote: > On Fri, Apr 29, 2016 at 1:07 PM, Yu-cheng Yu <yu-cheng...@intel.com> wrote: >> On Fri, Apr 29, 2016 at 01:03:43PM -0700, Dave Hansen wrote: >>> That's not feasible. Think of dynamic libraries or just-in-time >>> compilers. What instruction set does /usr/bin/java use, for instance? :) >> >> The java argument is true. In that case or when the bitmask is >> missing, we can allocate for all supported features. > > I actually want to see us moving in the direction of unconditionally > allocating everything on process startup. If we can stop using CR0.TS > entirely, I think everything will be better.
We can absolutely allocate the worst-case XSAVE buffer at task startup for folks that never want to see a latency spike in the life of the app no matter what. But I also think it would be pretty nice if 'ls' didn't pay the 2k cost to have AVX-512 state if it's not using AVX-512. We also don't have to do this with CR0.TS. We'd actually use a combination of out-of-line (not appended to task_struct) XSAVE buffers and XGETBV1 to check the size of our XSAVE buffer before we call XSAVE* and resize it when needed. Maybe nobody will ever care enough about 2kbytes/thread, though.