On Thu, 15 Oct 2015, Konstantin Belousov wrote:
On Wed, Oct 14, 2015 at 08:22:12PM +0000, Garrett Cooper wrote:
Author: ngie
Date: Wed Oct 14 20:22:12 2015
New Revision: 289332
URL: https://svnweb.freebsd.org/changeset/base/289332
Log:
Fix test-fenv:test_dfl_env when run on some amd64 CPUs
Compare the fields that the AMD [1] and Intel [2] specs say will be
set once fnstenv returns.
Not all amd64 capable processors zero out the env.__x87.__other field
(example: AMD Opteron 6308). The AMD64/x64 specs aren't explicit on what the
env.__x87.__other field will contain after fnstenv is executed, so the values
in env.__x87.__other could be filled with arbitrary data depending on how the
CPU-specific implementation of fnstenv.
No Intel or AMD CPU write to __other field at all.
No, they all do.
Test on old i386 on old A64:
Initial state for fegetenv():
X 00000000 7F 12 00 00 00 00 80 1F FF FF FF FF 52 3F 67 C0
--cw- -mxhi --sw- -mxlo --tw- -pad- ----fip----
-----------------
X 00000010 08 00 05 01 48 5F 74 C0 10 00 FF FF
-fcs- -opc- ----foff--- -fds- -pad-
----other[16]----------------------
This always fails the test:
cw: passes
mxhi: passes
sw: passes
pad+tw: passes
fip: always fails. The test wants 0, but the value is always the kernel
eip for the instruction that is used to avoid the security hole.
(The A64 CPU gives the security hole, and the kernel code is
pessimal so it apparently runs always on the first use of the
NPX by a thread.)
fcs: always fails. The test wants 0, but the value is always the kernel
cs, as above.
opc: always fails. The value is the opcode for the kernel instruction
foff: always fails. The value is the data address for the kernel instruction
fds: always fails. The value is the kernel ds extended by padding with 1's
Modified state for fxsave:
X 0000001C 7F 12 00 38 80 00 F0 3F EE EE EE EE EE EE EE EE
X 0000002C EE EE EE EE EE EE EE EE 80 1F 00 00 FF FF 00 00
Partial result of an fxsave instruction after fld1; fstl in the test
program to try to change the kernel data. A byte value of EE indicates
that the byte was written by memset() and not overwritten by fxsave.
There are 16 such bytes. These correspond to __other[16]. You are
probably thinking of fxsave not writing these, but fegetenv() doesn't
use fxsave and always writes these. I think fxsave on all Intel CPUs
write these (freefall's Xeon does), and all AMD CPUs write these after
an error.
The format is slightly different. All the other values seem to be the
same as for fegetenv().
Modified state for fegetenv():
X 0000005C 7F 12 00 00 00 00 80 1F FF FF FF FF 63 82 04 08
--cw- -mxhi --sw- -mxlo --tw- -pad- ----fip----
-----------------
X 0000006C 1F 00 1D 05 A0 92 04 08 2F 00 FF FF
-fcs- -opc- ----foff--- -fds- -pad-
----other[16]----------------------
This is from the same state as the fxsave. It has the following
modifications relative to the initial state:
- all kernel data changed to user addresses, as intended
Modified state for fnsave:
X 0000003C 7F 12 FF FF 00 00 FF FF FF FF FF FF 63 82 04 08
X 0000004C 1F 00 1D 05 A0 92 04 08 2F 00 FF FF
This is from the same state as the fxsave. It has the same format
and values as the state saved by fegetenv() except 2 padding
fields are padded with 1's and not repurposed for mxcsr.
Modified state for fnstenv:
X 00000078 7F 12 FF FF 00 00 FF FF FF FF FF FF 63 82 04 08
X 00000088 1F 00 1D 05 A0 92 04 08 2F 00 FF FF EE EE EE EE
This is from the same state as the fxsave. It has the same format
and values as the state saved by fnsave.
So last instruction/data values are always in fenv_t on i386. They
are undocumented, but might be used. On amd64 in 64-bit mode, they
don't fit in the hardware format used by fnstenv. fenv_t extends
this format only to append mxcsr. amd64 should have used the env
part of the better-designed 64-bit fxsave format.
Test on -current amd64 on Xeon (freefall):
early fegetenv():
X 00000000 7f 03 ff ff 00 00 ff ff ff ff ff ff 00 00 00 00
--cw- -pad- --sw- -pad- --tw- -pad- -----------
X 00000010 00 00 00 00 00 00 00 00 00 00 ff ff 80 1f 00 00
---other[16]------------------------ ---mxcsr---
The test passes. The 2 bytes at the end of other[20] are reserved
and not part of the last instruction/data. They are padded normally
with 1's and not with 0's. The default env hard-codes the assumption
that these reserved bits are 1's.
Since Xeons are not AMD CPUs, the kernel doesn't execute the code that
closes the security hole and all the last instruction/data values are
0. These values are written.
later fxsave:
X 00000020 7f 03 00 00 00 00 1c 05 75 08 40 00 43 00 00 00
--cw- --sw- tw 00 -opc- ----fip---- ----fcs----
X 00000030 20 62 60 00 3b 00 00 00 80 1f 00 00 ff ff 00 00
----fdp---- ----fds---- ---mxcsr--- -mxcsrmask-
fxsave always stores the last instruction/data values on Intel CPUs.
They have the nonzero user values. The format is the 64-bit format.
I now understand why it looked looked 32-bit format before -- it
has segment registers instead of 64-bit offsets, but that is apparently
from the memory model being small -- seg:offset32 is stored instead
of rip and edp if the operand size is 32 bits. fcs == %cs. %ds is
0 but fds == %ss.
Later fegetenv():
X 00000060 7f 03 ff ff 00 00 ff ff ff ff ff ff 75 08 40 00
X 00000070 43 00 1c 05 20 62 60 00 3b 00 ff ff 80 1f 00 00
Later fnsave:
X 00000040 7f 03 ff ff 00 00 ff ff ff ff ff ff 75 08 40 00
X 00000050 43 00 1c 05 20 62 60 00 3b 00 ff ff 00 00 00 00
Later fnstenv:
X 00000080 7f 03 ff ff 00 00 ff ff ff ff ff ff 75 08 40 00
X 00000090 43 00 1c 05 20 62 60 00 3b 00 ff ff ee ee ee ee
Everything works right and the values are the same as for fxsave
except for rearrangement and padding, but only because 64-bit
offsets are not needed.
This shows that:
- it is the __other field that should be least MD, not most MD,
since it has no reserved padding bits and just doesn't
support 64-bit offsets
- there seem to be no hardware differences for the __other field
as saved by fnstenv and fegetenv(). The AMD bug/optimization
is only for fxsave.
- however, the FreeBSD fix for the AMD bug/optimization gives
different initial values in the __other field for AMD CPUs only.
I don't like it, but thought it was smarter than that. Perhaps
it is, but I tested my version. My version is more efficient,
and this involves doing the cleaning more unconditionally. For
the first use of the NPX by a thread, it would be efficient
enough to clean using fstenv; clean; fldenv before the normal
fxrstor. But cleaning is needed on every NPX context switch,
and we don't want to do fstenv; clean; fldenv on every switch.
The AMD bug/optimization breaks some debugging methods and some
error handling in applications, and the FreeBSD cleaning makes
this worse. An application could try to use the last instruction/
data info for its error handling and use methods that preserve
it internally. But the info is lost on any context switch. Even
if no other thread uses the NPX, cleaning guarantees clobbering
of the info when a thread is switched back to. It is lost even
after an exception where AMD hardware doesn't lose it. Its loss
is visible to debuggers -- debuggers will usually see the result
of the cleaning.
Bruce
_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"