On 2024-10-13 14:06, Takashi Yano via Cygwin wrote:
Hi Brian

On Sun, 13 Oct 2024 10:41:58 -0600
Brian Inglis wrote:
On 2024-10-12 17:14, Takashi Yano via Cygwin wrote:
Hi Brian,

On Tue, 8 Oct 2024 10:37:14 -0600
Brian Inglis wrote:
On 2024-10-08 10:14, Brian Inglis via Cygwin wrote:
On 2024-10-08 05:20, Takashi Yano via Cygwin wrote:
On Mon, 7 Oct 2024 15:11:52 +0200
Christian Franke wrote:
$ gcc -o sigtest -O2 sigtest.c

$ ./sigtest > out.txt
(press ^C 42x :-)

$ sort out.txt | uniq -c
          3 x = 0x1.23456789p+0, y = -nan, d = -nan
          6 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = -nan
         33 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = 0x0p+0

The problem also occurs if compiled without -O2, but less often. No
problem occurs if compiled with -DWORKS which suggests that only 'long
double' is affected.

Thanks for the report. I looked into this problem and might find the
cause. It seems due to a bug of scripts/gendef. It generates signal
handler caller (sigfe.s) which stores/restores the registers.

In sigdelayed, control word is stored/restored by fnstcw/fldcw instruction,
however, fninit instruction destroys some status registers in FPU (x87).

I think we shold use fnstenv/fldenv rather than fnstcw/fldcw and fninit.
However, I'm not familiar with x87 instructions, so I may overlook
something.

Could anyone expert of x87 instructions and sigfe stuff give some
comments?

AIUI x87 FP handling is outdated and mainly unused on current systems, as
current systems do more and use more than the legacy x87 instructions and stack.

See https://en.cppreference.com/w/c/numeric/fenv and related docs for more
modern approaches.

You would have to look into the AMD/Intel/IEEE docs for lower level details.

This is basically what ISTR:

https://beta.boost.org/doc/libs/1_82_0/libs/context/doc/html/context/rationale/x86_and_floating_point_env.html

where legacy x87 and MMX registers are not used or preserved on x86_64/amd64, as
SSE... instructions and XMM registers are used.

Thanks for the advice. I read throuh the web pages and related documents
and made a patch which uses fxsave/fxrstor and xsave/xrstror to
cygwin-patc...@cygwin.com mailing list.
https://cygwin.com/pipermail/cygwin-patches/2024q4/012804.html

Is this as you intended?

That seems to be the preferred approach now, as long as you can correctly
determine adequate space for fxsave and xsave, given the varying feature sets,
register counts, and register sizes of recent processors:
sse/2/3/4.1/4.2/4a/5/ssse3 avx2/512 128/256/512 bits X/Y/ZMM registers.

Thanks for checking.

According to https://cdrdv2.intel.com/v1/dl/getContent/671110 ,
fxsave uses 512 bytes fixed length memory to save the current
state of the x87 FPU, MMX technology, XMM, and MXCSR registers.

The patch allocates 0x238 bytes:
  0x200 (512 bytes): fxsave area
  0x008 (  8 bytes): for 16-byte alignment
  0x010 ( 16 bytes): work area
  0x020 ( 32 bytes): reserved for later processing

That is just the FPU state, MMX state, and 16 16B XMM registers, etc.
Please also note that 64 bit operands or REX prefix must be used with FXSAVE/FXRSTOR to save expanded state rather than legacy state.

According to https://cdrdv2.intel.com/v1/dl/getContent/671436 ,
cpuid instruction with eax=0dh and ecs=00h returns the maximum
size required by xsave in ebx. So the patch allocates:
ebx + 0x048 bytes.
  0x018 ( 24 bytes): for 64-byte alignment
  0x010 ( 16 bytes): work area
  0x020 ( 32 bytes): reserved for later processing

That is for features currently enabled in XCR0 user state, not all the values of all possible registers, for all possible features, in ecx, which are supported, may be enabled, and in use. You need 2KB to store 32 X/Y/ZMM 64B registers, and new real and virtual features may require more. It may be conservative, but I would suggest allocating the space in ecx as documented, just in case of future changes, and that can be reduced to 512 if only fxsave is supported. I suggest you should check for fxsave in cpuid 1:0 edx:24, fall back to fnsave/frstor if not, and keep everything aligned to 64 bytes for safety.

For my AMD A10-9700 /proc/cpuinfo shows:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx *fxsr* sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good acc_power nopl tsc_reliable nonstop_tsc cpuid aperfmperf pni pclmuldq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes *xsave* avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm perfctr_core perfctr_nb bpext ptsc mwaitx cpb hw_pstate fsgsbase bmi1 avx2 smep bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decode_assists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov

and /usr/bin/cpuid (package cpuid) shows (see my added !):

...
   feature information (1/edx):
      x87 FPU on chip                        = true
      VME: virtual-8086 mode enhancement     = true
      DE: debugging extensions               = true
      PSE: page size extensions              = true
      TSC: time stamp counter                = true
      RDMSR and WRMSR support                = true
      PAE: physical address extensions       = true
      MCE: machine check exception           = true
      CMPXCHG8B inst.                        = true
      APIC on chip                           = true
      SYSENTER and SYSEXIT                   = true
      MTRR: memory type range registers      = true
      PTE global bit                         = true
      MCA: machine check architecture        = true
      CMOV: conditional move/compare instr   = true
      PAT: page attribute table              = true
      PSE-36: page size extension            = true
      PSN: processor serial number           = false
      CLFLUSH instruction                    = true
      DS: debug store                        = false
      ACPI: thermal monitor and clock ctrl   = false
      MMX Technology                         = true
!     FXSAVE/FXRSTOR                         = true
      SSE extensions                         = true
      SSE2 extensions                        = true
      SS: self snoop                         = false
      hyper-threading / multi-core supported = true
      TM: therm. monitor                     = false
      IA64                                   = false
      PBE: pending break event               = false
   feature information (1/ecx):
      PNI/SSE3: Prescott New Instructions     = true
      PCLMULDQ instruction                    = true
      DTES64: 64-bit debug store              = false
      MONITOR/MWAIT                           = true
      CPL-qualified debug store               = false
      VMX: virtual machine extensions         = false
      SMX: safer mode extensions              = false
      Enhanced Intel SpeedStep Technology     = false
      TM2: thermal monitor 2                  = false
      SSSE3 extensions                        = true
      context ID: adaptive or shared L1 data  = false
      SDBG: IA32_DEBUG_INTERFACE              = false
      FMA instruction                         = true
      CMPXCHG16B instruction                  = true
      xTPR disable                            = false
      PDCM: perfmon and debug                 = false
      PCID: process context identifiers       = false
      DCA: direct cache access                = false
      SSE4.1 extensions                       = true
      SSE4.2 extensions                       = true
      x2APIC: extended xAPIC support          = false
      MOVBE instruction                       = true
      POPCNT instruction                      = true
      time stamp counter deadline             = false
      AES instruction                         = true
      XSAVE/XSTOR states                      = true
!     OS-enabled XSAVE/XSTOR                  = true
      AVX: advanced vector extensions         = true
      F16C half-precision convert instruction = true
      RDRAND instruction                      = true
      hypervisor guest status                 = false
...
   XSAVE features (0xd/0):
      XCR0 valid bit field mask               = 0x4000000000000007
         x87 state                            = true
         SSE state                            = true
         AVX state                            = true
         MPX BNDREGS                          = false
         MPX BNDCSR                           = false
         AVX-512 opmask                       = false
         AVX-512 ZMM_Hi256                    = false
         AVX-512 Hi16_ZMM                     = false
         PKRU state                           = false
         XTILECFG state                       = false
         XTILEDATA state                      = false
      bytes required by fields in XCR0        = 0x00000340 (832)
!     bytes required by XSAVE/XRSTOR area     = 0x000003c0 (960)
      XSAVEOPT instruction                    = true
      XSAVEC instruction                      = false
      XGETBV instruction                      = false
      XSAVES/XRSTORS instructions             = false
      XFD: extended feature disable supported = false
      SAVE area size in bytes                 = 0x00000000 (0)
      IA32_XSS valid bit field mask           = 0x0000000000000000
         PT state                             = false
         PASID state                          = false
         CET_U user state                     = false
         CET_S supervisor state               = false
         HDC state                            = false
         UINTR state                          = false
         LBR state                            = false
         HWP state                            = false
   AVX/YMM features (0xd/2):
      AVX/YMM save state byte size             = 0x00000100 (256)
      AVX/YMM save state byte offset           = 0x00000240 (576)
      supported in IA32_XSS or XCR0            = XCR0 (user state)
      64-byte alignment in compacted XSAVE     = false
      XFD faulting supported                   = false
   LWP features (0xd/0x3e):
      LWP save state byte size                 = 0x00000080 (128)
      LWP save state byte offset               = 0x00000340 (832)
      supported in IA32_XSS or XCR0            = XCR0 (user state)
      64-byte alignment in compacted XSAVE     = false
      XFD faulting supported                   = false
...

--
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                -- Antoine de Saint-Exupéry


--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to