Hi Brian, Thanks for the detail expression.
On Sun, 13 Oct 2024 16:19:31 -0600 Brian Inglis wrote: > On 2024-10-13 14:06, Takashi Yano via Cygwin wrote: > > Hi Brian > > > > On Sun, 13 Oct 2024 10:41:58 -0600 > > Brian Inglis wrote: > >> On 2024-10-12 17:14, Takashi Yano via Cygwin wrote: > >>> Hi Brian, > >>> > >>> On Tue, 8 Oct 2024 10:37:14 -0600 > >>> Brian Inglis wrote: > >>>> On 2024-10-08 10:14, Brian Inglis via Cygwin wrote: > >>>>> On 2024-10-08 05:20, Takashi Yano via Cygwin wrote: > >>>>>> On Mon, 7 Oct 2024 15:11:52 +0200 > >>>>>> Christian Franke wrote: > >>>>>>> $ gcc -o sigtest -O2 sigtest.c > >>>>>>> > >>>>>>> $ ./sigtest > out.txt > >>>>>>> (press ^C 42x :-) > >>>>>>> > >>>>>>> $ sort out.txt | uniq -c > >>>>>>> 3 x = 0x1.23456789p+0, y = -nan, d = -nan > >>>>>>> 6 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = -nan > >>>>>>> 33 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = 0x0p+0 > >>>>>>> > >>>>>>> The problem also occurs if compiled without -O2, but less often. No > >>>>>>> problem occurs if compiled with -DWORKS which suggests that only 'long > >>>>>>> double' is affected. > >>>>>> > >>>>>> Thanks for the report. I looked into this problem and might find the > >>>>>> cause. It seems due to a bug of scripts/gendef. It generates signal > >>>>>> handler caller (sigfe.s) which stores/restores the registers. > >>>>>> > >>>>>> In sigdelayed, control word is stored/restored by fnstcw/fldcw > >>>>>> instruction, > >>>>>> however, fninit instruction destroys some status registers in FPU > >>>>>> (x87). > >>>>>> > >>>>>> I think we shold use fnstenv/fldenv rather than fnstcw/fldcw and > >>>>>> fninit. > >>>>>> However, I'm not familiar with x87 instructions, so I may overlook > >>>>>> something. > >>>>>> > >>>>>> Could anyone expert of x87 instructions and sigfe stuff give some > >>>>>> comments? > >>>>> > >>>>> AIUI x87 FP handling is outdated and mainly unused on current systems, > >>>>> as > >>>>> current systems do more and use more than the legacy x87 instructions > >>>>> and stack. > >>>>> > >>>>> See https://en.cppreference.com/w/c/numeric/fenv and related docs for > >>>>> more > >>>>> modern approaches. > >>>>> > >>>>> You would have to look into the AMD/Intel/IEEE docs for lower level > >>>>> details. > >>>> > >>>> This is basically what ISTR: > >>>> > >>>> https://beta.boost.org/doc/libs/1_82_0/libs/context/doc/html/context/rationale/x86_and_floating_point_env.html > >>>> > >>>> where legacy x87 and MMX registers are not used or preserved on > >>>> x86_64/amd64, as > >>>> SSE... instructions and XMM registers are used. > >>> > >>> Thanks for the advice. I read throuh the web pages and related documents > >>> and made a patch which uses fxsave/fxrstor and xsave/xrstror to > >>> cygwin-patc...@cygwin.com mailing list. > >>> https://cygwin.com/pipermail/cygwin-patches/2024q4/012804.html > >>> > >>> Is this as you intended? > >> > >> That seems to be the preferred approach now, as long as you can correctly > >> determine adequate space for fxsave and xsave, given the varying feature > >> sets, > >> register counts, and register sizes of recent processors: > >> sse/2/3/4.1/4.2/4a/5/ssse3 avx2/512 128/256/512 bits X/Y/ZMM registers. > > > > Thanks for checking. > > > > According to https://cdrdv2.intel.com/v1/dl/getContent/671110 , > > fxsave uses 512 bytes fixed length memory to save the current > > state of the x87 FPU, MMX technology, XMM, and MXCSR registers. > > > > The patch allocates 0x238 bytes: > > 0x200 (512 bytes): fxsave area > > 0x008 ( 8 bytes): for 16-byte alignment > > 0x010 ( 16 bytes): work area > > 0x020 ( 32 bytes): reserved for later processing > > That is just the FPU state, MMX state, and 16 16B XMM registers, etc. > Please also note that 64 bit operands or REX prefix must be used with > FXSAVE/FXRSTOR to save expanded state rather than legacy state. Fixed. > > According to https://cdrdv2.intel.com/v1/dl/getContent/671436 , > > cpuid instruction with eax=0dh and ecs=00h returns the maximum > > size required by xsave in ebx. So the patch allocates: > > ebx + 0x048 bytes. > > 0x018 ( 24 bytes): for 64-byte alignment > > 0x010 ( 16 bytes): work area > > 0x020 ( 32 bytes): reserved for later processing > > That is for features currently enabled in XCR0 user state, not all the values > of > all possible registers, for all possible features, in ecx, which are > supported, > may be enabled, and in use. > You need 2KB to store 32 X/Y/ZMM 64B registers, and new real and virtual > features may require more. Do you mean we should use ecx value rather than ebx returned by cpuid (eax=0dh,ecx=0)? I did not understand difference of the values of ebx and ecx returned by cpuid. Fixed. > It may be conservative, but I would suggest allocating the space in ecx as > documented, just in case of future changes, and that can be reduced to 512 if > only fxsave is supported. > I suggest you should check for fxsave in cpuid 1:0 edx:24, fall back to > fnsave/frstor if not, and keep everything aligned to 64 bytes for safety. According to my survay, all Intel and AMD CPUs (means all x86 CPUs) have fxsave/fxrstor. So we do not need to check bit 24, do we? > For my AMD A10-9700 /proc/cpuinfo shows: > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov > pat pse36 clflush mmx *fxsr* sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb > rdtscp lm constant_tsc rep_good acc_power nopl tsc_reliable nonstop_tsc cpuid > aperfmperf pni pclmuldq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes > *xsave* avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a > misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm > perfctr_core perfctr_nb bpext ptsc mwaitx cpb hw_pstate fsgsbase bmi1 avx2 > smep > bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean > flushbyasid > decode_assists pausefilter pfthreshold avic v_vmsave_vmload vgif > overflow_recov > > and /usr/bin/cpuid (package cpuid) shows (see my added !): > > ... > feature information (1/edx): > x87 FPU on chip = true > VME: virtual-8086 mode enhancement = true > DE: debugging extensions = true > PSE: page size extensions = true > TSC: time stamp counter = true > RDMSR and WRMSR support = true > PAE: physical address extensions = true > MCE: machine check exception = true > CMPXCHG8B inst. = true > APIC on chip = true > SYSENTER and SYSEXIT = true > MTRR: memory type range registers = true > PTE global bit = true > MCA: machine check architecture = true > CMOV: conditional move/compare instr = true > PAT: page attribute table = true > PSE-36: page size extension = true > PSN: processor serial number = false > CLFLUSH instruction = true > DS: debug store = false > ACPI: thermal monitor and clock ctrl = false > MMX Technology = true > ! FXSAVE/FXRSTOR = true > SSE extensions = true > SSE2 extensions = true > SS: self snoop = false > hyper-threading / multi-core supported = true > TM: therm. monitor = false > IA64 = false > PBE: pending break event = false > feature information (1/ecx): > PNI/SSE3: Prescott New Instructions = true > PCLMULDQ instruction = true > DTES64: 64-bit debug store = false > MONITOR/MWAIT = true > CPL-qualified debug store = false > VMX: virtual machine extensions = false > SMX: safer mode extensions = false > Enhanced Intel SpeedStep Technology = false > TM2: thermal monitor 2 = false > SSSE3 extensions = true > context ID: adaptive or shared L1 data = false > SDBG: IA32_DEBUG_INTERFACE = false > FMA instruction = true > CMPXCHG16B instruction = true > xTPR disable = false > PDCM: perfmon and debug = false > PCID: process context identifiers = false > DCA: direct cache access = false > SSE4.1 extensions = true > SSE4.2 extensions = true > x2APIC: extended xAPIC support = false > MOVBE instruction = true > POPCNT instruction = true > time stamp counter deadline = false > AES instruction = true > XSAVE/XSTOR states = true > ! OS-enabled XSAVE/XSTOR = true > AVX: advanced vector extensions = true > F16C half-precision convert instruction = true > RDRAND instruction = true > hypervisor guest status = false > ... > XSAVE features (0xd/0): > XCR0 valid bit field mask = 0x4000000000000007 > x87 state = true > SSE state = true > AVX state = true > MPX BNDREGS = false > MPX BNDCSR = false > AVX-512 opmask = false > AVX-512 ZMM_Hi256 = false > AVX-512 Hi16_ZMM = false > PKRU state = false > XTILECFG state = false > XTILEDATA state = false > bytes required by fields in XCR0 = 0x00000340 (832) Is this ebx > ! bytes required by XSAVE/XRSTOR area = 0x000003c0 (960) and is this ecx from cpuid (0d:0)? I had checked some of my environments, but ebx and ecx had always the same value. So, I thought either can be used... Please check v2 patch. -- Takashi Yano <takashi.y...@nifty.ne.jp> -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple