Hi Brian,

On Mon, 14 Oct 2024 15:38:22 +0900
Takashi Yano wrote:
> On Mon, 14 Oct 2024 14:59:40 +0900
> Takashi Yano wrote:
> > On Mon, 14 Oct 2024 14:29:58 +0900
> > Takashi Yano wrote:
> > > Hi Brian,
> > > 
> > > Thanks for the detail expression.
> > > 
> > > On Sun, 13 Oct 2024 16:19:31 -0600
> > > Brian Inglis wrote:
> > > > On 2024-10-13 14:06, Takashi Yano via Cygwin wrote:
> > > > > Hi Brian
> > > > > 
> > > > > On Sun, 13 Oct 2024 10:41:58 -0600
> > > > > Brian Inglis wrote:
> > > > >> On 2024-10-12 17:14, Takashi Yano via Cygwin wrote:
> > > > >>> Hi Brian,
> > > > >>>
> > > > >>> On Tue, 8 Oct 2024 10:37:14 -0600
> > > > >>> Brian Inglis wrote:
> > > > >>>> On 2024-10-08 10:14, Brian Inglis via Cygwin wrote:
> > > > >>>>> On 2024-10-08 05:20, Takashi Yano via Cygwin wrote:
> > > > >>>>>> On Mon, 7 Oct 2024 15:11:52 +0200
> > > > >>>>>> Christian Franke wrote:
> > > > >>>>>>> $ gcc -o sigtest -O2 sigtest.c
> > > > >>>>>>>
> > > > >>>>>>> $ ./sigtest > out.txt
> > > > >>>>>>> (press ^C 42x :-)
> > > > >>>>>>>
> > > > >>>>>>> $ sort out.txt | uniq -c
> > > > >>>>>>>           3 x = 0x1.23456789p+0, y = -nan, d = -nan
> > > > >>>>>>>           6 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = -nan
> > > > >>>>>>>          33 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = 0x0p+0
> > > > >>>>>>>
> > > > >>>>>>> The problem also occurs if compiled without -O2, but less 
> > > > >>>>>>> often. No
> > > > >>>>>>> problem occurs if compiled with -DWORKS which suggests that 
> > > > >>>>>>> only 'long
> > > > >>>>>>> double' is affected.
> > > > >>>>>>
> > > > >>>>>> Thanks for the report. I looked into this problem and might find 
> > > > >>>>>> the
> > > > >>>>>> cause. It seems due to a bug of scripts/gendef. It generates 
> > > > >>>>>> signal
> > > > >>>>>> handler caller (sigfe.s) which stores/restores the registers.
> > > > >>>>>>
> > > > >>>>>> In sigdelayed, control word is stored/restored by fnstcw/fldcw 
> > > > >>>>>> instruction,
> > > > >>>>>> however, fninit instruction destroys some status registers in 
> > > > >>>>>> FPU (x87).
> > > > >>>>>>
> > > > >>>>>> I think we shold use fnstenv/fldenv rather than fnstcw/fldcw and 
> > > > >>>>>> fninit.
> > > > >>>>>> However, I'm not familiar with x87 instructions, so I may 
> > > > >>>>>> overlook
> > > > >>>>>> something.
> > > > >>>>>>
> > > > >>>>>> Could anyone expert of x87 instructions and sigfe stuff give some
> > > > >>>>>> comments?
> > > > >>>>>
> > > > >>>>> AIUI x87 FP handling is outdated and mainly unused on current 
> > > > >>>>> systems, as
> > > > >>>>> current systems do more and use more than the legacy x87 
> > > > >>>>> instructions and stack.
> > > > >>>>>
> > > > >>>>> See https://en.cppreference.com/w/c/numeric/fenv and related docs 
> > > > >>>>> for more
> > > > >>>>> modern approaches.
> > > > >>>>>
> > > > >>>>> You would have to look into the AMD/Intel/IEEE docs for lower 
> > > > >>>>> level details.
> > > > >>>>
> > > > >>>> This is basically what ISTR:
> > > > >>>>
> > > > >>>> https://beta.boost.org/doc/libs/1_82_0/libs/context/doc/html/context/rationale/x86_and_floating_point_env.html
> > > > >>>>
> > > > >>>> where legacy x87 and MMX registers are not used or preserved on 
> > > > >>>> x86_64/amd64, as
> > > > >>>> SSE... instructions and XMM registers are used.
> > > > >>>
> > > > >>> Thanks for the advice. I read throuh the web pages and related 
> > > > >>> documents
> > > > >>> and made a patch which uses fxsave/fxrstor and xsave/xrstror to
> > > > >>> cygwin-patc...@cygwin.com mailing list.
> > > > >>> https://cygwin.com/pipermail/cygwin-patches/2024q4/012804.html
> > > > >>>
> > > > >>> Is this as you intended?
> > > > >>
> > > > >> That seems to be the preferred approach now, as long as you can 
> > > > >> correctly
> > > > >> determine adequate space for fxsave and xsave, given the varying 
> > > > >> feature sets,
> > > > >> register counts, and register sizes of recent processors:
> > > > >> sse/2/3/4.1/4.2/4a/5/ssse3 avx2/512 128/256/512 bits X/Y/ZMM 
> > > > >> registers.
> > > > > 
> > > > > Thanks for checking.
> > > > > 
> > > > > According to https://cdrdv2.intel.com/v1/dl/getContent/671110 ,
> > > > > fxsave uses 512 bytes fixed length memory to save the current
> > > > > state of the x87 FPU, MMX technology, XMM, and MXCSR registers.
> > > > > 
> > > > > The patch allocates 0x238 bytes:
> > > > >   0x200 (512 bytes): fxsave area
> > > > >   0x008 (  8 bytes): for 16-byte alignment
> > > > >   0x010 ( 16 bytes): work area
> > > > >   0x020 ( 32 bytes): reserved for later processing
> > > > 
> > > > That is just the FPU state, MMX state, and 16 16B XMM registers, etc.
> > > > Please also note that 64 bit operands or REX prefix must be used with 
> > > > FXSAVE/FXRSTOR to save expanded state rather than legacy state.
> > > 
> > > Fixed.
> > > 
> > > > > According to https://cdrdv2.intel.com/v1/dl/getContent/671436 ,
> > > > > cpuid instruction with eax=0dh and ecs=00h returns the maximum
> > > > > size required by xsave in ebx. So the patch allocates:
> > > > > ebx + 0x048 bytes.
> > > > >   0x018 ( 24 bytes): for 64-byte alignment
> > > > >   0x010 ( 16 bytes): work area
> > > > >   0x020 ( 32 bytes): reserved for later processing
> > > > 
> > > > That is for features currently enabled in XCR0 user state, not all the 
> > > > values of 
> > > > all possible registers, for all possible features, in ecx, which are 
> > > > supported, 
> > > > may be enabled, and in use.
> > > > You need 2KB to store 32 X/Y/ZMM 64B registers, and new real and 
> > > > virtual 
> > > > features may require more.
> > > 
> > > Do you mean we should use ecx value rather than ebx returned by
> > > cpuid (eax=0dh,ecx=0)? I did not understand difference of the
> > > values of ebx and ecx returned by cpuid.
> > > 
> > > Fixed.
> > 
> > On the second thought, it is not necessary to use the ecx value
> > because the patch uses the EDX:EAX value of cpuid(0d,0) for xsave,
> > is it? This means that only features enabled in XCR0 are saved.
> > The features not enabed in XCR0 cannot be used in user mode, so
> > we do not need to store the states for them.
> 
> No, I was wrong. The EDX:EAX value of cpuid(0d,0) means the features
> that CAN BE enabled in XCR0.
> 
> Please see v3 patch.

Any suggenstions?

-- 
Takashi Yano <takashi.y...@nifty.ne.jp>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to