http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #7 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Rich Felker from comment #6)
> On Sun, Mar 16, 2014 at 11:32:21PM +0000, olegendo at gcc dot gnu.org wrote:
> > If it's OK for a temporary mode switch to clobber other FPSCR bits (such as 
> > in
> > the PR = single mode above), it should also be OK to load the FPSCR value 
> > from
> > a thread local variable inside the thread-control-block:
> > 
> >      mov.l  @(<disp>, gbr),r0  // r0 = address to fpscr value for a
> >                                // particular mode setting
> >      lds.l  @r0+,fpscr         // mode switch
> 
> IMO this is an ugly hack that shouldn't be taken. It has lots of
> complex interactions with other things: signal handlers, the
> ucontext.h functions, fenv, pthread, etc.

Could you please elaborate on the problems you see there?

> that could probably be
> achieved correctly if somebody wanted to spend the effort on it, but
> it would be ugly and SH-specific and honestly we already have a
> shortage of people willing to spend time fixing SH problems without
> introducing even more work.

I failed to mention, that the idea with TLS variables was meant to be a
separate option.  If implemented in the compiler it would need to be turned on
explicitly, similar to the option '-matomic-model=soft-tcb,gbr-offset=<your
offset here>'.  Thus, if a libc/kernel/whatever doesn't want to make use of it,
it won't have to.

> 
> > This would require that any FPSCR setting change is also propagated to
> > the TLS variables.  E.g. setting the rounding mode would have to update
> > FPSCR mode values in all such TLS variables.
> > I guess that it would be useful to be able to select an FPSCR value for at
> > least all combinations of FR and SZ bit settings, in other words having
> > a TLS __fpscr_values array with 4 entries.  However, it would make things
> > such as toggling FPSCR.FR via frchg inefficient due to the required updates
> > of the TLS variables.  Other setting changes such as denorm or rounding
> > mode are probably not so critical.
> 
> If I'm not mistaken, the toggle approach will be efficient without
> this TLS hack once it's implemented, right?

No.  The toggle approach is only efficient on SH4A.  Other SH FPUs don't
implement the fpchg instruction and require sts-lds fpscr sequences, regardless
of toggling or explicitly setting the PR mode. 

> I don't think it makes
> sense to introduce a hack for just a short-term mitigation of the
> performance regression. If you think the short-term fix for this issue
> is too costly, the proper solution is probably to add a -m option to
> turn it back off (using the old __fpscr_values approach).

This was not meant to be a short-term mitigation.  In fact figuring out whether
FPSCR bits can be clobbered during a PR mode switch or not is already not so
simple.  If at all, it would be an additional optimization opportunity after
everything else is in place.

Keeping the 'global __fpscr_values' approach as an option could be useful for
thread model = single, or for bare-metal applications where the rounding mode
etc is never changed from the default and FP status bits are never read back by
application code.

Reply via email to