> > Well, it's of course the poor-mans solution compared to providing our own > ifunc-enabled libm ...
One benefit here would be that we could have our own calling convention for this. So for floor/ceil we may just declare registers to be preserved (as they are on all modern AVX enabled cpus) which would make the code size/speed tradeoffs more interesting. Honza > > I would expect that for SSE 4.1 the PLT and call overhead is measurable > and an inline run-time check be quite a bit more efficient. As you have a > testcase would it be possible to measure that by hand-editing the assembly > (or the benchmark source in case it is not fortran...)? > > The whole point of having the inline expansions was to have inline expansions, > avoding the need to spill the whole set of SSE regs around such calls. > > > I was just surprised by the glibc check, what would you consider a > > recent-enough glibc? Or is the check mainly necessary to ensure we > > are indeed using glibc and not some other libc (and thus something > > like we do for TARGET_LIBC_PROVIDES_SSP would do)? > > > > I will try to come up with a patch. > > I don't think this is the appropriate solution. Try disabling the inline > expansion and run SPEC (without -march=sse4.1 of course). > > I realize that doing the inline-expansion with a runtime check > is going to be quite tricky and the GCC local IFUNC trick doesn't > solve the inlining (but we might be able to avoid spilling with some > IPA RA help and/or attributes?). > > Richard. > > > Thanks, > > > > Martin