For those interested, on a Nokia N900 running hardfp-meego-runfast-by-default:

[root@localhost fpumode]# ./fpumodetest
* run scalar_load<float> testing
=> 28.499384 seconds
* run vector_load<float> testing
=> 14.4294176433 seconds
* run scalar_load<double> testing
=> 28.508026 seconds
* run vector_load<double> testing
=> 21.349396 seconds
[root@localhost fpumode]# LD_PRELOAD=$PWD/fpumode-ieee.so ./fpumodetest
* fpu mode build Oct 29 2010 14:38:19
* current fpu mode is 0x03000000 [RUN FAST]
* changing mode to 0x00001f00 [IEEE]
* run scalar_load<float> testing
=> 28.511074 seconds
* run vector_load<float> testing
=> 21.4294921398 seconds
* run scalar_load<double> testing
=> 29.4294486949 seconds
* run vector_load<double> testing
=> 21.333160 seconds
[root@localhost fpumode]# LD_PRELOAD=$PWD/fpumode-fast.so ./fpumodetest
* fpu mode build Oct 29 2010 14:38:19
* current fpu mode is 0x03000000 [RUN FAST]
* changing mode to 0x03000000 [RUN FAST]
* run scalar_load<float> testing
=> 29.4294486369 seconds
* run vector_load<float> testing
=> 13.230743 seconds
* run scalar_load<double> testing
=> 29.4294424724 seconds
* run vector_load<double> testing
=> 21.397857 seconds

/Carsten

2011/1/14  <leonid.moiseic...@nokia.com>:
> See the attached package. Readme contains example of usage. The binaries 
> compiled for ca8-hardfp, so you can launch it without recompilation if hw is 
> the same.
>
> With best wishes,
> Leonid
>
>
> -----Original Message-----
> From: carsten.m...@gmail.com [mailto:carsten.m...@gmail.com] On Behalf Of ext 
> Carsten Munk
> Sent: 14 January, 2011 11:21
> To: Moiseichuk Leonid (Nokia-MS/Helsinki)
> Cc: thi...@kde.org; meego-dev@meego.com
> Subject: Re: [MeeGo-dev] ARM RunFast by default in glibc
>
> 2011/1/14  <leonid.moiseic...@nokia.com>:
>> Enabling run-fast mode using -ffast-math is not-trivial hack. Also required 
>> updating packages for compilation flags or global options.
>> Patching glibc is much cheaper to implement and safer.
>>
>> In ideal case the speedup on cotext-a8 could be around 40% (depends on 
>> vector/matrix size), even non-vector operations with floats improves for 
>> margin more 10%.
>> BUT all float/doubles operations could be are affected: you may get 
>> different outcome in comparison to IEEE mode.
>
> Got any good benchmarks/tools we can run so we can verify this on
> actual MeeGo hardfp?
>>
>> With best wishes,
>> Leonid
>>
>>
>> -----Original Message-----
>> From: meego-dev-boun...@meego.com [mailto:meego-dev-boun...@meego.com] On 
>> Behalf Of ext Thiago Macieira
>> Sent: 12 January, 2011 17:55
>> To: meego-dev@meego.com
>> Subject: Re: [MeeGo-dev] ARM RunFast by default in glibc
>>
>> On Wednesday, 12 de January de 2011 16:01:31 Carsten Munk wrote:
>>> 2011/1/12 Arjan van de Ven <ar...@linux.intel.com>:
>>> > On 1/12/2011 1:06 AM, Carsten Munk wrote:
>>> >> Hi (ARM toolchain group mostly)
>>> >>
>>> >> Do we have a patch for glibc-2.11-12-g24c0bf7 and/or glibc-2.12.1
>>> >> that enables ARM RunFast[1] mode by default anywhere? Would be good
>>> >> to push it along with hardfp while we're at it and getting things
>>> >> tested through.
>>> >
>>> > can this be turned into something that's passed in via CFLAGS ?
>>> > that way apps will not be surprised, and there is an easy way for us
>>> > to toggle
>>
>> Right now, it's a context configuration, so there's nothing that will really 
>> work from CFLAGS. Without changing gcc, the only thing we could do is supply 
>> different crt1.o, one that puts the FPU in RunFast, the other doesn't.
>>
>> But this will, like I said, apply to all code within a process, so it 
>> doesn't help the library case. Libraries will need to cope with running in 
>> both modes.
>>
>>> > of course we can have a default in our OBS that you pick, but it
>>> > becomes an easy-to-manage (from a distro perspective) property
>>>
>>> That would be a better way than patching glibc, I would believe?
>>
>> Not necessarily. To do the right thing, the compiler would need to emit the 
>> code that changes FPSCR before any FP operation, so this means an increase 
>> in code size. I can also bet that there's a pipeline delay in modifying this 
>> register.
>>
>> And there's no such GCC patch.
>>
>>> Wouldn't -ffast-math correspond to this on x86 side at least?
>>>
>>> Leonid, does this correspond to an auto-setup of RunFast on ARM, when
>>> used there?
>>
>> No, it's different.
>>
>> By the way, I should point out that on Cortex-A8, RunFast only has a 
>> perceptible improvement for float. If you use double, you still have 
>> performance issues.
>>
>> On Cortex-A9, both are fast.
>>
>> --
>> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>>  Senior Product Manager - Nokia, Qt Development Frameworks
>>      PGP/GPG: 0x6EF45358; fingerprint:
>>      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
>> _______________________________________________
>> MeeGo-dev mailing list
>> MeeGo-dev@meego.com
>> http://lists.meego.com/listinfo/meego-dev
>>
>
_______________________________________________
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev

Reply via email to