Re: [fpc-devel] using sse2 packed doubles

2006-10-12 Thread Vincent Snijders
Florian Klaempfl wrote: Daniël Mantione schrieb: To get a large speedup, I think you should instead of making pairs of doubles, do the pixels in parallel. I.e. in this benchmark, a row is 3000 pixels wide, so, make an array of 3000 doubles, and do the operation with arrays. With proper compi

Re: [fpc-devel] using sse2 packed doubles

2006-10-08 Thread Daniël Mantione
Op Sun, 8 Oct 2006, schreef Vincent Snijders: > > You are right. How about doing it in blocks of 8x8 pixels? The high > > iteration loops are concentrated close to the borders of > > the set, so for most blocks the iteration can then be ended early. > > For starters I was thinking about blocks

Re: [fpc-devel] using sse2 packed doubles

2006-10-08 Thread Vincent Snijders
Daniël Mantione wrote: Op Sun, 8 Oct 2006, schreef Vincent Snijders: Daniël Mantione wrote: Op Sat, 7 Oct 2006, schreef Florian Klaempfl: Vincent Snijders schrieb: I started to add vector pascal like support, currently only i386/x86_64 are supported (no generic support). The whole (cur

Re: [fpc-devel] using sse2 packed doubles

2006-10-08 Thread Daniël Mantione
Op Sun, 8 Oct 2006, schreef Vincent Snijders: > Daniël Mantione wrote: > > > > Op Sat, 7 Oct 2006, schreef Florian Klaempfl: > > > > > > > Vincent Snijders schrieb: > > > > > > I started to add vector pascal like support, currently only > > > i386/x86_64 are > > > supported (no generic suppo

Re: [fpc-devel] using sse2 packed doubles

2006-10-08 Thread Marco van de Voort
> The 'problem' in this benchmark is that the number of iterations of the > inner loop isn't fixed, but can vary between 1 and 50. If you pair two > doubles, the change you can break the loop for all elements of the > vector before iteration 50 is bigger than when you combine 3000 elements. The

Re: [fpc-devel] using sse2 packed doubles

2006-10-08 Thread Vincent Snijders
Daniël Mantione wrote: Op Sat, 7 Oct 2006, schreef Florian Klaempfl: Vincent Snijders schrieb: I started to add vector pascal like support, currently only i386/x86_64 are supported (no generic support). The whole (currently implemented) functionality is demonstrated by the following example.

Re: [fpc-devel] using sse2 packed doubles

2006-10-08 Thread Vincent Snijders
Florian Klaempfl wrote: Vincent Snijders schrieb: Daniël Mantione wrote: Op Fri, 6 Oct 2006, schreef Micha Nelissen: Vincent Snijders wrote: You could also start an assembler implementation of the matrix unit. I suppose using it is allowed, and a Tvector2_double looks a lot like such a

Re: [fpc-devel] using sse2 packed doubles

2006-10-08 Thread Florian Klaempfl
Daniël Mantione schrieb: Op Sat, 7 Oct 2006, schreef Florian Klaempfl: Vincent Snijders schrieb: Daniël Mantione wrote: Op Fri, 6 Oct 2006, schreef Micha Nelissen: Vincent Snijders wrote: You could also start an assembler implementation of the matrix unit. I suppose using it is allowed,

Re: [fpc-devel] using sse2 packed doubles

2006-10-08 Thread Daniël Mantione
Op Sat, 7 Oct 2006, schreef Florian Klaempfl: > Vincent Snijders schrieb: > > Daniël Mantione wrote: > > > > > > Op Fri, 6 Oct 2006, schreef Micha Nelissen: > > > > > > > > > > Vincent Snijders wrote: > > > > > > > You could also start an assembler implementation of the matrix unit. > > > I

Re: [fpc-devel] using sse2 packed doubles

2006-10-07 Thread Florian Klaempfl
Vincent Snijders schrieb: Daniël Mantione wrote: Op Fri, 6 Oct 2006, schreef Micha Nelissen: Vincent Snijders wrote: You could also start an assembler implementation of the matrix unit. I suppose using it is allowed, and a Tvector2_double looks a lot like such a double2. Unless the comp

Re: [fpc-devel] using sse2 packed doubles

2006-10-07 Thread Vincent Snijders
Daniël Mantione wrote: Op Fri, 6 Oct 2006, schreef Micha Nelissen: Vincent Snijders wrote: You could also start an assembler implementation of the matrix unit. I suppose using it is allowed, and a Tvector2_double looks a lot like such a double2. Unless the compiler somehow helps, inlinin

Re: [fpc-devel] using sse2 packed doubles

2006-10-06 Thread Vincent Snijders
Micha Nelissen wrote: Vincent Snijders wrote: tracker item for this? Compiler hacking is out of my league. It's also "just" pascal, you know ;-). Sure, but modifying the compiler to reach my goal is an extra indirection, which makes it more difficult for me. There is a difference betwe

Re: [fpc-devel] using sse2 packed doubles

2006-10-06 Thread Daniël Mantione
Op Fri, 6 Oct 2006, schreef Micha Nelissen: > Vincent Snijders wrote: > > So if fpc hypothetically would support a type double2 = array[0..1] of > > double and operators like + - * on it would use the packed double > > instructions of sse2, then it could be used in the shoutout benchmarks. > >

Re: [fpc-devel] using sse2 packed doubles

2006-10-06 Thread Micha Nelissen
Vincent Snijders wrote: > So if fpc hypothetically would support a type double2 = array[0..1] of > double and operators like + - * on it would use the packed double > instructions of sse2, then it could be used in the shoutout benchmarks. Isn't it better to generalize this support to vectors ? May

Re: [fpc-devel] using sse2 packed doubles

2006-10-06 Thread Vincent Snijders
Florian Klaempfl wrote: Vincent Snijders schrieb: Hi, I wondered if it is possible to add support for using sse2 packed doubles in fpc. I tried to create a sse2doubles type and define a operator + on it. Eventually I want to move this type and all operators on it to a separate sse2 unit,

Re: [fpc-devel] using sse2 packed doubles

2006-10-06 Thread Micha Nelissen
Vincent Snijders wrote: > tracker item for this? Compiler hacking is out of my league. It's also "just" pascal, you know ;-). Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] using sse2 packed doubles

2006-10-06 Thread Florian Klaempfl
Vincent Snijders schrieb: Hi, I wondered if it is possible to add support for using sse2 packed doubles in fpc. I tried to create a sse2doubles type and define a operator + on it. Eventually I want to move this type and all operators on it to a separate sse2 unit, so that the user of this t