Re: [fpc-devel] cmem not aligning memory

2010-04-03 Thread Michalis Kamburelis
Marco van de Voort wrote: > In our previous episode, Jonas Maebe said: >>> Or do we have to allocate more bytes for blocks that are a multiple of 8? >> FPC's default memory manager even guarantees 16 byte alignment (for vectors). > > So a possible solution is to allocate 16-sizeof(ptruint) bytes m

Re: [fpc-devel] cmem not aligning memory

2010-04-03 Thread Jonas Maebe
On 03 Apr 2010, at 14:09, Micha Nelissen wrote: > Do C memory managers guarantee any alignment anyway? Not for SSE (16 bytes) > I'm sure, but 8 bytes I don't know. From Linux' malloc man page: For calloc() and malloc(), the value returned is a pointer to the allo- cated memory, which is suitab

Re: [fpc-devel] cmem not aligning memory

2010-04-03 Thread Marco van de Voort
In our previous episode, Jonas Maebe said: > > Or do we have to allocate more bytes for blocks that are a multiple of 8? > > FPC's default memory manager even guarantees 16 byte alignment (for vectors). So a possible solution is to allocate 16-sizeof(ptruint) bytes more? for 32-bit that would me

Re: [fpc-devel] cmem not aligning memory

2010-04-03 Thread Micha Nelissen
C Western wrote: Inspecting the cmem unit indicates the issue is the extra bytes allocated for the count - is this really needed? Or do we have to allocate more bytes for blocks that are a multiple of 8? Do C memory managers guarantee any alignment anyway? Not for SSE (16 bytes) I'm sure, but

Re: [fpc-devel] cmem not aligning memory

2010-04-03 Thread Jonas Maebe
On 03 Apr 2010, at 13:00, C Western wrote: > I notice that the cmem unit does not align memory in the same way as the > default unit - removing the cmem unit makes a factor of two difference in the > speed of some double precision matrix code. (My system is i386). Inspecting > the cmem unit in

[fpc-devel] cmem not aligning memory

2010-04-03 Thread C Western
I notice that the cmem unit does not align memory in the same way as the default unit - removing the cmem unit makes a factor of two difference in the speed of some double precision matrix code. (My system is i386). Inspecting the cmem unit indicates the issue is the extra bytes allocated for t