Re: [fpc-pascal] Memory alignment with FPC
On 10 okt '12, dhkblas...@zeelandnet.nl wrote: > One more question, when using packed records, is there anything to say about performance? Are there some tests anywhere that show how the performance is impacted? I did some performance tests on win32 and it appears that both packed and unpacked objects and records all show exactly the same performance. Writing the individual variables in a record or object to file takes about 5.5 times longer than writing them at once. If someone wants my test app to run it on other platforms please let me know then I can post the code. I will do more testing later on mac and linux32. I'm interested how win64 and linux64 behave in this respect. So if someone has these architectures please let me know. This makes me wonder if choosing a proper value for $PACKRECORDS could make my file readable safely on all platforms, only needing to convert the endianess if applicable. This would not force me to do manual padding in my structs. Say I use a value of 16 would that cover all ABI's FPC currently supports? Jonas: do you have an overview of the alignment on all architectures that FPC supports? Perhaps you could pinpoint where in the compiler this is handled? If appreciated I could make a patch to include this info in the documentation in the future. Regards, Darius ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Memory alignment with FPC
On 11 Oct 2012, at 13:59, dhkblas...@zeelandnet.nl wrote: I did some performance tests on win32 and it appears that both packed and unpacked objects and records all show exactly the same performance. Writing the individual variables in a record or object to file takes about 5.5 times longer than writing them at once. If someone wants my test app to run it on other platforms please let me know then I can post the code. I will do more testing later on mac and linux32. I'm interested how win64 and linux64 behave in this respect. So if someone has these architectures please let me know. As mentioned before, it not only depends on the platform, but also on the contents of the object/record. E.g., a badly misaligned double will generally give much worse performance even on Intel. This makes me wonder if choosing a proper value for $PACKRECORDS could make my file readable safely on all platforms, only needing to convert the endianess if applicable. This would not force me to do manual padding in my structs. Say I use a value of 16 would that cover all ABI's FPC currently supports? Yes. Jonas: do you have an overview of the alignment on all architectures that FPC supports? The information is not just architecture-specific, but also OS- specific (e.g. the alignment of int64 is 4 on Darwin/i386, but 8 on all other i386 platforms). This is defined in the platform ABI documents (application binary interface). Perhaps you could pinpoint where in the compiler this is handled? If appreciated I could make a patch to include this info in the documentation in the future. It's a combination of tdef.alignment (and its overridden methods in compiler/symdef.pas), tdef.structalignment (idem) and the varalign information in compiler/systems/i_*.pas. And the latter information in turn can be overridden by the programmer with -Oa switch and the {$codealign ...} directive, or is sometimes also adjusted by us when e.g. new data types are introduced, when bugs are found or when support for a new ABI is added that has different requirements (some OSes support multiple ABIs). I don't think documenting it in our manual is a good idea. It's not something people should depend on beyond what the official platform ABIs say, and those documents are maintained separately from our manual (and unfortunately seldom have stable URLs that can be referred to). Jonas___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Memory alignment with FPC
In our previous episode, Nico Erfurth said: > x86 can handle unaligned access, but most implementations (I think > current atoms and via nano are an exception) will suffer a rather high > performance penalty. I thought most modern x86's only had a penalty when an unaligned access crossed a cacheline boundery ? (32 bytes now, 64 bytes on Haswell) ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Memory alignment with FPC
On 11 okt '12, Jonas Maebe wrote: > As mentioned before, it not only depends on the platform, but also on the contents of the object/record. E.g., a badly misaligned double will generally give much worse performance even on Intel. > >> This makes me wonder if >> choosing a proper value for $PACKRECORDS could make my file readable >> safely on all platforms, only needing to convert the endianess if >> applicable. This would not force me to do manual padding in my structs. >> Say I use a value of 16 would that cover all ABI's FPC currently >> supports? > > Yes. So misalignment of for instance a double (or whatever type) will only happen if the record is packed and the packed value is smaller than what the ABI prescribes, correct? Let's assume I set the record to packed 16bytes, this would make reading and writing records as a whole safe on all platform/architecture combinations right? Apart from a few padding bytes, what are the performance penalties of doing this then? Why would there be penalties? Darius ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Memory alignment with FPC
On 11 Oct 2012, at 15:00, dhkblas...@zeelandnet.nl wrote: So misalignment of for instance a double (or whatever type) will only happen if the record is packed and the packed value is smaller than what the ABI prescribes, correct? Yes. Let's assume I set the record to packed 16bytes, this would make reading and writing records as a whole safe on all platform/ architecture combinations right? Apart from a few padding bytes, what are the performance penalties of doing this then? Why would there be penalties? The cpu cache will contain lots of unused padding bytes. Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Memory alignment with FPC
In our previous episode, Jonas Maebe said: > > reading and writing records as a whole safe on all platform/ > > architecture > > combinations right? Apart from a few padding bytes, what are the > > performance penalties of doing this then? Why would there be > > penalties? > > The cpu cache will contain lots of unused padding bytes. And operations that move records will move more bytes. (e.g. reallocation). ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Memory alignment with FPC
On 11 okt '12, Jonas Maebe wrote: > On 11 Oct 2012, at 15:00, dhkblas...@zeelandnet.nl [1]wrote: > >> So misalignment of for instance a double (or whatever type) will only happen if the record is packed and the packed value is smaller than what the ABI prescribes, correct? > > Yes. > >> Let's assume I set the record to packed 16bytes, this would make reading and writing records as a whole safe on all platform/ architecture combinations right? Apart from a few padding bytes, what are the performance penalties of doing this then? Why would there be penalties? > > The cpu cache will contain lots of unused padding bytes. Thanks, I think everything is clear now. My plan now is to respect default padding and write records in one go to disk. The padding value will be written to the file header so the records can be read back one variable at a time when padding differs, otherwise they will be read back in one go again. This will sure come at a cost, but only if the file is shared between different ABI's (as is the case when sharing between different endianess). The result will be that the data structures will be at default padding internally allways making optimal use of the CPU. So is there a way to get the padding value at runtime? Darius Links: -- [1] mailto:dhkblas...@zeelandnet.nl ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Memory alignment with FPC
On 11 Oct 2012, at 15:23, dhkblas...@zeelandnet.nl wrote: Thanks, I think everything is clear now. My plan now is to respect default padding and write records in one go to disk. The padding value will be written to the file header so the records can be read back one variable at a time when padding differs, otherwise they will be read back in one go again. This will sure come at a cost, but only if the file is shared between different ABI's (as is the case when sharing between different endianess). The result will be that the data structures will be at default padding internally allways making optimal use of the CPU. So is there a way to get the padding value at runtime? No. You really should write the fields one by one. Yes, it's slower. That's the cost of portability. You can always optimize by first writing them to a buffer and then writing the buffer in one go. Jonas___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Memory alignment with FPC
Marco van de Voort wrote: In our previous episode, Nico Erfurth said: x86 can handle unaligned access, but most implementations (I think current atoms and via nano are an exception) will suffer a rather high performance penalty. I thought most modern x86's only had a penalty when an unaligned access crossed a cacheline boundery ? (32 bytes now, 64 bytes on Haswell) In any event, I run FPC and Lazarus on SPARC which is susceptible to misalignment and am not currently aware of any problems. -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues] ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Memory alignment with FPC
On 11 okt '12, Jonas Maebe wrote: > On 11 Oct 2012, at 15:23, dhkblas...@zeelandnet.nl [1] wrote: > >> Thanks, I think everything is clear now. My plan now is to >> respect default padding and write records in one go to disk. The padding >> value will be written to the file header so the records can be read back >> one variable at a time when padding differs, otherwise they will be read >> back in one go again. This will sure come at a cost, but only if the >> file is shared between different ABI's (as is the case when sharing >> between different endianess). The result will be that the data >> structures will be at default padding internally allways making optimal >> use of the CPU. >> >> So is there a way to get the padding value at runtime? > > No. You really should write the fields one by one. Yes, it's slower. That's the cost of portability. You can always optimize by first writing them to a buffer and then writing the buffer in one go. > Jonas Sorry I keep asking questions, but why write them one by one? If I would store the offset each variable has at the time of writing (only need to do one time per record type), I could easily make the loading work (even if the ABI changes when the file is read back). What makes you prefer writing the variables one by one over once at a time? Darius Links: -- [1] mailto:dhkblas...@zeelandnet.nl ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Memory alignment with FPC
On 11 Oct 2012, at 16:11, dhkblas...@zeelandnet.nl wrote: On 11 okt '12, Jonas Maebe wrote: No. You really should write the fields one by one. Yes, it's slower. That's the cost of portability. You can always optimize by first writing them to a buffer and then writing the buffer in one go. Sorry I keep asking questions, but why write them one by one? If I would store the offset each variable has at the time of writing (only need to do one time per record type), I could easily make the loading work (even if the ABI changes when the file is read back). What makes you prefer writing the variables one by one over once at a time? I always prefer simple techniques over elaborate strategies aimed at optimizing things, especially if it's not clear that they will ever be the performance bottleneck in the first place. You're moreover trading space (storing all the offsets) for cpu operations here, and I/O is generally two or more orders of a magnitude slower than moving data in memory. Jonas___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal