On Sun, 2014-11-23 at 23:30 +0100, Julian Taylor wrote: > On 23.11.2014 23:11, Simon McVittie wrote: > > On 23/11/14 17:55, Simon McVittie wrote: > >> Unfortunately, on my x86-64 laptop, my patched liblzo2 with > >> -DLZO_CFG_NO_UNALIGNED on all architectures seems to be half as fast as > >> the unpatched one > > [...] > >> I'm trying out a slightly different approach: keeping the unaligned > >> accesses via casts like *(uint16_t *) on architectures where lzodefs.h > >> specifically allows them, but disabling the casts via > >> struct { char[n] } conditional on alignof(that struct) == 1, which seem > >> to be the problematic ones. > > > > That fixed the performance regression on amd64 while still working > > correctly on armv5tel, so I've uploaded it as a DELAYED/7 NMU. See > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=757037 for nmudiff. > > > > If anyone has better ideas, I'm happy to cancel the delayed upload and > > let someone take over fixing the bug. > > > > > what works well is just replacing the offending memory loads with the > memcpy call. [...]
That is not necessarily true, e.g. in this function void copy_foo(struct foo *dst, const struct foo *src) { memcpy(dst, src, sizeof(*dst)); } the compiler is still allowed to assume that src has the proper alignment for struct foo and to optimise the memcpy() accordingly. And yes, this is something that gcc really does. Pointers to an unaligned instance of a structure generally need to be declared as void *, char * or unsigned char * (or const-qualified versions). Ben. -- Ben Hutchings Never put off till tomorrow what you can avoid all together.
signature.asc
Description: This is a digitally signed message part