Chris Wilson <ch...@chris-wilson.co.uk> writes: > Quoting Scott D Phillips (2018-03-20 20:39:25) >> When building intel_tiled_memcpy for i686, the stack will only be >> 4-byte aligned. This isn't sufficient for SSE temporaries which >> require 16-byte alignment. Use the force_align_arg_pointer >> function attribute in that case to ensure sufficient alignment. >> --- >> src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> index 69306828d72..bd8bafbd2d7 100644 >> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> @@ -42,6 +42,12 @@ >> #include <emmintrin.h> >> #endif >> >> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || >> defined(__SSE2__)) >> +#define REALIGN __attribute__((force_align_arg_pointer)) >> +#else >> +#define REALIGN >> +#endif > > It would be a harmless no-op on x86-64 (or essential ;) > >> + >> #define FILE_DEBUG_FLAG DEBUG_TEXTURE >> >> #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b) >> @@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src) >> /** >> * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned. >> */ >> -static inline void * >> +static REALIGN inline void * >> rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes) >> { >> assert(bytes == 0 || !(((uintptr_t)dst) & 0xf)); > > Hmm, if aligned_dst is spilling to stack, so would be aligned_src. As > these are supposed to be inlined (and constant folded), do you not > want to realign the callers instead?
Playing around with this function attribute more, it seems not to work as I had hoped. Specifically I was expecting that having the attribute on an inline function would cause the attribute to be applied to any inliners as well, but that seems to not happen. Instead it seems to work more like "if you emit a prologue here, then use this alternative one". I guess the right solution is to just apply -mstackrealign to the whole file and let the stack fix get applied to all entry points (whatever that winds up being after optimization) like we have for main/streaming-load-memcpy.c. And then relying on inlining to keep any stack munging out of our loops. > Perhaps with an explicit FLATTEN. Ya, it looks like flatten would be good there. > -Chris _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev