> > Frankly speaking, I do not understand, what's wrong here. > > IMHO, this change is pretty mechanical: we just extend maximal aligment > > available. Because of 512-bit data types we now extend maximal aligment to > > 512 bits. > > Nothing wrong per se, but... > > > I suspect that an issue is here: > > if (opt > > && AGGREGATE_TYPE_P (type) > > && TYPE_SIZE (type) > > && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST > > && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= (unsigned) max_align > > > > || TREE_INT_CST_HIGH (TYPE_SIZE (type))) > > > > && align < max_align) > > align = max_align; > > ...yes, bumping max_align has the unexpected side effect of changing the > behavior for sizes between the old value and the new value because of this > code. I'm no x86 specialist, but I think that this should be fixed. > > > Maybe we can split it and handle 256-bit aggregates separately? > > Probably, and we should also add a warning just before the declaration of > max_align, as well as investigate whether this didn't already happen when > max_align was bumped from 128 to 256.
x86-64 ABI has clause about aligning static vars to 128bit boundary at a given size. This was introduced to aid compiler to generate aligned vector store/load even if the object may bind to other object file. This is set to stone and can not be changed for AVX/SSE. For other objects that are fully under local control we can bump up alignment more. I remember this code was originally supposed to bump up to 128bits since it was written long before AVX. I suppose it would make sense to do so when AVX is enabled and we anticipate to use it. I am not quite sure however how important it is given that we have pass to increase alignment for vectorizable arrays. Other case where we can autogenerate SSE is memcpy/memset, but sadly only for variably sized case and we don't do that by default yet (I hope to teach move_by_pieces/store_by_pieces about SSE soonish, but not for 4.9) This logic all come from time when vectorization was in infancy. Honza