BIT_FIELD_REFs can extract almost any kind of type but specifically is used to extract vector elements (that very special case is handled) but also sub-vectors (missing). RTL expansion of stores relies on an appropriate mode to use vector stores it seems.
The following patch relaxes the condition under which we force VOIDmode by making all non-integral types where the extraction size matches the type size (thus isn't "bitfieldish") use the mode of the extraction type. The patch leaves alone things like QImode extracts from SImode since that would need to check the offset as well whereas I assume we cannot extract non-INTEGRAL entities at non-byte aligned offsets(?). Bootstrap / regtest running on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Richard. 2019-09-25 Richard Biener <rguent...@suse.de> PR middle-end/91897 * expr.c (get_inner_reference): For BIT_FIELD_REF with non-integral type and matching access size retain the original mode. * gcc.target/i386/pr91897.c: New testcase. Index: gcc/expr.c =================================================================== --- gcc/expr.c (revision 276123) +++ gcc/expr.c (working copy) @@ -7232,8 +7232,9 @@ get_inner_reference (tree exp, poly_int6 /* For vector types, with the correct size of access, use the mode of inner type. */ - if (TREE_CODE (TREE_TYPE (TREE_OPERAND (exp, 0))) == VECTOR_TYPE - && TREE_TYPE (exp) == TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0))) + if (((TREE_CODE (TREE_TYPE (TREE_OPERAND (exp, 0))) == VECTOR_TYPE + && TREE_TYPE (exp) == TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0)))) + || !INTEGRAL_TYPE_P (TREE_TYPE (exp))) && tree_int_cst_equal (size_tree, TYPE_SIZE (TREE_TYPE (exp)))) mode = TYPE_MODE (TREE_TYPE (exp)); } Index: gcc/testsuite/gcc.target/i386/pr91897.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr91897.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr91897.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +typedef double Double16 __attribute__((vector_size(8*16))); + +void mult(Double16 *res, const Double16 *v1, const Double16 *v2) +{ + *res = *v1 * *v2; +} + +/* We want 4 ymm loads and 4 ymm stores. */ +/* { dg-final { scan-assembler-times "movapd" 8 } } */