https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82450
Bug ID: 82450 Summary: Consider optimizing multidimensional arrays access without -ftree-vectorize Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Iterating over multidimensional array uses a counter for each dimension. For code using array_t = unsigned[10][10]; void multidim_array_fill_1(array_t& data) { for (unsigned i = 0; i < 10; ++i) { for (unsigned j = 0; j < 10; ++j) { data[i][j] = 1; } } } The following assembly is generated with -O2: multidim_array_fill_1(unsigned int (&) [10][10]): lea rdx, [rdi+40] lea rcx, [rdi+440] <=== This could be avoided .L3: lea rax, [rdx-40] <=== This could be avoided .L2: mov DWORD PTR [rax], 1 add rax, 4 cmp rax, rdx jne .L2 lea rdx, [rax+40] <=== This could be avoided cmp rdx, rcx <=== This could be avoided jne .L3 <=== This could be avoided rep ret Optimal assembly would be multidim_array_fill_1_opt(unsigned int (&) [10][10]): lea rax, [rdi+400] .L2: mov DWORD PTR [rdi], 1 add rdi, 4 cmp rdi, rax jne .L2 rep ret as if rewriting the initial C++ code as: void multidim_array_fill_1_opt(array_t& data_md) { unsigned* data = &data_md[0][0]; for (unsigned i = 0; i < 100; ++i) { data[i] = 1; } } Seems that representing array as a single dimensional without vectorizing could be enabled at -O2 because it is always better: less registers used, code is smaller, less comparisons and instructions in loop. P.S.: With -ftree-vectorize array is represented as a single dimensional array, but memory access is vectorized with increase of code size: .L2: mov DWORD PTR [rdi+32], 1 mov DWORD PTR [rdi+36], 1 add rdi, 40 movups XMMWORD PTR [rdi-40], xmm0 movups XMMWORD PTR [rdi-24], xmm0 cmp rax, rdi jne .L2