https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92892
Bug ID: 92892 Summary: [AARCH64] TBL-based permutations can be implemented more efficiently for 2-element vectors Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: dpochepk at gmail dot com Target Milestone: --- Current vector elements permutation implementation generates different instructions depending on specific permutation form. For permutations like: "target[0] = src1[0]; target[1] = src2[1];" the TBL instruction is used and following instructions sequence is generated: mov tmpReg1, src1; mov tmpReg2, src2; tbl target, {tmpReg1, tmpReg2}, ... // the tmpReg1 and tmpReg2 registers which are numbered consecutively, as required by tbl instruction For 2-element vectors this sequence can be reduced to: mov target[0], src1[0] mov target[1], src2[1] And it can be reduced to a single mov in case target = src, which is already implemented in patch prototype I'm working on.