On Tue, 29 Nov 2022 at 20:43, Andrew Pinski <pins...@gmail.com> wrote: > > On Tue, Nov 29, 2022 at 6:40 AM Prathamesh Kulkarni via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Hi, > > For the following test-case: > > > > int16x8_t foo(int16_t x, int16_t y) > > { > > return (int16x8_t) { x, y, x, y, x, y, x, y }; > > } > > (Not to block this patch) > Seems like this trick can be done even with less than perfect initializer too: > e.g. > int16x8_t foo(int16_t x, int16_t y) > { > return (int16x8_t) { x, y, x, y, x, y, x, 0 }; > } > > Which should generate something like: > dup v0.8h, w0 > dup v1.8h, w1 > zip1 v0.8h, v0.8h, v1.8h > ins v0.h[7], wzr Hi Andrew, Nice catch, thanks for the suggestions! More generally, code-gen with constants involved seems to be sub-optimal. For example: int16x8_t foo(int16_t x) { return (int16x8_t) { x, x, x, x, x, x, x, 1 }; }
results in: foo: movi v0.8h, 0x1 ins v0.h[0], w0 ins v0.h[1], w0 ins v0.h[2], w0 ins v0.h[3], w0 ins v0.h[4], w0 ins v0.h[5], w0 ins v0.h[6], w0 ret which I suppose could instead be the following ? foo: dup v0.8h, w0 mov w1, 0x1 ins v0.h[7], w1 ret I will try to address this in follow up patch. Thanks, Prathamesh > > Thanks, > Andrew Pinski > > > > > > Code gen at -O3: > > foo: > > dup v0.8h, w0 > > ins v0.h[1], w1 > > ins v0.h[3], w1 > > ins v0.h[5], w1 > > ins v0.h[7], w1 > > ret > > > > For 16 elements, it results in 8 ins instructions which might not be > > optimal perhaps. > > I guess, the above code-gen would be equivalent to the following ? > > dup v0.8h, w0 > > dup v1.8h, w1 > > zip1 v0.8h, v0.8h, v1.8h > > > > I have attached patch to do the same, if number of elements >= 8, > > which should be possibly better compared to current code-gen ? > > Patch passes bootstrap+test on aarch64-linux-gnu. > > Does the patch look OK ? > > > > Thanks, > > Prathamesh