Kewen: On Fri, 2023-06-30 at 15:20 -0700, Carl Love wrote: > Segher never liked the above way of looking at the assembly. He > prefers: > gcc -S -g -mcpu=power8 -o vsx-vector-6-func-2lop.s vsx-vector-6- > func- > 2lop.c > > grep xxlor vsx-vector-6-func-2lop.s | wc > 34 68 516 > > So, again, I get the same count of 34 on both makalu and genoa. But > again, that doesn't agree with what make script/scan-assembler thinks > the counts should be. > > When I looked at the vsx-vector-6-func-2lop.s I see on BE: > > .... > lxvd2x 0,10,9 > xxlor 0,12,0 > xxlnor 0,0,0 > ... > > I was guessing that it was adjusting the data layout from the load. > But looking again more carefully versus LE: > > .... > lxvd2x 0,31,9 > xxpermdi 0,0,0,2 > xxlor 0,12,0 > xxlnor 0,0,0 > xxpermdi 0,0,0,2 > .... > > the xxpermdi is probably what is really doing the data layout change. > > So, we have the issue that looking at the assembly gives different > instruction counts then what > > dg-final { scan-assembler-times {\mxxlor\M} } > > comes up with??? Now I am really confused. I don't know how the > scan- > assembler-times works but I will go see if I can find it and see if I > can figure out what the issue is. I would expect that the scan- > assembler is working off the --save-temp files, which get deleted as > part of the run. I would guess that scan-assembler does a grep to > find > the instructions and then maybe uses wc to count them??? I will go > see > if I can figure out how scan-assembler-times works.
OK, I figured out why I was getting 34 xxlor instructions instead of the 22 that the scan-assembler-times was getting. The difference was when I compiled the program I forgot to use -O2. So with -O2 I get the same number of xxlor instructins as scan-assembler-instructions. I get 34 if I do not specify optimization. So, I think the scan-assembler-times are all correct. As Peter says, counting xxlor is a bit problematic in general. We could just drop counting xxlor or have the LE/BE count qualifier for the instructions. Your call. Carl