> > Just to make sure I don't misunderstand: > > Does this mean intrinsics are suboptimal to write assembly > > code? > Here's what I mean: All variables below are of type "vector int" > > 1. v0 = v2 * v3 > 2. v0 = v4 * v5 + v6 * v7 + v8 * v9 > > The first statement produces 1 multiply, 1 multiply-sum and 1 addition > instruction in assembly. > > The second produces 6 multiply, 6 multiply-sum, and 10 addition > instructions in assembly! I expected 3, 3, 3 of each respective > operations from (1) plus 2 additions.
The operations counts given above were obtained using gcc 5.3.1 on Fedora 22. I just created a simple test with those same statements and compiled using gcc 6.1.1 on Fedora 24. The assembly operation counts are what I had expected initially and more reasonable. So, I'm going to move my ffmpeg development onto the Fedora 24 cloud image and see if the SIMD performance there is better than was on Fedora 22. The reason I'm moving to Fedora 24 instead of trying to upgrade gcc on Fedora 22 is that I've learned to prefer standard pre-installed images to the wrecks I've managed to create doing my own sysadmin on the POWER8 cloud. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel