On Thu, 2014-11-06 at 19:30 -0500, Frank Henigman wrote: > I tested your patch with the "teximage" program in mesa demos, the > same thing I used to benchmark when I developed this code. > As Matt and Chad point out, the odd-looking _faster functions are > there for a reason. Your change causes a huge slowdown.
Yes I should have known better than to assume it was left over code. I didn't know that gcc could inline memcpy like that, very nice. In fact I was reading a blog just last week that was saying msvc was better than gcc for memcpy because gcc was reliant on a library implementation. A good reminder not to believe everything you read on the internet. Anyway I've had another go at it and the performance regression should be fixed. In my testing I couldn't spot any real difference. The main down side is the ssse3 code can't be inlined so there will be a small trade off compared to the current way of building with ssse3 enabled. Also thanks for pointing out "teximage" I didn't know the mesa demos contained pref tools. > I tested on a sandybridge system with a "Intel(R) Celeron(R) CPU 857 @ > 1.20GHz." Mesa compiled with -O2. > > original code: > TexSubImage(RGBA/ubyte 256 x 256): 9660.4 images/sec, 2415.1 MB/sec > TexSubImage(RGBA/ubyte 1024 x 1024): 821.2 images/sec, 3284.7 MB/sec > TexSubImage(RGBA/ubyte 4096 x 4096): 76.3 images/sec, 4884.9 MB/sec > > TexSubImage(BGRA/ubyte 256 x 256): 11307.1 images/sec, 2826.8 MB/sec > TexSubImage(BGRA/ubyte 1024 x 1024): 944.6 images/sec, 3778.6 MB/sec > TexSubImage(BGRA/ubyte 4096 x 4096): 76.7 images/sec, 4908.3 MB/sec > > TexSubImage(L/ubyte 256 x 256): 17847.5 images/sec, 1115.5 MB/sec > TexSubImage(L/ubyte 1024 x 1024): 3068.2 images/sec, 3068.2 MB/sec > TexSubImage(L/ubyte 4096 x 4096): 224.6 images/sec, 3593.0 MB/sec > > your code: > TexSubImage(RGBA/ubyte 256 x 256): 3271.6 images/sec, 817.9 MB/sec > TexSubImage(RGBA/ubyte 1024 x 1024): 232.3 images/sec, 929.2 MB/sec > TexSubImage(RGBA/ubyte 4096 x 4096): 47.5 images/sec, 3038.6 MB/sec > > TexSubImage(BGRA/ubyte 256 x 256): 2426.5 images/sec, 606.6 MB/sec > TexSubImage(BGRA/ubyte 1024 x 1024): 164.1 images/sec, 656.4 MB/sec > TexSubImage(BGRA/ubyte 4096 x 4096): 13.4 images/sec, 854.8 MB/sec > > TexSubImage(L/ubyte 256 x 256): 9514.5 images/sec, 594.7 MB/sec > TexSubImage(L/ubyte 1024 x 1024): 864.1 images/sec, 864.1 MB/sec > TexSubImage(L/ubyte 4096 x 4096): 59.7 images/sec, 955.2 MB/sec > > This is just one run, not an average, but you can see it's slower > across the board up to a factor of around 6. > Also I couldn't configure the build after your patch. I think you > left out a change to configure.ac to define SSSE3_SUPPORTED. > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev