I tested your patch with the "teximage" program in mesa demos, the same thing I used to benchmark when I developed this code. As Matt and Chad point out, the odd-looking _faster functions are there for a reason. Your change causes a huge slowdown. I tested on a sandybridge system with a "Intel(R) Celeron(R) CPU 857 @ 1.20GHz." Mesa compiled with -O2.
original code: TexSubImage(RGBA/ubyte 256 x 256): 9660.4 images/sec, 2415.1 MB/sec TexSubImage(RGBA/ubyte 1024 x 1024): 821.2 images/sec, 3284.7 MB/sec TexSubImage(RGBA/ubyte 4096 x 4096): 76.3 images/sec, 4884.9 MB/sec TexSubImage(BGRA/ubyte 256 x 256): 11307.1 images/sec, 2826.8 MB/sec TexSubImage(BGRA/ubyte 1024 x 1024): 944.6 images/sec, 3778.6 MB/sec TexSubImage(BGRA/ubyte 4096 x 4096): 76.7 images/sec, 4908.3 MB/sec TexSubImage(L/ubyte 256 x 256): 17847.5 images/sec, 1115.5 MB/sec TexSubImage(L/ubyte 1024 x 1024): 3068.2 images/sec, 3068.2 MB/sec TexSubImage(L/ubyte 4096 x 4096): 224.6 images/sec, 3593.0 MB/sec your code: TexSubImage(RGBA/ubyte 256 x 256): 3271.6 images/sec, 817.9 MB/sec TexSubImage(RGBA/ubyte 1024 x 1024): 232.3 images/sec, 929.2 MB/sec TexSubImage(RGBA/ubyte 4096 x 4096): 47.5 images/sec, 3038.6 MB/sec TexSubImage(BGRA/ubyte 256 x 256): 2426.5 images/sec, 606.6 MB/sec TexSubImage(BGRA/ubyte 1024 x 1024): 164.1 images/sec, 656.4 MB/sec TexSubImage(BGRA/ubyte 4096 x 4096): 13.4 images/sec, 854.8 MB/sec TexSubImage(L/ubyte 256 x 256): 9514.5 images/sec, 594.7 MB/sec TexSubImage(L/ubyte 1024 x 1024): 864.1 images/sec, 864.1 MB/sec TexSubImage(L/ubyte 4096 x 4096): 59.7 images/sec, 955.2 MB/sec This is just one run, not an average, but you can see it's slower across the board up to a factor of around 6. Also I couldn't configure the build after your patch. I think you left out a change to configure.ac to define SSSE3_SUPPORTED. On Thu, Nov 6, 2014 at 6:08 PM, Chad Versace <chad.vers...@intel.com> wrote: > On Thu 06 Nov 2014, Timothy Arceri wrote: >> >> Also cleans up some if statements in the *faster functions. > > > I have comments about the cleanup below. > >> diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c >> b/src/mesa/drivers/dri/i965/intel_tex_subimage.c >> index cb5738a..0deeb75 100644 >> --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c >> +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c > > > /** > * Copy texture data from linear to X tile layout, faster. > * > * Same as \ref xtile_copy but faster, because it passes constant parameters > * for common cases, allowing the compiler to inline code optimized for those > * cases. > * > * \copydoc tile_copy_fn > */ > static FLATTEN void > xtile_copy_faster(...) > >> @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t >> x2, uint32_t x3, >> mem_copy_fn mem_copy) > > > >> { >> if (x0 == 0 && x3 == xtile_width && y0 == 0 && y1 == xtile_height) { >> - if (mem_copy == memcpy) >> - return xtile_copy(0, 0, xtile_width, xtile_width, 0, >> xtile_height, >> - dst, src, src_pitch, swizzle_bit, memcpy); >> - else if (mem_copy == rgba8_copy) >> - return xtile_copy(0, 0, xtile_width, xtile_width, 0, >> xtile_height, >> - dst, src, src_pitch, swizzle_bit, rgba8_copy); >> - } else { >> - if (mem_copy == memcpy) >> - return xtile_copy(x0, x1, x2, x3, y0, y1, >> - dst, src, src_pitch, swizzle_bit, memcpy); >> - else if (mem_copy == rgba8_copy) >> - return xtile_copy(x0, x1, x2, x3, y0, y1, >> - dst, src, src_pitch, swizzle_bit, rgba8_copy); >> + return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, >> + dst, src, src_pitch, swizzle_bit, mem_copy); >> } >> xtile_copy(x0, x1, x2, x3, y0, y1, >> dst, src, src_pitch, swizzle_bit, mem_copy); > > > The "cleanup" of this if tree concerns me. Accoring the function > comment, the original author of this function, fjhenigman, clearly created > the weird 'if' tree with the intentation that the compiler would "inline > code optimized for those cases". > > Without one of the following, I object to this cleanup: > - Frank's approval, or > - Proof that gcc never does the desired optimizations, or > - Proof that this change does not harm's Chrome's texture upload > performance. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev