Re: [Mesa-dev] [PATCH] tgsi: lowering support for alpha test
Rob Clark writes: > Eric, > > we've managed to find the alpha-ref bitfield on a4xx, so suddenly I > don't actually need this anymore.. > > I've pushed a slightly updated version where I'd fixed a few more > things (and was passing the fbo-alphatest-formats tests for me) to my > github tree, in case you feel motivated to take it over: > > > https://github.com/freedreno/mesa/commit/70df2f9a2bbddf8e1608bab2a041a81b9b7c34e8 > > what was left was to drop the const semantic and convert it over to an > interface where driver passes in which const slot it wants to use. To > do that, we might want to re-work the tgsi_lowering interface slightly > so the driver does the first tgsi_scan_shader() (so it knows # of > consts) and tgsi_lowering only does the 2nd tgsi_scan_shader() in > cases where the shader is transformed. We seems like a sane change to > me. With NIR getting ready to land, I'm planning on doing lowering in that instead. It already has a way to describe state updates required for uniforms, which I would just translate to my driver's per-uniform state update enum. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 85380] FTBFS in Clover: /usr/lib/llvm-3.6/include/llvm/Config/Targets.def:26: undefined reference to `LLVMInitialize*Target'
https://bugs.freedesktop.org/show_bug.cgi?id=85380 Niels Ole Salscheider changed: What|Removed |Added CC||niels_ole@salscheider-onlin ||e.de -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600: fix texture gradients instruction emission (v2)
Hello guys, Can anyone backport the following patches for 10.3 and let me know if we should drop them. They have a bunch of non-trivial conflicts, thus i would rather avoid butchering things up. 38ec1844193570486bf35e25a7a4339a00da127e r600: fix texture gradients instruction emission (v2) c88385603ae8d983314b736a9459bbf7d002cf11 r600g: do all CUBE ALU operations before gradient texture operations (v2.1) 07ae69753c6818bcce5d4edaf2cca39c20e37f4c r600g: merge the TXQ and BUFFER constant buffers (v1.1) b10ddf962f2ca09073a13ad19817bf7c9b158294 r600g: fix fallout from last patch 91a827624c01d40613e97322632aaffe319540f1 r600g: make llvm code compile this time Thanks Emil On 26 November 2014 at 20:46, Emil Velikov wrote: > Hello Dave, > > Cherry picking this commit for the 10.3 branch resulted in a number of > non-trivial conflicts. Can you or anyone else familiar with the code > backport this for 10.3 ? > > Thanks > Emil > > On 18/11/14 22:53, Dave Airlie wrote: >> From: Dave Airlie >> >> The piglit tests were failing, and it appeared to be SB >> optimising out things, but Glenn pointed out the gradients >> are meant to be clause local, so we should emit the texture >> instructions in the same clause. This moves things around >> to always copy to a temp and then emit the texture clauses >> for H/V. >> >> v2: Glenn pointed out we could get another ALU fetch in >> the wrong place, so load the src gpr earlier as well. >> >> Fixes at least: >> ./bin/tex-miplevel-selection textureGrad 2D >> >> Signed-off-by: Dave Airlie >> --- >> src/gallium/drivers/r600/r600_shader.c | 59 >> ++ >> 1 file changed, 31 insertions(+), 28 deletions(-) >> >> diff --git a/src/gallium/drivers/r600/r600_shader.c >> b/src/gallium/drivers/r600/r600_shader.c >> index 76daf2c..41caac3 100644 >> --- a/src/gallium/drivers/r600/r600_shader.c >> +++ b/src/gallium/drivers/r600/r600_shader.c >> @@ -5110,11 +5110,37 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) >> } >> >> if (inst->Instruction.Opcode == TGSI_OPCODE_TXD) { >> + int temp_h, temp_v; >> /* TGSI moves the sampler to src reg 3 for TXD */ >> sampler_src_reg = 3; >> >> sampler_index_mode = inst->Src[sampler_src_reg].Indirect.Index >> == 2 ? 2 : 0; // CF_INDEX_1 : CF_INDEX_NONE >> >> + src_loaded = TRUE; >> + for (i = 0; i < 3; i++) { >> + int treg = r600_get_temp(ctx); >> + >> + if (i == 0) >> + src_gpr = treg; >> + else if (i == 1) >> + temp_h = treg; >> + else >> + temp_v = treg; >> + >> + for (j = 0; j < 4; j++) { >> + memset(&alu, 0, sizeof(struct >> r600_bytecode_alu)); >> + alu.op = ALU_OP1_MOV; >> +r600_bytecode_src(&alu.src[0], >> &ctx->src[i], j); >> +alu.dst.sel = treg; >> +alu.dst.chan = j; >> +if (j == 3) >> + alu.last = 1; >> +alu.dst.write = 1; >> +r = r600_bytecode_add_alu(ctx->bc, &alu); >> +if (r) >> +return r; >> + } >> + } >> for (i = 1; i < 3; i++) { >> /* set gradients h/v */ >> memset(&tex, 0, sizeof(struct r600_bytecode_tex)); >> @@ -5125,35 +5151,12 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) >> tex.resource_id = tex.sampler_id + >> R600_MAX_CONST_BUFFERS; >> tex.resource_index_mode = sampler_index_mode; >> >> - if (tgsi_tex_src_requires_loading(ctx, i)) { >> - tex.src_gpr = r600_get_temp(ctx); >> - tex.src_sel_x = 0; >> - tex.src_sel_y = 1; >> - tex.src_sel_z = 2; >> - tex.src_sel_w = 3; >> + tex.src_gpr = (i == 1) ? temp_h : temp_v; >> + tex.src_sel_x = 0; >> + tex.src_sel_y = 1; >> + tex.src_sel_z = 2; >> + tex.src_sel_w = 3; >> >> - for (j = 0; j < 4; j++) { >> - memset(&alu, 0, sizeof(struct >> r600_bytecode_alu)); >> - alu.op = ALU_OP1_MOV; >> -r600_bytecode_src(&alu.src[0], >> &ctx->src[i], j); >> -alu.dst.sel = tex.src_gpr; >> -alu.dst.chan = j; >> -
Re: [Mesa-dev] [PATCH 1/8] meta: Put _mesa_meta_in_progress in the header file
Jason asked on IRC what the overall results were for the series. Here is that data. n=50 for all four. Gl32Batch7 Gl32Multithread 32-bit 4.11846% +/- 0.216796% 1.97044% +/- 0.161024% 64-bit 5.48027% +/- 0.368804% 2.46053% +/- 0.19554% On 12/19/2014 02:22 PM, Ian Romanick wrote: > I should have also mentioned that this series is on the > byt-cpu-optimizations branch of my fd.o tree. > > On 12/19/2014 02:20 PM, Ian Romanick wrote: >> From: Ian Romanick >> >> ...so that it can be inlined in the two places that call it. >> >> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic >> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects >> Gl32Batch7: >> >> 32-bit: No difference proven at 95.0% confidence (n=120) >> 64-bit: Difference at 95.0% confidence 1.24042% +/- 0.382277% (n=40) >> >> Signed-off-by: Ian Romanick >> --- >> src/mesa/drivers/common/meta.c | 10 -- >> src/mesa/drivers/common/meta.h | 7 +-- >> 2 files changed, 5 insertions(+), 12 deletions(-) >> >> diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c >> index 87532c1..3284ad1 100644 >> --- a/src/mesa/drivers/common/meta.c >> +++ b/src/mesa/drivers/common/meta.c >> @@ -1213,16 +1213,6 @@ _mesa_meta_end(struct gl_context *ctx) >> >> >> /** >> - * Determine whether Mesa is currently in a meta state. >> - */ >> -GLboolean >> -_mesa_meta_in_progress(struct gl_context *ctx) >> -{ >> - return ctx->Meta->SaveStackDepth != 0; >> -} >> - >> - >> -/** >> * Convert Z from a normalized value in the range [0, 1] to an object-space >> * Z coordinate in [-1, +1] so that drawing at the new Z position with the >> * default/identity ortho projection results in the original Z value. >> diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h >> index 6ecf3c0..c6aef01 100644 >> --- a/src/mesa/drivers/common/meta.h >> +++ b/src/mesa/drivers/common/meta.h >> @@ -446,8 +446,11 @@ _mesa_meta_begin(struct gl_context *ctx, GLbitfield >> state); >> extern void >> _mesa_meta_end(struct gl_context *ctx); >> >> -extern GLboolean >> -_mesa_meta_in_progress(struct gl_context *ctx); >> +static inline bool >> +_mesa_meta_in_progress(struct gl_context *ctx) >> +{ >> + return ctx->Meta->SaveStackDepth != 0; >> +} >> >> extern void >> _mesa_meta_fb_tex_blit_begin(const struct gl_context *ctx, >> > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/8] meta: Put _mesa_meta_in_progress in the header file
On 12/19/2014 06:10 PM, Matt Turner wrote: > On Fri, Dec 19, 2014 at 2:20 PM, Ian Romanick wrote: >> From: Ian Romanick >> >> ...so that it can be inlined in the two places that call it. > >if (brw->meta_in_progress != _mesa_meta_in_progress(ctx)) { > brw->meta_in_progress = _mesa_meta_in_progress(ctx); > brw->state.dirty.brw |= BRW_NEW_META_IN_PROGRESS; >} > > I suspect the compiler wasn't able to deduce that > _mesa_meta_in_progress() was a pure function, so it actually called it > twice. Inlining it obviously gives it that information. The other problem is that one of the callers was in a different file, so it would never be inlined there. Of course, that callsite was never executed in my testing. > Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/8] i965: Micro-optimize brw_get_index_type
On 12/19/2014 06:22 PM, Matt Turner wrote: > On Fri, Dec 19, 2014 at 2:20 PM, Ian Romanick wrote: >> From: Ian Romanick >> >> With the switch-statement, GCC 4.8.3 produces a small pile of code with >> a branch. >> >> : >> 00: 8b 54 24 04 mov0x4(%esp),%edx >> 04: b8 01 00 00 00 mov$0x1,%eax >> 09: 81 fa 03 14 00 00 cmp$0x1403,%edx >> 0f: 74 0d je 1e >> >> 11: 31 c0 xor%eax,%eax >> 13: 81 fa 05 14 00 00 cmp$0x1405,%edx >> 19: 0f 94 c0sete %al >> 1c: 01 c0 add%eax,%eax >> 1e: c3 ret >> >> However, this could be two instructions. >> >> : >> 00: 2d 01 14 00 00 sub$0x1401,%eax >> 05: d1 e8 shr%eax >> 07: 90 nop >> 08: 90 nop >> 09: 90 nop >> 0a: 90 nop >> 0b: c3 ret >> >> The function was also moved to the header so that it could be inlined at >> the two call sites. Without this, 32-bit also needs to pull the > > I would have liked to see what inlining the original function did, and > then separately optimizing it, but oh well. > >> parameter from the stack. This means there is a push, a call, a move, >> and a ret added to a two instruction function. The above code shows the >> function with __attribute__((regparm=1)), but even this adds several >> extra instructions. There is also an extra instruction on 64-bit to >> move the parameter to %eax for the subtract. >> >> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic >> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects >> Gl32Batch7: >> >> 32-bit: Difference at 95.0% confidence 0.818589% +/- 0.234661% (n=40) >> 64-bit: Difference at 95.0% confidence 0.54554% +/- 0.354092% (n=40) > > What are we benchmarking? +/- 0.35% on a 0.54% increase is kind of large. Gl32Batch7. I've noticed quite a few cases with that benchmark where the standard deviation is quite large compared to the magnitude of the results. I tried to account for that with my test procedure, but it occasionally gets tricked by results that are quite possible "no difference." My procedure has been as follows: - Run before vs. after 40 times. - While there is no difference with 99.5% confidence and we're run less than 120 tests, run 10 more times. - Report result with 95% confidence. After the results that I sent out, I've increased this to 50 initial runs, +25, and up to 150. I used this for the overall results I sent a few minutes ago. On that run I got 1.23843% +/- 0.239952% on 32-bit an no difference on 64-bit. >> Signed-off-by: Ian Romanick >> --- >> src/mesa/drivers/dri/i965/brw_context.h | 22 +- >> src/mesa/drivers/dri/i965/brw_draw_upload.c | 13 + >> src/mesa/drivers/dri/i965/gen8_draw_upload.c | 2 +- >> 3 files changed, 23 insertions(+), 14 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_context.h >> b/src/mesa/drivers/dri/i965/brw_context.h >> index a63c483..92eb022 100644 >> --- a/src/mesa/drivers/dri/i965/brw_context.h >> +++ b/src/mesa/drivers/dri/i965/brw_context.h >> @@ -1602,7 +1602,27 @@ gl_clip_plane *brw_select_clip_planes(struct >> gl_context *ctx); >> /* brw_draw_upload.c */ >> unsigned brw_get_vertex_surface_type(struct brw_context *brw, >> const struct gl_client_array *glarray); >> -unsigned brw_get_index_type(GLenum type); >> + >> +static inline unsigned >> +brw_get_index_type(GLenum type) >> +{ >> + assert((type == GL_UNSIGNED_BYTE) >> + || (type == GL_UNSIGNED_SHORT) >> + || (type == GL_UNSIGNED_INT)); > > I don't think the extra parenthesis really make anything more clear? Ok >> + >> + /* The possible values for type are GL_UNSIGNED_BYTE (0x1401), >> +* GL_UNSIGNED_SHORT (0x1403), and GL_UNSIGNED_INT (0x1405) which we want >> +* to map to scale factors of 0, 1, and 2, respectively. These scale >> +* factors are then left-shfited by 8 to be in the correct position in >> the > > typo: shifted Oops. >> +* CMD_INDEX_BUFFER packet. >> +* >> +* Subtracting 0x1401 gives 0, 2, and 4. Shifting left by 7 afterwards >> +* gives 0x, 0x0100, and 0x0200. These just happen to be >> +* the values the need to be written in the CMD_INDEX_BUFFER packet. >> +*/ >> + return (type - 0x1401) << 7; > > Wow, nice. I had a similar trick for index_bytes (in src/mesa/main/api_validate.c) that used inline assembly to emit an ADC, but then I figured out I could just delete the function (in patch 6). :) Part of me wanted to send that patch out anyway... >> +} >> + >>
Re: [Mesa-dev] [PATCH] mesa: Add mesa SHA-1 functions
On 20 December 2014 at 14:21, Olivier Galibert wrote: > Here is an implementation I've written myself, so no license issues. > Thanks OG, Afaics the main issue is not the lack of implementation, but that no-one wants to step up to "maintain" it. Even adding code that is x2 the size is considered a better solution :'-( If you're up-to the maintenance task, we can resolve all the issues (linking, multi platform support) in half the size and a lot cleaner build :-) Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] mesa: Always generate GL_INVALID_OPERATION in _mesa_GetProgramBinary
From: Ian Romanick There are no binary formats supported, so what are you doing? At least this gives the application developer some feedback about what's going on. The spec gives no guidance about what to do in this scenario. Signed-off-by: Ian Romanick Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87516 --- src/mesa/main/shaderapi.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c index c7e2633..92cafdc 100644 --- a/src/mesa/main/shaderapi.c +++ b/src/mesa/main/shaderapi.c @@ -1719,6 +1719,8 @@ _mesa_GetProgramBinary(GLuint program, GLsizei bufSize, GLsizei *length, } *length = 0; + _mesa_error(ctx, GL_INVALID_OPERATION, + "glGetProgramBinary(driver supports zero binary formats)"); (void) binaryFormat; (void) binary; -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] mesa: Add a missing error check in _mesa_GetProgramBinary
From: Ian Romanick Signed-off-by: Ian Romanick Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87516 --- src/mesa/main/shaderapi.c | 26 -- 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c index 108e3f5..c7e2633 100644 --- a/src/mesa/main/shaderapi.c +++ b/src/mesa/main/shaderapi.c @@ -1681,16 +1681,35 @@ _mesa_GetProgramBinary(GLuint program, GLsizei bufSize, GLsizei *length, GLenum *binaryFormat, GLvoid *binary) { struct gl_shader_program *shProg; + GLsizei length_dummy; GET_CURRENT_CONTEXT(ctx); shProg = _mesa_lookup_shader_program_err(ctx, program, "glGetProgramBinary"); if (!shProg) return; + /* The ARB_get_program_binary spec says: +* +* "If is NULL, then no length is returned." +* +* Ensure that length always points to valid storage to avoid multiple NULL +* pointer checks below. +*/ + if (length != NULL) + *length = &length_dummy; + + + /* The ARB_get_program_binary spec says: +* +* "When a program object's LINK_STATUS is FALSE, its program binary +* length is zero, and a call to GetProgramBinary will generate an +* INVALID_OPERATION error. +*/ if (!shProg->LinkStatus) { _mesa_error(ctx, GL_INVALID_OPERATION, "glGetProgramBinary(program %u not linked)", shProg->Name); + *length = 0; return; } @@ -1699,12 +1718,7 @@ _mesa_GetProgramBinary(GLuint program, GLsizei bufSize, GLsizei *length, return; } - /* The ARB_get_program_binary spec says: -* -* "If is NULL, then no length is returned." -*/ - if (length != NULL) - *length = 0; + *length = 0; (void) binaryFormat; (void) binary; -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] mesa: Add missing error checks in _mesa_ProgramBinary
From: Ian Romanick Signed-off-by: Ian Romanick Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87516 --- src/mesa/main/shaderapi.c | 27 +-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c index 6d831f7..108e3f5 100644 --- a/src/mesa/main/shaderapi.c +++ b/src/mesa/main/shaderapi.c @@ -1723,8 +1723,31 @@ _mesa_ProgramBinary(GLuint program, GLenum binaryFormat, (void) binaryFormat; (void) binary; - (void) length; - _mesa_error(ctx, GL_INVALID_OPERATION, __FUNCTION__); + + /* Section 2.3.1 (Errors) of the OpenGL 4.5 spec says: +* +* "If a negative number is provided where an argument of type sizei or +* sizeiptr is specified, an INVALID_VALUE error is generated." +*/ + if (length < 0) { + _mesa_error(ctx, GL_INVALID_VALUE, "glProgramBinary(length < 0)"); + return; + } + + /* The ARB_get_program_binary spec says: +* +* " and must be those returned by a previous +* call to GetProgramBinary, and must be the length of the +* program binary as returned by GetProgramBinary or GetProgramiv with +* PROGRAM_BINARY_LENGTH. Loading the program binary will fail, +* setting the LINK_STATUS of to FALSE, if these conditions +* are not met." +* +* Since any value of binaryFormat passed "is not one of those specified as +* allowable for [this] command, an INVALID_ENUM error is generated." +*/ + shProg->LinkStatus = GL_FALSE; + _mesa_error(ctx, GL_INVALID_ENUM, __FUNCTION__); } -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/8] i965: Micro-optimize brw_get_index_type
On Sun, Dec 21, 2014 at 11:40 AM, Ian Romanick wrote: >>> diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c >>> b/src/mesa/drivers/dri/i965/brw_draw_upload.c >>> index 6e0cf3e..e46c54b 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c >>> +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c >>> @@ -344,17 +344,6 @@ brw_get_vertex_surface_type(struct brw_context *brw, >>> } >>> } >>> >>> -unsigned >>> -brw_get_index_type(GLenum type) >>> -{ >>> - switch (type) { >>> - case GL_UNSIGNED_BYTE: return BRW_INDEX_BYTE; >>> - case GL_UNSIGNED_SHORT: return BRW_INDEX_WORD; >>> - case GL_UNSIGNED_INT: return BRW_INDEX_DWORD; >> >> The BRW_INDEX_* defines are now unused. I'd remove them. > > Separate patch or with this patch? No preference really. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 87516] glProgramBinary violates spec
https://bugs.freedesktop.org/show_bug.cgi?id=87516 --- Comment #3 from Ian Romanick --- I sent some patches to the mesa-dev mailing list: http://lists.freedesktop.org/archives/mesa-dev/2014-December/073210.html http://lists.freedesktop.org/archives/mesa-dev/2014-December/073212.html http://lists.freedesktop.org/archives/mesa-dev/2014-December/073211.html The last one I'm a little iffy about. -- You are receiving this mail because: You are the QA Contact for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 87516] glProgramBinary violates spec
https://bugs.freedesktop.org/show_bug.cgi?id=87516 --- Comment #4 from Leith Bade --- Looks good. Are you able to send feedback upstream to the GL ARB about the vagueness around having no support binaryFormats in the spec? -- You are receiving this mail because: You are the QA Contact for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/8] i965/vec4: Use Vector-float immediates.
This series adds support to i965's vec4 backend for using vector-float immediates. The shader-db results are pretty nice: total instructions in shared programs: 5889529 -> 5876617 (-0.22%) instructions in affected programs: 465347 -> 452435 (-2.77%) and there's still room for improvement. It helps 4914 shaders and hurts none. 50 Valley and Heaven shaders are cut by 30% and there are a few that do mov(8) g25<1>F [-1.5F, -1.5F, -0.5F, -1.5F]VF mov(8) g27<1>F [ 0.5F, -1.5F, 1.5F, -1.5F]VF mov(8) g29<1>F [-1.5F, -0.5F, -0.5F, -0.5F]VF mov(8) g31<1>F [ 0.5F, -0.5F, 1.5F, -0.5F]VF mov(8) g33<1>F [-1.5F, 0.5F, -0.5F, 0.5F]VF mov(8) g35<1>F [ 0.5F, 0.5F, 1.5F, 0.5F]VF mov(8) g37<1>F [-1.5F, 1.5F, -0.5F, 1.5F]VF mov(8) g39<1>F [ 0.5F, 1.5F, 1.5F, 1.5F]VF before using each register once in a MAD instruction. I'll have to think more about a general solution that would optimize that into a single mov(8) g25<1>F [-1.5F, 1.5F, -0.5F, 0.5F]VF and then sources it 8 times with different swizzles. Probably something like the constant-combining pass I've floated for the FS backend. On a side note, can someone tell me how to run Heaven or Valley in an automated benchmarking mode? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/8] i965/vec4: Add parameter to skip doing constant propagation.
After CSEing some MOV ..., VF instructions we have code like mov tmp, [1F, 2F, 3F, 4F]VF mov r10, tmp mov r11, tmp ... use r10 use r11 We want to copy propagate tmp into the uses of r10 and r11, but *not* constant propagate the VF immediate into the uses of tmp. --- src/mesa/drivers/dri/i965/brw_vec4.h| 2 +- src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 4a8a467..75ecaf1 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -369,7 +369,7 @@ public: bool opt_reduce_swizzle(); bool dead_code_eliminate(); bool virtual_grf_interferes(int a, int b); - bool opt_copy_propagation(); + bool opt_copy_propagation(bool do_constant_prop = true); bool opt_cse_local(bblock_t *block); bool opt_cse(); bool opt_algebraic(); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index 65564c9..9deaffa 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -330,7 +330,7 @@ try_copy_propagate(struct brw_context *brw, vec4_instruction *inst, } bool -vec4_visitor::opt_copy_propagation() +vec4_visitor::opt_copy_propagation(bool do_constant_prop) { bool progress = false; struct copy_entry entries[virtual_grf_reg_count]; @@ -395,7 +395,7 @@ vec4_visitor::opt_copy_propagation() if (c != 4) continue; -if (try_constant_propagate(brw, inst, i, &entry)) + if (do_constant_prop && try_constant_propagate(brw, inst, i, &entry)) progress = true; if (try_copy_propagate(brw, inst, i, &entry, reg)) -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/8] i965/vec4: Do CSE, copy propagation, and DCE after opt_vector_float().
total instructions in shared programs: 5869005 -> 5868220 (-0.01%) instructions in affected programs: 70208 -> 69423 (-1.12%) --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index c1aaeea..d36a735 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1798,7 +1798,11 @@ vec4_visitor::run() OPT(opt_register_coalesce); } while (progress); - opt_vector_float(); + if (opt_vector_float()) { + opt_cse(); + opt_copy_propagation(); + dead_code_eliminate(); + } if (failed) return false; -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/8] i965/vec4: Add pass to gather constants into a vector-float MOV.
Currently only handles consecutive instructions with the same destination that collectively write all channels. total instructions in shared programs: 5879798 -> 5869011 (-0.18%) instructions in affected programs: 465236 -> 454449 (-2.32%) --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 61 ++ src/mesa/drivers/dri/i965/brw_vec4.h | 1 + 2 files changed, 62 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 7619eb6..c1aaeea 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -336,6 +336,66 @@ src_reg::equals(const src_reg &r) const sizeof(fixed_hw_reg)) == 0); } +bool +vec4_visitor::opt_vector_float() +{ + bool progress = false; + + int last_reg = -1; + int remaining_channels; + uint8_t imm[4]; + int inst_count; + vec4_instruction *imm_inst[4]; + + foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) { + if (last_reg != inst->dst.reg) { + last_reg = inst->dst.reg; + remaining_channels = WRITEMASK_XYZW; + + inst_count = 0; + } + + if (inst->opcode != BRW_OPCODE_MOV || + inst->dst.writemask == WRITEMASK_XYZW || + inst->src[0].file != IMM) + continue; + + int vf = brw_float_to_vf(inst->src[0].fixed_hw_reg.dw1.f); + if (vf == -1) + continue; + + if ((inst->dst.writemask & WRITEMASK_X) != 0) + imm[0] = vf; + if ((inst->dst.writemask & WRITEMASK_Y) != 0) + imm[1] = vf; + if ((inst->dst.writemask & WRITEMASK_Z) != 0) + imm[2] = vf; + if ((inst->dst.writemask & WRITEMASK_W) != 0) + imm[3] = vf; + + imm_inst[inst_count++] = inst; + + remaining_channels &= ~inst->dst.writemask; + if (remaining_channels == 0) { + vec4_instruction *mov = MOV(inst->dst, imm); + mov->dst.type = BRW_REGISTER_TYPE_F; + mov->dst.writemask = WRITEMASK_XYZW; + inst->insert_after(block, mov); + last_reg = -1; + + for (int i = 0; i < inst_count; i++) { +imm_inst[i]->remove(block); + } + progress = true; + } + } + + if (progress) + invalidate_live_intervals(); + + return progress; +} + /* Replaces unused channels of a swizzle with channels that are used. * * For instance, this pass transforms @@ -1738,6 +1798,7 @@ vec4_visitor::run() OPT(opt_register_coalesce); } while (progress); + opt_vector_float(); if (failed) return false; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 0c44ad3..4a8a467 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -365,6 +365,7 @@ public: void calculate_live_intervals(); void invalidate_live_intervals(); void split_virtual_grfs(); + bool opt_vector_float(); bool opt_reduce_swizzle(); bool dead_code_eliminate(); bool virtual_grf_interferes(int a, int b); -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/8] i965/vec4: Allow constant propagation of VF immediates.
total instructions in shared programs: 5877951 -> 5877012 (-0.02%) instructions in affected programs: 155923 -> 154984 (-0.60%) Helps 1233, hurts 156 shaders. The hurt shaders are addressed in the next commit. --- .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 +- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index 9deaffa..9e47dd9 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -50,7 +50,9 @@ is_direct_copy(vec4_instruction *inst) inst->dst.file == GRF && !inst->dst.reladdr && !inst->src[0].reladdr && - inst->dst.type == inst->src[0].type); + (inst->dst.type == inst->src[0].type || +(inst->dst.type == BRW_REGISTER_TYPE_F && + inst->src[0].type == BRW_REGISTER_TYPE_VF))); } static bool @@ -77,6 +79,22 @@ is_channel_updated(vec4_instruction *inst, src_reg *values[4], int ch) inst->dst.writemask & (1 << BRW_GET_SWZ(src->swizzle, ch))); } +static unsigned +swizzle_vf_imm(unsigned vf4, unsigned swizzle) +{ + union { + unsigned vf4; + uint8_t vf[4]; + } v = { vf4 }, ret; + + ret.vf[0] = v.vf[BRW_GET_SWZ(swizzle, 0)]; + ret.vf[1] = v.vf[BRW_GET_SWZ(swizzle, 1)]; + ret.vf[2] = v.vf[BRW_GET_SWZ(swizzle, 2)]; + ret.vf[3] = v.vf[BRW_GET_SWZ(swizzle, 3)]; + + return ret.vf4; +} + static bool try_constant_propagate(struct brw_context *brw, vec4_instruction *inst, int arg, struct copy_entry *entry) @@ -98,6 +116,8 @@ try_constant_propagate(struct brw_context *brw, vec4_instruction *inst, if (inst->src[arg].abs) { if (value.type == BRW_REGISTER_TYPE_F) { value.fixed_hw_reg.dw1.f = fabs(value.fixed_hw_reg.dw1.f); + } else if (value.type == BRW_REGISTER_TYPE_VF) { + value.fixed_hw_reg.dw1.ud &= ~0x80808080; } else if (value.type == BRW_REGISTER_TYPE_D) { if (value.fixed_hw_reg.dw1.d < 0) value.fixed_hw_reg.dw1.d = -value.fixed_hw_reg.dw1.d; @@ -107,10 +127,16 @@ try_constant_propagate(struct brw_context *brw, vec4_instruction *inst, if (inst->src[arg].negate) { if (value.type == BRW_REGISTER_TYPE_F) value.fixed_hw_reg.dw1.f = -value.fixed_hw_reg.dw1.f; + else if (value.type == BRW_REGISTER_TYPE_VF) + value.fixed_hw_reg.dw1.ud ^= 0x80808080; else value.fixed_hw_reg.dw1.ud = -value.fixed_hw_reg.dw1.ud; } + if (value.type == BRW_REGISTER_TYPE_VF) + value.fixed_hw_reg.dw1.ud = swizzle_vf_imm(value.fixed_hw_reg.dw1.ud, + inst->src[arg].swizzle); + switch (inst->opcode) { case BRW_OPCODE_MOV: inst->src[arg] = value; -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/8] i965/vec4: Perform CSE on MOV ..., VF instructions.
Port of commit a28ad9d4 from the fs backend. No shader-db changes since we don't emit MOV ..., VF instructions yet. --- src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp index 7071213..37c930c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp @@ -48,6 +48,7 @@ static bool is_expression(const vec4_instruction *const inst) { switch (inst->opcode) { + case BRW_OPCODE_MOV: case BRW_OPCODE_SEL: case BRW_OPCODE_NOT: case BRW_OPCODE_AND: @@ -154,11 +155,16 @@ vec4_visitor::opt_cse_local(bblock_t *block) } if (!found) { -/* Our first sighting of this expression. Create an entry. */ -aeb_entry *entry = ralloc(cse_ctx, aeb_entry); -entry->tmp = src_reg(); /* file will be BAD_FILE */ -entry->generator = inst; -aeb.push_tail(entry); +if (inst->opcode != BRW_OPCODE_MOV || +(inst->opcode == BRW_OPCODE_MOV && + inst->src[0].file == IMM && + inst->src[0].type == BRW_REGISTER_TYPE_VF)) { + /* Our first sighting of this expression. Create an entry. */ + aeb_entry *entry = ralloc(cse_ctx, aeb_entry); + entry->tmp = src_reg(); /* file will be BAD_FILE */ + entry->generator = inst; + aeb.push_tail(entry); +} } else { /* This is at least our second sighting of this expression. * If we don't have a temporary already, make one. -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/8] i965: Add fs_reg/src_reg constructors that take vf[4].
Sometimes it's easier to generate 4x values into an array, and the memcpy is 1 instruction, rather than 11 to piece 4 arguments together. I'd forgotten to remove the prototype from fs_reg from a previous patch, so it's already there for us here. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 9 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 + src/mesa/drivers/dri/i965/brw_vec4.h | 1 + 3 files changed, 19 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 3639ed2..2837fc0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -585,6 +585,15 @@ fs_reg::fs_reg(uint32_t u) } /** Vector float immediate value constructor. */ +fs_reg::fs_reg(uint8_t vf[4]) +{ + init(); + this->file = IMM; + this->type = BRW_REGISTER_TYPE_VF; + memcpy(&this->fixed_hw_reg.dw1.ud, vf, sizeof(unsigned)); +} + +/** Vector float immediate value constructor. */ fs_reg::fs_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, uint8_t vf3) { init(); diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 2fb578e..b303eb6 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -113,6 +113,15 @@ src_reg::src_reg(int32_t i) this->fixed_hw_reg.dw1.d = i; } +src_reg::src_reg(uint8_t vf[4]) +{ + init(); + + this->file = IMM; + this->type = BRW_REGISTER_TYPE_VF; + memcpy(&this->fixed_hw_reg.dw1.ud, vf, sizeof(unsigned)); +} + src_reg::src_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, uint8_t vf3) { init(); diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index be52fbc..0c44ad3 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -82,6 +82,7 @@ public: src_reg(float f); src_reg(uint32_t u); src_reg(int32_t i); + src_reg(uint8_t vf[4]); src_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, uint8_t vf3); src_reg(struct brw_reg reg); -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/8] i965: Add support for saturating immediates.
I don't feel great about unreachable("unimplemented: ...") but these cases do only seem possible under some currently impossible circumstances. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 16 +++ src/mesa/drivers/dri/i965/brw_shader.cpp | 47 src/mesa/drivers/dri/i965/brw_shader.h | 1 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 16 +++ 4 files changed, 80 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 2837fc0..789f967 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2292,6 +2292,22 @@ fs_visitor::opt_algebraic() foreach_block_and_inst(block, fs_inst, inst, cfg) { switch (inst->opcode) { + case BRW_OPCODE_MOV: + if (inst->src[0].file != IMM) +break; + + if (inst->saturate) { +if (inst->dst.type != inst->src[0].type) + unreachable("unimplemented: saturate mixed types"); + +if (brw_saturate_immediate(inst->dst.type, + &inst->src[0].fixed_hw_reg)) { + inst->saturate = false; + progress = true; +} + } + break; + case BRW_OPCODE_MUL: if (inst->src[1].file != IMM) continue; diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 1e5227c..a2fc471 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -575,6 +575,53 @@ brw_instruction_name(enum opcode op) unreachable("not reached"); } +bool +brw_saturate_immediate(enum brw_reg_type type, struct brw_reg *reg) +{ + union { + unsigned ud; + int d; + float f; + } imm = { reg->dw1.ud }, sat_imm; + + switch (type) { + case BRW_REGISTER_TYPE_UD: + case BRW_REGISTER_TYPE_D: + case BRW_REGISTER_TYPE_UQ: + case BRW_REGISTER_TYPE_Q: + /* Nothing to do. */ + return false; + case BRW_REGISTER_TYPE_UW: + sat_imm.ud = CLAMP(imm.ud, 0, USHRT_MAX); + break; + case BRW_REGISTER_TYPE_W: + sat_imm.d = CLAMP(imm.d, SHRT_MIN, SHRT_MAX); + break; + case BRW_REGISTER_TYPE_F: + sat_imm.f = CLAMP(imm.f, 0.0f, 1.0f); + break; + case BRW_REGISTER_TYPE_UB: + sat_imm.ud = CLAMP(imm.ud, 0, UCHAR_MAX); + break; + case BRW_REGISTER_TYPE_B: + sat_imm.d = CLAMP(imm.d, CHAR_MIN, CHAR_MAX); + break; + case BRW_REGISTER_TYPE_V: + case BRW_REGISTER_TYPE_UV: + case BRW_REGISTER_TYPE_VF: + unreachable("unimplemented: saturate vector immediate"); + case BRW_REGISTER_TYPE_DF: + case BRW_REGISTER_TYPE_HF: + unreachable("unimplemented: saturate DF/HF immediate"); + } + + if (imm.ud != sat_imm.ud) { + reg->dw1.ud = sat_imm.ud; + return true; + } + return false; +} + backend_visitor::backend_visitor(struct brw_context *brw, struct gl_shader_program *shader_prog, struct gl_program *prog, diff --git a/src/mesa/drivers/dri/i965/brw_shader.h b/src/mesa/drivers/dri/i965/brw_shader.h index 05434a7..233e224 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.h +++ b/src/mesa/drivers/dri/i965/brw_shader.h @@ -192,6 +192,7 @@ enum brw_reg_type brw_type_for_base_type(const struct glsl_type *type); enum brw_conditional_mod brw_conditional_for_comparison(unsigned int op); uint32_t brw_math_function(enum opcode op); const char *brw_instruction_name(enum opcode op); +bool brw_saturate_immediate(enum brw_reg_type type, struct brw_reg *reg); #ifdef __cplusplus extern "C" { diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index b303eb6..7619eb6 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -572,6 +572,22 @@ vec4_visitor::opt_algebraic() foreach_block_and_inst(block, vec4_instruction, inst, cfg) { switch (inst->opcode) { + case BRW_OPCODE_MOV: + if (inst->src[0].file != IMM) +break; + + if (inst->saturate) { +if (inst->dst.type != inst->src[0].type) + unreachable("unimplemented: saturate mixed types"); + +if (brw_saturate_immediate(inst->dst.type, + &inst->src[0].fixed_hw_reg)) { + inst->saturate = false; + progress = true; +} + } + break; + case VEC4_OPCODE_UNPACK_UNIFORM: if (inst->src[0].file != UNIFORM) { inst->opcode = BRW_OPCODE_MOV; -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 8/8] i965/vec4: Do copy before constant propagation after opt_vector_float().
total instructions in shared programs: 5877012 -> 5876617 (-0.01%) instructions in affected programs: 33140 -> 32745 (-1.19%) From before the commit that allows VF constant propagation (which hurt some programs) to here, the results are: total instructions in shared programs: 5877951 -> 5876617 (-0.02%) instructions in affected programs: 123444 -> 122110 (-1.08%) with no programs hurt. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index d36a735..c9ba338 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1800,6 +1800,7 @@ vec4_visitor::run() if (opt_vector_float()) { opt_cse(); + opt_copy_propagation(false); opt_copy_propagation(); dead_code_eliminate(); } -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Consider SEL.{GE, L} to be commutative operations.
--- No shader-db changes, unfortunately. src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 14 ++ src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 18 -- 2 files changed, 22 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index 38fae17..2488596 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -109,22 +109,28 @@ is_expression(const fs_inst *const inst) } static bool -is_expression_commutative(enum opcode op) +is_expression_commutative(const fs_inst *inst) { - switch (op) { + switch (inst->opcode) { case BRW_OPCODE_AND: case BRW_OPCODE_OR: case BRW_OPCODE_XOR: case BRW_OPCODE_ADD: case BRW_OPCODE_MUL: return true; + case BRW_OPCODE_SEL: + if (inst->conditional_mod == BRW_CONDITIONAL_GE || + inst->conditional_mod == BRW_CONDITIONAL_L) { + return true; + } + /* fallthrough */ default: return false; } } static bool -operands_match(fs_inst *a, fs_inst *b) +operands_match(const fs_inst *a, fs_inst *b) { fs_reg *xs = a->src; fs_reg *ys = b->src; @@ -133,7 +139,7 @@ operands_match(fs_inst *a, fs_inst *b) return xs[0].equals(ys[0]) && ((xs[1].equals(ys[1]) && xs[2].equals(ys[2])) || (xs[2].equals(ys[1]) && xs[1].equals(ys[2]))); - } else if (!is_expression_commutative(a->opcode)) { + } else if (!is_expression_commutative(a)) { bool match = true; for (int i = 0; i < a->sources; i++) { if (!xs[i].equals(ys[i])) { diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp index 7071213..1b47de4 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp @@ -88,28 +88,34 @@ is_expression(const vec4_instruction *const inst) } static bool -is_expression_commutative(enum opcode op) +is_expression_commutative(const vec4_instruction *inst) { - switch (op) { + switch (inst->opcode) { case BRW_OPCODE_AND: case BRW_OPCODE_OR: case BRW_OPCODE_XOR: case BRW_OPCODE_ADD: case BRW_OPCODE_MUL: return true; + case BRW_OPCODE_SEL: + if (inst->conditional_mod == BRW_CONDITIONAL_GE || + inst->conditional_mod == BRW_CONDITIONAL_L) { + return true; + } + /* fallthrough */ default: return false; } } static bool -operands_match(enum opcode op, src_reg *xs, src_reg *ys) +operands_match(const vec4_instruction *inst, src_reg *xs, src_reg *ys) { - if (op == BRW_OPCODE_MAD) { + if (inst->opcode == BRW_OPCODE_MAD) { return xs[0].equals(ys[0]) && ((xs[1].equals(ys[1]) && xs[2].equals(ys[2])) || (xs[2].equals(ys[1]) && xs[1].equals(ys[2]))); - } else if (!is_expression_commutative(op)) { + } else if (!is_expression_commutative(inst)) { return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && xs[2].equals(ys[2]); } else { return (xs[0].equals(ys[0]) && xs[1].equals(ys[1])) || @@ -125,7 +131,7 @@ instructions_match(vec4_instruction *a, vec4_instruction *b) a->conditional_mod == b->conditional_mod && a->dst.type == b->dst.type && a->dst.writemask == b->dst.writemask && - operands_match(a->opcode, a->src, b->src); + operands_match(a, a->src, b->src); } bool -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Consider SEL.{GE, L} to be commutative operations.
On Sunday, December 21, 2014 03:37:25 PM Matt Turner wrote: > --- > No shader-db changes, unfortunately. Unsurprising :) This patch doesn't implement what you meant. 1. instructions_match() checks a->conditional_mod == b->conditional_mod. So you won't ever see mixed conditional mods in operands_match() or is_expression_commutative(). I suspect this is why you see no changes. 2. It looks like your patch allows CSE of (a >= b) with (b >= a), which are clearly different. So any changes would be wrong. What you want is to use brw_swap_cmod. It's then trivial to handle /all/ comparitors, not just L/GE. I think what you want is: static bool instructions match(fs_inst *a, fs_inst *b) { bool match = a->opcode == b->opcode && a->saturate == b->saturate && a->predicate == b->predicate && a->predicate_inverse == b->predicate_inverse && a->dst.type == b->dst.type && a->sources == b->sources; if (a->is_tex()) { match = match && a->offset == b->offset && a->mlen == b->mlen && a->regs_written == b->regs_written && a->base_mrf == b->base_mrf && a->eot == b->eot && a->header_present == b->header_present && a->shadow_compare == b->shadow_compare); } if (!match) return false; /* Comparisons match if both the comparitor and operands are flipped. */ if (a->opcode == BRW_OPCODE_SEL && a->conditional_mod == brw_swap_cmod(b->conditional_mod) a->src[0].equals(b->src[1]) && a->src[1].equals(b->src[0])) { return true; } return a->conditional_mod == b->conditional_mod && operands_match(a, b); } I'm also not sure why you special case SEL - it seems like CMP would work fine as well, and would probably help even more. Maybe just drop the opcode check. FWIW, I haven't tested the above code. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/8] i965: Micro-optimize brw_get_index_type
On Friday, December 19, 2014 02:20:53 PM Ian Romanick wrote: [snip] > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > index a63c483..92eb022 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.h > +++ b/src/mesa/drivers/dri/i965/brw_context.h > @@ -1602,7 +1602,27 @@ gl_clip_plane *brw_select_clip_planes(struct > gl_context *ctx); > /* brw_draw_upload.c */ > unsigned brw_get_vertex_surface_type(struct brw_context *brw, > const struct gl_client_array *glarray); > -unsigned brw_get_index_type(GLenum type); > + > +static inline unsigned > +brw_get_index_type(GLenum type) > +{ > + assert((type == GL_UNSIGNED_BYTE) > + || (type == GL_UNSIGNED_SHORT) > + || (type == GL_UNSIGNED_INT)); > + > + /* The possible values for type are GL_UNSIGNED_BYTE (0x1401), > +* GL_UNSIGNED_SHORT (0x1403), and GL_UNSIGNED_INT (0x1405) which we want > +* to map to scale factors of 0, 1, and 2, respectively. These scale > +* factors are then left-shfited by 8 to be in the correct position in the > +* CMD_INDEX_BUFFER packet. > +* > +* Subtracting 0x1401 gives 0, 2, and 4. Shifting left by 7 afterwards > +* gives 0x, 0x0100, and 0x0200. These just happen to be > +* the values the need to be written in the CMD_INDEX_BUFFER packet. > +*/ > + return (type - 0x1401) << 7; > +} Clever! How about putting it in brw_draw.h? It seems like the right place, and would avoid cluttering brw_context.h with even more random stuff. With Matt's suggestions, this is: Reviewed-by: Kenneth Graunke I don't care about removing the BRW_INDEX_* #defines - either way's fine. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Consider SEL.{GE, L} to be commutative operations.
On Sun, Dec 21, 2014 at 7:50 PM, Kenneth Graunke wrote: > On Sunday, December 21, 2014 03:37:25 PM Matt Turner wrote: >> --- >> No shader-db changes, unfortunately. > > Unsurprising :) This patch doesn't implement what you meant. > > 1. instructions_match() checks a->conditional_mod == b->conditional_mod. >So you won't ever see mixed conditional mods in operands_match() or >is_expression_commutative(). I suspect this is why you see no changes. I'm not expecting to CSE SELs with different conditional mods. Maybe I'm misunderstanding. > 2. It looks like your patch allows CSE of (a >= b) with (b >= a), >which are clearly different. So any changes would be wrong. I don't think so? Those operations aren't implemented with SEL. I think thinking about this as SEL+conditional mod is more confusing than just MIN/MAX. The intention of this patch is to allow MAX(a, b) and MAX(b, a) to be recognized as identical expressions. Same for MIN. > What you want is to use brw_swap_cmod. It's then trivial to handle /all/ > comparitors, not just L/GE. I think what you want is: > > static bool > instructions match(fs_inst *a, fs_inst *b) > { >bool match = a->opcode == b->opcode && > a->saturate == b->saturate && > a->predicate == b->predicate && > a->predicate_inverse == b->predicate_inverse && > a->dst.type == b->dst.type && > a->sources == b->sources; > >if (a->is_tex()) { >match = match && >a->offset == b->offset && >a->mlen == b->mlen && >a->regs_written == b->regs_written && >a->base_mrf == b->base_mrf && >a->eot == b->eot && >a->header_present == b->header_present && >a->shadow_compare == b->shadow_compare); >} > >if (!match) > return false; > >/* Comparisons match if both the comparitor and operands are flipped. */ >if (a->opcode == BRW_OPCODE_SEL && >a->conditional_mod == brw_swap_cmod(b->conditional_mod) >a->src[0].equals(b->src[1]) && a->src[1].equals(b->src[0])) { > return true; >} > >return a->conditional_mod == b->conditional_mod && operands_match(a, b); > } > > I'm also not sure why you special case SEL - it seems like CMP would work fine > as well, and would probably help even more. Maybe just drop the opcode check. I think you're confused. :) I'm special casing SEL because it's used to implement MIN/MAX and MIN and MAX are commutative operations. You're right that we could probably consider things like CMP with different conditional mods as identical expressions (I'm not sure, the comparison rules for CMP are non-obvious when you consider NaNs), but that's not related to this patch. The IVB PRM says about SEL with .l/.ge (specifically those, since they are used to implement MIN/MAX): For a sel instruction with a .l or .ge conditional modifier, if one source is NaN and the other not NaN, the non-NaN source is the result. If both sources are NaNs, the result is NaN. For all other conditional modifiers, if either source is NaN then src1 is selected. That's the behavior I suggested GLSL should adopt in another thread. That's the inspiration for this patch. Does that clear it up? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/8] i965: Store the atoms directly in the context
On Friday, December 19, 2014 02:20:54 PM Ian Romanick wrote: > From: Ian Romanick > > Instead of having an extra pointer indirection in one of the hottest > loops in the driver. > > On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic > for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects > Gl32Batch7: > > 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40) > 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60) > > Signed-off-by: Ian Romanick > --- > src/mesa/drivers/dri/i965/brw_context.h | 2 +- > src/mesa/drivers/dri/i965/brw_state_upload.c | 19 --- > 2 files changed, 17 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > index 92eb022..5f5f807 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.h > +++ b/src/mesa/drivers/dri/i965/brw_context.h > @@ -1384,7 +1384,7 @@ struct brw_context > } perfmon; > > int num_atoms; > - const struct brw_tracked_state **atoms; > + const struct brw_tracked_state atoms[64]; Making this atoms[57] would save memory, and the static asserts make it easy to remember to increase the size if we ever add additional atoms. Honestly, it seems like there should be a better way to do this. Currently, the atoms list only depends on the generation number, and thus is identical for all GL contexts. Storing an additional 1.5 kB of redundant data in each GL context seems a bit wasteful. Would it help to make brw_upload_state cache the array and size? void brw_upload_state(struct brw_context *brw) { static const struct brw_tracked_state *atoms = brw->atoms; static const int num_atoms = brw->num_atoms; ... for (i = 0; i < num_atoms; i++) { ...use atoms[i]... } ... } It seems like that would achieve the same effect - one indirection. One downside to that approach is that we couldn't have separate atom lists for API_CORE vs. API_COMPAT (while we could using your approach). (We could skip stippling for core contexts, and skip GS/HS/DS for legacy contexts, which might help a tiny bit...) > /* If (INTEL_DEBUG & DEBUG_BATCH) */ > struct { > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > b/src/mesa/drivers/dri/i965/brw_state_upload.c > index a579781..9a5d49a 100644 > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > @@ -357,6 +357,11 @@ void brw_init_state( struct brw_context *brw ) > const struct brw_tracked_state **atoms; > int num_atoms; > > + STATIC_ASSERT(ARRAY_SIZE(gen4_atoms) <= ARRAY_SIZE(brw->atoms)); > + STATIC_ASSERT(ARRAY_SIZE(gen6_atoms) <= ARRAY_SIZE(brw->atoms)); > + STATIC_ASSERT(ARRAY_SIZE(gen7_atoms) <= ARRAY_SIZE(brw->atoms)); > + STATIC_ASSERT(ARRAY_SIZE(gen8_atoms) <= ARRAY_SIZE(brw->atoms)); I do like the static asserts. > + > brw_init_caches(brw); > > if (brw->gen >= 8) { > @@ -373,9 +378,17 @@ void brw_init_state( struct brw_context *brw ) >num_atoms = ARRAY_SIZE(gen4_atoms); > } > > - brw->atoms = atoms; > brw->num_atoms = num_atoms; > > + /* This is to work around brw_context::atoms being declared const. We > want > +* it to be const, but it needs to be initialized somehow! > +*/ > + struct brw_tracked_state *context_atoms = > + (struct brw_tracked_state *) &brw->atoms[0]; Ugly, but... not sure what else to suggest. I suppose this patch is fine. It does offer a small CPU overhead reduction in the hottest path, for a small memory usage penalty. I think we can do better, but I don't know that it makes sense to block your patch over nebulous ideas for future improvements. We can do those in the future... Reviewed-by: Kenneth Graunke > + > + for (int i = 0; i < num_atoms; i++) > + context_atoms[i] = *atoms[i]; > + > while (num_atoms--) { >assert((*atoms)->dirty.mesa | (*atoms)->dirty.brw); >assert((*atoms)->emit); > @@ -607,7 +620,7 @@ void brw_upload_state(struct brw_context *brw) >prev = *state; > >for (i = 0; i < brw->num_atoms; i++) { > - const struct brw_tracked_state *atom = brw->atoms[i]; > + const struct brw_tracked_state *atom = &brw->atoms[i]; >struct brw_state_flags generated; > >if (check_state(state, &atom->dirty)) { > @@ -627,7 +640,7 @@ void brw_upload_state(struct brw_context *brw) > } > else { >for (i = 0; i < brw->num_atoms; i++) { > - const struct brw_tracked_state *atom = brw->atoms[i]; > + const struct brw_tracked_state *atom = &brw->atoms[i]; > >if (check_state(state, &atom->dirty)) { > atom->emit(brw); signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/8] i965: Store the atoms directly in the context
On Sun, Dec 21, 2014 at 9:11 PM, Kenneth Graunke wrote: > > On Friday, December 19, 2014 02:20:54 PM Ian Romanick wrote: > > From: Ian Romanick > > > > Instead of having an extra pointer indirection in one of the hottest > > loops in the driver. > > > > On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic > > for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects > > Gl32Batch7: > > > > 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40) > > 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60) > > > > Signed-off-by: Ian Romanick > > --- > > src/mesa/drivers/dri/i965/brw_context.h | 2 +- > > src/mesa/drivers/dri/i965/brw_state_upload.c | 19 --- > > 2 files changed, 17 insertions(+), 4 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > > index 92eb022..5f5f807 100644 > > --- a/src/mesa/drivers/dri/i965/brw_context.h > > +++ b/src/mesa/drivers/dri/i965/brw_context.h > > @@ -1384,7 +1384,7 @@ struct brw_context > > } perfmon; > > > > int num_atoms; > > - const struct brw_tracked_state **atoms; > > + const struct brw_tracked_state atoms[64]; > > Making this atoms[57] would save memory, and the static asserts make it > easy > to remember to increase the size if we ever add additional atoms. > > Honestly, it seems like there should be a better way to do this. > Currently, > the atoms list only depends on the generation number, and thus is identical > for all GL contexts. Storing an additional 1.5 kB of redundant data in > each > GL context seems a bit wasteful. > > Would it help to make brw_upload_state cache the array and size? > > void > brw_upload_state(struct brw_context *brw) > { >static const struct brw_tracked_state *atoms = brw->atoms; >static const int num_atoms = brw->num_atoms; > >... > >for (i = 0; i < num_atoms; i++) { > ...use atoms[i]... >} > >... > } > > It seems like that would achieve the same effect - one indirection. > Yeah, but then we have deal with threading issues. It might be ok, but I think we'd have to be a bit more careful than that. > One downside to that approach is that we couldn't have separate atom > lists for API_CORE vs. API_COMPAT (while we could using your approach). > (We could skip stippling for core contexts, and skip GS/HS/DS for legacy > contexts, which might help a tiny bit...) > Another option would be to make the atoms list a screen thing and do the copy at screen creation time. That would allow it to be shared across contexts without the above threading issues. If we wanted, we could store one core version and one compat version and have the context point to whichever one it needs. The problem with this is that it's wasteful in the one-context case where we would have both core an compat lists. I guess you can't win them all. > /* If (INTEL_DEBUG & DEBUG_BATCH) */ > > struct { > > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > b/src/mesa/drivers/dri/i965/brw_state_upload.c > > index a579781..9a5d49a 100644 > > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > > @@ -357,6 +357,11 @@ void brw_init_state( struct brw_context *brw ) > > const struct brw_tracked_state **atoms; > > int num_atoms; > > > > + STATIC_ASSERT(ARRAY_SIZE(gen4_atoms) <= ARRAY_SIZE(brw->atoms)); > > + STATIC_ASSERT(ARRAY_SIZE(gen6_atoms) <= ARRAY_SIZE(brw->atoms)); > > + STATIC_ASSERT(ARRAY_SIZE(gen7_atoms) <= ARRAY_SIZE(brw->atoms)); > > + STATIC_ASSERT(ARRAY_SIZE(gen8_atoms) <= ARRAY_SIZE(brw->atoms)); > > I do like the static asserts. > > > + > > brw_init_caches(brw); > > > > if (brw->gen >= 8) { > > @@ -373,9 +378,17 @@ void brw_init_state( struct brw_context *brw ) > >num_atoms = ARRAY_SIZE(gen4_atoms); > > } > > > > - brw->atoms = atoms; > > brw->num_atoms = num_atoms; > > > > + /* This is to work around brw_context::atoms being declared const. > We want > > +* it to be const, but it needs to be initialized somehow! > > +*/ > > + struct brw_tracked_state *context_atoms = > > + (struct brw_tracked_state *) &brw->atoms[0]; > > Ugly, but... not sure what else to suggest. > I don't like it either, but I did exactly the same thing when I wrote it, so I can't blame Ian. > > I suppose this patch is fine. It does offer a small CPU overhead > reduction in > the hottest path, for a small memory usage penalty. I think we can do > better, > but I don't know that it makes sense to block your patch over nebulous > ideas > for future improvements. We can do those in the future... > > Reviewed-by: Kenneth Graunke > > > + > > + for (int i = 0; i < num_atoms; i++) > > + context_atoms[i] = *atoms[i]; > > + > > while (num_atoms--) { > >assert((*atoms)->dirty.mesa | (*atoms)->dirty.brw); > >assert((*atoms)->emit); > > @@ -607,7 +620,7 @@ void
Re: [Mesa-dev] [PATCH 4/8] mesa: Only validate shaders that can exist in the context
On Friday, December 19, 2014 02:20:55 PM Ian Romanick wrote: > From: Ian Romanick > > On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic > for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects > Gl32Batch7: > > 32-bit: Difference at 95.0% confidence 0.495267% +/- 0.202063% (n=40) > 64-bit: Difference at 95.0% confidence 3.57576% +/- 0.288175% (n=40) > > Signed-off-by: Ian Romanick I'm guessing it would help legacy context programs even more. > --- > src/mesa/main/context.c | 78 > +++-- > 1 file changed, 49 insertions(+), 29 deletions(-) > > diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c > index 400c158..74f9004 100644 > --- a/src/mesa/main/context.c > +++ b/src/mesa/main/context.c > @@ -1903,49 +1903,69 @@ shader_linked_or_absent(struct gl_context *ctx, > GLboolean > _mesa_valid_to_render(struct gl_context *ctx, const char *where) > { > - bool from_glsl_shader[MESA_SHADER_COMPUTE] = { false }; > unsigned i; > > /* This depends on having up to date derived state (shaders) */ > if (ctx->NewState) >_mesa_update_state(ctx); > > - for (i = 0; i < MESA_SHADER_COMPUTE; i++) { > - if (!shader_linked_or_absent(ctx, ctx->_Shader->CurrentProgram[i], > - &from_glsl_shader[i], where)) > - return GL_FALSE; > - } > + if (ctx->API == API_OPENGL_CORE || ctx->API == API_OPENGLES2) { > + bool from_glsl_shader[MESA_SHADER_COMPUTE] = { false }; > > - /* Any shader stages that are not supplied by the GLSL shader and have > -* assembly shaders enabled must now be validated. > -*/ > - if (!from_glsl_shader[MESA_SHADER_VERTEX] > - && ctx->VertexProgram.Enabled && !ctx->VertexProgram._Enabled) { > - _mesa_error(ctx, GL_INVALID_OPERATION, > - "%s(vertex program not valid)", where); > - return GL_FALSE; > - } > + for (i = 0; i < MESA_SHADER_COMPUTE; i++) { > + if (!shader_linked_or_absent(ctx, ctx->_Shader->CurrentProgram[i], > + &from_glsl_shader[i], where)) > +return GL_FALSE; > + } > > - /* FINISHME: If GL_NV_geometry_program4 is ever supported, the current > -* FINISHME: geometry program should validated here. > -*/ > - (void) from_glsl_shader[MESA_SHADER_GEOMETRY]; > + /* In OpenGL Core Profile and OpenGL ES 2.0 / 3.0, there are no > assembly > + * shaders. Don't check state related to those. > + */ FWIW, I don't think we actually enforce this restriction. Although we don't advertise the extension, the API functions don't appear to contain any API_OPENGL_CORE -> raise GL error code. We might want to do that. In the unlikely event that there are applications attempting to use ARB programs in core profile without checking for the extension, landing those checks first would make debugging the problem easier: 1. broken app somehow works 2. broken app starts returning INVALID_OPERATION 3. code to support broken applications is removed. *shrug*. It probably doesn't actually matter. > + } else { > + bool has_vertex_shader = false; > + bool has_fragment_shader = false; > + > + /* In OpenGL Compatibility Profile, there is only vertex shader and > + * fragment shader. We take this path also for API_OPENGLES because "there is only vertex shader" sounds a little odd. Perhaps "there are only vertex and fragment shaders" or "there are only vertex and fragment shader stages"? Whatever you decide is fine. > + * optimizing that path would make the other (more common) paths > + * slightly slower. > + */ > + if (!shader_linked_or_absent(ctx, > + > ctx->_Shader->CurrentProgram[MESA_SHADER_VERTEX], > + &has_vertex_shader, where)) > + return GL_FALSE; > > - if (!from_glsl_shader[MESA_SHADER_FRAGMENT]) { > - if (ctx->FragmentProgram.Enabled && !ctx->FragmentProgram._Enabled) { > - _mesa_error(ctx, GL_INVALID_OPERATION, > - "%s(fragment program not valid)", where); > - return GL_FALSE; > - } > + if (!shader_linked_or_absent(ctx, > + > ctx->_Shader->CurrentProgram[MESA_SHADER_FRAGMENT], > + &has_fragment_shader, where)) > + return GL_FALSE; > > - /* If drawing to integer-valued color buffers, there must be an > - * active fragment shader (GL_EXT_texture_integer). > + /* Any shader stages that are not supplied by the GLSL shader and have > + * assembly shaders enabled must now be validated. > */ > - if (ctx->DrawBuffer && ctx->DrawBuffer->_IntegerColor) { > + if (!has_vertex_shader > + && ctx->VertexProgram.Enabled && !ctx->VertexProgram._Enabled) { > _mesa_error(ctx, GL_INVALID_OPERATION, > -
Re: [Mesa-dev] [PATCH 8/8] mesa: Micro-optimize _mesa_is_valid_prim_mode
On Friday, December 19, 2014 02:20:59 PM Ian Romanick wrote: > From: Ian Romanick > > You would not believe the mess GCC 4.8.3 generated for the old > switch-statement. > > On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic > for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects > Gl32Batch7: > > 32-bit: Difference at 95.0% confidence -0.37374% +/- 0.184057% (n=40) > 64-bit: Difference at 95.0% confidence 0.966722% +/- 0.338442% (n=40) > > The regression on 32-bit is odd. Callgrind says the caller, > _mesa_is_valid_prim_mode is faster. Before it says 2,293,760 > cycles, and after it says 917,504. > > Signed-off-by: Ian Romanick > --- > src/mesa/main/api_validate.c | 30 -- > 1 file changed, 12 insertions(+), 18 deletions(-) > > diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c > index b882f0e..9c2e29e 100644 > --- a/src/mesa/main/api_validate.c > +++ b/src/mesa/main/api_validate.c > @@ -113,27 +113,21 @@ check_valid_to_render(struct gl_context *ctx, const > char *function) > bool > _mesa_is_valid_prim_mode(struct gl_context *ctx, GLenum mode) > { > - switch (mode) { > - case GL_POINTS: > - case GL_LINES: > - case GL_LINE_LOOP: > - case GL_LINE_STRIP: > - case GL_TRIANGLES: > - case GL_TRIANGLE_STRIP: > - case GL_TRIANGLE_FAN: > + /* The overwhelmingly common case is (mode <= GL_TRIANGLE_FAN). Test that > +* first and exit. You would think that a switch-statement would be the > +* right approach, but at least GCC 4.7.2 generates some pretty dire code Perhaps update this to 4.8.3 to match your commit message? > +* for the common case. > +*/ > + if (likely(mode <= GL_TRIANGLE_FAN)) I'm not sure if likely() is a good idea. My understanding is the penalty for getting it wrong is pretty hefty, and the other modes do happen. Getting rid of the switch statement makes a lot of sense. I like the new approach. With likely removed, this is: Reviewed-by: Kenneth Graunke >return true; > - case GL_QUADS: > - case GL_QUAD_STRIP: > - case GL_POLYGON: > + > + if (mode <= GL_POLYGON) >return (ctx->API == API_OPENGL_COMPAT); > - case GL_LINES_ADJACENCY: > - case GL_LINE_STRIP_ADJACENCY: > - case GL_TRIANGLES_ADJACENCY: > - case GL_TRIANGLE_STRIP_ADJACENCY: > + > + if (mode <= GL_TRIANGLE_STRIP_ADJACENCY) >return _mesa_has_geometry_shaders(ctx); > - default: > - return false; > - } > + > + return false; > } signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/8] meta: Put _mesa_meta_in_progress in the header file
On Friday, December 19, 2014 02:20:52 PM Ian Romanick wrote: > From: Ian Romanick > > ...so that it can be inlined in the two places that call it. > > On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic > for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects > Gl32Batch7: > > 32-bit: No difference proven at 95.0% confidence (n=120) > 64-bit: Difference at 95.0% confidence 1.24042% +/- 0.382277% (n=40) > > Signed-off-by: Ian Romanick > --- > src/mesa/drivers/common/meta.c | 10 -- > src/mesa/drivers/common/meta.h | 7 +-- > 2 files changed, 5 insertions(+), 12 deletions(-) I sent comments on patches 2, 3, 4, and 8. The series is: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/8] i965: Store the atoms directly in the context
On Sunday, December 21, 2014 09:19:53 PM Jason Ekstrand wrote: > On Sun, Dec 21, 2014 at 9:11 PM, Kenneth Graunke > wrote: > > > > On Friday, December 19, 2014 02:20:54 PM Ian Romanick wrote: > > > From: Ian Romanick > > > > > > Instead of having an extra pointer indirection in one of the hottest > > > loops in the driver. > > > > > > On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic > > > for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects > > > Gl32Batch7: > > > > > > 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40) > > > 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60) > > > > > > Signed-off-by: Ian Romanick > > > --- > > > src/mesa/drivers/dri/i965/brw_context.h | 2 +- > > > src/mesa/drivers/dri/i965/brw_state_upload.c | 19 --- > > > 2 files changed, 17 insertions(+), 4 deletions(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > > b/src/mesa/drivers/dri/i965/brw_context.h > > > index 92eb022..5f5f807 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_context.h > > > +++ b/src/mesa/drivers/dri/i965/brw_context.h > > > @@ -1384,7 +1384,7 @@ struct brw_context > > > } perfmon; > > > > > > int num_atoms; > > > - const struct brw_tracked_state **atoms; > > > + const struct brw_tracked_state atoms[64]; > > > > Making this atoms[57] would save memory, and the static asserts make it > > easy > > to remember to increase the size if we ever add additional atoms. > > > > Honestly, it seems like there should be a better way to do this. > > Currently, > > the atoms list only depends on the generation number, and thus is identical > > for all GL contexts. Storing an additional 1.5 kB of redundant data in > > each > > GL context seems a bit wasteful. > > > > Would it help to make brw_upload_state cache the array and size? > > > > void > > brw_upload_state(struct brw_context *brw) > > { > >static const struct brw_tracked_state *atoms = brw->atoms; > >static const int num_atoms = brw->num_atoms; > > > >... > > > >for (i = 0; i < num_atoms; i++) { > > ...use atoms[i]... > >} > > > >... > > } > > > > It seems like that would achieve the same effect - one indirection. > > > > Yeah, but then we have deal with threading issues. It might be ok, but I > think we'd have to be a bit more careful than that. Should be doable. But yeah, I guess it is kind of hacky. > > One downside to that approach is that we couldn't have separate atom > > lists for API_CORE vs. API_COMPAT (while we could using your approach). > > (We could skip stippling for core contexts, and skip GS/HS/DS for legacy > > contexts, which might help a tiny bit...) > > > > Another option would be to make the atoms list a screen thing and do the > copy at screen creation time. That would allow it to be shared across > contexts without the above threading issues. If we wanted, we could store > one core version and one compat version and have the context point to > whichever one it needs. The problem with this is that it's wasteful in the > one-context case where we would have both core an compat lists. I guess > you can't win them all. Ah, store the actual atoms, like Ian's doing...but in the screen. That's not a crazy plan either. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev