[Mesa-dev] [PATCH 43/57] i965/fs: Print fs_reg::offset field consistently for all register files.

2016-09-07 Thread Francisco Jerez
The offset printing code in fs_visitor::dump_instruction() was doing things differently for sources and destinations and for each register file -- In some cases it would be added to the base register number fs_reg::nr, in other cases it would follow the base register separated with a plus sign, in

[Mesa-dev] [PATCH 13/57] i965/fs: Return more accurate read size for LINTERP from fs_inst::size_read.

2016-09-07 Thread Francisco Jerez
The LINTERP virtual instruction only reads three scalar components from the first 16B of the second source, we can now teach size_read() about it since its return value is represented with byte granularity. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion

[Mesa-dev] [PATCH 22/57] i965/fs: Fix can_propagate_from() source/destination overlap check.

2016-09-07 Thread Francisco Jerez
The previous overlap condition only made sure that the VGRF numbers or GRF-aligned offsets were different without taking the amount of data written and read by the instruction into consideration. Use the regions_overlap() helper instead. --- src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp |

[Mesa-dev] [PATCH 16/57] i965/fs: Take into account trailing padding in regs_written() and regs_read().

2016-09-07 Thread Francisco Jerez
This fixes regs_written() and regs_read() to return a more accurate value when the padding left between components due to a stride value greater than one causes the region bounds given by size_written or size_read to overflow into the next register. This could become a problem in optimization pass

[Mesa-dev] [PATCH 34/57] i965/fs: Simplify and fix buggy stride/offset calculations using subscript().

2016-09-07 Thread Francisco Jerez
These were bashing the 'offset' and 'stride' values of several registers without taking the previous value into account, which probably didn't matter in practice for optimize_frontfacing_ternary() because the 'tmp' register already had a known region, but it would have given the wrong region as res

[Mesa-dev] [PATCH 38/57] i965/fs: Change region_contained_in() to use byte units.

2016-09-07 Thread Francisco Jerez
This makes the function less annoying to use and more accurate -- We shouldn't propagate a copy into a register region that wasn't fully contained in the destination of the copy (IOW, a source region that wasn't fully defined by the copy) just because the number of registers written and read by eac

[Mesa-dev] [PATCH 11/57] i965/vec4: Replace vec4_instruction::regs_read with ::size_read using byte units.

2016-09-07 Thread Francisco Jerez
The previous regs_read value can be recovered by rewriting each reference of regs_read() like 'x = i.regs_read(j)' to 'x = DIV_ROUND_UP(i.size_read(j), reg_unit)'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the int

[Mesa-dev] [PATCH 05/57] i965/fs: Add wrapper functions for fs_inst::regs_read and ::regs_written.

2016-09-07 Thread Francisco Jerez
This is in preparation for dropping fs_inst::regs_read and ::regs_written in favor of more accurate alternatives expressed in byte units. The main reason these wrappers are useful is that a number of optimization passes implement dataflow analysis with register granularity, so these helpers will c

[Mesa-dev] [PATCH 50/57] i965/vec4: Compare full register offsets in opt_register_coalesce nop move check.

2016-09-07 Thread Francisco Jerez
In preparation for adding support for sub-GRF offsets to the VEC4 IR. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index f97de18..d9dbc4c 100644 --- a/

[Mesa-dev] [PATCH 17/57] i965/fs: Take into account misalignment in regs_written() and regs_read().

2016-09-07 Thread Francisco Jerez
There was a workaround for this in fs_inst::size_read() for the SHADER_OPCODE_MOV_INDIRECT instruction and FIXED_GRF register file *only*. We should take this possibility into account for the sources and destinations of all instructions on all optimization passes that need to quantize dataflow in

[Mesa-dev] [PATCH 47/57] i965/vec4: Simplify src/dst_reg to brw_reg conversion by using byte_offset().

2016-09-07 Thread Francisco Jerez
This should also have the side effect of fixing convert_to_hw_regs() to handle sub-GRF register offsets. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i96

[Mesa-dev] [PATCH 45/57] i965/ir: Don't print ARF subnr values twice.

2016-09-07 Thread Francisco Jerez
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 src/mesa/drivers/dri/i965/brw_vec4.cpp | 4 2 files changed, 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index a90aba0..6d4303c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp

[Mesa-dev] [PATCH 40/57] i965/fs: Use region_contained_in() in compute-to-mrf coalescing pass.

2016-09-07 Thread Francisco Jerez
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 1668ac0..df66dc7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/br

[Mesa-dev] [PATCH 09/57] i965/ir: Drop backend_instruction::regs_written field.

2016-09-07 Thread Francisco Jerez
--- src/mesa/drivers/dri/i965/brw_shader.h | 1 - 1 file changed, 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_shader.h b/src/mesa/drivers/dri/i965/brw_shader.h index 2173f32..0de0808 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.h +++ b/src/mesa/drivers/dri/i965/brw_shader.h @@

[Mesa-dev] [PATCH 27/57] i965/vec4: Drop backend_reg::in_range() in favor of regions_overlap().

2016-09-07 Thread Francisco Jerez
This makes sure that overlap checks are done correctly throughout the back-end when the '*this' register starts before the register/size pair provided as argument, and is actually less annoying to use than in_range() at this point since regions_overlap() takes its size arguments in bytes. --- src/

[Mesa-dev] [PATCH 37/57] i965/fs: Simplify copy propagation LOAD_PAYLOAD ACP setup.

2016-09-07 Thread Francisco Jerez
By keeping track of 'offset' in byte units. --- src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index bd534b

[Mesa-dev] [PATCH 12/57] i965/fs: Return more accurate read size from fs_inst::size_read for IMM and UNIFORM files.

2016-09-07 Thread Francisco Jerez
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 5979461..7473f07 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.

[Mesa-dev] [PATCH 18/57] i965/vec4: Take into account misalignment in regs_written() and regs_read().

2016-09-07 Thread Francisco Jerez
Unlike the FS counterpart of this commit this was likely not (yet) a bug, but let's fix it already in preparation for implementing support for sub-GRF offsets in the VEC4 back-end. --- src/mesa/drivers/dri/i965/brw_ir_vec4.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --

[Mesa-dev] [PATCH 55/57] i965/vec4: Don't spill non-GRF-aligned register regions.

2016-09-07 Thread Francisco Jerez
A better fix would be to do something along the lines of the FS back-end spilling code and emit a scratch read before any instruction that overwrites the register to spill partially due to a non-zero sub-register offset. In the meantime mark registers used with a non-zero sub-register offset as no

[Mesa-dev] [PATCH 42/57] i965/fs: Misc simplification.

2016-09-07 Thread Francisco Jerez
Get rid of some leftover redundant arithmetic introduced during the conversion to byte offsets and sizes that can be simplified easily. --- src/mesa/drivers/dri/i965/brw_fs_combine_constants.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs_nir.cpp| 2 +- src/mesa/drivers/dri/i965/brw_

[Mesa-dev] [PATCH 25/57] i965/fs: Stop using fs_reg::in_range() in favor of regions_overlap().

2016-09-07 Thread Francisco Jerez
Its only use left in the FS back-end should be using regions_overlap() instead to avoid getting a false negative result in cases where source and destination overlap but the former starts before the latter in the VGRF file. --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 3 ++- 1 file changed, 2 in

[Mesa-dev] [PATCH 14/57] i965/fs: Handle arbitrary offsets in brw_reg_from_fs_reg for MRF/VGRF registers.

2016-09-07 Thread Francisco Jerez
This restriction seemed rather artificial... Removing it actually simplifies things slightly. --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw

[Mesa-dev] [PATCH 57/57] i965/vec4: Assert that pull constant load offsets are 16B-aligned.

2016-09-07 Thread Francisco Jerez
Non-16B-aligned pull constant loads are unlikely to be particularly useful given that you can get roughly the same effect by using swizzles on the result. --- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visit

[Mesa-dev] [PATCH 35/57] i965/fs: Simplify result_live calculation in dead_code_eliminate().

2016-09-07 Thread Francisco Jerez
No need to unroll the first iteration of the loop manually. --- src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp b/src/mesa/drivers/dri/i965/brw_fs_dea

[Mesa-dev] [PATCH 33/57] i965/fs: Simplify get_fpu_lowered_simd_width() by using inequalities instead of rounding.

2016-09-07 Thread Francisco Jerez
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index c9d3d7d..743929a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw

[Mesa-dev] [PATCH 41/57] i965/fs: Get rid of fs_inst::set_smear().

2016-09-07 Thread Francisco Jerez
component() was generally a better alternative because of several issues set_smear() had: - It wouldn't take the original stride and offset of the register into account, which means that set_smear() on the result of e.g. another set_smear() call or an offset() call would give a bogus reg

[Mesa-dev] [PATCH 21/57] i965/fs: Compare full register offsets in cmod propagation pass.

2016-09-07 Thread Francisco Jerez
This could potentially have misoptimized a program in cases where inst->src[0] had a non-zero sub-GRF offset. --- src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp b/src

[Mesa-dev] [PATCH 20/57] i965/fs: Don't consider LOAD_PAYLOAD with stride > 1 source to behave like a raw copy.

2016-09-07 Thread Francisco Jerez
Noticed the problem by inspection while typing in the previous commit. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f6a9ec1..57a3494 100644 --- a/src/m

[Mesa-dev] [PATCH 48/57] i965/vec4: Change opt_vector_float to keep track of the last offset seen in bytes.

2016-09-07 Thread Francisco Jerez
This simplifies things slightly and makes the pass more correct in presence of sub-GRF offsets. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp inde

[Mesa-dev] [PATCH 31/57] i965/fs: Fix signedness of the return value of fs_inst::size_read().

2016-09-07 Thread Francisco Jerez
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- src/mesa/drivers/dri/i965/brw_ir_fs.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index e033a65..c9d3d7d 100644 --- a/src/mesa/drivers/dri/i965/

[Mesa-dev] [PATCH 39/57] i965/fs: Move region_contained_in to the IR header and fix for non-VGRF files.

2016-09-07 Thread Francisco Jerez
Also changed the argument names since 'src' and 'dst' don't make that much sense outside of the context of copy propagation. --- src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 14 -- src/mesa/drivers/dri/i965/brw_ir_fs.h | 13 + 2 files changed, 13

[Mesa-dev] [PATCH 30/57] i965/fs: Switch mask_relative_to() used in compute-to-mrf to byte units.

2016-09-07 Thread Francisco Jerez
This makes the helper function less annoying to use and somewhat more accurate. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 703

[Mesa-dev] [PATCH 08/57] i965/vec4: Replace vec4_instruction::regs_written with ::size_written field in bytes.

2016-09-07 Thread Francisco Jerez
The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previ

[Mesa-dev] [PATCH 26/57] i965/vec4: Port regions_overlap() to the vec4 IR.

2016-09-07 Thread Francisco Jerez
This is copy-pasted almost line by line from the FS back-end. The only reason it cannot be implemented in terms of backend_reg is that the backend_reg::nr field doesn't have the same meaning for uniforms on both back-ends. It could be easily deduplicated by using a template function. --- src/mes

[Mesa-dev] [PATCH 23/57] i965/fs: Fix LOAD_PAYLOAD handling in register coalesce is_nop_mov().

2016-09-07 Thread Francisco Jerez
is_nop_mov() was broken for LOAD_PAYLOAD instructions in two ways: On the one hand the original destination register offset wasn't being taken into account which would give incorrect results if it was already non-zero, and on the other hand all source registers were being treated as if they had a s

[Mesa-dev] [PATCH 15/57] i965/fs: Handle fixed HW GRF subnr in reg_offset().

2016-09-07 Thread Francisco Jerez
This will be useful later on when we start using reg_offset() on fixed hardware registers. --- src/mesa/drivers/dri/i965/brw_ir_fs.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h b/src/mesa/drivers/dri/i965/brw_ir_fs.h index 2e5c8e5..4

[Mesa-dev] [PATCH 32/57] i965/fs: Simplify byte_offset().

2016-09-07 Thread Francisco Jerez
In the most common case this can now be implemented as a simple addition because the offset is already encoded as a single scalar value in bytes. --- src/mesa/drivers/dri/i965/brw_ir_fs.h | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw

[Mesa-dev] [PATCH 06/57] i965/vec4: Add wrapper functions for vec4_instruction::regs_read and ::regs_written.

2016-09-07 Thread Francisco Jerez
This is in preparation for dropping vec4_instruction::regs_read and ::regs_written in favor of more accurate alternatives expressed in byte units. The main reason these wrappers are useful is that a number of optimization passes implement dataflow analysis with register granularity, so these helpe

[Mesa-dev] [PATCH 46/57] i965/ir: Update several stale comments.

2016-09-07 Thread Francisco Jerez
--- src/mesa/drivers/dri/i965/brw_defines.h| 2 +- src/mesa/drivers/dri/i965/brw_fs.cpp | 18 +++--- src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp| 12 ++-- .../drivers/dri/i965/brw_schedule_instructions.cpp | 8 src/me

[Mesa-dev] [PATCH 56/57] i965/vec4: Assert that ATTR regions are register-aligned.

2016-09-07 Thread Francisco Jerez
It might be useful to actually handle this once copy propagation becomes smarter about register-misaligned offsets. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index

[Mesa-dev] [PATCH 28/57] i965/fs: Take into account copy register offset during compute-to-mrf.

2016-09-07 Thread Francisco Jerez
This was dropping 'inst->dst.offset' on the floor. Nothing in the code above seems to guarantee that it's zero and in that case the offset of the register being coalesced into wouldn't be taken into account while rewriting the generating instruction. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +

[Mesa-dev] [PATCH 53/57] i965/vec4: Compare full register offsets in cmod propagation.

2016-09-07 Thread Francisco Jerez
Cmod propagation would misoptimize the program if the destination offset of the generating instruction wasn't exactly the same as the source region offset of the copy instruction. In preparation for adding support for sub-GRF offsets to the VEC4 IR. --- src/mesa/drivers/dri/i965/brw_vec4_cmod_pro

[Mesa-dev] [PATCH 44/57] i965/vec4: Print src/dst_reg::offset field consistently for all register files.

2016-09-07 Thread Francisco Jerez
C.f. 'i965/fs: Print fs_reg::offset field consistently for all register files.'. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp in

[Mesa-dev] [PATCH 24/57] i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap().

2016-09-07 Thread Francisco Jerez
fs_inst::overwrites_reg is rather easy to misuse because it cannot tell how large the register region starting at 'reg' is, so in cases where the destination region starts after 'reg' it may give a misleading result. regions_overlap() is somewhat more verbose to use but handles arbitrary overlap c

[Mesa-dev] [PATCH 51/57] i965/vec4: Don't coalesce registers with overlapping writes not matching the MOV source.

2016-09-07 Thread Francisco Jerez
In preparation for adding support for sub-GRF offsets to the VEC4 IR. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index d9dbc4c..8f8d262 10

[Mesa-dev] [PATCH 49/57] i965/vec4: Check that the write offsets match when setting dependency controls.

2016-09-07 Thread Francisco Jerez
For simplicity just assume that two writes to the same GRF with different sub-GRF offsets will potentially interfere and break the dependency control chain. This is in preparation for adding sub-GRF offset support to the VEC4 IR. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 ++ 1 file changed,

[Mesa-dev] [PATCH 29/57] i965/fs: Fix bogus sub-MRF offset calculation in compute-to-mrf.

2016-09-07 Thread Francisco Jerez
The 'scan_inst->dst.offset % REG_SIZE' term in the final 'scan_inst->dst.offset' calculation is obviously bogus. The offset from the start of the copy destination register 'inst->dst' where the destination of the generating instruction 'scan_inst' would be written to (before compute-to-mrf runs) i

[Mesa-dev] [PATCH 54/57] i965/vec4: Fix copy propagation for non-register-aligned regions.

2016-09-07 Thread Francisco Jerez
This prevents it from trying to propagate a copy through a register-misaligned region. MOV instructions with a misaligned destination shouldn't be treated as a direct GRF copy, because they only define the destination GRFs partially. Also fix the interference check implemented with is_channel_upd

[Mesa-dev] [PATCH 07/57] i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.

2016-09-07 Thread Francisco Jerez
The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previ

[Mesa-dev] [PATCH 19/57] i965/fs: Don't consider LOAD_PAYLOAD with sub-GRF offset to behave like a raw copy.

2016-09-07 Thread Francisco Jerez
This was likely the original intention, and at least register coalesce relies on it. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 51c0d55..f6a9ec1 1006

[Mesa-dev] [PATCH 36/57] i965/fs: Simplify a bunch of fs_inst::size_written calculations by using component_size().

2016-09-07 Thread Francisco Jerez
Using component_size() is easier and generally more correct because it takes into account the register type and stride for you. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 28 -- .../drivers/dri/i965/brw_f

[Mesa-dev] [PATCH 52/57] i965/vec4: Assign correct destination offset to rewritten instruction in register coalesce.

2016-09-07 Thread Francisco Jerez
Because the pass already checks that the destination offset of each 'scan_inst' that needs to be rewritten matches 'inst->src[0].offset' exactly, the final offset of the rewritten instruction is just the original destination offset of the copy. This is in preparation for adding support for sub-GRF

Re: [Mesa-dev] [PATCH] i965/fs: Fail the shader compile instead of asserting when we can't spill

2016-09-08 Thread Francisco Jerez
Jason Ekstrand writes: > Blorp doesn't handle spilling so we set allow_spilling to false in that > case. The blorp 16x MSAA resolve shader spills in 16-wide but not 8-wide. > This commit makes it so that we fail the 16-wide compile and successfully > fall back to 8-wide instead of just assert-fa

Re: [Mesa-dev] [PATCH] i965/fs: Fail the shader compile instead of asserting when we can't spill

2016-09-08 Thread Francisco Jerez
Jason Ekstrand writes: > On Thu, Sep 8, 2016 at 2:39 PM, Francisco Jerez > wrote: > >> Jason Ekstrand writes: >> >> > Blorp doesn't handle spilling so we set allow_spilling to false in that >> > case. The blorp 16x MSAA resolve shader spills in

Re: [Mesa-dev] [PATCH 01/57] i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.

2016-09-08 Thread Francisco Jerez
Iago Toral writes: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: > (...) >> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp >> b/src/mesa/drivers/dri/i965/brw_shader.cpp >> index ea39252..29435f6 100644 >> --- a/src/mesa/drivers/dri/i965/br

Re: [Mesa-dev] [PATCH 02/57] i965/vec4: Replace dst/src_reg::reg_offset with dst/src_reg::offset expressed in bytes.

2016-09-08 Thread Francisco Jerez
Iago Toral writes: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: > (...) >>  src/mesa/drivers/dri/i965/brw_ir_vec4.h|  4 +- >>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 61 >> -- >>  .../drivers/dri/i965/brw_

Re: [Mesa-dev] [PATCH 06/57] i965/vec4: Add wrapper functions for vec4_instruction::regs_read and ::regs_written.

2016-09-08 Thread Francisco Jerez
Iago Toral writes: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> This is in preparation for dropping vec4_instruction::regs_read and >> ::regs_written in favor of more accurate alternatives expressed in >> byte units.  The main reason these wrappers

Re: [Mesa-dev] [PATCH 07/57] i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.

2016-09-08 Thread Francisco Jerez
Iago Toral writes: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: > (...) >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> index 12ab7b3..a678351 100644 >> --- a/src/mesa/drivers

Re: [Mesa-dev] [PATCH 11/57] i965/vec4: Replace vec4_instruction::regs_read with ::size_read using byte units.

2016-09-08 Thread Francisco Jerez
Iago Toral writes: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> The previous regs_read value can be recovered by rewriting each >> reference of regs_read() like 'x = i.regs_read(j)' to 'x = >> DIV_ROUND_UP(i.size_read(j), reg_unit)&#x

Re: [Mesa-dev] [PATCH 00/57] i965/ir: Switch representation of register offsets and sizes to byte units.

2016-09-08 Thread Francisco Jerez
Iago Toral writes: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> This series reworks the representation of register region offsets in >> the i965 IR to be universally byte-based instead of the rather >> awkward >> split between reg_offset and subreg_of

Re: [Mesa-dev] [RFT PATCH 2/2] nv20: enable ARB_texture_border_clamp support

2016-09-08 Thread Francisco Jerez
hardware patch is: Acked-by: Francisco Jerez > src/mesa/drivers/dri/nouveau/nv20_context.c | 1 + > src/mesa/drivers/dri/nouveau/nv20_state_tex.c | 29 > ++- > 2 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/

Re: [Mesa-dev] [PATCH] glsl: Make blend_colordodge compare against 1.0 - FLT_EPSILON.

2016-09-08 Thread Francisco Jerez
Alejandro Piñeiro writes: > On 02/09/16 23:13, Kenneth Graunke wrote: >> On Friday, August 26, 2016 10:49:18 PM PDT Kenneth Graunke wrote: >>> This fixes a numerical precision issue that was causing two CTS >>> failures: >>> >>> ES31-CTS.blend_equation_advanced.blend_specific.GL_COLORBURN_KHR >>>

Re: [Mesa-dev] [PATCH 25/57] i965/fs: Stop using fs_reg::in_range() in favor of regions_overlap().

2016-09-09 Thread Francisco Jerez
Iago Toral writes: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> Its only use left in the FS back-end should be using >> regions_overlap() >> instead to avoid getting a false negative result in cases where >> source >> and destination overlap

Re: [Mesa-dev] [PATCH 15/57] i965/fs: Handle fixed HW GRF subnr in reg_offset().

2016-09-09 Thread Francisco Jerez
Iago Toral writes: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> This will be useful later on when we start using reg_offset() on >> fixed >> hardware registers. >> --- >>  src/mesa/drivers/dri/i965/brw_ir_fs.h | 3 ++- >>  1 fi

Re: [Mesa-dev] [PATCH 16/57] i965/fs: Take into account trailing padding in regs_written() and regs_read().

2016-09-09 Thread Francisco Jerez
Iago Toral writes: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> This fixes regs_written() and regs_read() to return a more accurate >> value when the padding left between components due to a stride value >> greater than one causes the region bounds giv

Re: [Mesa-dev] [PATCH 27/57] i965/vec4: Drop backend_reg::in_range() in favor of regions_overlap().

2016-09-09 Thread Francisco Jerez
Iago Toral writes: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> This makes sure that overlap checks are done correctly throughout the >> back-end when the '*this' register starts before the register/size >> pair provided as argument, and is act

Re: [Mesa-dev] [PATCH 28/57] i965/fs: Take into account copy register offset during compute-to-mrf.

2016-09-09 Thread Francisco Jerez
Iago Toral writes: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> This was dropping 'inst->dst.offset' on the floor.  Nothing in the >> code above seems to guarantee that it's zero and in that case the >> offset of the register bein

Re: [Mesa-dev] [PATCH] glsl: Make blend_colordodge compare against 1.0 - FLT_EPSILON.

2016-09-10 Thread Francisco Jerez
Alejandro Piñeiro writes: > On 09/09/16 07:33, Francisco Jerez wrote: >> Alejandro Piñeiro writes: >> >>> On 02/09/16 23:13, Kenneth Graunke wrote: >>>> On Friday, August 26, 2016 10:49:18 PM PDT Kenneth Graunke wrote: >>>>> This fixes a

Re: [Mesa-dev] [PATCH] st/clover: Define __OPENCL_VERSION__ on the device side

2016-09-10 Thread Francisco Jerez
Niels Ole Salscheider writes: > On Wednesday, 31 August 2016, 15:53:05 CEST, Serge Martin wrote: >> On Wednesday 31 August 2016 12:39:23 Vedran Miletić wrote: >> > On 08/28/2016 04:42 PM, Niels Ole Salscheider wrote: >> > > This is required by the OpenCL standard. >> > > >> > > Signed-off-by: Ni

Re: [Mesa-dev] [PATCH 00/57] i965/ir: Switch representation of register offsets and sizes to byte units.

2016-09-12 Thread Francisco Jerez
Iago Toral writes: > On Fri, 2016-09-09 at 11:37 +0200, Iago Toral wrote: >> On Thu, 2016-09-08 at 11:36 +0200, Iago Toral wrote: >> > >> > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> > > >> > > >> > > Th

Re: [Mesa-dev] [PATCH 00/57] i965/ir: Switch representation of register offsets and sizes to byte units.

2016-09-12 Thread Francisco Jerez
Francisco Jerez writes: > Iago Toral writes: > >> On Fri, 2016-09-09 at 11:37 +0200, Iago Toral wrote: >>> On Thu, 2016-09-08 at 11:36 +0200, Iago Toral wrote: >>> > >>> > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >>> > &

Re: [Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg

2016-09-12 Thread Francisco Jerez
Iago Toral Quiroga writes: > We will use this in cases where we want to force the vstride of a src_reg > to 0 to exploit a particular behavior of the hardware. It will come in > handy to implement access to components Z/W. > --- > src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 + > src/mesa/drivers

Re: [Mesa-dev] [PATCH 53/95] i965/disasm: fix subreg for dst in Align16 mode

2016-09-12 Thread Francisco Jerez
brw_inst_dst_da16_subreg_nr(devinfo, inst) / > reg_type_size[brw_inst_dst_reg_type(devinfo, inst)]); If brw_inst_dst_da16_subreg_nr(devinfo, inst) is guaranteed to be one, isn't this equivalent to '16 / reg_type_size[brw_inst_dst_reg_type(devinfo, inst)]'? With that fixed

Re: [Mesa-dev] [PATCH 62/95] i965/vec4: Add a shuffle_64bit_data helper

2016-09-12 Thread Francisco Jerez
Iago Toral Quiroga writes: > SIMD4x2 64bit data is stored in register space like this: > > r0.0:DF x0 y0 z0 w0 > r0.1:DF x1 y1 z1 w1 > > When we need to write data such as this to memory using 32-bit write > messages we need to shuffle it in this fashion: > > r0.0:DF x0 y0 x1 y1 > r0.1:DF z0

Re: [Mesa-dev] [PATCH] anv/cmd_buffer: Set the L3 atomic disable mask bit in CHICKEN3 on HSW

2016-09-12 Thread Francisco Jerez
ason Ekstrand > Cc: Lionel Landwerlin > Cc: Francisco Jerez Reviewed-by: Francisco Jerez > --- > src/intel/genxml/gen75.xml | 1 + > src/intel/vulkan/genX_cmd_buffer.c | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/src/intel/genxml/gen75.xml b/src/

[Mesa-dev] [PATCH 1/2] glapi: Move PrimitiveBoundingBox and BlendBarrier definitions into ES3.2 category.

2016-10-18 Thread Francisco Jerez
These two GLES 3.2 entry points were being defined in the category of the ARB_ES3_2_compatibility and KHR_blend_equation_advanced extensions respectively instead of in the ES3.2 category. Defining them in the ES3.2 category makes sure that the gl_procs.py generator emits declarations in the glproc

[Mesa-dev] [PATCH 2/2] Revert "Revert "mapi: export all GLES 3.2 functions in libGLESv2.so""

2016-10-18 Thread Francisco Jerez
This reverts commit 85e9bbc14d93fa7166c9ae075ee7ae29a8313e3f. The previous commit should help with the scons build failure caused by the original commit. --- src/mapi/glapi/gen/static_data.py | 12 1 file changed, 12 insertions(+) diff --git a/src/mapi/glapi/gen/static_data.py b/sr

Re: [Mesa-dev] [PATCH 1/2] glapi: Move PrimitiveBoundingBox and BlendBarrier definitions into ES3.2 category.

2016-10-18 Thread Francisco Jerez
erstand the dispatch generation mess, but apparently the gl_procs.py treats the ES (and GL_OES) categories specially and emits forward declarations for them before the actual table -- Possibly to hack around build failures with GLES entry points not defined in desktop GL headers. > On Tue, Oct 18, 2016

Re: [Mesa-dev] [PATCH] glsl: Skip invariant/precision linker checks for built-in variables.

2016-10-20 Thread Francisco Jerez
Ian Romanick writes: > On 10/19/2016 04:52 PM, Jason Ekstrand wrote: >> On Wed, Oct 19, 2016 at 3:44 PM, Ian Romanick > > wrote: >> >> On 10/19/2016 02:31 PM, Kenneth Graunke wrote: >> > On Wednesday, October 19, 2016 1:40:39 PM PDT Ian Romanick wrote: >>

Re: [Mesa-dev] [PATCH] i965: Skip register write checks if cmd_parser_version >= 2.

2016-10-25 Thread Francisco Jerez
Kenneth Graunke writes: > If the kernel advertises a new enough command parser version, then we > can just assume that register writes will work and not bother executing > commands on the GPU to test it. > > This should speed up context creation. > > From the command parser version history (i915_

Re: [Mesa-dev] [PATCH] i965: Skip register write checks if cmd_parser_version >= 2.

2016-10-26 Thread Francisco Jerez
Daniel Vetter writes: > On Tue, Oct 25, 2016 at 11:16:56AM -0700, Francisco Jerez wrote: >> Kenneth Graunke writes: >> >> > If the kernel advertises a new enough command parser version, then we >> > can just assume that register writes will work and not both

Re: [Mesa-dev] [PATCH v2.1] i965/gen7: expose OpenGL 4.0 on Haswell

2016-10-26 Thread Francisco Jerez
Iago Toral writes: > On Thu, 2016-10-20 at 16:25 -0700, Ian Romanick wrote: >> On 10/20/2016 12:09 AM, Iago Toral Quiroga wrote: >> > >> > ARB_gpu_shader_fp64 was the last piece missing. Notice that some >> > hardware and kernel combinations do not support pipelined register >> > writes, which a

Re: [Mesa-dev] [PATCH 1/3] i965/vec4: add a byte_offset helper

2016-10-26 Thread Francisco Jerez
assert(bytes == 0); > + } > +} > + I suggest you wrap this helper in 'namespace detail { ... }' to make clear that this is an implementation detail of the byte_offset() functions below not intended to be used independently. With that changed: Reviewed-by: Francisco Jerez

Re: [Mesa-dev] [PATCH 3/3] i965/vec4: make offset() work in terms of a simd width and scalar components

2016-10-26 Thread Francisco Jerez
delta); | } Note that this also makes sure that we step by a whole multiple of a vec4. With that changed [and the other offset function below] patch is: Reviewed-by: Francisco Jerez > > @@ -176,11 +178,13 @@ byte_offset(dst_reg reg, unsigned bytes) > } > > static inline dst_

Re: [Mesa-dev] [PATCH 2/3] i965/vec4: use byte_offset() instead of offset()

2016-10-26 Thread Francisco Jerez
ore(block, copy); These last two hunks seem rather suspect... They're emitting a MOV instruction for each register written even though you'd need one for each SIMD-wide vector written. Seems like the loop upper bound will need to be changed to 'inst->size_writ

Re: [Mesa-dev] [PATCH] glsl: Improve accuracy of alpha scaling in advanced blend lowering.

2016-10-28 Thread Francisco Jerez
swizzle(src, SWIZZLE_, > 3)), > + imm3(1), > + div(swizzle_xyz(src), > src_alpha); > > ir_variable *factor = f.make_temp(glsl_type::vec3_type, &

Re: [Mesa-dev] [PATCH v4] clover: Pass unquoted compiler arguments to Clang

2016-10-30 Thread Francisco Jerez
Vedran Miletić writes: > OpenCL apps can quote arguments they pass to the OpenCL compiler, most > commonly include paths containing spaces. > > If the Clang OpenCL compiler was called via a shell, the shell would > split the arguments with respect to to quotes and then remove quotes > before pass

Re: [Mesa-dev] [PATCH v2] clover: Allow OpenCL version override

2016-10-30 Thread Francisco Jerez
Vedran Miletić writes: > CLOVER_CL_VERSION_OVERRIDE allows overriding default OpenCL version > supported by Clover, analogous to MESA_GL_VERSION_OVERRIDE for OpenGL. > CLOVER_CL_C_VERSION_OVERRIDE allows overridng default OpenCL C version. > > v2: > - move version getters to version.hpp, simpli

Re: [Mesa-dev] [PATCH] clover: clGetExtensionFunctionAddressForPlatform

2016-10-30 Thread Francisco Jerez
Serge Martin writes: > On Saturday 01 October 2016 19:03:11 Serge Martin wrote: >> On Sunday 27 September 2015 11:15:14 Serge Martin wrote: >> > add clGetExtensionFunctionAddressForPlatform (CL 1.2) >> >> ping (one year reminder :p ) > > CC curro > R-b and pushed. >> >> > --- >> > >> > src/

Re: [Mesa-dev] [PATCH v6] clover: Introduce CLOVER_EXTRA_*_OPTIONS environment variables

2016-10-30 Thread Francisco Jerez
Vedran Miletić writes: > The options specified in the CLOVER_EXTRA_BUILD_OPTIONS shell > variable are appended to the options specified by the OpenCL program > in the clBuildProgram function call, if any. > Analogously, the options specified in the CLOVER_EXTRA_COMPILE_OPTIONS > and CLOVER_EXTRA_

Re: [Mesa-dev] [PATCH v2 2/2] clover: add missing clGetDeviceInfo CL1.2 queries

2016-10-30 Thread Francisco Jerez
However, clover is not ready yet to support it > + buf.as_scalar() = 0 /* 1024 */; > + break; > + AFAIUI /dev/null is a valid implementation of the printf built-in as far as the CL spec is concerned, so I'd go ahead and report 1 MB of buffer space already as the

Re: [Mesa-dev] [PATCH v2] clover: add GetKernelArgInfo (CL 1.2)

2016-10-30 Thread Francisco Jerez
Serge Martin writes: > --- > src/gallium/state_trackers/clover/api/kernel.cpp | 47 -- > src/gallium/state_trackers/clover/core/kernel.cpp | 6 +++ > src/gallium/state_trackers/clover/core/kernel.hpp | 1 + > src/gallium/state_trackers/clover/core/module.hpp | 19 +-- >

[Mesa-dev] [PATCHv3 1/2] clover: Add CL_PROGRAM_BINARY_TYPE support (CL1.2).

2016-10-30 Thread Francisco Jerez
From: Serge Martin v3 [Francisco Jerez]: Loosely based on Serge's v1 of this patch in order to avoid CL-specific enums in the clover module binary format. In addition to other changes made in v2: Represent the CL program binary type as the section type instead of adding a CL

[Mesa-dev] [PATCH] nir: Flip gl_SamplePosition in nir_lower_wpos_ytransform().

2016-11-01 Thread Francisco Jerez
Assuming the hardware is set-up to use a screen coordinate system flipped vertically with respect to the GL's window coordinate system, the SYSTEM_VALUE_SAMPLE_POS vector will also be flipped vertically with respect to the value expected by the GL, so we need to give it the same treatment as gl_Fra

Re: [Mesa-dev] [PATCH 3/7] glsl/lower_if: don't lower branches touching tess control outputs

2016-11-03 Thread Francisco Jerez
Ian Romanick writes: > On 10/28/2016 04:13 PM, Marek Olšák wrote: >> From: Marek Olšák >> >> --- >> src/compiler/glsl/ir_optimization.h | 3 ++- >> src/compiler/glsl/lower_if_to_cond_assign.cpp | 23 --- >> src/compiler/glsl/test_optpass.cpp| 2 +- >>

Re: [Mesa-dev] [PATCH 2/2] i965: Advertise 8 subpixel bits always.

2016-11-07 Thread Francisco Jerez
tests use the GL_SUBPIXEL_BITS value to determine the error tolerance so increasing the value could potentially uncover additional approximation errors. Doesn't seem to cause any regressions though in our CI system, series is: Reviewed-by: Francisco Jerez > Signed-off-by: Chris For

Re: [Mesa-dev] [PATCH 2/2] i965: Advertise 8 subpixel bits always.

2016-11-07 Thread Francisco Jerez
k we fall back to swrast for anything that would be sensitive to the subpixel precision, but it's definitely worth checking. :) > - Chris > > On Tue, Nov 8, 2016 at 11:01 AM, Francisco Jerez > wrote: > >> Chris Forbes writes: >> >> > The mesa default is 4, bu

Re: [Mesa-dev] [PATCH] ralloc: don't use ralloc_set_destructor() for linear allocations

2016-11-09 Thread Francisco Jerez
Brian Paul writes: > With older gcc versions and MSVC we were using _ralloc_destructor() in > with the linear allocator. That led to a failed canary assertion. > > This patch prevents _ralloc_destructor() from being used in those cases. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=

Re: [Mesa-dev] [PATCH] ralloc: don't use ralloc_set_destructor() for linear allocations

2016-11-09 Thread Francisco Jerez
Francisco Jerez writes: > Brian Paul writes: > >> With older gcc versions and MSVC we were using _ralloc_destructor() in >> with the linear allocator. That led to a failed canary assertion. >> >> This patch prevents _ralloc_destructor() from being used in thos

Re: [Mesa-dev] [PATCH] util: Fix Clang trivial destructor check.

2016-11-14 Thread Francisco Jerez
is trivially destructible.") > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98526 > Signed-off-by: Vinson Lee Reviewed-by: Francisco Jerez > --- > src/util/macros.h | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/src/ut

<    3   4   5   6   7   8   9   10   11   12   >