On Sat, Oct 19, 2013 at 3:05 PM, Paul Berry <stereotype...@gmail.com> wrote: > On 14 October 2013 10:12, Anuj Phogat <anuj.pho...@gmail.com> wrote: >> >> Implement the FS backend for new builtins added by the extension: >> in vec2 gl_SamplePosition >> in int gl_SampleID >> in int gl_NumSamples >> out int gl_SampleMask[] > > > There is a lot going on in this one patch, and it's getting hard to follow > all the patch review that's going on. If you wind up re-spinning this patch > series, can you please break it into four separate patches, one to add > support for each builtin? > Yes, I'll split it in to four patches. >> >> >> Signed-off-by: Anuj Phogat <anuj.pho...@gmail.com> >> --- >> src/mesa/drivers/dri/i965/brw_fs.cpp | 109 >> +++++++++++++++++++++++++++ >> src/mesa/drivers/dri/i965/brw_fs.h | 4 + >> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 23 ++++++ >> src/mesa/drivers/dri/i965/brw_wm.h | 1 + >> 4 files changed, 137 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> b/src/mesa/drivers/dri/i965/brw_fs.cpp >> index e5d6e4b..e4f7745 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> @@ -1115,6 +1115,109 @@ >> fs_visitor::emit_frontfacing_interpolation(ir_variable *ir) >> return reg; >> } >> >> +void >> +fs_visitor::compute_sample_position(fs_reg dst, fs_reg int_sample_pos) >> +{ >> + int num_samples = ctx->DrawBuffer->Visual.samples; > > > This isn't safe. When compilation depends on a piece of GL state, you need > to include the state in brw_wm_prog_key. Otherwise the program won't get > recompiled when the value changes. > > To avoid unnecessary recompiles, I'd recommend adding a boolean to > brw_wm_prog_key which is: > - true if ctx->Multisample.Enabled && num_samples != 0 && (shader accesses > gl_SamplePosition) > - false otherwise > yes, this make sense. >> >> + assert(num_samples >= 0 && num_samples <= 8); >> + >> + /* From arb_sample_shading specification: >> + * "When rendering to a non-multisample buffer, or if multisample >> + * rasterization is disabled, gl_SamplePosition will always be >> + * (0.5, 0.5). >> + */ >> + if (!ctx->Multisample.Enabled || num_samples == 0) { >> + emit(BRW_OPCODE_MOV, dst, fs_reg(0.5f)); > > > It looks like you're using the old, more verbose style of emitting > instructions. Can we convert this (and the other instructions in this > patch) to the more compact style: > > emit(MOV(dst, fs_reg(0.5f))); > ok. >> >> + } >> + else { >> + /* For num_samples = {4, 8} */ >> + emit(BRW_OPCODE_MOV, dst, int_sample_pos); >> + emit(BRW_OPCODE_MUL, dst, dst, fs_reg(1 / 16.0f)); > > > Like Ken, I was confused as to why we needed a separate MOV before the MUL. > The assertion Ken recommended would have been a useful clue, but I'd prefer > to just have explicit comments explaining why we need the MOV. How about > this: > > /* Convert int_sample_pos to floating point */ > emit(BRW_OPCODE_MOV, dst, int_sample_pos); > /* Scale to the range [0, 1] */ > emit(BRW_OPCODE_MUL, dst, dst, fs_reg(1 / 16.0f)); > >> >> + } >> +} >> + >> +fs_reg * >> +fs_visitor::emit_samplepos_interpolation(ir_variable *ir) >> +{ >> + assert(brw->gen >= 6); >> + >> + this->current_annotation = "compute sample position"; >> + fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type); > > > Since this code assigns to two consecutive registers, it relies on the fact > that ir->type is vec2. Just to make that explicit, can we add: > > assert(ir->type == glsl_type::vec2_type); > >> >> + fs_reg pos = *reg; >> + fs_reg int_sample_x = fs_reg(this, glsl_type::int_type); >> + fs_reg int_sample_y = fs_reg(this, glsl_type::int_type); >> + >> + /* WM will be run in MSDISPMODE_PERSAMPLE. So, only SIMD8 mode will be >> + * enabled. The X, Y sample positions come in as bytes in thread >> payload. >> + * Sample IDs and sample positions remain same for all four slots in a >> + * subspan. So, read the positions using vstride=2, width=4, >> hstride=0. > > > I have similar concerns to Ken about why MSDISPMODE_PERSAMPLE implies that > only SIMD8 mode will be enabled. I'm assuming the two of you have > adequately resolved that. > I was facing few problems with enabling 'SIMD16 only' dispatch. So, we agreed to enable 'SIMD8 only' dispatch in this series. I'll do SIMD16 in a followup patch. I'll update this comment with correct information. >> >> + */ >> + emit(BRW_OPCODE_AND, int_sample_x, >> + fs_reg(stride(retype(brw_vec1_grf(c->sample_pos_reg, 0), >> + BRW_REGISTER_TYPE_D), 2, 4, 0)), >> + fs_reg(brw_imm_d(0xff))); > > > If I understand correctly, this is creating the instruction > > AND(8) int_sample_x<1>:d sample_pos_reg<2;4,0>:d 0x000000ff:d > > I think this works, but it would be more straightforward to do this, which > just selects the X coordinate bytes from the sample_pos_reg register: > > MOV(8) int_sample_x<1>:d sample_pos_reg<16;8,2>:b > > That would have the advantage that it doesn't rely on the fact that sample > IDs and sample positions are the same for all four slots in a subspan. > > (Note: there are some weird restrictions on byte operations, and I can't > remember all the details. So you might need to try this in the simulator > and tweak it to get it to work.) > yes, this looks better. I didn't think about using a byte operation. Thanks. >> >> + >> + /* Compute gl_SamplePosition.x */ >> + compute_sample_position(pos, int_sample_x); >> + pos.reg_offset++; > > > I think if we change this to: > > pos.reg_offset += dispatch_width / 8; > > then the code will work correctly for both SIMD8 and SIMD16 execution modes. > right. >> >> + >> + emit(BRW_OPCODE_SHR, int_sample_y, >> + fs_reg(stride(retype(brw_vec1_grf(c->sample_pos_reg, 0), >> + BRW_REGISTER_TYPE_D), 2, 4, 0)), >> + fs_reg(8)); >> + emit(BRW_OPCODE_AND, int_sample_y, int_sample_y, >> fs_reg(brw_imm_d(0xff))); > > > Similarly, these two instructions could be replaced with simply: > > MOV(8) int_sample_y<1>:d sample_pos_reg.1<16;8,2>:b > > Which just selects the Y coordintae bytes from the sample_pos_register. > >> >> + >> + /* Compute gl_SamplePosition.y */ >> + compute_sample_position(pos, int_sample_y); >> + return reg; >> +} >> + >> +fs_reg * >> +fs_visitor::emit_sampleid_interpolation(ir_variable *ir) >> +{ >> + assert(brw->gen >= 6); >> + bool multisampled_fbo = ctx->DrawBuffer->Visual.samples > 1; > > > As I mentioned in my comments on compute_sample_position(), it's not safe to > consult the context like this. This information needs to go in the program > key. > >> >> + >> + this->current_annotation = "compute sample id"; >> + fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type); >> + >> + if (multisampled_fbo && ctx->Multisample.Enabled) { > > > This too. > >> >> + fs_reg t1 = fs_reg(this, glsl_type::int_type); >> + fs_reg t2 = fs_reg(this, glsl_type::int_type); >> + t2.type = BRW_REGISTER_TYPE_UW; >> + >> + /* The WM will be run in MSDISPMODE_PERSAMPLE with num_samples = 8. >> + * Therefore, subspan 0 will represent sample N (where N is 0, 2, 4 >> + * or 6), subspan 1 will represent sample 1, 3, 5 or 7. We can >> find >> + * the value of N by looking at R0.0 bits 7:6 ("Starting Sample >> Pair >> + * Index (SSPI)") and multiplying by two (since samples are always >> + * delivered in pairs). That is, we compute 2*((R0.0 & 0xc0) >> 6) >> + * == (R0.0 & 0xc0) >> 5. >> + * >> + * Then we need to add N to the sequence (0, 0, 0, 0, 1, 1, 1, 1), >> + * which we compute by populating a temporary variable with the >> + * sequence (0, 1), and then reading from it using vstride=1, >> + * width=4, hstride=0. >> + * Same holds true for num_samples = 4. >> + */ >> + emit(BRW_OPCODE_AND, t1, >> + fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)), >> + fs_reg(brw_imm_d(0xc0))); >> + emit(BRW_OPCODE_SHR, t1, t1, fs_reg(5)); >> + emit(BRW_OPCODE_MOV, t2, fs_reg(brw_imm_v(0x11110000))); >> >> + emit(BRW_OPCODE_ADD, *reg, t1, t2); > > > I believe this works, but it's not what the comment says you're doing. It > looks like you borrowed the comment from blorp, but I think that what you've > actually done is actually a lot simpler (and better) than what blorp does. > Rather than populate a register with (0, 1) and then reading from it using > vstride=1, width=4, hstride=0, you're just populating the register with (0, > 0, 0, 0, 1, 1, 1, 1) directly. Pleas update the comment accordingly. > Yes, last few lines are not applicable here. I'll update the comment.
> For SIMD16 mode, we'll need to do something slightly more complicated, since > we'll need t2 (which will be a pair of registers in SIMD16 mode) to contain > (0, 0, 0, 0, 1, 1, 1, 1) in the first register and (2, 2, 2, 2, 3, 3, 3, 3) > in the second register. I *think* you can do this with something like: > > push_force_uncompressed(); > emit(MOV(t2, fs_reg(brw_imm_v(0x11110000)))); > push_force_sechalf(); > t2.reg_offset++; > emit(MOV(t2, fs_reg(brw_imm_v(0x33332222)))); > pop_force_sechalf(); > pop_force_uncompressed(); > > But please double check in the simulator to make sure it does the right > thing :) > This won't work as it is because t2 is a UW register and it's also a requirement when working with immediate vectors. It looks like what you've done in blorp is the right thing to do here. i.e. populate a register with (0, 1, 2, 3) and then reading from it using vstride=1, width=4, hstride=0. I think I've to create a special FS backend instruction to adjust the stride to (1, 4, 0). >> >> + } >> + else { >> + /* As per GL_ARB_sample_shading specification: >> + * "When rendering to a non-multisample buffer, or if multisample >> + * rasterization is disabled, gl_SampleID will always be zero." >> + */ >> + emit(BRW_OPCODE_MOV, *reg, fs_reg(0)); >> + } >> + >> + return reg; >> +} >> + >> fs_reg >> fs_visitor::fix_math_operand(fs_reg src) >> { >> @@ -2966,7 +3069,13 @@ fs_visitor::setup_payload_gen6() >> c->nr_payload_regs++; >> } >> } >> + >> /* R31: MSAA position offsets. */ >> + if (fp->Base.InputsRead & VARYING_BIT_SAMPLE_POS) { >> + c->sample_pos_reg = c->nr_payload_regs; >> + c->nr_payload_regs++; >> + } >> + >> /* R32-: bary for 32-pixel. */ >> /* R58-59: interp W for 32-pixel. */ >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h >> b/src/mesa/drivers/dri/i965/brw_fs.h >> index c78f9ae..812e9c7 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs.h >> +++ b/src/mesa/drivers/dri/i965/brw_fs.h >> @@ -332,9 +332,12 @@ public: >> glsl_interp_qualifier interpolation_mode, >> bool is_centroid); >> fs_reg *emit_frontfacing_interpolation(ir_variable *ir); >> + fs_reg *emit_samplepos_interpolation(ir_variable *ir); >> + fs_reg *emit_sampleid_interpolation(ir_variable *ir); >> fs_reg *emit_general_interpolation(ir_variable *ir); >> void emit_interpolation_setup_gen4(); >> void emit_interpolation_setup_gen6(); >> + void compute_sample_position(fs_reg dst, fs_reg int_sample_pos); >> fs_reg rescale_texcoord(ir_texture *ir, fs_reg coordinate, >> bool is_rect, int sampler, int texunit); >> fs_inst *emit_texture_gen4(ir_texture *ir, fs_reg dst, fs_reg >> coordinate, >> @@ -432,6 +435,7 @@ public: >> >> struct hash_table *variable_ht; >> fs_reg frag_depth; >> + fs_reg sample_mask; >> fs_reg outputs[BRW_MAX_DRAW_BUFFERS]; >> unsigned output_components[BRW_MAX_DRAW_BUFFERS]; >> fs_reg dual_src_output; >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >> index e659203..75ced1d 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >> @@ -61,9 +61,14 @@ fs_visitor::visit(ir_variable *ir) >> reg = emit_fragcoord_interpolation(ir); >> } else if (!strcmp(ir->name, "gl_FrontFacing")) { >> reg = emit_frontfacing_interpolation(ir); >> + } else if (!strcmp(ir->name, "gl_SamplePosition")) { >> + reg = emit_samplepos_interpolation(ir); >> + } else if (!strcmp(ir->name, "gl_SampleID")) { >> + reg = emit_sampleid_interpolation(ir); >> } else { >> reg = emit_general_interpolation(ir); >> } >> + >> assert(reg); >> hash_table_insert(this->variable_ht, reg, ir); >> return; >> @@ -82,6 +87,8 @@ fs_visitor::visit(ir_variable *ir) >> } >> } else if (ir->location == FRAG_RESULT_DEPTH) { >> this->frag_depth = *reg; >> + } else if (ir->location == FRAG_RESULT_SAMPLE_MASK) { >> + this->sample_mask = *reg; >> } else { >> /* gl_FragData or a user-defined FS output */ >> assert(ir->location >= FRAG_RESULT_DATA0 && >> @@ -2502,6 +2509,22 @@ fs_visitor::emit_fb_writes() >> pop_force_uncompressed(); >> } >> >> + if (fp->Base.OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK)) >> { >> + this->current_annotation = "FB write oMask"; >> + assert(this->sample_mask.file != BAD_FILE); >> + fs_reg reg = fs_reg(this, glsl_type::int_type); >> + >> + /* Hand over gl_SampleMask. Only lower 16 bits are relevant. */ >> + emit(AND(this->sample_mask, this->sample_mask, fs_reg(255))); >> + emit(SHL(reg, this->sample_mask, fs_reg(16))); >> + >> + this->sample_mask.type = BRW_REGISTER_TYPE_UW; >> >> + reg.type = BRW_REGISTER_TYPE_UW; >> + emit(OR(fs_reg(MRF, nr, BRW_REGISTER_TYPE_UW), this->sample_mask, >> reg)); > > > I agree with Ken that this is going to do the wrong thing. You are filling > up all the data in the oMask with values, but you're not filling it with the > right values. You're delivering: > > sample_mask slot 0 -> oMask slot 0 and 1 > sample_mask slot 1 -> oMask slot 2 and 3 > sample_mask slot 2 -> oMask slot 4 and 5 > sample_mask slot 3 -> oMask slot 6 and 7 > and sample_mask slots 4-7 are getting delivered to oMask slots 8-15, which > have no effect in SIMD8 mode. > > That's definitely not what we want. > Yes, I made a false assumption that all the slots will have a same oMask value. But someone may want to use per fragment sample mask and that's the purpose of putting it in a fragment shader. > I believe that what we need can actually be done in a single instruction. > For SIMD8: > > MOV(8) oMask<1>:uw sample_mask<16;8,2>:uw > > and for SIMD16: > > MOV(16) oMask<1>:uw sample_mask<16;8,2>:uw > > To get that to happen you'll need to create a new FS backend opcode, so that > you can adjust the stride on sample_mask. > Thanks for explaining. I needed exactly this information. >> >> + >> + nr += 1; >> + } >> + >> /* Reserve space for color. It'll be filled in per MRT below. */ >> int color_mrf = nr; >> nr += 4 * reg_width; >> diff --git a/src/mesa/drivers/dri/i965/brw_wm.h >> b/src/mesa/drivers/dri/i965/brw_wm.h >> index aa786de..fda74ab 100644 >> --- a/src/mesa/drivers/dri/i965/brw_wm.h >> +++ b/src/mesa/drivers/dri/i965/brw_wm.h >> @@ -83,6 +83,7 @@ struct brw_wm_compile { >> uint8_t source_w_reg; >> uint8_t aa_dest_stencil_reg; >> uint8_t dest_depth_reg; >> + uint8_t sample_pos_reg; >> uint8_t barycentric_coord_reg[BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT]; >> uint8_t nr_payload_regs; >> GLuint source_depth_to_render_target:1; >> -- >> 1.8.1.4 >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev