On Tue, Oct 15, 2013 at 5:42 PM, Anuj Phogat <anuj.pho...@gmail.com> wrote: > On Tue, Oct 15, 2013 at 1:49 PM, Kenneth Graunke <kenn...@whitecape.org> > wrote: >> On 10/14/2013 10:12 AM, Anuj Phogat wrote: >>> Implement the FS backend for new builtins added by the extension: >>> in vec2 gl_SamplePosition >>> in int gl_SampleID >>> in int gl_NumSamples >>> out int gl_SampleMask[] >>> >>> Signed-off-by: Anuj Phogat <anuj.pho...@gmail.com> >>> --- >>> src/mesa/drivers/dri/i965/brw_fs.cpp | 109 >>> +++++++++++++++++++++++++++ >>> src/mesa/drivers/dri/i965/brw_fs.h | 4 + >>> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 23 ++++++ >>> src/mesa/drivers/dri/i965/brw_wm.h | 1 + >>> 4 files changed, 137 insertions(+) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >>> b/src/mesa/drivers/dri/i965/brw_fs.cpp >>> index e5d6e4b..e4f7745 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >>> @@ -1115,6 +1115,109 @@ >>> fs_visitor::emit_frontfacing_interpolation(ir_variable *ir) >>> return reg; >>> } >>> >>> +void >>> +fs_visitor::compute_sample_position(fs_reg dst, fs_reg int_sample_pos) >>> +{ >>> + int num_samples = ctx->DrawBuffer->Visual.samples; >>> + assert(num_samples >= 0 && num_samples <= 8); >>> + >>> + /* From arb_sample_shading specification: >> >> "From the ARB_sample_shading specification:" >> >> (when I search for extension names in code, I capitalize ARB/EXT/NV, so >> this would make a case-sensitive search work) >> >>> + * "When rendering to a non-multisample buffer, or if multisample >>> + * rasterization is disabled, gl_SamplePosition will always be >>> + * (0.5, 0.5). >>> + */ >>> + if (!ctx->Multisample.Enabled || num_samples == 0) { >>> + emit(BRW_OPCODE_MOV, dst, fs_reg(0.5f)); >>> + } >>> + else { >> >> } else { >> >>> + /* For num_samples = {4, 8} */ >>> + emit(BRW_OPCODE_MOV, dst, int_sample_pos); >>> + emit(BRW_OPCODE_MUL, dst, dst, fs_reg(1 / 16.0f)); >> >> Ah, I see...the MOV here converts the sample position from <D> to <F>, >> which is necessary for the next multiply. >> >> Adding an assertion would probably make this a little bit more obvious: >> assert(dst.type == BRW_REGISTER_TYPE_F); >> > right. >>> + } >>> +} >>> + >>> +fs_reg * >>> +fs_visitor::emit_samplepos_interpolation(ir_variable *ir) >>> +{ >>> + assert(brw->gen >= 6); >>> + >>> + this->current_annotation = "compute sample position"; >>> + fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type); >>> + fs_reg pos = *reg; >>> + fs_reg int_sample_x = fs_reg(this, glsl_type::int_type); >>> + fs_reg int_sample_y = fs_reg(this, glsl_type::int_type); >>> + >>> + /* WM will be run in MSDISPMODE_PERSAMPLE. So, only SIMD8 mode will be >> >> Looking at the docs for 3DSTATE_PS, I believe that SIMD16 is allowed in >> some cases... >> >> SNB: Must be SIMD8 only. >> IVB: >> - 4x MSAA: Either SIMD8 or SIMD16 should work. >> - 8x MSAA: Must be SIMD8 only. >> >> Maybe I'm crazy, though...I haven't looked into it too closely... >> > Yes, you're right. I should have explained it here that I've chosen > to use 'SIMD8 only' for 4x and 8x MSAA on both SNB and IVB. > I decided to use SIMD8 over SIMD16 in case of 4x MSAA after > reading following lines in IVB PRM vol 2, part 1, page 330: > "Object Size: If the number of very small objects (e.g., covering 2 > subspans or fewer) is expected to comprise a significant portion > of the workload, supporting the 8-pixel dispatch mode may be > advantageous Otherwise there could be a large number of 16-pixel > dispatches with only 1 or 2 valid subspans, resulting in low > efficiency for those threads" > > Do you have other reasons to suggest using SIMD16 for 4x MSAA > on IVB? >>> + * enabled. The X, Y sample positions come in as bytes in thread >>> payload. >>> + * Sample IDs and sample positions remain same for all four slots in a >> >> It would be great to have a documentation reference here... >> > I'll add the reference. >>> + * subspan. So, read the positions using vstride=2, width=4, hstride=0. >>> + */ >>> + emit(BRW_OPCODE_AND, int_sample_x, >>> + fs_reg(stride(retype(brw_vec1_grf(c->sample_pos_reg, 0), >>> + BRW_REGISTER_TYPE_D), 2, 4, 0)), >>> + fs_reg(brw_imm_d(0xff))); >>> + >>> + /* Compute gl_SamplePosition.x */ >>> + compute_sample_position(pos, int_sample_x); >>> + pos.reg_offset++; >>> + >>> + emit(BRW_OPCODE_SHR, int_sample_y, >>> + fs_reg(stride(retype(brw_vec1_grf(c->sample_pos_reg, 0), >>> + BRW_REGISTER_TYPE_D), 2, 4, 0)), >>> + fs_reg(8)); >>> + emit(BRW_OPCODE_AND, int_sample_y, int_sample_y, >>> fs_reg(brw_imm_d(0xff))); >>> + >>> + /* Compute gl_SamplePosition.y */ >>> + compute_sample_position(pos, int_sample_y); >>> + return reg; >>> +} >>> + >>> +fs_reg * >>> +fs_visitor::emit_sampleid_interpolation(ir_variable *ir) >> >> This...isn't exactly interpolation. Maybe call it emit_sampleid_setup? >> > ok. I'll also change the name of emit_samplepos_interpolation() function. >>> +{ >>> + assert(brw->gen >= 6); >>> + bool multisampled_fbo = ctx->DrawBuffer->Visual.samples > 1; >>> + >>> + this->current_annotation = "compute sample id"; >>> + fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type); >>> + >>> + if (multisampled_fbo && ctx->Multisample.Enabled) { >>> + fs_reg t1 = fs_reg(this, glsl_type::int_type); >>> + fs_reg t2 = fs_reg(this, glsl_type::int_type); >>> + t2.type = BRW_REGISTER_TYPE_UW; >>> + >>> + /* The WM will be run in MSDISPMODE_PERSAMPLE with num_samples = 8. >> >> With num_samples = 8? >> >> Also, could you please say "The PS will be run in MSDISPMODE_PERSAMPLE >> mode"? On Gen7+, the docs begin referring to the Pixel Shader >> separately from the Windowizer/Masker, and technically, this is PS state >> and the PS thread payload. Minor nit :) >> > Language of the sentence is not clear. I'll change it to: > "The PS will be run in MSDISPMODE_PERSAMPLE. With 8x multisampling > ...." > >>> + * Therefore, subspan 0 will represent sample N (where N is 0, 2, 4 >>> + * or 6), subspan 1 will represent sample 1, 3, 5 or 7. We can find >>> + * the value of N by looking at R0.0 bits 7:6 ("Starting Sample Pair >>> + * Index (SSPI)") and multiplying by two (since samples are always >>> + * delivered in pairs). That is, we compute 2*((R0.0 & 0xc0) >> 6) >>> + * == (R0.0 & 0xc0) >> 5. >>> + * >>> + * Then we need to add N to the sequence (0, 0, 0, 0, 1, 1, 1, 1), >>> + * which we compute by populating a temporary variable with the >>> + * sequence (0, 1), and then reading from it using vstride=1, >>> + * width=4, hstride=0. >>> + * Same holds true for num_samples = 4. >> >> It sounds like num_samples could actually be 4, so you might want to >> adjust your comment... >> > I'll change it to: > " These computations hold good for 4x multisampling as well." >>> + */ >>> + emit(BRW_OPCODE_AND, t1, >>> + fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)), >>> + fs_reg(brw_imm_d(0xc0))); >>> + emit(BRW_OPCODE_SHR, t1, t1, fs_reg(5)); >>> + emit(BRW_OPCODE_MOV, t2, fs_reg(brw_imm_v(0x11110000))); >>> + emit(BRW_OPCODE_ADD, *reg, t1, t2); >> >> This looks right to me. >> >>> + } >>> + else { >> >> } else { >> >>> + /* As per GL_ARB_sample_shading specification: >>> + * "When rendering to a non-multisample buffer, or if multisample >>> + * rasterization is disabled, gl_SampleID will always be zero." >>> + */ >>> + emit(BRW_OPCODE_MOV, *reg, fs_reg(0)); >>> + } >>> + >>> + return reg; >>> +} >>> + >>> fs_reg >>> fs_visitor::fix_math_operand(fs_reg src) >>> { >>> @@ -2966,7 +3069,13 @@ fs_visitor::setup_payload_gen6() >>> c->nr_payload_regs++; >>> } >>> } >>> + >>> /* R31: MSAA position offsets. */ >>> + if (fp->Base.InputsRead & VARYING_BIT_SAMPLE_POS) { >>> + c->sample_pos_reg = c->nr_payload_regs; >>> + c->nr_payload_regs++; >>> + } >>> + >>> /* R32-: bary for 32-pixel. */ >>> /* R58-59: interp W for 32-pixel. */ >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h >>> b/src/mesa/drivers/dri/i965/brw_fs.h >>> index c78f9ae..812e9c7 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs.h >>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h >>> @@ -332,9 +332,12 @@ public: >>> glsl_interp_qualifier interpolation_mode, >>> bool is_centroid); >>> fs_reg *emit_frontfacing_interpolation(ir_variable *ir); >>> + fs_reg *emit_samplepos_interpolation(ir_variable *ir); >>> + fs_reg *emit_sampleid_interpolation(ir_variable *ir); >>> fs_reg *emit_general_interpolation(ir_variable *ir); >>> void emit_interpolation_setup_gen4(); >>> void emit_interpolation_setup_gen6(); >>> + void compute_sample_position(fs_reg dst, fs_reg int_sample_pos); >>> fs_reg rescale_texcoord(ir_texture *ir, fs_reg coordinate, >>> bool is_rect, int sampler, int texunit); >>> fs_inst *emit_texture_gen4(ir_texture *ir, fs_reg dst, fs_reg >>> coordinate, >>> @@ -432,6 +435,7 @@ public: >>> >>> struct hash_table *variable_ht; >>> fs_reg frag_depth; >>> + fs_reg sample_mask; >>> fs_reg outputs[BRW_MAX_DRAW_BUFFERS]; >>> unsigned output_components[BRW_MAX_DRAW_BUFFERS]; >>> fs_reg dual_src_output; >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >>> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >>> index e659203..75ced1d 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >>> @@ -61,9 +61,14 @@ fs_visitor::visit(ir_variable *ir) >>> reg = emit_fragcoord_interpolation(ir); >>> } else if (!strcmp(ir->name, "gl_FrontFacing")) { >>> reg = emit_frontfacing_interpolation(ir); >>> + } else if (!strcmp(ir->name, "gl_SamplePosition")) { >>> + reg = emit_samplepos_interpolation(ir); >>> + } else if (!strcmp(ir->name, "gl_SampleID")) { >>> + reg = emit_sampleid_interpolation(ir); >>> } else { >>> reg = emit_general_interpolation(ir); >>> } >>> + >>> assert(reg); >>> hash_table_insert(this->variable_ht, reg, ir); >>> return; >>> @@ -82,6 +87,8 @@ fs_visitor::visit(ir_variable *ir) >>> } >>> } else if (ir->location == FRAG_RESULT_DEPTH) { >>> this->frag_depth = *reg; >>> + } else if (ir->location == FRAG_RESULT_SAMPLE_MASK) { >>> + this->sample_mask = *reg; >>> } else { >>> /* gl_FragData or a user-defined FS output */ >>> assert(ir->location >= FRAG_RESULT_DATA0 && >>> @@ -2502,6 +2509,22 @@ fs_visitor::emit_fb_writes() >>> pop_force_uncompressed(); >>> } >>> >>> + if (fp->Base.OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK)) { >>> + this->current_annotation = "FB write oMask"; >>> + assert(this->sample_mask.file != BAD_FILE); >>> + fs_reg reg = fs_reg(this, glsl_type::int_type); >>> + >>> + /* Hand over gl_SampleMask. Only lower 16 bits are relevant. */ >>> + emit(AND(this->sample_mask, this->sample_mask, fs_reg(255))); >> >> If the lower 16 bits are relevant, then why are you masking off all but >> the lower 8 bits? Today, we only support 8x MSAA, but we may want 16x >> for future hardware. >> >> I think I'd prefer to see fs_reg(0xffff), unless there's some reason why >> that doesn't work. >> > Yes, I should use 0xffff here. Thanks for catching it. > >>> + emit(SHL(reg, this->sample_mask, fs_reg(16))); >> >> I'm having a hard time understanding why you need to shift by 16 here. >> >> My understanding is that gl_SampleMask[1] is single 32-bit integer, >> where bit N == 0 means that sample N is uncovered. The oMask payload >> has the same meaning, but is only a 16-bit integer (since we don't >> support 32x MSAA). >> >> Since we run in SIMD8 mode, sample_mask actually contains 8 integers (or >> 16 for SIMD16) - one for each pixel. oMask also contains 16 "slots" >> (one per pixel - slots 8-15 are unused for SIMD8). >> >> So sample_mask looks like: >> 7 6 5 4 3 2 1 0 >> hhhhxxxxhhhhxxxxhhhhxxxxhhhhxxxxhhhhxxxxhhhhxxxxhhhhxxxxhhhhxxxx >> >> Masking off the top 16-bits gives: >> 0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx >> >> The oMask payload is packed, and looks like: >> ________________________________xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx >> >> The SHL should produce: >> xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000 >> > We enable both SIMD8 and SIMD16 mode with gl_SampleMask[] because > it doesn't require MSDISPMODE_PERSAMPLE. > > for SIMD8 > sample_mask looks like: > 7 6 5 4 3 2 1 0 > hhhhxxxxhhhhxxxxhhhhxxxxhhhhxxxxhhhhxxxxhhhhxxxxhhhhxxxxhhhhxxxx > > Masking off the top 16-bits gives: > 0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx > > SHL will produce: > xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000xxxx0000 > > OR will produce: > xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > Relevant data for oMask paylaod is: > ________________________________xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > rest of the data is ignored. > Let me know if I'm making some crazy mistake here. > Ken, did this help you to understand what I'm trying to do here?
>>> + >>> + this->sample_mask.type = BRW_REGISTER_TYPE_UW; >>> + reg.type = BRW_REGISTER_TYPE_UW; >>> + emit(OR(fs_reg(MRF, nr, BRW_REGISTER_TYPE_UW), this->sample_mask, >>> reg)); >> >> ...which you then OR into an uninitialized MRF (AFAICT), and...I don't >> think this is going to work. > This seems to work with the piglit test case (builtin-gl-sample-mask.cpp) > which I developed for gl_SampleMask. I also verified the register data. > >> >> I'm pretty sure you want MOV, not OR, and you want some sort of special >> stride on the sample_mask source register. I know we can specify stride if we have a brw_reg :- fs_reg (stride(brw_vec1_grf(0, 0), 2, 4, 0)); But I could not find a reliable way to use stride if we have a fs_reg. That's why I used OR to get the desired result. >>> + nr += 1; >>> + } >>> + >>> /* Reserve space for color. It'll be filled in per MRT below. */ >>> int color_mrf = nr; >>> nr += 4 * reg_width; >>> diff --git a/src/mesa/drivers/dri/i965/brw_wm.h >>> b/src/mesa/drivers/dri/i965/brw_wm.h >>> index aa786de..fda74ab 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_wm.h >>> +++ b/src/mesa/drivers/dri/i965/brw_wm.h >>> @@ -83,6 +83,7 @@ struct brw_wm_compile { >>> uint8_t source_w_reg; >>> uint8_t aa_dest_stencil_reg; >>> uint8_t dest_depth_reg; >>> + uint8_t sample_pos_reg; >>> uint8_t barycentric_coord_reg[BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT]; >>> uint8_t nr_payload_regs; >>> GLuint source_depth_to_render_target:1; >>> >> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev