On Mon, Aug 26, 2013 at 8:59 PM, Marko Ristola <marko.rist...@kolumbus.fi> wrote: > > Hi > > > 15.08.2013 13:54, Marek Olšák wrote: >> >> On Thu, Aug 15, 2013 at 10:27 AM, Christian König >> <deathsim...@vodafone.de> wrote: >>> >>> Am 15.08.2013 05:25, schrieb Marek Olšák: >>> >>>> (This should be applied before MSAA, which will need to be rebased.) >>>> >>>> It moves all sampler view descriptors to a buffer. >>>> It supports partial resource updates and it can also unbind resources >>>> (required for FMASK texturing). >>>> >>>> The buffer contains all sampler view descriptors for one shader stage, >>>> represented as an array. On top of that, there are N arrays in the >>>> buffer, >>>> which are used to emulate context registers as implemented by the >>>> previous >>>> ASICs (each array is a context). >>>> >>>> This uses the RCU synchronization approach to avoid read-after-write >>>> hazards >>>> as discussed in the thread: >>>> "radeonsi: add FMASK texture binding slots and resource setup" >>>> >>>> CP DMA is used to clear the descriptors at context initialization and to >>>> copy >>>> the descriptors from one context to the next. >>>> >>>> IMPORTANT: >>>> 128 resource contexts are needed, 64 doesn't work. If I set >>>> SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are >>>> needed. >>>> I don't have an explanation for this. >>>> --- >>> >>> >>> >>> The idea itself looks really good to me, but we should probably also move >>> the all resources and samplers to the new model and then rip out the code >>> that stores them directly into the IB. >> >> >> I'd like MSAA to land first, but yes, the plan is to eventually move >> all resources and samplers to the new model. >> >>> >>> >>>> +/* Emit a CP DMA packet to do a copy from one buffer to another. >>>> + * The size must fit in bits [20:0]. Notes: >>>> + * >>>> + * 1) Set sync to true if you want the 3D engine to wait until CP DMA >>>> is >>>> done. >>>> + * >>>> + * 2) Set raw_hazard_wait to true if the source data was used as a >>>> destination >>>> + * in a previous CP DMA packet. It's for preventing a >>>> read-after-write >>>> hazard >>>> + * between two CP DMA packets. >>>> + */ >>>> +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx, >>>> + uint64_t dst_va, uint64_t src_va, >>>> + unsigned size, >>>> + bool sync, bool raw_hazard_wait) >>>> +{ >>>> + struct radeon_winsys_cs *cs = rctx->cs; >>>> + uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0; >>>> + uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT : >>>> 0; >>>> + >>>> + assert(size); >>>> + assert((size & ((1<<21)-1)) == size); >>>> + >>>> + cs->buf[cs->cdw++] = PKT3(PKT3_CP_DMA, 4, 0); >>>> + cs->buf[cs->cdw++] = src_va; /* SRC_ADDR_LO >>>> [31:0] */ >>>> + cs->buf[cs->cdw++] = sync_flag | ((src_va >> 32) & 0xff); /* >>>> CP_SYNC [31] | SRC_ADDR_HI [7:0] */ >>>> + cs->buf[cs->cdw++] = dst_va; /* DST_ADDR_LO >>>> [31:0] */ >>>> + cs->buf[cs->cdw++] = (dst_va >> 32) & 0xff; /* DST_ADDR_HI >>>> [7:0] */ >>>> + cs->buf[cs->cdw++] = size | raw_wait; /* COMMAND >>>> [29:22] >>>> | BYTE_COUNT [20:0] */ >>>> +} >>>> + >>>> +/* Emit a CP DMA packet to clear a buffer. The size must fit in bits >>>> [20:0]. */ >>>> +static void si_emit_cp_dma_clear_buffer(struct r600_context *rctx, >>>> + uint64_t dst_va, unsigned size, >>>> + uint32_t clear_value, >>>> + bool sync, bool raw_hazard_wait) >>>> +{ >>>> + struct radeon_winsys_cs *cs = rctx->cs; >>>> + uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0; >>>> + uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT : >>>> 0; >>>> + >>>> + assert(size); >>>> + assert((size & ((1<<21)-1)) == size); >>>> + >>>> + cs->buf[cs->cdw++] = PKT3(PKT3_CP_DMA, 4, 0); >>>> + cs->buf[cs->cdw++] = clear_value; /* DATA [31:0] >>>> */ >>>> + cs->buf[cs->cdw++] = sync_flag | PKT3_CP_DMA_SRC_SEL(2); /* >>>> CP_SYNC [31] | SRC_SEL[30:29] */ >>>> + cs->buf[cs->cdw++] = dst_va; /* DST_ADDR_LO >>>> [31:0] */ >>>> + cs->buf[cs->cdw++] = (dst_va >> 32) & 0xff; /* DST_ADDR_HI >>>> [7:0] */ >>>> + cs->buf[cs->cdw++] = size | raw_wait; /* COMMAND >>>> [29:22] >>>> | BYTE_COUNT [20:0] */ >>>> +} >>> >>> >>> >>> Can we use some kind of macro or inline function instead of >>> "cs->buf[cs->cdw++] " ? That should help of we need to port that over to >>> a >>> different CS mechanism. >> >> >> How about this? >> >> static INLINE void >> r600_write_value(struct radeon_winsys_cs *cs, unsigned value) >> { >> cs->buf[cs->cdw++] = value; >> } >> >> >>> >>> And IIRC the CP DMA is identical on all chipset generation (maybe >>> excluding >>> early R6xx, but I'm not 100% sure of that), so it might be a good idea to >>> start sharing code again by putting this under >>> "src/gallium/drivers/radeon/radeon_cp_dma.c". Not necessary now, but more >>> as >>> a general idea. What do you think? >> >> >> I agree. >> >> CP DMA is indeed identical on all chipsets. The copying is supported >> since R600 and the clearing is supported since Evergreen. > > > Maybe you already thought: One way to emulate clearing is to > copy with CP DMA from a constant cleared memory area.
We emulate clearing with streamout (AKA transform feedback) on r600-r700. The clearing binds the clear value as a vertex buffer with stride=0, so that it's broadcast to all elements in the destination. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev