Hi

15.08.2013 13:54, Marek Olšák wrote:
On Thu, Aug 15, 2013 at 10:27 AM, Christian König
<deathsim...@vodafone.de> wrote:
Am 15.08.2013 05:25, schrieb Marek Olšák:

(This should be applied before MSAA, which will need to be rebased.)

It moves all sampler view descriptors to a buffer.
It supports partial resource updates and it can also unbind resources
(required for FMASK texturing).

The buffer contains all sampler view descriptors for one shader stage,
represented as an array. On top of that, there are N arrays in the buffer,
which are used to emulate context registers as implemented by the previous
ASICs (each array is a context).

This uses the RCU synchronization approach to avoid read-after-write
hazards
as discussed in the thread:
"radeonsi: add FMASK texture binding slots and resource setup"

CP DMA is used to clear the descriptors at context initialization and to
copy
the descriptors from one context to the next.

IMPORTANT:
    128 resource contexts are needed, 64 doesn't work. If I set
    SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are
needed.
    I don't have an explanation for this.
---


The idea itself looks really good to me, but we should probably also move
the all resources and samplers to the new model and then rip out the code
that stores them directly into the IB.

I'd like MSAA to land first, but yes, the plan is to eventually move
all resources and samplers to the new model.



+/* Emit a CP DMA packet to do a copy from one buffer to another.
+ * The size must fit in bits [20:0]. Notes:
+ *
+ * 1) Set sync to true if you want the 3D engine to wait until CP DMA is
done.
+ *
+ * 2) Set raw_hazard_wait to true if the source data was used as a
destination
+ *    in a previous CP DMA packet. It's for preventing a read-after-write
hazard
+ *    between two CP DMA packets.
+ */
+static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx,
+                                      uint64_t dst_va, uint64_t src_va,
+                                      unsigned size,
+                                      bool sync, bool raw_hazard_wait)
+{
+       struct radeon_winsys_cs *cs = rctx->cs;
+       uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
+       uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT :
0;
+
+       assert(size);
+       assert((size & ((1<<21)-1)) == size);
+
+       cs->buf[cs->cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
+       cs->buf[cs->cdw++] = src_va;                    /* SRC_ADDR_LO
[31:0] */
+       cs->buf[cs->cdw++] = sync_flag | ((src_va >> 32) & 0xff); /*
CP_SYNC [31] | SRC_ADDR_HI [7:0] */
+       cs->buf[cs->cdw++] = dst_va;                    /* DST_ADDR_LO
[31:0] */
+       cs->buf[cs->cdw++] = (dst_va >> 32) & 0xff;     /* DST_ADDR_HI
[7:0] */
+       cs->buf[cs->cdw++] = size | raw_wait;           /* COMMAND [29:22]
| BYTE_COUNT [20:0] */
+}
+
+/* Emit a CP DMA packet to clear a buffer. The size must fit in bits
[20:0]. */
+static void si_emit_cp_dma_clear_buffer(struct r600_context *rctx,
+                                       uint64_t dst_va, unsigned size,
+                                       uint32_t clear_value,
+                                       bool sync, bool raw_hazard_wait)
+{
+       struct radeon_winsys_cs *cs = rctx->cs;
+       uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
+       uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT :
0;
+
+       assert(size);
+       assert((size & ((1<<21)-1)) == size);
+
+       cs->buf[cs->cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
+       cs->buf[cs->cdw++] = clear_value;               /* DATA [31:0] */
+       cs->buf[cs->cdw++] = sync_flag | PKT3_CP_DMA_SRC_SEL(2); /*
CP_SYNC [31] | SRC_SEL[30:29] */
+       cs->buf[cs->cdw++] = dst_va;                    /* DST_ADDR_LO
[31:0] */
+       cs->buf[cs->cdw++] = (dst_va >> 32) & 0xff;     /* DST_ADDR_HI
[7:0] */
+       cs->buf[cs->cdw++] = size | raw_wait;           /* COMMAND [29:22]
| BYTE_COUNT [20:0] */
+}


Can we use some kind of macro or inline function instead of
"cs->buf[cs->cdw++] " ? That should help of we need to port that over to a
different CS mechanism.

How about this?

static INLINE void
r600_write_value(struct radeon_winsys_cs *cs, unsigned value)
{
     cs->buf[cs->cdw++] = value;
}



And IIRC the CP DMA is identical on all chipset generation (maybe excluding
early R6xx, but I'm not 100% sure of that), so it might be a good idea to
start sharing code again by putting this under
"src/gallium/drivers/radeon/radeon_cp_dma.c". Not necessary now, but more as
a general idea. What do you think?

I agree.

CP DMA is indeed identical on all chipsets. The copying is supported
since R600 and the clearing is supported since Evergreen.

Maybe you already thought: One way to emulate clearing is to
copy with CP DMA from a constant cleared memory area.

Regards,
Marko Ristola


Marek
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to