On 1 December 2017 at 20:49, Nicolai Hähnle <nhaeh...@gmail.com> wrote: > On 01.12.2017 06:06, Dave Airlie wrote: >> >> From: Dave Airlie <airl...@redhat.com> >> >> On Cayman we don't use the append/consume counters (fglrx doesn't) >> and they don't seem to work well with compute shaders. >> >> This just uses GDS instead to do the atomic operations. > > > Interesting. This is kind of what I'd have expected to be used from the > beginning at least for GCN. > > Don't you still need to use an EOS event for proper synchronization? I mean, > I guess you looked at fglrx traces, but still... CP_DMA definitely isn't > waiting for shaders on newer hardware, and I don't know why it would do that > on older hardware. > > FWIW, I don't have the packet specification for pre-GCN hardware here, but > on GCN it should be: > > radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOS, 3, 0) | pkt_flags); > radeon_emit(cs, EVENT_TYPE(event) | EVENT_INDEX(6)); > radeon_emit(cs, (dst_offset) & 0xffffffff); > radeon_emit(cs, (1 << 29) | ((dst_offset >> 32) & 0xffff)); > radeon_emit(cs, (gds_index & 0xffff) | (num_dwords << 16)); > > to copy GDS data to memory at EOS.
My guess is WRITE_EOS is broken on cayman for compute shaders, hence why they don't use it. I'll dump some non-compute atomics to make sure they don't use it there either. It at least appears the GDS append/consume counters work for non-compute shaders (hence why I didn't notice this earlier), but when it comes to compute they failed. I've no sign of fglrx using EVENT_WRITE_EOS on cayman traces, which leads me to suspect I just have to flush hard before CP_DMA. 0xc0031503 // PKT3 0x15 4dw: COMPUTE 0x00000001 0x00000001 0x00000001 0x00000001 0xc0004600 // PKT3 0x46 1dw: 0x00000006 0xc0004600 // PKT3 0x46 1dw: 0x00000410 0xc0004600 // PKT3 0x46 1dw: 0x00000407 0xc0016800 // PKT3 0x68 2dw: 0x00000363 // CFG_OFFSET 0x00008d8c 0x00000100 // 0x00008d8c SQ_DYN_GPR_CNTL_PS_FLUSH_REQ 0xc0034300 // PKT3 0x43 4dw: 0x80107ffc 0xffffffff 0x00000000 0x00000004 0xc0016900 // PKT3 0x69 2dw: 0x00000290 // CTX_OFFSET 0x00028a40 0x00000000 // 0x00028a40 VGT_GS_MODE 0xc0016900 // PKT3 0x69 2dw: 0x000002d5 // CTX_OFFSET 0x00028b54 0x00000000 // 0x00028b54 VGT_SHADER_STAGES_EN 0xc0016900 // PKT3 0x69 2dw: 0x000001ba // CTX_OFFSET 0x000286e8 0x00000000 // 0x000286e8 SPI_COMPUTE_INPUT_CNTL 0xc0056900 // PKT3 0x69 6dw: 0x000001be // CTX_OFFSET 0x000286f8 0x00000000 // 0x000286f8 ?? 0x0000ffff // 0x000286fc ?? 0x00000000 // 0x00028700 ?? 0x00000000 // 0x00028704 ?? 0x00000000 // 0x00028708 ?? 0xc0004600 // PKT3 0x46 1dw: 0x00000407 0xc0004600 // PKT3 0x46 1dw: 0x00000407 0xc0044102 // PKT3 0x41 5dw: COMPUTE 0x00000004 0xa0000000 0x7f1d0000 0x000000f8 0x04000004 0xc0044102 // PKT3 0x41 5dw: COMPUTE 0x00000000 0xa0000000 0x7f1d0004 0x000000f8 0x04000004 is the command stream from fglrx, it executes the dispatch, does a CACHE_FLUSH, PS_PARTIAL_FLUSH, CS_PARTIAL_FLUSH, resets DYN_GPR_PS_FLUSH_REQ, then does a SURFACE_SYNC, resets a bunch of registers, does another couple of CS_PARTIAL_FLUSHES./ However fglrx definitely does some things different, as we have this DEALLOC_STATE workaround and they never do it, which means they must construct something else differently in kernel or a lot earlier in userspace, if I do all those flushes directly after dispatch I hang unless I call the DEALLOC_STATE pkt. Dave. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev