Hello, When profiling my workload on an AMD E-350 (PALM GPU) to see why it still wasn't performing well with Jerome's WIP macrotiling patches, I noticed that r600_fence_finish was taking 10% of my CPU time. I determined experimentally that changing from sched_yield() to os_time_sleep(10) fixed this and resolved my last performance issue on AMD Fusion as compared to Intel Atom, but felt that this was hacky.
I've therefore tried to use INT_SEL of 0b10 in the EVENT_WRITE_EOP in Mesa, combined with a new ioctl to wait for a changed value, but it's not working the way I would expect. I'll be sending patches as replies to this message, so that you can see exactly what I've done, but in brief, I have an ioctl that uses wait_event to wait for a chosen offset in a BO to change value. I've added a suitable waitqueue, and made radeon_fence_process call wake_up_all. I'm seeing behaviour from this that I can't explain; as you'll see in the patches, I've moved some IRQ prints from DRM_DEBUG to printk(KERN_INFO), and I'm seeing that I don't get the EOP interrupt in a timely fashion - either because memory is not as coherent between the GPU and CPU as I would like (so I'm reading stale data when I call wait_event), or because the interrupt is genuinely delayed.