Ben, I'm looking at a way to make TTM memory management asynchronous with the CPU. The idea is that you should basically be able to DMA data to and from memory regions without waiting for idle, as long as the GPU has a means to provide operation ordering.
While doing that I looked a bit at the Nouveau fencing. It appears like waiting for fences is polling only (no irq to signal fences)? Is that correct? /Thomas