On 11/28/2010 05:11 PM, Francisco Jerez wrote: > Francisco Jerez<currojerez at riseup.net> writes: > > >> Thomas Hellstrom<thomas at shipmail.org> writes: >> >> >>> Ben, >>> >>> I'm looking at a way to make TTM memory management asynchronous with >>> the CPU. The idea is that you should basically be able to DMA data to >>> and from memory regions without waiting for idle, as long as the GPU >>> has a means to provide operation ordering. >>> >>> >> Sounds good. I guess you're mainly dealing with BO eviction >> synchronization? The only problem I see on our side is that calls to our >> move() hook aren't guaranteed to be carried out in order (because of the >> multiple hardware channels). I'm thinking that move() could be extended >> with an optional sync_obj argument, that way move() would be able to >> make sure that evictions are strictly ordered with respect to the fence >> specified. >> The way evictions will work is that they appear to take place "instantly", but are scheduled on a channel, and there will be a data structure that keeps track about what fences need to be signaled before a managed area can be reused.
The driver will need to provide a function that, given a list of fences, returns a fence that when it signals, guarantees that all other fences in the list have signaled. Single-channel hardware will just return the fence with the highest sequence. Multi-channel hardware may need to insert command stream barriers if available and create a new sync object to return or resort to simply waiting to determine which fence signals last. I guess Nouveau can do command stream barriers, (waiting for other channels to reach a certain command before progressing?) Needless to say, drivers need not activate async operation if they don't want to, but for single-channel hardware it will hopefully be very simple. >> >>> While doing that I looked a bit at the Nouveau fencing. It appears >>> like waiting for fences is polling only (no irq to signal fences)? Is >>> that correct? >>> >>> >> That's right, nvidia hardware has no nice way to schedule a fence-like >> interrupt we could selectively turn on and off around the sync_obj_wait >> hook. There's a bunch of (more or less) chipset-specific hacks that >> could be used to get an equivalent effect, but polling has seemed good >> enough so far (in the typical case we only take the "lazy" path so CPU >> usage is still OK). >> Indeed, I saw the same with unichromes. lazy for throttling and not lazy for other waits, although I ended up with a hrtimer polling loop in the non-lazy case, since software fallbacks tended to eat a lot of CPU while waiting for buffer idle. /Thomas