On 11/28/2010 10:55 PM, Francisco Jerez wrote: > Thomas Hellstrom<thomas at shipmail.org> writes: > > >> On 11/28/2010 05:11 PM, Francisco Jerez wrote: >> >>> Francisco Jerez<currojerez at riseup.net> writes: >>> >>> >>> >>>> Thomas Hellstrom<thomas at shipmail.org> writes: >>>> >>>> >>>> >>>>> Ben, >>>>> >>>>> I'm looking at a way to make TTM memory management asynchronous with >>>>> the CPU. The idea is that you should basically be able to DMA data to >>>>> and from memory regions without waiting for idle, as long as the GPU >>>>> has a means to provide operation ordering. >>>>> >>>>> >>>>> >>>> Sounds good. I guess you're mainly dealing with BO eviction >>>> synchronization? The only problem I see on our side is that calls to our >>>> move() hook aren't guaranteed to be carried out in order (because of the >>>> multiple hardware channels). I'm thinking that move() could be extended >>>> with an optional sync_obj argument, that way move() would be able to >>>> make sure that evictions are strictly ordered with respect to the fence >>>> specified. >>>> >>>> >> The way evictions will work is that they appear to take place >> "instantly", but are scheduled on a channel, and there will be a data >> structure that keeps track about what fences need to be signaled >> before a managed area can be reused. >> >> The driver will need to provide a function that, given a list of >> fences, returns a fence that when it signals, guarantees that all >> other fences in the list have signaled. >> > Ah, so, evictions made in response to ttm_bo_mem_force_space() are still > going to be synchronous after the changes you have in mind (because in > that case you need to reuse the freed memory immediately), right? > No and yes. Evictions will be asynchronous, but the new user of the memory area needs to take appropriate action to make sure it doesn't overwrite old contents. If it's a CPU upload, it needs to wait on a fence. Single-channel GPU with dma uploads needs to do nothing. Multi-channel GPU needs to insert a barrier before uploading, that waits on the eviction DMA.
So you're right in that we need to give the new move function information on what to wait on / insert barriers for. I was initially thinking of a single fence object (and that's why the order function is needed). > In other cases (e.g. evictions triggered by BO validation), what exactly > would we gain from this function? I mean, why can't we just push waiting > down to ttm_bo_move_ttm/memcpy? > That's essentially what's going to happen, but those functions also need to know what exactly to wait on. > >> Single-channel hardware will just return the fence with the highest >> sequence. Multi-channel hardware may need to insert command stream >> barriers if available and create a new sync object to return or resort >> to simply waiting to determine which fence signals last. >> >> I guess Nouveau can do command stream barriers, (waiting for other >> channels to reach a certain command before progressing?) >> >> > Yep, that's what nouveau_fence_sync() does. > OK, thanks. /Thomas