op 15-05-14 15:19, Christian K?nig schreef: > Am 15.05.2014 15:04, schrieb Maarten Lankhorst: >> op 15-05-14 11:42, Christian K?nig schreef: >>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst: >>>> op 15-05-14 11:21, Christian K?nig schreef: >>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst: >>>>>> op 14-05-14 17:29, Christian K?nig schreef: >>>>>>>> + /* did fence get signaled after we enabled the sw irq? */ >>>>>>>> + if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >>>>>>>> >= fence->seq) { >>>>>>>> + radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring); >>>>>>>> + return false; >>>>>>>> + } >>>>>>>> + >>>>>>>> + fence->fence_wake.flags = 0; >>>>>>>> + fence->fence_wake.private = NULL; >>>>>>>> + fence->fence_wake.func = radeon_fence_check_signaled; >>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake); >>>>>>>> + fence_get(f); >>>>>>> That looks like a race condition to me. The fence needs to be added to >>>>>>> the wait queue before the check, not after. >>>>>>> >>>>>>> Apart from that the whole approach looks like a really bad idea to me. >>>>>>> How for example is lockup detection supposed to happen with this? >>>>>> It's not a race condition because fence_queue.lock is held when this >>>>>> function is called. >>>>> Ah, I see. That's also the reason why you moved the wake_up_all out of >>>>> the processing function. >>>> Correct. :-) >>>>>> Lockup's a bit of a weird problem, the changes wouldn't allow core ttm >>>>>> code to handle the lockup any more, >>>>>> but any driver specific wait code would still handle this. I did this by >>>>>> design, because in future patches the wait >>>>>> function may be called from outside of the radeon driver. The official >>>>>> wait function takes a timeout parameter, >>>>>> so lockups wouldn't be fatal if the timeout is set to something like >>>>>> 30*HZ for example, it would still return >>>>>> and report that the function timed out. >>>>> Timeouts help with the detection of the lockup, but not at all with the >>>>> handling of them. >>>>> >>>>> What we essentially need is a wait callback into the driver that is >>>>> called in non atomic context without any locks held. >>>>> >>>>> This way we can block for the fence to become signaled with a timeout and >>>>> can then also initiate the reset handling if necessary. >>>>> >>>>> The way you designed the interface now means that the driver never gets a >>>>> chance to wait for the hardware to become idle and so never has the >>>>> opportunity to the reset the whole thing. >>>> You could set up a hangcheck timer like intel does, and end up with a >>>> reliable hangcheck detection that doesn't depend on cpu waits. :-) Or >>>> override the default wait function and restore the old behavior. >>> >>> Overriding the default wait function sounds better, please implement it >>> this way. >>> >>> Thanks, >>> Christian. >> >> Does this modification look sane? > Adding the timeout is on my todo list for quite some time as well, so this > part makes sense. > >> +static long __radeon_fence_wait(struct fence *f, bool intr, long timeout) >> +{ >> + struct radeon_fence *fence = to_radeon_fence(f); >> + u64 target_seq[RADEON_NUM_RINGS] = {}; >> + >> + target_seq[fence->ring] = fence->seq; >> + return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, >> timeout); >> +} > When this call is comming from outside the radeon driver you need to lock > rdev->exclusive_lock here to make sure not to interfere with a possible reset. Ah thanks, I'll add that.
>> .get_timeline_name = radeon_fence_get_timeline_name, >> .enable_signaling = radeon_fence_enable_signaling, >> .signaled = __radeon_fence_signaled, > Do we still need those callback when we implemented the wait callback? .get_timeline_name is used for debugging (trace events). .signaled is the non-blocking call to check if the fence is signaled or not. .enable_signaling is used for adding callbacks upon fence completion, the default 'fence_default_wait' uses it, so when it works no separate implementation is needed unless you want to do more than just waiting. It's also used when fence_add_callback is called. i915 can be patched to use it. ;-) ~Maarten