Am 18.08.2014 um 17:28 schrieb Maarten Lankhorst: > Op 18-08-14 om 17:12 schreef Christian K?nig: >> Am 18.08.2014 um 16:45 schrieb Maarten Lankhorst: >>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com> >>> --- >>> V1 had a nasty bug breaking gpu lockup recovery. The fix is not >>> allowing radeon_fence_driver_check_lockup to take exclusive_lock, >>> and kill it during lockup recovery instead. >>> V2 used delayed work that ran during lockup recovery, but required >>> read lock. I've fixed this by downgrading the write, and retrying if >>> recovery fails. >>> --- >>> drivers/gpu/drm/radeon/radeon.h | 2 + >>> drivers/gpu/drm/radeon/radeon_fence.c | 115 >>> +++++++++++++++++----------------- >>> 2 files changed, 61 insertions(+), 56 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/radeon/radeon.h >>> b/drivers/gpu/drm/radeon/radeon.h >>> index 1d806983ec7b..29504efe8971 100644 >>> --- a/drivers/gpu/drm/radeon/radeon.h >>> +++ b/drivers/gpu/drm/radeon/radeon.h >>> @@ -355,6 +355,8 @@ struct radeon_fence_driver { >>> uint64_t sync_seq[RADEON_NUM_RINGS]; >>> atomic64_t last_seq; >>> bool initialized; >>> + struct delayed_work fence_check_work; >>> + struct radeon_device *rdev; >> Put the reference to the device as the first field in the structure. >> >>> }; >>> struct radeon_fence { >>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c >>> b/drivers/gpu/drm/radeon/radeon_fence.c >>> index 913787085dfa..94eca53d99f8 100644 >>> --- a/drivers/gpu/drm/radeon/radeon_fence.c >>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c >>> @@ -125,16 +125,7 @@ int radeon_fence_emit(struct radeon_device *rdev, >>> return 0; >>> } >>> -/** >>> - * radeon_fence_process - process a fence >>> - * >>> - * @rdev: radeon_device pointer >>> - * @ring: ring index the fence is associated with >>> - * >>> - * Checks the current fence value and wakes the fence queue >>> - * if the sequence number has increased (all asics). >>> - */ >>> -void radeon_fence_process(struct radeon_device *rdev, int ring) >>> +static bool __radeon_fence_process(struct radeon_device *rdev, int ring) >> Don't use "__" for internal radeon function names and especially don't >> remove the function documentation. > I've moved the documentation to the new radeon_fence_process function that > calls the __radeon_fence_process one, > which is an internal function that is only used by the driver. I guess I can > rename it to __radeon_fence_process_nowake or something instead, > but documentation doesn't make sense since it's not used outside of > radeon_fence.c
We try to document even static functions in doxygen style even if you can't see them outside the C file. >>> { >>> uint64_t seq, last_seq, last_emitted; >>> unsigned count_loop = 0; >>> @@ -190,7 +181,53 @@ void radeon_fence_process(struct radeon_device *rdev, >>> int ring) >>> } >>> } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq); >>> - if (wake) >>> + if (seq < last_emitted && !rdev->in_reset) >>> + mod_delayed_work(system_power_efficient_wq, >>> + &rdev->fence_drv[ring].fence_check_work, >>> + RADEON_FENCE_JIFFIES_TIMEOUT); >> Am I wrong or do you queue the work only after radeon_fence_process is >> called for the first time? >> >> Might be a good idea to have an explicit queue_delayed_work in >> radeon_fence_emit as well. > Yeah might as well, with these changes it makes sense to run it as soon as > possible. > > But if there are no waiters, there's no real need. It's a tree falling in a > forest with no-one around to hear it. ;-) That's why I suggested to schedule the work item only when the IRQ is enabled for waiting, otherwise it indeed doesn't make much sense. Christian.