[PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
> +/* The 64-bit ABI is the authoritative version. */ > +#pragma pack(push, 8) > + Don't do this, pad and align things explicitly in structs. > +struct kfd_ioctl_create_queue_args { > + uint64_t ring_base_address; /* to KFD */ > + uint32_t ring_size; /* to KFD */ > + uint32_t gpu_id;/* to KFD */ > + uint32_t queue_type;/* to KFD */ > + uint32_t queue_percentage; /* to KFD */ > + uint32_t queue_priority;/* to KFD */ > + uint64_t write_pointer_address; /* to KFD */ > + uint64_t read_pointer_address; /* to KFD */ > + > + uint64_t doorbell_address; /* from KFD */ > + uint32_t queue_id; /* from KFD */ > +}; > + maybe put all the uint64_t at the start, or add explicit padding. Dave.
[Intel-gfx] [v3 09/13] drm/i915: Add rotation property for sprites
On Tue, Jul 08, 2014 at 10:31:59AM +0530, sonika.jindal at intel.com wrote: > From: Ville Syrj?l? > > Sprite planes support 180 degree rotation. The lower layers are now in > place, so hook in the standard rotation property to expose the feature > to the users. > > v2: Moving rotation_property to drm_plane > > Cc: dri-devel at lists.freedesktop.org > Signed-off-by: Ville Syrj?l? > Signed-off-by: Sonika Jindal > Reviewed-by: Imre Deak Also this r-b tag was for v1 (which was ok), not for v2. If you carry over such a review-by tag and make functional changes not discussed with the reviewer you _must_ at least mark the r-b with a (v1) or if it's a big change, drop the tag completely. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
[Intel-gfx] [v3 09/13] drm/i915: Add rotation property for sprites
On Tue, Jul 08, 2014 at 10:31:59AM +0530, sonika.jindal at intel.com wrote: > From: Ville Syrj?l? > > Sprite planes support 180 degree rotation. The lower layers are now in > place, so hook in the standard rotation property to expose the feature > to the users. > > v2: Moving rotation_property to drm_plane > > Cc: dri-devel at lists.freedesktop.org > Signed-off-by: Ville Syrj?l? > Signed-off-by: Sonika Jindal > Reviewed-by: Imre Deak > --- > drivers/gpu/drm/i915/intel_sprite.c | 40 > ++- > include/drm/drm_crtc.h |1 + One more: A patch titled with "drm/i915: ..." really shouldn't touch anything outside of drm/i915 directories and so shouldn't introduce any changes to core drm code. Such changes always need to be split out into a separate drm patch. Exceptions (like refactoring function interfaces) obviously apply. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
[Bug 73053] dpm hangs with BTC parts
https://bugs.freedesktop.org/show_bug.cgi?id=73053 --- Comment #39 from Alexandre Demers --- (In reply to comment #38) > Attachment 102081 [details] fixes the "hard lockup with small vertical blue > stripes" issue, when applied to 3.15.4, and AFAICS dpm works fine. > > The new problem is that I get kernel panic after a few hours if dpm is > enabled. With the good old profile method the system is stable. Could you test with latest kernel 3.16 RC (the patch is already included)? I have been running kernel 3.16-RC4 with this patch for Cayman and I don't get any lockups anymore. I've been running my system for a few days (games, movies and so on) without problem. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/7fa5a6b3/attachment.html>
[Bug 73053] dpm hangs with BTC parts
https://bugs.freedesktop.org/show_bug.cgi?id=73053 --- Comment #40 from Alex Deucher --- (In reply to comment #38) > The new problem is that I get kernel panic after a few hours if dpm is > enabled. With the good old profile method the system is stable. Can you get a copy of the panic? I think it may be related to the page flipping changes in the last couple kernels. It's not likely dpm would cause a panic. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/fe863d50/attachment.html>
[PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
Confirmed. The locking functions are removed from the interface in commit 82 : [PATCH 82/83] drm/radeon: Remove lock functions from kfd2kgd interface There is an elegant symmetry there, but yeah we need to find a way to make this less awkward to review without screwing up all the work you've done so far. It's not obvious how to do that though. I looked at squashing into a smaller number of big commits earlier on but unless we completely rip the code out and recreate from scratch I don't see anything better than : - a few foundation commits - a big code dump that covers everything up to ~patch 54 (with 71 squashed in) - remaining commits squashed a bit to combine fixes with initial code Is that what you had in mind when you said ~10 big commits ? Our feeling was that the need to skip over the original scheduler would make it more like "one really big commit and 10-20 smaller ones", and I think we all felt that the "big code dump" required to skip over the original scheduler would be a non-starter. I guess there is another option, and maybe that's what you had in mind -- breaking the "big code dump" into smaller commits would be possible if we were willing to not have working code until we got to the equivalent of ~patch 54 (+71) when all the new scheduler bits were in. Maybe that would still be an improvement ? Thanks, JB >-Original Message- >From: Bridgman, John >Sent: Friday, July 11, 2014 1:48 PM >To: 'Jerome Glisse'; Oded Gabbay >Cc: David Airlie; Deucher, Alexander; linux-kernel at vger.kernel.org; dri- >devel at lists.freedesktop.org; Lewycky, Andrew; Joerg Roedel; Gabbay, Oded; >Koenig, Christian >Subject: RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking >srbm_gfx_cntl register > >Checking... we shouldn't need to call the lock from kfd any more.We should >be able to do any required locking in radeon kgd code. > >>-Original Message- >>From: Jerome Glisse [mailto:j.glisse at gmail.com] >>Sent: Friday, July 11, 2014 12:35 PM >>To: Oded Gabbay >>Cc: David Airlie; Deucher, Alexander; linux-kernel at vger.kernel.org; >>dri- devel at lists.freedesktop.org; Bridgman, John; Lewycky, Andrew; >>Joerg Roedel; Gabbay, Oded; Koenig, Christian >>Subject: Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of >>locking srbm_gfx_cntl register >> >>On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote: >>> This patch adds a new interface to kfd2kgd_calls structure, which >>> allows the kfd to lock and unlock the srbm_gfx_cntl register >> >>Why does kfd needs to lock this register if kfd can not access any of >>those register ? This sounds broken to me, exposing a driver internal >>mutex to another driver is not something i am fan of. >> >>Cheers, >>J?r?me >> >>> >>> Signed-off-by: Oded Gabbay >>> --- >>> drivers/gpu/drm/radeon/radeon_kfd.c | 20 >>> include/linux/radeon_kfd.h | 4 >>> 2 files changed, 24 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c >>> b/drivers/gpu/drm/radeon/radeon_kfd.c >>> index 66ee36b..594020e 100644 >>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c >>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c >>> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd, >struct >>> kgd_mem *mem); >>> >>> static uint64_t get_vmem_size(struct kgd_dev *kgd); >>> >>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd); static void >>> +unlock_srbm_gfx_cntl(struct kgd_dev *kgd); >>> + >>> + >>> static const struct kfd2kgd_calls kfd2kgd = { >>> .allocate_mem = allocate_mem, >>> .free_mem = free_mem, >>> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = { >>> .kmap_mem = kmap_mem, >>> .unkmap_mem = unkmap_mem, >>> .get_vmem_size = get_vmem_size, >>> + .lock_srbm_gfx_cntl = lock_srbm_gfx_cntl, >>> + .unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl, >>> }; >>> >>> static const struct kgd2kfd_calls *kgd2kfd; @@ -233,3 +239,17 @@ >>> static uint64_t get_vmem_size(struct kgd_dev *kgd) >>> >>> return rdev->mc.real_vram_size; >>> } >>> + >>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd) { >>> + struct radeon_device *rdev = (struct radeon_device *)kgd; >>> + >>> + mutex_lock(&rdev->srbm_mutex); >>> +} >>> + >>> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd) { >>> + struct radeon_device *rdev = (struct radeon_device *)kgd; >>> + >>> + mutex_unlock(&rdev->srbm_mutex); >>> +} >>> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h >>> index c7997d4..40b691c 100644 >>> --- a/include/linux/radeon_kfd.h >>> +++ b/include/linux/radeon_kfd.h >>> @@ -81,6 +81,10 @@ struct kfd2kgd_calls { >>> void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem); >>> >>> uint64_t (*get_vmem_size)(struct kgd_dev *kgd); >>> + >>> + /* SRBM_GFX_CNTL mutex */ >>> + void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd); >>> + void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd); >>> }; >>> >>> bool kgd2kfd_init(unsigned interface_version,
Recall: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
Bridgman, John would like to recall the message, "[PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register".
[PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
>-Original Message- >From: Bridgman, John >Sent: Friday, July 11, 2014 1:48 PM >To: 'Jerome Glisse'; Oded Gabbay >Cc: David Airlie; Deucher, Alexander; linux-kernel at vger.kernel.org; dri- >devel at lists.freedesktop.org; Lewycky, Andrew; Joerg Roedel; Gabbay, Oded; >Koenig, Christian >Subject: RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking >srbm_gfx_cntl register > >Checking... we shouldn't need to call the lock from kfd any more.We should >be able to do any required locking in radeon kgd code. Confirmed. The locking functions are removed from the interface in commit 82 : [PATCH 82/83] drm/radeon: Remove lock functions from kfd2kgd interface There is an elegant symmetry there, but yeah we need to find a way to make this less awkward to review without screwing up all the work you've done so far. It's not obvious how to do that though. I looked at squashing into a smaller number of big commits earlier on but unless we completely rip the code out and recreate from scratch I don't see anything better than : - a few foundation commits - a big code dump that covers everything up to ~patch 54 (with 71 squashed in) - remaining commits squashed a bit to combine fixes with initial code Is that what you had in mind when you said ~10 big commits ? Our feeling was that the need to skip over the original scheduler would make it more like "one really big commit and 10-20 smaller ones", and I think we all felt that the "big code dump" required to skip over the original scheduler would be a non-starter. I guess there is another option, and maybe that's what you had in mind -- breaking the "big code dump" into smaller commits would be possible if we were willing to not have working code until we got to the equivalent of ~patch 54 (+71) when all the new scheduler bits were in. Maybe that would still be an improvement ? Thanks, JB > >>-Original Message- >>From: Jerome Glisse [mailto:j.glisse at gmail.com] >>Sent: Friday, July 11, 2014 12:35 PM >>To: Oded Gabbay >>Cc: David Airlie; Deucher, Alexander; linux-kernel at vger.kernel.org; >>dri- devel at lists.freedesktop.org; Bridgman, John; Lewycky, Andrew; >>Joerg Roedel; Gabbay, Oded; Koenig, Christian >>Subject: Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of >>locking srbm_gfx_cntl register >> >>On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote: >>> This patch adds a new interface to kfd2kgd_calls structure, which >>> allows the kfd to lock and unlock the srbm_gfx_cntl register >> >>Why does kfd needs to lock this register if kfd can not access any of >>those register ? This sounds broken to me, exposing a driver internal >>mutex to another driver is not something i am fan of. >> >>Cheers, >>J?r?me >> >>> >>> Signed-off-by: Oded Gabbay >>> --- >>> drivers/gpu/drm/radeon/radeon_kfd.c | 20 >>> include/linux/radeon_kfd.h | 4 >>> 2 files changed, 24 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c >>> b/drivers/gpu/drm/radeon/radeon_kfd.c >>> index 66ee36b..594020e 100644 >>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c >>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c >>> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd, >struct >>> kgd_mem *mem); >>> >>> static uint64_t get_vmem_size(struct kgd_dev *kgd); >>> >>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd); static void >>> +unlock_srbm_gfx_cntl(struct kgd_dev *kgd); >>> + >>> + >>> static const struct kfd2kgd_calls kfd2kgd = { >>> .allocate_mem = allocate_mem, >>> .free_mem = free_mem, >>> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = { >>> .kmap_mem = kmap_mem, >>> .unkmap_mem = unkmap_mem, >>> .get_vmem_size = get_vmem_size, >>> + .lock_srbm_gfx_cntl = lock_srbm_gfx_cntl, >>> + .unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl, >>> }; >>> >>> static const struct kgd2kfd_calls *kgd2kfd; @@ -233,3 +239,17 @@ >>> static uint64_t get_vmem_size(struct kgd_dev *kgd) >>> >>> return rdev->mc.real_vram_size; >>> } >>> + >>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd) { >>> + struct radeon_device *rdev = (struct radeon_device *)kgd; >>> + >>> + mutex_lock(&rdev->srbm_mutex); >>> +} >>> + >>> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd) { >>> + struct radeon_device *rdev = (struct radeon_device *)kgd; >>> + >>> + mutex_unlock(&rdev->srbm_mutex); >>> +} >>> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h >>> index c7997d4..40b691c 100644 >>> --- a/include/linux/radeon_kfd.h >>> +++ b/include/linux/radeon_kfd.h >>> @@ -81,6 +81,10 @@ struct kfd2kgd_calls { >>> void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem); >>> >>> uint64_t (*get_vmem_size)(struct kgd_dev *kgd); >>> + >>> + /* SRBM_GFX_CNTL mutex */ >>> + void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd); >>> + void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd); >>> }; >>> >>> bool kgd2kfd_init(unsigned interface_version,
[Bug 81255] New: EDEADLK with S. Islands APU+dGPU during ib test on ring 5
https://bugs.freedesktop.org/show_bug.cgi?id=81255 Priority: medium Bug ID: 81255 Assignee: dri-devel at lists.freedesktop.org Summary: EDEADLK with S. Islands APU+dGPU during ib test on ring 5 Severity: normal Classification: Unclassified OS: Linux (All) Reporter: joshua.r.marshall.1991 at gmail.com Hardware: x86-64 (AMD64) Status: NEW Version: XOrg CVS Component: DRM/Radeon Product: DRI Created attachment 102648 --> https://bugs.freedesktop.org/attachment.cgi?id=102648&action=edit sudo lshw -sanitize Recently installed a R9 270x on my system. For the life of me, I cannot determine what is wrong. The BIOS uses it by default, during early boot the system uses it, but during the VT switchoff, my system switches to APU graphics during a series of GPU page faults. Also, when running startxfce, the first time after boot fails, and the following uses only the APU graphics. sudo lshw -sanitize > http://pastebin.com/tkihM4rH sudo journalctl -b -kxam > http://pastebin.com/EJcwcYU6 cat /var/log/Xorg.0.log > http://pastebin.com/wCbzws9d cat /var/log/Xorg.o.log.old > http://pastebin.com/GaAL5Tqa Xorg is working on auto. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/72f1bf23/attachment.html>
[Bug 81255] EDEADLK with S. Islands APU+dGPU during ib test on ring 5
https://bugs.freedesktop.org/show_bug.cgi?id=81255 --- Comment #1 from joshua.r.marshall.1991 at gmail.com --- Created attachment 102649 --> https://bugs.freedesktop.org/attachment.cgi?id=102649&action=edit sudo journalctl -b -kxam -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/c987546f/attachment.html>
[Bug 81255] EDEADLK with S. Islands APU+dGPU during ib test on ring 5
https://bugs.freedesktop.org/show_bug.cgi?id=81255 --- Comment #2 from joshua.r.marshall.1991 at gmail.com --- Created attachment 102650 --> https://bugs.freedesktop.org/attachment.cgi?id=102650&action=edit Xorg.0.log -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/4e207e3e/attachment.html>
[Bug 81255] EDEADLK with S. Islands APU+dGPU during ib test on ring 5
https://bugs.freedesktop.org/show_bug.cgi?id=81255 --- Comment #3 from joshua.r.marshall.1991 at gmail.com --- Created attachment 102651 --> https://bugs.freedesktop.org/attachment.cgi?id=102651&action=edit Xorg.0.log.old -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/c0765320/attachment-0001.html>
[Bug 81255] EDEADLK with S. Islands APU+dGPU during ib test on ring 5
https://bugs.freedesktop.org/show_bug.cgi?id=81255 --- Comment #4 from joshua.r.marshall.1991 at gmail.com --- I would start at /drivers/gpu/drm/radeon/radeon_fence.c:368 since that is the clause being hit. So on line 364 on iteration 5 (the last) radeon_ring_is_lockup returns true. The function's definition is beyond me, so that's where I'll have to leave you off. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/1d71744d/attachment.html>
[PATCH v2 0/2] drm: rework flip-work framework
Hello, This patch series reworks the flip-work framework to make it safe when calling drm_flip_work_queue from atomic contexts. The 2nd patch of this series is optional, as it only reworks drm_flip_work_init prototype to remove unneeded size argument and return code (this function cannot fail anymore). Best Regards, Boris Boris BREZILLON (2): drm: rework flip-work helpers to avoid calling func when the FIFO is full drm: flip-work: change drm_flip_work_init prototype drivers/gpu/drm/drm_flip_work.c | 104 ++- drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c | 19 ++ drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c | 16 + drivers/gpu/drm/omapdrm/omap_plane.c | 14 + drivers/gpu/drm/tilcdc/tilcdc_crtc.c | 6 +- include/drm/drm_flip_work.h | 31 ++--- 6 files changed, 105 insertions(+), 85 deletions(-) -- 1.8.3.2
[PATCH v2 1/2] drm: rework flip-work helpers to avoid calling func when the FIFO is full
Make use of lists instead of kfifo in order to dynamically allocate task entry when someone require some delayed work, and thus preventing drm_flip_work_queue from directly calling func instead of queuing this call. This allow drm_flip_work_queue to be safely called even within irq handlers. Add new helper functions to allocate a flip work task and queue it when needed. This prevents allocating data within irq context (which might impact the time spent in the irq handler). Signed-off-by: Boris BREZILLON --- drivers/gpu/drm/drm_flip_work.c | 96 ++--- include/drm/drm_flip_work.h | 29 + 2 files changed, 93 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/drm_flip_work.c b/drivers/gpu/drm/drm_flip_work.c index f9c7fa3..7441aa8 100644 --- a/drivers/gpu/drm/drm_flip_work.c +++ b/drivers/gpu/drm/drm_flip_work.c @@ -25,6 +25,44 @@ #include "drm_flip_work.h" /** + * drm_flip_work_allocate_task - allocate a flip-work task + * @data: data associated to the task + * @flags: allocator flags + * + * Allocate a drm_flip_task object and attach private data to it. + */ +struct drm_flip_task *drm_flip_work_allocate_task(void *data, gfp_t flags) +{ + struct drm_flip_task *task; + + task = kzalloc(sizeof(*task), flags); + if (task) + task->data = data; + + return task; +} +EXPORT_SYMBOL(drm_flip_work_allocate_task); + +/** + * drm_flip_work_queue_task - queue a specific task + * @work: the flip-work + * @task: the task to handle + * + * Queues task, that will later be run (passed back to drm_flip_func_t + * func) on a work queue after drm_flip_work_commit() is called. + */ +void drm_flip_work_queue_task(struct drm_flip_work *work, + struct drm_flip_task *task) +{ + unsigned long flags; + + spin_lock_irqsave(&work->lock, flags); + list_add_tail(&task->node, &work->queued); + spin_unlock_irqrestore(&work->lock, flags); +} +EXPORT_SYMBOL(drm_flip_work_queue_task); + +/** * drm_flip_work_queue - queue work * @work: the flip-work * @val: the value to queue @@ -34,10 +72,14 @@ */ void drm_flip_work_queue(struct drm_flip_work *work, void *val) { - if (kfifo_put(&work->fifo, val)) { - atomic_inc(&work->pending); + struct drm_flip_task *task; + + task = drm_flip_work_allocate_task(val, + drm_can_sleep() ? GFP_KERNEL : GFP_ATOMIC); + if (task) { + drm_flip_work_queue_task(work, task); } else { - DRM_ERROR("%s fifo full!\n", work->name); + DRM_ERROR("%s could not allocate task!\n", work->name); work->func(work, val); } } @@ -56,9 +98,12 @@ EXPORT_SYMBOL(drm_flip_work_queue); void drm_flip_work_commit(struct drm_flip_work *work, struct workqueue_struct *wq) { - uint32_t pending = atomic_read(&work->pending); - atomic_add(pending, &work->count); - atomic_sub(pending, &work->pending); + unsigned long flags; + + spin_lock_irqsave(&work->lock, flags); + list_splice_tail(&work->queued, &work->commited); + INIT_LIST_HEAD(&work->queued); + spin_unlock_irqrestore(&work->lock, flags); queue_work(wq, &work->worker); } EXPORT_SYMBOL(drm_flip_work_commit); @@ -66,14 +111,26 @@ EXPORT_SYMBOL(drm_flip_work_commit); static void flip_worker(struct work_struct *w) { struct drm_flip_work *work = container_of(w, struct drm_flip_work, worker); - uint32_t count = atomic_read(&work->count); - void *val = NULL; + struct list_head tasks; + unsigned long flags; - atomic_sub(count, &work->count); + while (1) { + struct drm_flip_task *task, *tmp; - while(count--) - if (!WARN_ON(!kfifo_get(&work->fifo, &val))) - work->func(work, val); + INIT_LIST_HEAD(&tasks); + spin_lock_irqsave(&work->lock, flags); + list_splice_tail(&work->commited, &tasks); + INIT_LIST_HEAD(&work->commited); + spin_unlock_irqrestore(&work->lock, flags); + + if (list_empty(&tasks)) + break; + + list_for_each_entry_safe(task, tmp, &tasks, node) { + work->func(work, task->data); + kfree(task); + } + } } /** @@ -91,19 +148,11 @@ static void flip_worker(struct work_struct *w) int drm_flip_work_init(struct drm_flip_work *work, int size, const char *name, drm_flip_func_t func) { - int ret; - work->name = name; - atomic_set(&work->count, 0); - atomic_set(&work->pending, 0); + INIT_LIST_HEAD(&work->queued); + INIT_LIST_HEAD(&work->commited); work->func = func; - ret = kfifo_alloc(&work->fifo, size, GFP_KERNEL); - if (ret) { - DRM_ERROR("could
[PATCH v2 2/2] drm: flip-work: change drm_flip_work_init prototype
Now that we're using lists instead of kfifo to store drm flip-work tasks we do not need the size parameter passed to drm_flip_work_init function anymore. Moreover this function cannot fail anymore, we can thus remove the return code. Modify drm_flip_work_init users to take account of these changes. Signed-off-by: Boris BREZILLON --- drivers/gpu/drm/drm_flip_work.c | 8 +--- drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c | 19 --- drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c | 16 +++- drivers/gpu/drm/omapdrm/omap_plane.c | 14 ++ drivers/gpu/drm/tilcdc/tilcdc_crtc.c | 6 +- include/drm/drm_flip_work.h | 2 +- 6 files changed, 12 insertions(+), 53 deletions(-) diff --git a/drivers/gpu/drm/drm_flip_work.c b/drivers/gpu/drm/drm_flip_work.c index 7441aa8..2b557f2 100644 --- a/drivers/gpu/drm/drm_flip_work.c +++ b/drivers/gpu/drm/drm_flip_work.c @@ -136,16 +136,12 @@ static void flip_worker(struct work_struct *w) /** * drm_flip_work_init - initialize flip-work * @work: the flip-work to initialize - * @size: the max queue depth * @name: debug name * @func: the callback work function * * Initializes/allocates resources for the flip-work - * - * RETURNS: - * Zero on success, error code on failure. */ -int drm_flip_work_init(struct drm_flip_work *work, int size, +void drm_flip_work_init(struct drm_flip_work *work, const char *name, drm_flip_func_t func) { work->name = name; @@ -154,8 +150,6 @@ int drm_flip_work_init(struct drm_flip_work *work, int size, work->func = func; INIT_WORK(&work->worker, flip_worker); - - return 0; } EXPORT_SYMBOL(drm_flip_work_init); diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c index 74cebb5..44d4f93 100644 --- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c +++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c @@ -755,10 +755,8 @@ struct drm_crtc *mdp4_crtc_init(struct drm_device *dev, int ret; mdp4_crtc = kzalloc(sizeof(*mdp4_crtc), GFP_KERNEL); - if (!mdp4_crtc) { - ret = -ENOMEM; - goto fail; - } + if (!mdp4_crtc) + return ERR_PTR(-ENOMEM); crtc = &mdp4_crtc->base; @@ -779,12 +777,9 @@ struct drm_crtc *mdp4_crtc_init(struct drm_device *dev, spin_lock_init(&mdp4_crtc->cursor.lock); - ret = drm_flip_work_init(&mdp4_crtc->unref_fb_work, 16, + drm_flip_work_init(&mdp4_crtc->unref_fb_work, "unref fb", unref_fb_worker); - if (ret) - goto fail; - - ret = drm_flip_work_init(&mdp4_crtc->unref_cursor_work, 64, + drm_flip_work_init(&mdp4_crtc->unref_cursor_work, "unref cursor", unref_cursor_worker); INIT_FENCE_CB(&mdp4_crtc->pageflip_cb, pageflip_cb); @@ -795,10 +790,4 @@ struct drm_crtc *mdp4_crtc_init(struct drm_device *dev, mdp4_plane_install_properties(mdp4_crtc->plane, &crtc->base); return crtc; - -fail: - if (crtc) - mdp4_crtc_destroy(crtc); - - return ERR_PTR(ret); } diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c index ebe2e60..a0cb374 100644 --- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c +++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c @@ -537,10 +537,8 @@ struct drm_crtc *mdp5_crtc_init(struct drm_device *dev, int ret; mdp5_crtc = kzalloc(sizeof(*mdp5_crtc), GFP_KERNEL); - if (!mdp5_crtc) { - ret = -ENOMEM; - goto fail; - } + if (!mdp5_crtc) + return ERR_PTR(-ENOMEM); crtc = &mdp5_crtc->base; @@ -553,10 +551,8 @@ struct drm_crtc *mdp5_crtc_init(struct drm_device *dev, snprintf(mdp5_crtc->name, sizeof(mdp5_crtc->name), "%s:%d", pipe2name(mdp5_plane_pipe(plane)), id); - ret = drm_flip_work_init(&mdp5_crtc->unref_fb_work, 16, + drm_flip_work_init(&mdp5_crtc->unref_fb_work, "unref fb", unref_fb_worker); - if (ret) - goto fail; INIT_FENCE_CB(&mdp5_crtc->pageflip_cb, pageflip_cb); @@ -566,10 +562,4 @@ struct drm_crtc *mdp5_crtc_init(struct drm_device *dev, mdp5_plane_install_properties(mdp5_crtc->plane, &crtc->base); return crtc; - -fail: - if (crtc) - mdp5_crtc_destroy(crtc); - - return ERR_PTR(ret); } diff --git a/drivers/gpu/drm/omapdrm/omap_plane.c b/drivers/gpu/drm/omapdrm/omap_plane.c index 3cf31ee..847d1ca 100644 --- a/drivers/gpu/drm/omapdrm/omap_plane.c +++ b/drivers/gpu/drm/omapdrm/omap_plane.c @@ -397,14 +397,10 @@ struct drm_plane *omap_plane_init(struct drm_device *dev, omap_plane = kzalloc(sizeof(*omap_plane), GFP_KERNEL); if (!omap_plane) - goto fail; + return NULL; - ret = drm_flip_work_init(&omap_plane->unpin_work, 16, + drm_fl
[PATCH v2 1/2] drm: rework flip-work helpers to avoid calling func when the FIFO is full
On Sat, 12 Jul 2014 09:00:08 +0200 Boris BREZILLON wrote: > Make use of lists instead of kfifo in order to dynamically allocate > task entry when someone require some delayed work, and thus preventing > drm_flip_work_queue from directly calling func instead of queuing this > call. > This allow drm_flip_work_queue to be safely called even within irq > handlers. > > Add new helper functions to allocate a flip work task and queue it when > needed. This prevents allocating data within irq context (which might > impact the time spent in the irq handler). > > Signed-off-by: Boris BREZILLON > --- > drivers/gpu/drm/drm_flip_work.c | 96 > ++--- > include/drm/drm_flip_work.h | 29 + > 2 files changed, 93 insertions(+), 32 deletions(-) > > diff --git a/drivers/gpu/drm/drm_flip_work.c b/drivers/gpu/drm/drm_flip_work.c > index f9c7fa3..7441aa8 100644 > --- a/drivers/gpu/drm/drm_flip_work.c > +++ b/drivers/gpu/drm/drm_flip_work.c > @@ -25,6 +25,44 @@ > #include "drm_flip_work.h" > > /** > + * drm_flip_work_allocate_task - allocate a flip-work task > + * @data: data associated to the task > + * @flags: allocator flags > + * > + * Allocate a drm_flip_task object and attach private data to it. > + */ > +struct drm_flip_task *drm_flip_work_allocate_task(void *data, gfp_t flags) > +{ > + struct drm_flip_task *task; > + > + task = kzalloc(sizeof(*task), flags); > + if (task) > + task->data = data; > + > + return task; > +} > +EXPORT_SYMBOL(drm_flip_work_allocate_task); > + > +/** > + * drm_flip_work_queue_task - queue a specific task > + * @work: the flip-work > + * @task: the task to handle > + * > + * Queues task, that will later be run (passed back to drm_flip_func_t > + * func) on a work queue after drm_flip_work_commit() is called. > + */ > +void drm_flip_work_queue_task(struct drm_flip_work *work, > + struct drm_flip_task *task) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&work->lock, flags); > + list_add_tail(&task->node, &work->queued); > + spin_unlock_irqrestore(&work->lock, flags); > +} > +EXPORT_SYMBOL(drm_flip_work_queue_task); > + > +/** > * drm_flip_work_queue - queue work > * @work: the flip-work > * @val: the value to queue > @@ -34,10 +72,14 @@ > */ > void drm_flip_work_queue(struct drm_flip_work *work, void *val) > { > - if (kfifo_put(&work->fifo, val)) { > - atomic_inc(&work->pending); > + struct drm_flip_task *task; > + > + task = drm_flip_work_allocate_task(val, > + drm_can_sleep() ? GFP_KERNEL : GFP_ATOMIC); > + if (task) { > + drm_flip_work_queue_task(work, task); > } else { > - DRM_ERROR("%s fifo full!\n", work->name); > + DRM_ERROR("%s could not allocate task!\n", work->name); > work->func(work, val); > } > } > @@ -56,9 +98,12 @@ EXPORT_SYMBOL(drm_flip_work_queue); > void drm_flip_work_commit(struct drm_flip_work *work, > struct workqueue_struct *wq) > { > - uint32_t pending = atomic_read(&work->pending); > - atomic_add(pending, &work->count); > - atomic_sub(pending, &work->pending); > + unsigned long flags; > + > + spin_lock_irqsave(&work->lock, flags); > + list_splice_tail(&work->queued, &work->commited); > + INIT_LIST_HEAD(&work->queued); > + spin_unlock_irqrestore(&work->lock, flags); > queue_work(wq, &work->worker); > } > EXPORT_SYMBOL(drm_flip_work_commit); > @@ -66,14 +111,26 @@ EXPORT_SYMBOL(drm_flip_work_commit); > static void flip_worker(struct work_struct *w) > { > struct drm_flip_work *work = container_of(w, struct drm_flip_work, > worker); > - uint32_t count = atomic_read(&work->count); > - void *val = NULL; > + struct list_head tasks; > + unsigned long flags; > > - atomic_sub(count, &work->count); > + while (1) { > + struct drm_flip_task *task, *tmp; > > - while(count--) > - if (!WARN_ON(!kfifo_get(&work->fifo, &val))) > - work->func(work, val); > + INIT_LIST_HEAD(&tasks); > + spin_lock_irqsave(&work->lock, flags); > + list_splice_tail(&work->commited, &tasks); > + INIT_LIST_HEAD(&work->commited); > + spin_unlock_irqrestore(&work->lock, flags); > + > + if (list_empty(&tasks)) > + break; > + > + list_for_each_entry_safe(task, tmp, &tasks, node) { > + work->func(work, task->data); > + kfree(task); > + } > + } > } > > /** > @@ -91,19 +148,11 @@ static void flip_worker(struct work_struct *w) > int drm_flip_work_init(struct drm_flip_work *work, int size, > const char *name, drm_flip_func_t func) > { > - int ret; > - > work->name = name; > - atomic_set(&work->count, 0); > - atomic_set(&work
[Bug 81255] EDEADLK with S. Islands APU+dGPU during ib test on ring 5
https://bugs.freedesktop.org/show_bug.cgi?id=81255 --- Comment #5 from Kertesz Laszlo --- You should try disabling the APU from the BIOS. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/098ed5a3/attachment.html>
[PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
Am 11.07.2014 18:22, schrieb Alex Deucher: > On Fri, Jul 11, 2014 at 12:18 PM, Christian K?nig > wrote: >> Am 11.07.2014 18:05, schrieb Jerome Glisse: >> >>> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote: To support HSA on KV, we need to limit the number of vmids and pipes that are available for radeon's use with KV. This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs 0-7) and also makes radeon thinks that KV has only a single MEC with a single pipe in it Signed-off-by: Oded Gabbay >>> Reviewed-by: J?r?me Glisse >> >> At least fro the VMIDs on demand allocation should be trivial to implement, >> so I would rather prefer this instead of a fixed assignment. > IIRC, the way the CP hw scheduler works you have to give it a range of > vmids and it assigns them dynamically as queues are mapped so > effectively they are potentially in use once the CP scheduler is set > up. That's not what I meant. Changing it completely on the fly is nice to have, but we should at least make it configurable as a module parameter. And even if we hardcode it we should use a define for it somewhere instead of hardcoding 8 VMIDs on the KGD side and 8 VMIDs on KFD side without any relation to each other. Christian. > Alex > > >> Christian. >> >> --- drivers/gpu/drm/radeon/cik.c | 48 ++-- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c index 4bfc2c0..e0c8052 100644 --- a/drivers/gpu/drm/radeon/cik.c +++ b/drivers/gpu/drm/radeon/cik.c @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device *rdev) /* * KV:2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total +* Nonetheless, we assign only 1 pipe because all other pipes will +* be handled by KFD */ - if (rdev->family == CHIP_KAVERI) - rdev->mec.num_mec = 2; - else - rdev->mec.num_mec = 1; - rdev->mec.num_pipe = 4; + rdev->mec.num_mec = 1; + rdev->mec.num_pipe = 1; rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8; if (rdev->mec.hpd_eop_obj == NULL) { @@ -4809,28 +4808,24 @@ static int cik_cp_compute_resume(struct radeon_device *rdev) /* init the pipes */ mutex_lock(&rdev->srbm_mutex); - for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) { - int me = (i < 4) ? 1 : 2; - int pipe = (i < 4) ? i : (i - 4); - eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE * 2); + eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr; - cik_srbm_select(rdev, me, pipe, 0, 0); + cik_srbm_select(rdev, 0, 0, 0, 0); - /* write the EOP addr */ - WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8); - WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8); + /* write the EOP addr */ + WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8); + WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8); - /* set the VMID assigned */ - WREG32(CP_HPD_EOP_VMID, 0); + /* set the VMID assigned */ + WREG32(CP_HPD_EOP_VMID, 0); + + /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */ + tmp = RREG32(CP_HPD_EOP_CONTROL); + tmp &= ~EOP_SIZE_MASK; + tmp |= order_base_2(MEC_HPD_SIZE / 8); + WREG32(CP_HPD_EOP_CONTROL, tmp); - /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */ - tmp = RREG32(CP_HPD_EOP_CONTROL); - tmp &= ~EOP_SIZE_MASK; - tmp |= order_base_2(MEC_HPD_SIZE / 8); - WREG32(CP_HPD_EOP_CONTROL, tmp); - } - cik_srbm_select(rdev, 0, 0, 0, 0); mutex_unlock(&rdev->srbm_mutex); /* init the queues. Just two for now. */ @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib) */ int cik_vm_init(struct radeon_device *rdev) { - /* number of VMs */ - rdev->vm_manager.nvm = 16; + /* +* number of VMs +* VMID 0 is reserved for Graphics +* radeon compute will use VMIDs 1-7 +* KFD will use VMIDs 8-15 +*/ + rdev->vm_manager.nvm = 8; /* base offset of vram pages */ if (rdev->flags & RADEON_IS_IGP) { u64 t
[PATCH 00/83] AMD HSA kernel driver
Am 11.07.2014 23:18, schrieb Jerome Glisse: > On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote: >> On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote: >>> On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote: This patch set implements a Heterogeneous System Architecture (HSA) driver for radeon-family GPUs. >>> >>> This is just quick comments on few things. Given size of this, people >>> will need to have time to review things. >>> HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to share system resources more effectively via HW features including shared pageable memory, userspace-accessible work queues, and platform-level atomics. In addition to the memory protection mechanisms in GPUVM and IOMMUv2, the Sea Islands family of GPUs also performs HW-level validation of commands passed in through the queues (aka rings). The code in this patch set is intended to serve both as a sample driver for other HSA-compatible hardware devices and as a production driver for radeon-family processors. The code is architected to support multiple CPUs each with connected GPUs, although the current implementation focuses on a single Kaveri/Berlin APU, and works alongside the existing radeon kernel graphics driver (kgd). AMD GPUs designed for use with HSA (Sea Islands and up) share some hardware functionality between HSA compute and regular gfx/compute (memory, interrupts, registers), while other functionality has been added specifically for HSA compute (hw scheduler for virtualized compute rings). All shared hardware is owned by the radeon graphics driver, and an interface between kfd and kgd allows the kfd to make use of those shared resources, while HSA-specific functionality is managed directly by kfd by submitting packets into an HSA-specific command queue (the "HIQ"). During kfd module initialization a char device node (/dev/kfd) is created (surviving until module exit), with ioctls for queue creation & management, and data structures are initialized for managing HSA device topology. The rest of the initialization is driven by calls from the radeon kgd at the following points : - radeon_init (kfd_init) - radeon_exit (kfd_fini) - radeon_driver_load_kms (kfd_device_probe, kfd_device_init) - radeon_driver_unload_kms (kfd_device_fini) During the probe and init processing per-device data structures are established which connect to the associated graphics kernel driver. This information is exposed to userspace via sysfs, along with a version number allowing userspace to determine if a topology change has occurred while it was reading from sysfs. The interface between kfd and kgd also allows the kfd to request buffer management services from kgd, and allows kgd to route interrupt requests to kfd code since the interrupt block is shared between regular graphics/compute and HSA compute subsystems in the GPU. The kfd code works with an open source usermode library ("libhsakmt") which is in the final stages of IP review and should be published in a separate repo over the next few days. The code operates in one of three modes, selectable via the sched_policy module parameter : - sched_policy=0 uses a hardware scheduler running in the MEC block within CP, and allows oversubscription (more queues than HW slots) - sched_policy=1 also uses HW scheduling but does not allow oversubscription, so create_queue requests fail when we run out of HW slots - sched_policy=2 does not use HW scheduling, so the driver manually assigns queues to HW slots by programming registers The "no HW scheduling" option is for debug & new hardware bringup only, so has less test coverage than the other options. Default in the current code is "HW scheduling without oversubscription" since that is where we have the most test coverage but we expect to change the default to "HW scheduling with oversubscription" after further testing. This effectively removes the HW limit on the number of work queues available to applications. Programs running on the GPU are associated with an address space through the VMID field, which is translated to a unique PASID at access time via a set of 16 VMID-to-PASID mapping registers. The available VMIDs (currently 16) are partitioned (under control of the radeon kgd) between current gfx/compute and HSA compute, with each getting 8 in the current code. The VMID-to
[Intel-gfx] [Xen-devel] [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type
On Fri, Jul 11, 2014 at 08:30:59PM +, Tian, Kevin wrote: > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk at oracle.com] > > Sent: Friday, July 11, 2014 12:42 PM > > > > On Fri, Jul 11, 2014 at 08:29:56AM +0200, Daniel Vetter wrote: > > > On Thu, Jul 10, 2014 at 09:08:24PM +, Tian, Kevin wrote: > > > > actually I'm curious whether it's still necessary to __detect__ PCH. > > > > Could > > > > we assume a 1:1 mapping between GPU and PCH, e.g. BDW already hard > > > > code the knowledge: > > > > > > > > } else if (IS_BROADWELL(dev)) { > > > > dev_priv->pch_type = PCH_LPT; > > > > dev_priv->pch_id = > > > > > > INTEL_PCH_LPT_LP_DEVICE_ID_TYPE; > > > > DRM_DEBUG_KMS("This is Broadwell, > > assuming " > > > > "LynxPoint LP PCH\n"); > > > > > > > > Or if there is real usage on non-fixed mapping (not majority), could it > > > > be a > > > > better option to have fixed mapping as a fallback instead of leaving as > > > > PCH_NONE? Then even when Qemu doesn't provide a special tweaked > > PCH, > > > > the majority case just works. > > > > > > I guess we can do it, at least I haven't seen any strange combinations in > > > the wild outside of Intel ... > > > > How big is the QA matrix for this? Would it make sense to just > > include the latest hardware (say going two generations back) > > and ignore the older one? > > suppose minimal or no QA effort on bare metal, if we only conservatively > change the fallback path which is today not supposed to function with > PCH_NONE. so it's only same amount of QA effort as whatever else is > proposed in this passthru upstreaming task. I agree no need to cover > older model, possibly just snb, ivb and hsw, but will leave Tiejun to answer > the overall goal. Yeah, I'd be ok with the approach of using defaults if we can't recognize the pch - if anyone screams we can either quirk or figure something else out. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
[Bug 79074] PRIME with compositing rednering hangs and other rendering issues
https://bugs.freedesktop.org/show_bug.cgi?id=79074 Christoph Haag changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #1 from Christoph Haag --- I think this is not a problem anymore with DRI3 offloading and the other improvements in that commit. http://cgit.freedesktop.org/mesa/mesa/commit/?id=9320c8fea947fd0f6eb723c67f0bdb947e45c4c3 -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/c9d767f7/attachment.html>
[PATCH 00/83] AMD HSA kernel driver
On Sat, Jul 12, 2014 at 11:24:49AM +0200, Christian K?nig wrote: > Am 11.07.2014 23:18, schrieb Jerome Glisse: > >On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote: > >>On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote: > >>>On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote: > This patch set implements a Heterogeneous System Architecture > (HSA) driver > for radeon-family GPUs. > >>>This is just quick comments on few things. Given size of this, people > >>>will need to have time to review things. > HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to > share > system resources more effectively via HW features including > shared pageable > memory, userspace-accessible work queues, and platform-level > atomics. In > addition to the memory protection mechanisms in GPUVM and > IOMMUv2, the Sea > Islands family of GPUs also performs HW-level validation of > commands passed > in through the queues (aka rings). > The code in this patch set is intended to serve both as a sample > driver for > other HSA-compatible hardware devices and as a production driver > for > radeon-family processors. The code is architected to support > multiple CPUs > each with connected GPUs, although the current implementation > focuses on a > single Kaveri/Berlin APU, and works alongside the existing radeon > kernel > graphics driver (kgd). > AMD GPUs designed for use with HSA (Sea Islands and up) share > some hardware > functionality between HSA compute and regular gfx/compute (memory, > interrupts, registers), while other functionality has been added > specifically for HSA compute (hw scheduler for virtualized > compute rings). > All shared hardware is owned by the radeon graphics driver, and > an interface > between kfd and kgd allows the kfd to make use of those shared > resources, > while HSA-specific functionality is managed directly by kfd by > submitting > packets into an HSA-specific command queue (the "HIQ"). > During kfd module initialization a char device node (/dev/kfd) is > created > (surviving until module exit), with ioctls for queue creation & > management, > and data structures are initialized for managing HSA device > topology. > The rest of the initialization is driven by calls from the radeon > kgd at > the following points : > - radeon_init (kfd_init) > - radeon_exit (kfd_fini) > - radeon_driver_load_kms (kfd_device_probe, kfd_device_init) > - radeon_driver_unload_kms (kfd_device_fini) > During the probe and init processing per-device data structures > are > established which connect to the associated graphics kernel > driver. This > information is exposed to userspace via sysfs, along with a > version number > allowing userspace to determine if a topology change has occurred > while it > was reading from sysfs. > The interface between kfd and kgd also allows the kfd to request > buffer > management services from kgd, and allows kgd to route interrupt > requests to > kfd code since the interrupt block is shared between regular > graphics/compute and HSA compute subsystems in the GPU. > The kfd code works with an open source usermode library > ("libhsakmt") which > is in the final stages of IP review and should be published in a > separate > repo over the next few days. > The code operates in one of three modes, selectable via the > sched_policy > module parameter : > - sched_policy=0 uses a hardware scheduler running in the MEC > block within > CP, and allows oversubscription (more queues than HW slots) > - sched_policy=1 also uses HW scheduling but does not allow > oversubscription, so create_queue requests fail when we run out > of HW slots > - sched_policy=2 does not use HW scheduling, so the driver > manually assigns > queues to HW slots by programming registers > The "no HW scheduling" option is for debug & new hardware bringup > only, so > has less test coverage than the other options. Default in the > current code > is "HW scheduling without oversubscription" since that is where > we have the > most test coverage but we expect to change the default to "HW > scheduling > with oversubscription" after further testing. This effectively > removes the > HW limit on the number of work queues available to applications. > Programs running on the GPU are associated with an address space > through the > VMID field, which is translated to a unique PASID at access time > via a set > of 16 VMID-to-PASID mapping registers. The available VMIDs > (currently 16) > are par
[PATCH v2 1/2] drm: rework flip-work helpers to avoid calling func when the FIFO is full
On Sat, Jul 12, 2014 at 3:00 AM, Boris BREZILLON wrote: > Make use of lists instead of kfifo in order to dynamically allocate > task entry when someone require some delayed work, and thus preventing > drm_flip_work_queue from directly calling func instead of queuing this > call. > This allow drm_flip_work_queue to be safely called even within irq > handlers. > > Add new helper functions to allocate a flip work task and queue it when > needed. This prevents allocating data within irq context (which might > impact the time spent in the irq handler). > > Signed-off-by: Boris BREZILLON > --- > drivers/gpu/drm/drm_flip_work.c | 96 > ++--- > include/drm/drm_flip_work.h | 29 + > 2 files changed, 93 insertions(+), 32 deletions(-) > > diff --git a/drivers/gpu/drm/drm_flip_work.c b/drivers/gpu/drm/drm_flip_work.c > index f9c7fa3..7441aa8 100644 > --- a/drivers/gpu/drm/drm_flip_work.c > +++ b/drivers/gpu/drm/drm_flip_work.c > @@ -25,6 +25,44 @@ > #include "drm_flip_work.h" > > /** > + * drm_flip_work_allocate_task - allocate a flip-work task > + * @data: data associated to the task > + * @flags: allocator flags > + * > + * Allocate a drm_flip_task object and attach private data to it. > + */ > +struct drm_flip_task *drm_flip_work_allocate_task(void *data, gfp_t flags) > +{ > + struct drm_flip_task *task; > + > + task = kzalloc(sizeof(*task), flags); > + if (task) > + task->data = data; > + > + return task; > +} > +EXPORT_SYMBOL(drm_flip_work_allocate_task); > + > +/** > + * drm_flip_work_queue_task - queue a specific task > + * @work: the flip-work > + * @task: the task to handle > + * > + * Queues task, that will later be run (passed back to drm_flip_func_t > + * func) on a work queue after drm_flip_work_commit() is called. > + */ > +void drm_flip_work_queue_task(struct drm_flip_work *work, > + struct drm_flip_task *task) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&work->lock, flags); > + list_add_tail(&task->node, &work->queued); > + spin_unlock_irqrestore(&work->lock, flags); > +} > +EXPORT_SYMBOL(drm_flip_work_queue_task); > + > +/** > * drm_flip_work_queue - queue work > * @work: the flip-work > * @val: the value to queue > @@ -34,10 +72,14 @@ > */ > void drm_flip_work_queue(struct drm_flip_work *work, void *val) > { > - if (kfifo_put(&work->fifo, val)) { > - atomic_inc(&work->pending); > + struct drm_flip_task *task; > + > + task = drm_flip_work_allocate_task(val, > + drm_can_sleep() ? GFP_KERNEL : GFP_ATOMIC); > + if (task) { > + drm_flip_work_queue_task(work, task); > } else { > - DRM_ERROR("%s fifo full!\n", work->name); > + DRM_ERROR("%s could not allocate task!\n", work->name); > work->func(work, val); > } > } > @@ -56,9 +98,12 @@ EXPORT_SYMBOL(drm_flip_work_queue); > void drm_flip_work_commit(struct drm_flip_work *work, > struct workqueue_struct *wq) > { > - uint32_t pending = atomic_read(&work->pending); > - atomic_add(pending, &work->count); > - atomic_sub(pending, &work->pending); > + unsigned long flags; > + > + spin_lock_irqsave(&work->lock, flags); > + list_splice_tail(&work->queued, &work->commited); > + INIT_LIST_HEAD(&work->queued); > + spin_unlock_irqrestore(&work->lock, flags); > queue_work(wq, &work->worker); > } > EXPORT_SYMBOL(drm_flip_work_commit); > @@ -66,14 +111,26 @@ EXPORT_SYMBOL(drm_flip_work_commit); > static void flip_worker(struct work_struct *w) > { > struct drm_flip_work *work = container_of(w, struct drm_flip_work, > worker); > - uint32_t count = atomic_read(&work->count); > - void *val = NULL; > + struct list_head tasks; > + unsigned long flags; > > - atomic_sub(count, &work->count); > + while (1) { > + struct drm_flip_task *task, *tmp; > > - while(count--) > - if (!WARN_ON(!kfifo_get(&work->fifo, &val))) > - work->func(work, val); > + INIT_LIST_HEAD(&tasks); > + spin_lock_irqsave(&work->lock, flags); > + list_splice_tail(&work->commited, &tasks); > + INIT_LIST_HEAD(&work->commited); > + spin_unlock_irqrestore(&work->lock, flags); > + > + if (list_empty(&tasks)) > + break; > + > + list_for_each_entry_safe(task, tmp, &tasks, node) { > + work->func(work, task->data); > + kfree(task); > + } > + } > } > > /** > @@ -91,19 +148,11 @@ static void flip_worker(struct work_struct *w) > int drm_flip_work_init(struct drm_flip_work *work, int size, > const char *name, drm_flip_func_t func) > { > - i
[Bug 81255] EDEADLK with S. Islands APU+dGPU during ib test on ring 5
https://bugs.freedesktop.org/show_bug.cgi?id=81255 --- Comment #6 from joshua.r.marshall.1991 at gmail.com --- That is not a BIOS option. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/60df37fd/attachment.html>
[Bug 73053] dpm hangs with BTC parts
https://bugs.freedesktop.org/show_bug.cgi?id=73053 --- Comment #41 from almos --- (In reply to comment #40) > (In reply to comment #38) > > The new problem is that I get kernel panic after a few hours if dpm is > > enabled. With the good old profile method the system is stable. > > Can you get a copy of the panic? I think it may be related to the page > flipping changes in the last couple kernels. It's not likely dpm would > cause a panic. It seems I spoke too soon. 3.15.4 panics even without dpm. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/c825fd39/attachment.html>
[Bug 73053] dpm hangs with BTC parts
https://bugs.freedesktop.org/show_bug.cgi?id=73053 --- Comment #42 from almos --- Created attachment 102669 --> https://bugs.freedesktop.org/attachment.cgi?id=102669&action=edit image of panic.jpg -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/068a2520/attachment.html>
[Bug 73911] Color Banding on radeon
https://bugs.freedesktop.org/show_bug.cgi?id=73911 tomimaki changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #30 from tomimaki --- Well, I think we can close it as fix is now in stable and longterm kernels (except 3.12). :) -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/766482fc/attachment.html>
[Bug 81255] EDEADLK with S. Islands APU+dGPU during ib test on ring 5
https://bugs.freedesktop.org/show_bug.cgi?id=81255 joshua.r.marshall.1991 at gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |WORKSFORME --- Comment #7 from joshua.r.marshall.1991 at gmail.com --- Re-read the manual. There is an option...albeit described as something else. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140712/057bcd47/attachment.html>
[PATCH v3 1/2] drm: rework flip-work helpers to avoid calling func when the FIFO is full
Make use of lists instead of kfifo in order to dynamically allocate task entry when someone require some delayed work, and thus preventing drm_flip_work_queue from directly calling func instead of queuing this call. This allow drm_flip_work_queue to be safely called even within irq handlers. Add new helper functions to allocate a flip work task and queue it when needed. This prevents allocating data within irq context (which might impact the time spent in the irq handler). Signed-off-by: Boris BREZILLON Reviewed-by: Rob Clark --- drivers/gpu/drm/drm_flip_work.c | 97 +++-- include/drm/drm_flip_work.h | 31 + 2 files changed, 96 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/drm_flip_work.c b/drivers/gpu/drm/drm_flip_work.c index f9c7fa3..6f4ae5b 100644 --- a/drivers/gpu/drm/drm_flip_work.c +++ b/drivers/gpu/drm/drm_flip_work.c @@ -25,6 +25,44 @@ #include "drm_flip_work.h" /** + * drm_flip_work_allocate_task - allocate a flip-work task + * @data: data associated to the task + * @flags: allocator flags + * + * Allocate a drm_flip_task object and attach private data to it. + */ +struct drm_flip_task *drm_flip_work_allocate_task(void *data, gfp_t flags) +{ + struct drm_flip_task *task; + + task = kzalloc(sizeof(*task), flags); + if (task) + task->data = data; + + return task; +} +EXPORT_SYMBOL(drm_flip_work_allocate_task); + +/** + * drm_flip_work_queue_task - queue a specific task + * @work: the flip-work + * @task: the task to handle + * + * Queues task, that will later be run (passed back to drm_flip_func_t + * func) on a work queue after drm_flip_work_commit() is called. + */ +void drm_flip_work_queue_task(struct drm_flip_work *work, + struct drm_flip_task *task) +{ + unsigned long flags; + + spin_lock_irqsave(&work->lock, flags); + list_add_tail(&task->node, &work->queued); + spin_unlock_irqrestore(&work->lock, flags); +} +EXPORT_SYMBOL(drm_flip_work_queue_task); + +/** * drm_flip_work_queue - queue work * @work: the flip-work * @val: the value to queue @@ -34,10 +72,14 @@ */ void drm_flip_work_queue(struct drm_flip_work *work, void *val) { - if (kfifo_put(&work->fifo, val)) { - atomic_inc(&work->pending); + struct drm_flip_task *task; + + task = drm_flip_work_allocate_task(val, + drm_can_sleep() ? GFP_KERNEL : GFP_ATOMIC); + if (task) { + drm_flip_work_queue_task(work, task); } else { - DRM_ERROR("%s fifo full!\n", work->name); + DRM_ERROR("%s could not allocate task!\n", work->name); work->func(work, val); } } @@ -56,9 +98,12 @@ EXPORT_SYMBOL(drm_flip_work_queue); void drm_flip_work_commit(struct drm_flip_work *work, struct workqueue_struct *wq) { - uint32_t pending = atomic_read(&work->pending); - atomic_add(pending, &work->count); - atomic_sub(pending, &work->pending); + unsigned long flags; + + spin_lock_irqsave(&work->lock, flags); + list_splice_tail(&work->queued, &work->commited); + INIT_LIST_HEAD(&work->queued); + spin_unlock_irqrestore(&work->lock, flags); queue_work(wq, &work->worker); } EXPORT_SYMBOL(drm_flip_work_commit); @@ -66,14 +111,26 @@ EXPORT_SYMBOL(drm_flip_work_commit); static void flip_worker(struct work_struct *w) { struct drm_flip_work *work = container_of(w, struct drm_flip_work, worker); - uint32_t count = atomic_read(&work->count); - void *val = NULL; + struct list_head tasks; + unsigned long flags; - atomic_sub(count, &work->count); + while (1) { + struct drm_flip_task *task, *tmp; - while(count--) - if (!WARN_ON(!kfifo_get(&work->fifo, &val))) - work->func(work, val); + INIT_LIST_HEAD(&tasks); + spin_lock_irqsave(&work->lock, flags); + list_splice_tail(&work->commited, &tasks); + INIT_LIST_HEAD(&work->commited); + spin_unlock_irqrestore(&work->lock, flags); + + if (list_empty(&tasks)) + break; + + list_for_each_entry_safe(task, tmp, &tasks, node) { + work->func(work, task->data); + kfree(task); + } + } } /** @@ -91,19 +148,12 @@ static void flip_worker(struct work_struct *w) int drm_flip_work_init(struct drm_flip_work *work, int size, const char *name, drm_flip_func_t func) { - int ret; - work->name = name; - atomic_set(&work->count, 0); - atomic_set(&work->pending, 0); + INIT_LIST_HEAD(&work->queued); + INIT_LIST_HEAD(&work->commited); + spin_lock_init(&work->lock); work->func = func; - ret = kfifo_alloc(&work->fifo, size, GFP_
[PATCH v3 0/2] drm: rework flip-work framework
Hello, This patch series reworks the flip-work framework to make it safe when calling drm_flip_work_queue from atomic contexts. The 2nd patch of this series is optional, as it only reworks drm_flip_work_init prototype to remove unneeded size argument and return code (this function cannot fail anymore). Best Regards, Boris Changes since v2: - add missing spin_lock_init - fix flip utils description Changes since v1: - add gfp flags argument to drm_flip_work_allocate_task function - make drm_flip_work_queue safe when called from atomic context Boris BREZILLON (2): drm: rework flip-work helpers to avoid calling func when the FIFO is full drm: flip-work: change drm_flip_work_init prototype drivers/gpu/drm/drm_flip_work.c | 105 ++- drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c | 19 ++ drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c | 16 + drivers/gpu/drm/omapdrm/omap_plane.c | 14 + drivers/gpu/drm/tilcdc/tilcdc_crtc.c | 6 +- include/drm/drm_flip_work.h | 33 +++--- 6 files changed, 108 insertions(+), 85 deletions(-) -- 1.8.3.2
[PATCH v3 2/2] drm: flip-work: change drm_flip_work_init prototype
Now that we're using lists instead of kfifo to store drm flip-work tasks we do not need the size parameter passed to drm_flip_work_init function anymore. Moreover this function cannot fail anymore, we can thus remove the return code. Modify drm_flip_work_init users to take account of these changes. Signed-off-by: Boris BREZILLON Reviewed-by: Rob Clark --- drivers/gpu/drm/drm_flip_work.c | 8 +--- drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c | 19 --- drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c | 16 +++- drivers/gpu/drm/omapdrm/omap_plane.c | 14 ++ drivers/gpu/drm/tilcdc/tilcdc_crtc.c | 6 +- include/drm/drm_flip_work.h | 2 +- 6 files changed, 12 insertions(+), 53 deletions(-) diff --git a/drivers/gpu/drm/drm_flip_work.c b/drivers/gpu/drm/drm_flip_work.c index 6f4ae5b..43d9b95 100644 --- a/drivers/gpu/drm/drm_flip_work.c +++ b/drivers/gpu/drm/drm_flip_work.c @@ -136,16 +136,12 @@ static void flip_worker(struct work_struct *w) /** * drm_flip_work_init - initialize flip-work * @work: the flip-work to initialize - * @size: the max queue depth * @name: debug name * @func: the callback work function * * Initializes/allocates resources for the flip-work - * - * RETURNS: - * Zero on success, error code on failure. */ -int drm_flip_work_init(struct drm_flip_work *work, int size, +void drm_flip_work_init(struct drm_flip_work *work, const char *name, drm_flip_func_t func) { work->name = name; @@ -155,8 +151,6 @@ int drm_flip_work_init(struct drm_flip_work *work, int size, work->func = func; INIT_WORK(&work->worker, flip_worker); - - return 0; } EXPORT_SYMBOL(drm_flip_work_init); diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c index 74cebb5..44d4f93 100644 --- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c +++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c @@ -755,10 +755,8 @@ struct drm_crtc *mdp4_crtc_init(struct drm_device *dev, int ret; mdp4_crtc = kzalloc(sizeof(*mdp4_crtc), GFP_KERNEL); - if (!mdp4_crtc) { - ret = -ENOMEM; - goto fail; - } + if (!mdp4_crtc) + return ERR_PTR(-ENOMEM); crtc = &mdp4_crtc->base; @@ -779,12 +777,9 @@ struct drm_crtc *mdp4_crtc_init(struct drm_device *dev, spin_lock_init(&mdp4_crtc->cursor.lock); - ret = drm_flip_work_init(&mdp4_crtc->unref_fb_work, 16, + drm_flip_work_init(&mdp4_crtc->unref_fb_work, "unref fb", unref_fb_worker); - if (ret) - goto fail; - - ret = drm_flip_work_init(&mdp4_crtc->unref_cursor_work, 64, + drm_flip_work_init(&mdp4_crtc->unref_cursor_work, "unref cursor", unref_cursor_worker); INIT_FENCE_CB(&mdp4_crtc->pageflip_cb, pageflip_cb); @@ -795,10 +790,4 @@ struct drm_crtc *mdp4_crtc_init(struct drm_device *dev, mdp4_plane_install_properties(mdp4_crtc->plane, &crtc->base); return crtc; - -fail: - if (crtc) - mdp4_crtc_destroy(crtc); - - return ERR_PTR(ret); } diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c index ebe2e60..a0cb374 100644 --- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c +++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c @@ -537,10 +537,8 @@ struct drm_crtc *mdp5_crtc_init(struct drm_device *dev, int ret; mdp5_crtc = kzalloc(sizeof(*mdp5_crtc), GFP_KERNEL); - if (!mdp5_crtc) { - ret = -ENOMEM; - goto fail; - } + if (!mdp5_crtc) + return ERR_PTR(-ENOMEM); crtc = &mdp5_crtc->base; @@ -553,10 +551,8 @@ struct drm_crtc *mdp5_crtc_init(struct drm_device *dev, snprintf(mdp5_crtc->name, sizeof(mdp5_crtc->name), "%s:%d", pipe2name(mdp5_plane_pipe(plane)), id); - ret = drm_flip_work_init(&mdp5_crtc->unref_fb_work, 16, + drm_flip_work_init(&mdp5_crtc->unref_fb_work, "unref fb", unref_fb_worker); - if (ret) - goto fail; INIT_FENCE_CB(&mdp5_crtc->pageflip_cb, pageflip_cb); @@ -566,10 +562,4 @@ struct drm_crtc *mdp5_crtc_init(struct drm_device *dev, mdp5_plane_install_properties(mdp5_crtc->plane, &crtc->base); return crtc; - -fail: - if (crtc) - mdp5_crtc_destroy(crtc); - - return ERR_PTR(ret); } diff --git a/drivers/gpu/drm/omapdrm/omap_plane.c b/drivers/gpu/drm/omapdrm/omap_plane.c index 3cf31ee..847d1ca 100644 --- a/drivers/gpu/drm/omapdrm/omap_plane.c +++ b/drivers/gpu/drm/omapdrm/omap_plane.c @@ -397,14 +397,10 @@ struct drm_plane *omap_plane_init(struct drm_device *dev, omap_plane = kzalloc(sizeof(*omap_plane), GFP_KERNEL); if (!omap_plane) - goto fail; + return NULL; - ret = drm_flip_work_init(&omap_plane->unpin_
[PATCH] drm/nouveau/disp/dp: drop dead code
Since this commit: commit 55f083c33feb7231c7574a64cd01b0477715a370 Author: Ben Skeggs Date: Tue May 20 10:18:03 2014 +1000 drm/nouveau/disp/dp: maintain link in response to hpd signal a few bits of code have been dead. This was noticed by Coverity Scan. Signed-off-by: Brian Norris Cc: Ben Skeggs --- Compile tested only drivers/gpu/drm/nouveau/core/engine/disp/dport.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/dport.c b/drivers/gpu/drm/nouveau/core/engine/disp/dport.c index 5a5b59b21130..0f6fbe020c41 100644 --- a/drivers/gpu/drm/nouveau/core/engine/disp/dport.c +++ b/drivers/gpu/drm/nouveau/core/engine/disp/dport.c @@ -331,7 +331,6 @@ nouveau_dp_train(struct work_struct *w) struct dp_state _dp = { .outp = outp, }, *dp = &_dp; - u32 datarate = 0; int ret; /* bring capabilities within encoder limits */ @@ -345,20 +344,13 @@ nouveau_dp_train(struct work_struct *w) outp->dpcd[1] = outp->base.info.dpconf.link_bw; dp->pc2 = outp->dpcd[2] & DPCD_RC02_TPS3_SUPPORTED; - /* restrict link config to the lowest required rate, if requested */ - if (datarate) { - datarate = (datarate / 8) * 10; /* 8B/10B coding overhead */ - while (cfg[1].rate >= datarate) - cfg++; - } - cfg--; - /* disable link interrupt handling during link training */ nouveau_event_put(outp->irq); /* enable down-spreading and execute pre-train script from vbios */ dp_link_train_init(dp, outp->dpcd[3] & 0x01); + cfg--; while (ret = -EIO, (++cfg)->rate) { /* select next configuration supported by encoder and sink */ while (cfg->nr > (outp->dpcd[2] & DPCD_RC02_MAX_LANE_COUNT) || -- 1.7.9.5
[PATCH] drm: omapdrm: fix compiler errors
Regular randconfig nightly testing has detected problems with omapdrm. omapdrm fails to build when the kernel is built to support 64-bit DMA addresses and/or 64-bit physical addresses due to an assumption about the width of these types. Use %pad to print DMA addresses, rather than %x or %Zx (which is even more wrong than %x). Avoid passing a uint32_t pointer into a function which expects dma_addr_t pointer. drivers/gpu/drm/omapdrm/omap_plane.c: In function 'omap_plane_pre_apply': drivers/gpu/drm/omapdrm/omap_plane.c:145:2: error: format '%x' expects argument of type 'unsigned int', but argument 5 has type 'dma_addr_t' [-Werror=format] drivers/gpu/drm/omapdrm/omap_plane.c:145:2: error: format '%x' expects argument of type 'unsigned int', but argument 6 has type 'dma_addr_t' [-Werror=format] make[5]: *** [drivers/gpu/drm/omapdrm/omap_plane.o] Error 1 drivers/gpu/drm/omapdrm/omap_gem.c: In function 'omap_gem_get_paddr': drivers/gpu/drm/omapdrm/omap_gem.c:794:4: error: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'dma_addr_t' [-Werror=format] drivers/gpu/drm/omapdrm/omap_gem.c: In function 'omap_gem_describe': drivers/gpu/drm/omapdrm/omap_gem.c:991:4: error: format '%Zx' expects argument of type 'size_t', but argument 7 has type 'dma_addr_t' [-Werror=format] drivers/gpu/drm/omapdrm/omap_gem.c: In function 'omap_gem_init': drivers/gpu/drm/omapdrm/omap_gem.c:1470:4: error: format '%x' expects argument of type 'unsigned int', but argument 7 has type 'dma_addr_t' [-Werror=format] make[5]: *** [drivers/gpu/drm/omapdrm/omap_gem.o] Error 1 drivers/gpu/drm/omapdrm/omap_dmm_tiler.c: In function 'dmm_txn_append': drivers/gpu/drm/omapdrm/omap_dmm_tiler.c:226:2: error: passing argument 3 of 'alloc_dma' from incompatible pointer type [-Werror] make[5]: *** [drivers/gpu/drm/omapdrm/omap_dmm_tiler.o] Error 1 make[5]: Target `__build' not remade because of errors. make[4]: *** [drivers/gpu/drm/omapdrm] Error 2 Signed-off-by: Russell King --- drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 6 -- drivers/gpu/drm/omapdrm/omap_gem.c | 10 +- drivers/gpu/drm/omapdrm/omap_plane.c | 4 ++-- 3 files changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c index f926b4caf449..56c60552abba 100644 --- a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c +++ b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c @@ -199,7 +199,7 @@ static struct dmm_txn *dmm_txn_init(struct dmm *dmm, struct tcm *tcm) static void dmm_txn_append(struct dmm_txn *txn, struct pat_area *area, struct page **pages, uint32_t npages, uint32_t roll) { - dma_addr_t pat_pa = 0; + dma_addr_t pat_pa = 0, data_pa = 0; uint32_t *data; struct pat *pat; struct refill_engine *engine = txn->engine_handle; @@ -223,7 +223,9 @@ static void dmm_txn_append(struct dmm_txn *txn, struct pat_area *area, .lut_id = engine->tcm->lut_id, }; - data = alloc_dma(txn, 4*i, &pat->data_pa); + data = alloc_dma(txn, 4*i, &data_pa); + /* FIXME: what if data_pa is more than 32-bit ? */ + pat->data_pa = data_pa; while (i--) { int n = i + roll; diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c index 95dbce286a41..d9f5e5241af4 100644 --- a/drivers/gpu/drm/omapdrm/omap_gem.c +++ b/drivers/gpu/drm/omapdrm/omap_gem.c @@ -791,7 +791,7 @@ int omap_gem_get_paddr(struct drm_gem_object *obj, omap_obj->paddr = tiler_ssptr(block); omap_obj->block = block; - DBG("got paddr: %08x", omap_obj->paddr); + DBG("got paddr: %pad", &omap_obj->paddr); } omap_obj->paddr_cnt++; @@ -985,9 +985,9 @@ void omap_gem_describe(struct drm_gem_object *obj, struct seq_file *m) off = drm_vma_node_start(&obj->vma_node); - seq_printf(m, "%08x: %2d (%2d) %08llx %08Zx (%2d) %p %4d", + seq_printf(m, "%08x: %2d (%2d) %08llx %pad (%2d) %p %4d", omap_obj->flags, obj->name, obj->refcount.refcount.counter, - off, omap_obj->paddr, omap_obj->paddr_cnt, + off, &omap_obj->paddr, omap_obj->paddr_cnt, omap_obj->vaddr, omap_obj->roll); if (omap_obj->flags & OMAP_BO_TILED) { @@ -1467,8 +1467,8 @@ void omap_gem_init(struct drm_device *dev) entry->paddr = tiler_ssptr(block); entry->block = block; - DBG("%d:%d: %dx%d: paddr=%08x stride=%d", i, j, w, h, - entry->paddr, + DBG("%d:%d: %dx%d: paddr=%pad stride=%d", i, j, w, h, + &entry->paddr, usergart[i].stride_pfn << PAGE_SHIFT); }
[PATCH] drm: bochs: fix warnings
Regular nightly randconfig build testing discovered these warnings: drivers/gpu/drm/bochs/bochs_drv.c:100:12: warning: 'bochs_pm_suspend' defined but not used [-Wunused-function] drivers/gpu/drm/bochs/bochs_drv.c:117:12: warning: 'bochs_pm_resume' defined but not used [-Wunused-function] Fix these by adding the same condition that SET_SYSTEM_SLEEP_PM_OPS() uses. Signed-off-by: Russell King --- There is no maintainers entry for this driver, so I don't know who this should be sent to. drivers/gpu/drm/bochs/bochs_drv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/bochs/bochs_drv.c b/drivers/gpu/drm/bochs/bochs_drv.c index 9c13df29fd20..f5e0ead974a6 100644 --- a/drivers/gpu/drm/bochs/bochs_drv.c +++ b/drivers/gpu/drm/bochs/bochs_drv.c @@ -97,6 +97,7 @@ static struct drm_driver bochs_driver = { /* -- */ /* pm interface */ +#ifdef CONFIG_PM_SLEEP static int bochs_pm_suspend(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); @@ -131,6 +132,7 @@ static int bochs_pm_resume(struct device *dev) drm_kms_helper_poll_enable(drm_dev); return 0; } +#endif static const struct dev_pm_ops bochs_pm_ops = { SET_SYSTEM_SLEEP_PM_OPS(bochs_pm_suspend, -- 1.8.3.1
[PATCH] drm: cirrus: fix warnings
Regular nightly randconfig build testing discovered these warnings: drivers/gpu/drm/cirrus/cirrus_drv.c:79:12: warning: 'cirrus_pm_suspend' defined but not used [-Wunused-function] drivers/gpu/drm/cirrus/cirrus_drv.c:96:12: warning: 'cirrus_pm_resume' defined but not used [-Wunused-function] Fix these by adding the same condition that SET_SYSTEM_SLEEP_PM_OPS() uses. Signed-off-by: Russell King --- There is no maintainers entry for this driver, so I don't know who this should be sent to. drivers/gpu/drm/cirrus/cirrus_drv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/cirrus/cirrus_drv.c b/drivers/gpu/drm/cirrus/cirrus_drv.c index 08ce520f61a5..4516b052cc67 100644 --- a/drivers/gpu/drm/cirrus/cirrus_drv.c +++ b/drivers/gpu/drm/cirrus/cirrus_drv.c @@ -76,6 +76,7 @@ static void cirrus_pci_remove(struct pci_dev *pdev) drm_put_dev(dev); } +#ifdef CONFIG_PM_SLEEP static int cirrus_pm_suspend(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); @@ -110,6 +111,7 @@ static int cirrus_pm_resume(struct device *dev) drm_kms_helper_poll_enable(drm_dev); return 0; } +#endif static const struct file_operations cirrus_driver_fops = { .owner = THIS_MODULE, -- 1.8.3.1
[PATCH 00/83] AMD HSA kernel driver
On Sat, Jul 12, 2014 at 01:10:32PM +0200, Daniel Vetter wrote: > On Sat, Jul 12, 2014 at 11:24:49AM +0200, Christian K?nig wrote: > > Am 11.07.2014 23:18, schrieb Jerome Glisse: > > >On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote: > > >>On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote: > > >>>On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote: > > This patch set implements a Heterogeneous System Architecture > > (HSA) driver > > for radeon-family GPUs. > > >>>This is just quick comments on few things. Given size of this, people > > >>>will need to have time to review things. > > HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to > > share > > system resources more effectively via HW features including > > shared pageable > > memory, userspace-accessible work queues, and platform-level > > atomics. In > > addition to the memory protection mechanisms in GPUVM and > > IOMMUv2, the Sea > > Islands family of GPUs also performs HW-level validation of > > commands passed > > in through the queues (aka rings). > > The code in this patch set is intended to serve both as a sample > > driver for > > other HSA-compatible hardware devices and as a production driver > > for > > radeon-family processors. The code is architected to support > > multiple CPUs > > each with connected GPUs, although the current implementation > > focuses on a > > single Kaveri/Berlin APU, and works alongside the existing radeon > > kernel > > graphics driver (kgd). > > AMD GPUs designed for use with HSA (Sea Islands and up) share > > some hardware > > functionality between HSA compute and regular gfx/compute (memory, > > interrupts, registers), while other functionality has been added > > specifically for HSA compute (hw scheduler for virtualized > > compute rings). > > All shared hardware is owned by the radeon graphics driver, and > > an interface > > between kfd and kgd allows the kfd to make use of those shared > > resources, > > while HSA-specific functionality is managed directly by kfd by > > submitting > > packets into an HSA-specific command queue (the "HIQ"). > > During kfd module initialization a char device node (/dev/kfd) is > > created > > (surviving until module exit), with ioctls for queue creation & > > management, > > and data structures are initialized for managing HSA device > > topology. > > The rest of the initialization is driven by calls from the radeon > > kgd at > > the following points : > > - radeon_init (kfd_init) > > - radeon_exit (kfd_fini) > > - radeon_driver_load_kms (kfd_device_probe, kfd_device_init) > > - radeon_driver_unload_kms (kfd_device_fini) > > During the probe and init processing per-device data structures > > are > > established which connect to the associated graphics kernel > > driver. This > > information is exposed to userspace via sysfs, along with a > > version number > > allowing userspace to determine if a topology change has occurred > > while it > > was reading from sysfs. > > The interface between kfd and kgd also allows the kfd to request > > buffer > > management services from kgd, and allows kgd to route interrupt > > requests to > > kfd code since the interrupt block is shared between regular > > graphics/compute and HSA compute subsystems in the GPU. > > The kfd code works with an open source usermode library > > ("libhsakmt") which > > is in the final stages of IP review and should be published in a > > separate > > repo over the next few days. > > The code operates in one of three modes, selectable via the > > sched_policy > > module parameter : > > - sched_policy=0 uses a hardware scheduler running in the MEC > > block within > > CP, and allows oversubscription (more queues than HW slots) > > - sched_policy=1 also uses HW scheduling but does not allow > > oversubscription, so create_queue requests fail when we run out > > of HW slots > > - sched_policy=2 does not use HW scheduling, so the driver > > manually assigns > > queues to HW slots by programming registers > > The "no HW scheduling" option is for debug & new hardware bringup > > only, so > > has less test coverage than the other options. Default in the > > current code > > is "HW scheduling without oversubscription" since that is where > > we have the > > most test coverage but we expect to change the default to "HW > > scheduling > > with oversubscription" after further testing. This effectively > > removes the > > HW limit on the number of work queues available to applications. > > Programs
[RESEND PATCH v3 05/11] drm: add Atmel HLCDC Display Controller support
Hello, On Mon, 7 Jul 2014 18:42:58 +0200 Boris BREZILLON wrote: > +int atmel_hlcdc_layer_disable(struct atmel_hlcdc_layer *layer) > +{ > + struct atmel_hlcdc_layer_dma_channel *dma = &layer->dma; > + unsigned long flags; > + int i; > + > + spin_lock_irqsave(&dma->lock, flags); > + for (i = 0; i < layer->max_planes; i++) { > + if (!dma->cur[i]) > + break; > + > + dma->cur[i]->ctrl = 0; > + } > + spin_unlock_irqrestore(&dma->lock, flags); > + > + return 0; > +} I'm trying to simplify the hlcdc_layer code and in order to do that I need to know what's expected when a user calls plane_disable (or more exactly DRM_IOCTL_MODE_SETPLANE ioctl call with the frame buffer ID set to 0). The HLCDC Display Controller support two types of disable: 1) The plane is disabled at the end of the current frame (the is the solution I'm using) 2) The plane is disabled right away (I haven't tested it, but I think this solution could generate some sort of artifacts for a short period of time, because the framebuffer might be partially displayed) If solution 1 is chosen, should I wait for the plane to be actually disabled before returning ? A the moment, I'm not: I'm just asking for the plane to be disabled and then return. And this is where some of my complicated code come from, because I must handle the case where a user disable the plane then re enable it right away (modetest cursor test is doing a lot of cursor enable/disable in a short period of time, and this is how I tested all this weird use cases). Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com
[RESEND PATCH v3 05/11] drm: add Atmel HLCDC Display Controller support
On Sat, Jul 12, 2014 at 2:16 PM, Boris BREZILLON wrote: > Hello, > > On Mon, 7 Jul 2014 18:42:58 +0200 > Boris BREZILLON wrote: > > >> +int atmel_hlcdc_layer_disable(struct atmel_hlcdc_layer *layer) >> +{ >> + struct atmel_hlcdc_layer_dma_channel *dma = &layer->dma; >> + unsigned long flags; >> + int i; >> + >> + spin_lock_irqsave(&dma->lock, flags); >> + for (i = 0; i < layer->max_planes; i++) { >> + if (!dma->cur[i]) >> + break; >> + >> + dma->cur[i]->ctrl = 0; >> + } >> + spin_unlock_irqrestore(&dma->lock, flags); >> + >> + return 0; >> +} > > > I'm trying to simplify the hlcdc_layer code and in order to do that I > need to know what's expected when a user calls plane_disable (or more > exactly DRM_IOCTL_MODE_SETPLANE ioctl call with the frame buffer ID set > to 0). > > The HLCDC Display Controller support two types of disable: > > 1) The plane is disabled at the end of the current frame (the is the > solution I'm using) > > 2) The plane is disabled right away (I haven't tested it, but I think > this solution could generate some sort of artifacts for a short period > of time, because the framebuffer might be partially displayed) > > If solution 1 is chosen, should I wait for the plane to be actually > disabled before returning ? for cursor in particular, if you block, it is going to be a massive slowdown for some apps. I remember at least older gdm would rapidly flash a spinning cursor. As a result, if you wait for vsync each time, it would take a couple minutes to login! if #2 works, I'd recommend it. Otherwise you may have to do some of the same hijinks that I have to do in mdp4_crtc for the cursor. BR, -R > A the moment, I'm not: I'm just asking for the plane to be disabled and > then return. And this is where some of my complicated code come from, > because I must handle the case where a user disable the plane then re > enable it right away (modetest cursor test is doing a lot of cursor > enable/disable in a short period of time, and this is how I tested all > this weird use cases). > > Best Regards, > > Boris > > -- > Boris Brezillon, Free Electrons > Embedded Linux and Kernel engineering > http://free-electrons.com > ___ > dri-devel mailing list > dri-devel at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 73901] Kernel crash after modprobe radeon runpm=1
https://bugzilla.kernel.org/show_bug.cgi?id=73901 Pali Roh?r changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |CODE_FIX --- Comment #17 from Pali Roh?r --- Ok, when set auto control via $ echo auto > /sys/bus/pci/devices/:01:00.0/power/control card is automatically turned off when it is not used. When I set on via $ echo on > /sys/bus/pci/devices/:01:00.0/power/control then it is always on. So it working as expected and closing this bug as fixed. -- You are receiving this mail because: You are watching the assignee of the bug.
[Bug 79591] possible circular locking dependency detected
https://bugzilla.kernel.org/show_bug.cgi?id=79591 --- Comment #5 from Martin Peres --- Created attachment 142831 --> https://bugzilla.kernel.org/attachment.cgi?id=142831&action=edit drm/nouveau/therm: fix a potential deadlock in the therm monitoring code Sorry for the wait. Can you try to reproduce the issue with this patch? -- You are receiving this mail because: You are watching the assignee of the bug.
[PATCH 00/83] AMD HSA kernel driver
On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote: > On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote: > > On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote: > > > On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote: > > > > This patch set implements a Heterogeneous System Architecture > > > > (HSA) driver > > > > for radeon-family GPUs. > > > This is just quick comments on few things. Given size of this, > > > people > > > will need to have time to review things. > > > > HSA allows different processor types (CPUs, DSPs, GPUs, > > > > etc..) to > > > > share > > > > system resources more effectively via HW features including > > > > shared pageable > > > > memory, userspace-accessible work queues, and platform-level > > > > atomics. In > > > > addition to the memory protection mechanisms in GPUVM and > > > > IOMMUv2, the Sea > > > > Islands family of GPUs also performs HW-level validation of > > > > commands passed > > > > in through the queues (aka rings). > > > > The code in this patch set is intended to serve both as a > > > > sample > > > > driver for > > > > other HSA-compatible hardware devices and as a production > > > > driver > > > > for > > > > radeon-family processors. The code is architected to support > > > > multiple CPUs > > > > each with connected GPUs, although the current implementation > > > > focuses on a > > > > single Kaveri/Berlin APU, and works alongside the existing > > > > radeon > > > > kernel > > > > graphics driver (kgd). > > > > AMD GPUs designed for use with HSA (Sea Islands and up) share > > > > some hardware > > > > functionality between HSA compute and regular gfx/compute > > > > (memory, > > > > interrupts, registers), while other functionality has been > > > > added > > > > specifically for HSA compute (hw scheduler for virtualized > > > > compute rings). > > > > All shared hardware is owned by the radeon graphics driver, > > > > and > > > > an interface > > > > between kfd and kgd allows the kfd to make use of those > > > > shared > > > > resources, > > > > while HSA-specific functionality is managed directly by kfd > > > > by > > > > submitting > > > > packets into an HSA-specific command queue (the "HIQ"). > > > > During kfd module initialization a char device node > > > > (/dev/kfd) is > > > > created > > > > (surviving until module exit), with ioctls for queue > > > > creation & > > > > management, > > > > and data structures are initialized for managing HSA device > > > > topology. > > > > The rest of the initialization is driven by calls from the > > > > radeon > > > > kgd at > > > > the following points : > > > > - radeon_init (kfd_init) > > > > - radeon_exit (kfd_fini) > > > > - radeon_driver_load_kms (kfd_device_probe, kfd_device_init) > > > > - radeon_driver_unload_kms (kfd_device_fini) > > > > During the probe and init processing per-device data > > > > structures > > > > are > > > > established which connect to the associated graphics kernel > > > > driver. This > > > > information is exposed to userspace via sysfs, along with a > > > > version number > > > > allowing userspace to determine if a topology change has > > > > occurred > > > > while it > > > > was reading from sysfs. > > > > The interface between kfd and kgd also allows the kfd to > > > > request > > > > buffer > > > > management services from kgd, and allows kgd to route > > > > interrupt > > > > requests to > > > > kfd code since the interrupt block is shared between regular > > > > graphics/compute and HSA compute subsystems in the GPU. > > > > The kfd code works with an open source usermode library > > > > ("libhsakmt") which > > > > is in the final stages of IP review and should be published > > > > in a > > > > separate > > > > repo over the next few days. > > > > The code operates in one of three modes, selectable via the > > > > sched_policy > > > > module parameter : > > > > - sched_policy=0 uses a hardware scheduler running in the MEC > > > > block within > > > > CP, and allows oversubscription (more queues than HW slots) > > > > - sched_policy=1 also uses HW scheduling but does not allow > > > > oversubscription, so create_queue requests fail when we run > > > > out > > > > of HW slots > > > > - sched_policy=2 does not use HW scheduling, so the driver > > > > manually assigns > > > > queues to HW slots by programming registers > > > > The "no HW scheduling" option is for debug & new hardware > > > > bringup > > > > only, so > > > > has less test coverage than the other options. Default in the > > > > current code > > > > is "HW scheduling without oversubscription" since that is > > > > where > > > > we have the > > > > most test coverage but we expect to change the default to "HW > > > > scheduling > > > > with oversubscription" after further testing. This > > > > effectively > > > >
[PATCH 00/83] AMD HSA kernel driver
On Sat, Jul 12, 2014 at 09:55:49PM +, Gabbay, Oded wrote: > On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote: > > On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote: > > > On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote: > > > > On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote: > > > > > This patch set implements a Heterogeneous System Architecture > > > > > (HSA) driver > > > > > for radeon-family GPUs. > > > > This is just quick comments on few things. Given size of this, > > > > people > > > > will need to have time to review things. > > > > > HSA allows different processor types (CPUs, DSPs, GPUs, > > > > > etc..) to > > > > > share > > > > > system resources more effectively via HW features including > > > > > shared pageable > > > > > memory, userspace-accessible work queues, and platform-level > > > > > atomics. In > > > > > addition to the memory protection mechanisms in GPUVM and > > > > > IOMMUv2, the Sea > > > > > Islands family of GPUs also performs HW-level validation of > > > > > commands passed > > > > > in through the queues (aka rings). > > > > > The code in this patch set is intended to serve both as a > > > > > sample > > > > > driver for > > > > > other HSA-compatible hardware devices and as a production > > > > > driver > > > > > for > > > > > radeon-family processors. The code is architected to support > > > > > multiple CPUs > > > > > each with connected GPUs, although the current implementation > > > > > focuses on a > > > > > single Kaveri/Berlin APU, and works alongside the existing > > > > > radeon > > > > > kernel > > > > > graphics driver (kgd). > > > > > AMD GPUs designed for use with HSA (Sea Islands and up) share > > > > > some hardware > > > > > functionality between HSA compute and regular gfx/compute > > > > > (memory, > > > > > interrupts, registers), while other functionality has been > > > > > added > > > > > specifically for HSA compute (hw scheduler for virtualized > > > > > compute rings). > > > > > All shared hardware is owned by the radeon graphics driver, > > > > > and > > > > > an interface > > > > > between kfd and kgd allows the kfd to make use of those > > > > > shared > > > > > resources, > > > > > while HSA-specific functionality is managed directly by kfd > > > > > by > > > > > submitting > > > > > packets into an HSA-specific command queue (the "HIQ"). > > > > > During kfd module initialization a char device node > > > > > (/dev/kfd) is > > > > > created > > > > > (surviving until module exit), with ioctls for queue > > > > > creation & > > > > > management, > > > > > and data structures are initialized for managing HSA device > > > > > topology. > > > > > The rest of the initialization is driven by calls from the > > > > > radeon > > > > > kgd at > > > > > the following points : > > > > > - radeon_init (kfd_init) > > > > > - radeon_exit (kfd_fini) > > > > > - radeon_driver_load_kms (kfd_device_probe, kfd_device_init) > > > > > - radeon_driver_unload_kms (kfd_device_fini) > > > > > During the probe and init processing per-device data > > > > > structures > > > > > are > > > > > established which connect to the associated graphics kernel > > > > > driver. This > > > > > information is exposed to userspace via sysfs, along with a > > > > > version number > > > > > allowing userspace to determine if a topology change has > > > > > occurred > > > > > while it > > > > > was reading from sysfs. > > > > > The interface between kfd and kgd also allows the kfd to > > > > > request > > > > > buffer > > > > > management services from kgd, and allows kgd to route > > > > > interrupt > > > > > requests to > > > > > kfd code since the interrupt block is shared between regular > > > > > graphics/compute and HSA compute subsystems in the GPU. > > > > > The kfd code works with an open source usermode library > > > > > ("libhsakmt") which > > > > > is in the final stages of IP review and should be published > > > > > in a > > > > > separate > > > > > repo over the next few days. > > > > > The code operates in one of three modes, selectable via the > > > > > sched_policy > > > > > module parameter : > > > > > - sched_policy=0 uses a hardware scheduler running in the MEC > > > > > block within > > > > > CP, and allows oversubscription (more queues than HW slots) > > > > > - sched_policy=1 also uses HW scheduling but does not allow > > > > > oversubscription, so create_queue requests fail when we run > > > > > out > > > > > of HW slots > > > > > - sched_policy=2 does not use HW scheduling, so the driver > > > > > manually assigns > > > > > queues to HW slots by programming registers > > > > > The "no HW scheduling" option is for debug & new hardware > > > > > bringup > > > > > only, so > > > > > has less test coverage than the other options. Default in the > > > > > current co