Re: [amdgpu] deadlock

2021-02-03 Thread Bridgman, John
‎>>Uh, that doesn't work. If you want infinite compute queues you need the
amdkfd model with preempt-ctx dma_fence. If you allow normal cs ioctl to
run forever, you just hang the kernel whenever userspace feels like. Not
just the gpu, the kernel (anything that allocates memory, irrespective of
process can hang). That's no good.

We have moved from using gfx paths to using kfd paths as of the 20.45 release a 
couple of months ago. Not sure if that applies to APU's yet but if not I would 
expect it to just be a matter of time.

Thanks,
John
  Original Message
From: Daniel Vetter
Sent: Wednesday, February 3, 2021 9:27 AM
To: Alex Deucher
Cc: Linux Kernel Mailing List; dri-devel; amd-gfx list; Deucher, Alexander; 
Daniel Gomez; Koenig, Christian
Subject: Re: [amdgpu] deadlock


On Wed, Feb 03, 2021 at 08:56:17AM -0500, Alex Deucher wrote:
> On Wed, Feb 3, 2021 at 7:30 AM Christian König  
> wrote:
> >
> > Am 03.02.21 um 13:24 schrieb Daniel Vetter:
> > > On Wed, Feb 03, 2021 at 01:21:20PM +0100, Christian König wrote:
> > >> Am 03.02.21 um 12:45 schrieb Daniel Gomez:
> > >>> On Wed, 3 Feb 2021 at 10:47, Daniel Gomez  wrote:
> >  On Wed, 3 Feb 2021 at 10:17, Daniel Vetter  wrote:
> > > On Wed, Feb 3, 2021 at 9:51 AM Christian König 
> > >  wrote:
> > >> Am 03.02.21 um 09:48 schrieb Daniel Vetter:
> > >>> On Wed, Feb 3, 2021 at 9:36 AM Christian König 
> > >>>  wrote:
> >  Hi Daniel,
> > 
> >  this is not a deadlock, but rather a hardware lockup.
> > >>> Are you sure? Ime getting stuck in dma_fence_wait has generally good
> > >>> chance of being a dma_fence deadlock. GPU hang should never result 
> > >>> in
> > >>> a forever stuck dma_fence.
> > >> Yes, I'm pretty sure. Otherwise the hardware clocks wouldn't go up 
> > >> like
> > >> this.
> > > Maybe clarifying, could be both. TDR should notice and get us out of
> > > this, but if there's a dma_fence deadlock and we can't re-emit or
> > > force complete the pending things, then we're stuck for good.
> > > -Daniel
> > >
> > >> Question is rather why we end up in the userptr handling for GFX? Our
> > >> ROCm OpenCL stack shouldn't use this.
> > >>
> > >>> Daniel, can you pls re-hang your machine and then dump backtraces of
> > >>> all tasks into dmesg with sysrq-t, and then attach that? Without all
> > >>> the backtraces it's tricky to construct the full dependency chain of
> > >>> what's going on. Also is this plain -rc6, not some more patches on
> > >>> top?
> > >> Yeah, that's still a good idea to have.
> >  Here the full backtrace dmesg logs after the hang:
> >  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fraw%2Fkzivm2L3&data=04%7C01%7Cjohn.bridgman%40amd.com%7Cbe4d5642b52242dd9fdb08d8c84fca13%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637479592320075525%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=z5%2FBK1akJi7%2BGrUZOA8cmyN7uOAn02ckU4tv1EprVQk%3D&reserved=0
> > 
> >  This is another dmesg log with the backtraces after SIGKILL the matrix 
> >  process:
> >  (I didn't have the sysrq enable at the time):
> >  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fraw%2FpRBwGcj1&data=04%7C01%7Cjohn.bridgman%40amd.com%7Cbe4d5642b52242dd9fdb08d8c84fca13%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637479592320075525%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TPt%2BS8l6%2Boza78KQwGplTf%2FHj5guCctbJGq3WiioIGg%3D&reserved=0
> > >>> I've now removed all our v4l2 patches and did the same test with the 
> > >>> 'plain'
> > >>> mainline version (-rc6).
> > >>>
> > >>> Reference: 3aaf0a27ffc29b19a62314edd684b9bc6346f9a8
> > >>>
> > >>> Same error, same behaviour. Full dmesg log attached:
> > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fraw%2FKgaEf7Y1&data=04%7C01%7Cjohn.bridgman%40amd.com%7Cbe4d5642b52242dd9fdb08d8c84fca13%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637479592320075525%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wG%2Bex7ZOJWd%2B4ZQyhJA%2BHTVbzZXC2lRPSvzYfZlzwIY%3D&reserved=0
> > >>> Note:
> > >>> dmesg with sysrq-t before running the test starts in [  122.016502]
> > >>> sysrq: Show State
> > >>> dmesg with sysrq-t after the test starts in: [  495.587671] sysrq: 
> > >>> Show State
> > >> There is nothing amdgpu related in there except for waiting for the
> > >> hardware.
> > > Yeah, but there's also no other driver that could cause a stuck dma_fence,
> > > so why is reset not cleaning up the mess here? Irrespective of why the gpu
> > > is stuck, the kernel should at least complete all the dma_fences even if
> > > the gpu for some reason is terminally ill ...
> >
> > That's a good question as well. I'm digging 

Re: [bug] Radeon 3900XT not switch to graphic mode on kernel 5.10

2020-12-27 Thread Bridgman, John
[AMD Official Use Only - Internal Distribution Only]

If you want to pick up the firmware directly it is maintained at...

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/amdgpu


-rw-r--r-- sienna_cichlid_ce.bin 263296   logstatsplain
-rw-r--r-- sienna_cichlid_dmcub.bin 80244 logstatsplain
-rw-r--r-- sienna_cichlid_me.bin 263424   logstatsplain
-rw-r--r-- sienna_cichlid_mec.bin 268592  logstatsplain
-rw-r--r-- sienna_cichlid_mec2.bin 268592 logstatsplain
-rw-r--r-- sienna_cichlid_pfp.bin 263424  logstatsplain
-rw-r--r-- sienna_cichlid_rlc.bin 128592  logstatsplain
-rw-r--r-- sienna_cichlid_sdma.bin 34048  logstatsplain
-rw-r--r-- sienna_cichlid_smc.bin 247396  logstatsplain
-rw-r--r-- sienna_cichlid_sos.bin 215152  logstatsplain
-rw-r--r-- sienna_cichlid_ta.bin 333568   logstatsplain
-rw-r--r-- sienna_cichlid_vcn.bin 504224  logstatsplain

My understanding was that the firmware was also added to Fedora back in 
November but I'm having a tough time finding confirmation of that.



From: amd-gfx  on behalf of Mikhail 
Gavrilov 
Sent: December 27, 2020 11:39 AM
To: amd-gfx list ; Linux List Kernel Mailing 
; dri-devel 
Subject: [bug] Radeon 3900XT not switch to graphic mode on kernel 5.10

Hi folks.
I bought myself a gift a new AMD 6900 XT graphics card to replace the
AMD Radeon VII.
But all joy was overshadowed that this video card did not working in Linux.
Output on the my boot screen was ended with message "fb0: switching to
amdgpudrmfb from EFI VGA" and videocard not switched to graphic mode.
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fphotos.app.goo.gl%2FzwpErNrusq9CNyES7&data=04%7C01%7Cjohn.bridgman%40amd.com%7C37ba164fc80241451b9808d8aa864aa2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637446841356919588%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=Y3V3lbEaXNwHiakgRUAeO7gBJASeElBaIwZ9Vmd0AgU%3D&reserved=0

I suppose the root of cause my problem here:

[3.961326] amdgpu :0b:00.0: Direct firmware load for
amdgpu/sienna_cichlid_sos.bin failed with error -2
[3.961359] amdgpu :0b:00.0: amdgpu: failed to init sos firmware
[3.961433] [drm:psp_sw_init [amdgpu]] *ERROR* Failed to load psp firmware!
[3.961529] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init
of IP block  failed -2
[3.961549] amdgpu :0b:00.0: amdgpu: amdgpu_device_ip_init failed
[3.961569] amdgpu :0b:00.0: amdgpu: Fatal error during GPU init
[3.961911] amdgpu: probe of :0b:00.0 failed with error -2

Can anybody here help me get firmware?
my distro: Fedora Rawhide
kernel: 5.10 rc6
mesa: from git 21.0.0 devel

Sorry for disturb and merry xmas.


--
Best Regards,
Mike Gavrilov.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 00/13] drm/amdgpu: Add virtual display feature.

2016-08-04 Thread Bridgman, John
>-Original Message-
>From: dri-devel [mailto:dri-devel-bounces at lists.freedesktop.org] On Behalf
>Of Daniel Vetter
>Sent: Thursday, August 04, 2016 1:23 PM
>To: Alex Deucher
>Cc: Deng, Emily; amd-gfx list; Maling list - DRI developers
>Subject: Re: [PATCH 00/13] drm/amdgpu: Add virtual display feature.
>
>On Thu, Aug 04, 2016 at 12:53:04PM -0400, Alex Deucher wrote:
>> On Thu, Aug 4, 2016 at 12:24 PM, Daniel Vetter  wrote:
>> > On Thu, Aug 04, 2016 at 10:59:38AM -0400, Alex Deucher wrote:
>> >> Adding dri-devel.
>> >>
>> >> This patch set basically adds a driver option to enable virtual
>> >> display hw if the user needs it (e.g., virtualization environments,
>> >> headless cards, pre-silicon environments, etc.).  It looks like a
>> >> regular KMS crtc/encoder/connector and works with existing
>> >> userspace unchanged.
>> >
>> > We autodetect this already for virtualized envirnments and pre-silicon.
>>
>> What do you mean?  What do you do in those cases?
>
>So virtualized (xengt) I think just fakes one display/connector, I didn't 
>really
>check the details tbh. We had code floating around (but not merged) for the
>management console on servers, which injected a special config into iirc the
>VGA port. So a mix of kernel driver and hw/firmware tricks.
>
>Pre-silicon just has a bunch of things to fake enough of a display since not 
>all
>environments have the full display block simulated.
>
>Anyway just wanted to say that we have piles of precendence for faking semi-
>virtual outputs.
>
>> > Not so sure about headless cards, on those we don't bother to expose
>> > anything if there's nothing connected (i.e. no crtc/encoder/plane or
>> > connector objects at all). Why do you want fake outputs in that case?
>>
>> We have some customers that want to run X or other desktops on
>> hardware without display connectors or even chips without display hw
>> at all.
>
>Hm, for that I'd say virtual output/screen in X. Adding a fake output in the
>kernel where there really is nothing at all feels a bit wrong. All the examples
>above mean that the output is actually connected to something on the other
>side (screen on the management console, host OS for xengt or pre-silicon
>simulations).
>
>> > Anyway, if this is just a modparam and disabled by default I don't
>> > see any issue really at all.
>>
>> Yes, this is controlled by a module option and is intended to only be
>> enabled by the user for specific use cases.
>
>Another option for entirely fake outputs would be vkms.ko, similar to
>vgem.ko. With the simple display driver it should be fairly easy to a simple
>fake kms driver with just 1 crtc/encoder/connector/plane, all virtual,
>up&running. Needs a few lines to implement dumb mmap on top of shmem
>(but nothing else, since the driver never reads the buffer), plus prime
>import/export scaffolding. One module option (could even adjust at
>runtime) to configure how many drm_device instance there should be.
>Output configuration could be done by injecting a suitable EDID plus forcing
>the connector state (we have interfaces for that already, and iirc even patches
>to expose them all in sysfs).
>
>I'd say if you really want entirely fake/virtual outputs that go exactly 
>nowhere
>at all, vkms.ko would be the cleanest approach. And that would have lots of
>use-cases outputs of just what you need, for e.g. testing kms helpers/ioctls
>and other nice things.

FWIW I don't think we ever plan to have virtual outputs that go nowhere - the 
display content just might go out via VNC or a compressed streaming interface 
rather than through a local display controller. 

>-Daniel
>
>>
>> Alex
>>
>> > -Daniel
>> >
>> >>
>> >> Alex
>> >>
>> >> On Thu, Aug 4, 2016 at 3:04 AM, Emily Deng 
>wrote:
>> >> > The Virtual Display feature is to fake a display engine in amdgpu kernel
>driver, which allows any other kernel modules or user mode components to
>work as expected even without real display HW. User can get the
>desktop/primary surface through remote desktop tools instead of displaying
>HW associated with the GPU.
>> >> > The virtual display feature is designed for following cases:
>> >> > 1)Headless GPU, which has no display engine, while for some
>> >> > reason the X server is required to initialize in this GPU; 2)GPU
>> >> > with head (display engine) but Video BIOS disables display capability 
>> >> > for
>some reason. For example, SR-IOV virtualization enabled Video BIOS often
>disables display connector. Some S-series Pro-Graphics designed for headless
>computer also disable display capability in Video BIOS; 3)For whatever reason,
>end user wants to enable a virtual display (don’t need HW display 
>capability).
>> >> >
>> >> > Emily Deng (13):
>> >> >   drm/amdgpu: Add virtual connector and encoder macros.
>> >> >   drm/amdgpu: Initialize dce_virtual_ip_funcs
>> >> >   drm/amdgpu: Initialize dce_virtual_display_funcs.
>> >> >   drm/amdgpu: Initialize crtc, pageflip irq funcs
>> >> >   drm/amdgpu: Initialize dce_virtual

Re: [Intel-gfx] [Mesa-dev] gitlab.fd.o financial situation and impact on services

2020-03-01 Thread Bridgman, John
[AMD Official Use Only - Internal Distribution Only]

The one suggestion I saw that definitely seemed worth looking at was adding 
download caches if the larger CI systems didn't already have them.

Then again do we know that CI traffic is generating the bulk of the costs ? My 
guess would have been that individual developers and users would be generating 
as much traffic as the CI rigs.


From: amd-gfx  on behalf of Jason 
Ekstrand 
Sent: March 1, 2020 3:18 PM
To: Jacob Lifshay ; Nicolas Dufresne 

Cc: Erik Faye-Lund ; Daniel Vetter 
; Michel Dänzer ; X.Org development 
; amd-gfx list ; wayland 
; X.Org Foundation Board 
; Xorg Members List ; dri-devel 
; Mesa Dev ; 
intel-gfx ; Discussion of the development of 
and with GStreamer 
Subject: Re: [Intel-gfx] [Mesa-dev] gitlab.fd.o financial situation and impact 
on services

I don't think we need to worry so much about the cost of CI that we need to 
micro-optimize to to get the minimal number of CI runs. We especially shouldn't 
if it begins to impact coffee quality, people's ability to merge patches in a 
timely manner, or visibility into what went wrong when CI fails. I've seen a 
number of suggestions which will do one or both of those things including:

 - Batching merge requests
 - Not running CI on the master branch
 - Shutting off CI
 - Preventing CI on other non-MR branches
 - Disabling CI on WIP MRs
 - I'm sure there are more...

I think there are things we can do to make CI runs more efficient with some 
sort of end-point caching and we can probably find some truly wasteful CI to 
remove. Most of the things in the list above, I've seen presented by people who 
are only lightly involved the project to my knowledge (no offense to anyone 
intended).  Developers depend on the CI system for their day-to-day work and 
hampering it will only show down development, reduce code quality, and 
ultimately hurt our customers and community. If we're so desperate as to be 
considering painful solutions which will have a negative impact on development, 
we're better off trying to find more money.

--Jason


On March 1, 2020 13:51:32 Jacob Lifshay  wrote:

One idea for Marge-bot (don't know if you already do this):
Rust-lang has their bot (bors) automatically group together a few merge 
requests into a single merge commit, which it then tests, then, then the tests 
pass, it merges. This could help reduce CI runs to once a day (or some other 
rate). If the tests fail, then it could automatically deduce which one failed, 
by recursive subdivision or similar. There's also a mechanism to adjust 
priority and grouping behavior when the defaults aren't sufficient.

Jacob
___
Intel-gfx mailing list
intel-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [pull] radeon drm-fixes-4.11

2017-03-29 Thread Bridgman, John
This is a request for Dave to pull changes from Alex's tree into Dave's 
"drm-fixes" tree, which is the last step before it gets sent to Linus.


Dave is the drm subsystem maintainer, and drm-next / drm-fixes branches are 
where code from multiple GPU driver maintainers comes together. Dave would get 
similar requests from Intel, Nouveau developers etc...



From: amd-gfx  on behalf of Panariti, 
David 
Sent: March 29, 2017 1:53 PM
To: Alex Deucher; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org; airl...@gmail.com
Cc: Deucher, Alexander
Subject: RE: [pull] radeon drm-fixes-4.11

Hi,

I'm still new to this stuff.
Is this informational or some action items?

thanks,
davep

> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Alex Deucher
> Sent: Wednesday, March 29, 2017 12:55 PM
> To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org;
> airl...@gmail.com
> Cc: Deucher, Alexander 
> Subject: [pull] radeon drm-fixes-4.11
>
> Hi Dave,
>
> One small fix for radeon.
>
> The following changes since commit
> d64a04720b0e64c1cd0726a3a27b360822fbee22:
>
>   Merge branch 'drm-fixes-4.11' of git://people.freedesktop.org/~agd5f/linux
> into drm-fixes (2017-03-24 11:05:06 +1000)
>
> are available in the git repository at:
>
>   git://people.freedesktop.org/~agd5f/linux drm-fixes-4.11
>
> for you to fetch changes up to
> ce4b4f228e51219b0b79588caf73225b08b5b779:
>
>   drm/radeon: Override fpfn for all VRAM placements in radeon_evict_flags
> (2017-03-27 16:17:30 -0400)
>
> 
> Michel Dänzer (1):
>   drm/radeon: Override fpfn for all VRAM placements in radeon_evict_flags
>
>  drivers/gpu/drm/radeon/radeon_ttm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Enable AMDGPU for CIK by default

2016-11-07 Thread Bridgman, John

>-Original Message-
>From: dri-devel [mailto:dri-devel-bounces at lists.freedesktop.org] On Behalf
>Of Michel Dänzer
>Sent: Monday, November 07, 2016 2:24 AM
>To: Sandeep
>Cc: dri-devel at lists.freedesktop.org
>Subject: Re: Enable AMDGPU for CIK by default
>
>On 07/11/16 03:56 AM, Sandeep wrote:
>> Hello,
>>
>> I was wondering when DRM_AMDGPU_CIK would be turned on by default
>in
>> the upstream kernel (or is this upto individual distros?)
>>
>> Is there any work left to be done/bugs to be fixed before it can be
>> enabled by default?
>
>There are still some functional regressions, notably amdgpu doesn't support
>HDMI/DP audio yet.
>
>Also, simply enabling DRM_AMDGPU_CIK by default isn't a good solution,
>since the radeon driver also still supports the CIK devices, and there's no
>good mechanism to choose which driver gets to drive a particular GPU at
>runtime.

Right... we would need corresponding logic to disable CIK support in radeon. 

Would we need two flags, one for each driver, or could we define a flag at
drivers/gpu/drm level which would choose between radeon and amdgpu for
CIK hardware ? Even as I type that I don't like it... so two flags I guess.

>
>
>--
>Earthling Michel Dänzer   |   http://www.amd.com
>Libre software enthusiast | Mesa and X developer
>___
>dri-devel mailing list
>dri-devel at lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: Slower 3D with kernel 4.11.x

2017-06-01 Thread Bridgman, John
Hmm... more powerplay error messages than I am used to seeing, plus a bunch of 
GPUVM faults, plus a stack trace. 

My first thought would be to ask if you could go back to the previous kernel, 
boot up and send a dmesg from that to see how many of those error messages are 
new. 

>-Original Message-
>From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf
>Of Daniel Mota Leite
>Sent: Thursday, June 01, 2017 4:18 PM
>To: Alex Deucher
>Cc: DRI Development
>Subject: Re: Slower 3D with kernel 4.11.x
>
>On Thu, 1 Jun 2017 08:44:31 -0400, Alex Deucher 
>wrote:
>> > I upgraded a few weeks ago to kernel 4.11.x from 4.10.x and
>> > notice a drop in performance in my AMD RX480, using mesa 17.2-dev
>> > and a A10-7890k APU
>> Please attach your dmesg output.
>
>   Dmesg attached.
>
>   Thanks for the help
>higuita
>--
>Naturally the common people don't want war... but after all it is the leaders 
>of
>a country who determine the policy, and it is always a simple matter to drag
>the people along, whether it is a democracy, or a fascist dictatorship, or a
>parliament, or a communist dictatorship.
>Voice or no voice, the people can always be brought to the bidding of the
>leaders. That is easy. All you have to do is tell them they are being attacked,
>and denounce the pacifists for lack of patriotism and exposing the country to
>danger.  It works the same in every country.
>   -- Hermann Goering, Nazi and war criminal, 1883-1946
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: Slower 3D with kernel 4.11.x

2017-06-02 Thread Bridgman, John

>-Original Message-
>From: Daniel Mota Leite [mailto:dan...@motaleite.net]
>Sent: Friday, June 02, 2017 9:47 PM
>To: Bridgman, John
>Cc: Alex Deucher; DRI Development
>Subject: Re: Slower 3D with kernel 4.11.x
>
>On Thu, 1 Jun 2017 21:09:03 +, "Bridgman, John"
> wrote:
>> Hmm... more powerplay error messages than I am used to seeing, plus a
>> bunch of GPUVM faults, plus a stack trace.
>>
>> My first thought would be to ask if you could go back to the previous
>> kernel, boot up and send a dmesg from that to see how many of those
>> error messages are new.
>
>   See attached file in kernel 4.10.12, boot and after running mad max
>vulkan benchmark , opengl benchmark and then war thunder benchmark.

OK, no powerplay messages that time. Was performance back to what you expected ?

>
>   I would say that the gpu faults are happening in vulkan.
>I can try to update mesa to the latest git or downgrade to 17.1 if you want or
>change the libdrm version (2.4.80 right now)
>
>Thanks
>higuita
>--
>Naturally the common people don't want war... but after all it is the leaders 
>of
>a country who determine the policy, and it is always a simple matter to drag
>the people along, whether it is a democracy, or a fascist dictatorship, or a
>parliament, or a communist dictatorship.
>Voice or no voice, the people can always be brought to the bidding of the
>leaders. That is easy. All you have to do is tell them they are being attacked,
>and denounce the pacifists for lack of patriotism and exposing the country to
>danger.  It works the same in every country.
>   -- Hermann Goering, Nazi and war criminal, 1883-1946
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


amdgpu/radeonsi support for mobile FirePro?

2016-01-08 Thread Bridgman, John
I'm pretty sure that the 5170M and 5130M are both based on the Cape Verde GPU, 
so they would use the radeon stack rather than the amdgpu stack. Note that 
"radeonsi" refers to the Mesa GL driver, which is used with both radeon and 
amdgpu - it's primarily the kernel driver that is different between the stacks.

You can run lspci -nn to get the PCI IDs for the parts (if you have access to 
HW). If not, I believe the DID for 5170M was 0x6820 but it can vary from one 
OEM to the next. Couldn't find a reference ID for the 5130M but 6821 might be 
it.

Support for the parts should be pretty solid although if the IDs aren't yet in 
the driver source then the drivers won't recognize them until that is changed.

From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
Alex G.S.
Sent: Friday, January 08, 2016 3:01 PM
To: dri-devel at lists.freedesktop.org
Subject: AMD: amdgpu/radeonsi support for mobile FirePro?

Dear Radeon Devs,

What's the support status of the  AMD FirePro W5170M and AMD FirePro W5130M.  
I'm confused as to whether these will support 'radeonsi' or 'amdgpu'.  After 
doing exhaustive research I've come across codenames like Tropo XT/LE and Mars 
XT/LE.  On some forums a pci-id was mentioned '6821' but I can't find this 
anywhere in the pcidb lists in either 'xf86-video-amdgpu' or 'xf86-video-ati'.

When I looked at the the Radeon feature matrix on [1] I'm unable to categorize 
these two GPU models into any of the categories S.Islands, C.Islands or 
V.Islands.  So are these GPU's supported by 'radeonsi' or by 'amdgpu'?  If 
they're 'radeonsi' what's the support like for those models, can I expect 
relatively good performance?

Thank you!

--- Alex G.S.

[1] http://xorg.freedesktop.org/wiki/RadeonFeature/
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91880] Radeonsi on Grenada cards (r9 390) exceptionally unstable and poorly performing

2016-07-20 Thread Bridgman, John
Yep, agree. Will see if I can get that documented. Thanks !!

From: dri-devel [mailto:dri-devel-bounces at lists.freedesktop.org] On Behalf 
Of bugzilla-dae...@freedesktop.org
Sent: Tuesday, July 19, 2016 11:23 PM
To: dri-devel at lists.freedesktop.org
Subject: [Bug 91880] Radeonsi on Grenada cards (r9 390) exceptionally unstable 
and poorly performing

Comment # 112 on bug 
91880 from Chris 
Waters

> Yep, that's a fair point. I was just trying to make sure we were collecting

> good data. Thanks.



I'm more than willing to help test, just need directions. Having everything

split up over all these comments is messy and makes this nigh impossible to

figure out for those not familiar with the process.


You are receiving this mail because:

  *   You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV

2014-07-11 Thread Bridgman, John


>-Original Message-
>From: dri-devel [mailto:dri-devel-bounces at lists.freedesktop.org] On Behalf
>Of Alex Deucher
>Sent: Friday, July 11, 2014 12:23 PM
>To: Koenig, Christian
>Cc: Oded Gabbay; Lewycky, Andrew; LKML; Maling list - DRI developers;
>Deucher, Alexander
>Subject: Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and
>pipes in KV
>
>On Fri, Jul 11, 2014 at 12:18 PM, Christian K?nig 
>wrote:
>> Am 11.07.2014 18:05, schrieb Jerome Glisse:
>>
>>> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:

 To support HSA on KV, we need to limit the number of vmids and pipes
 that are available for radeon's use with KV.

 This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
 0-7) and also makes radeon thinks that KV has only a single MEC with
 a single pipe in it

 Signed-off-by: Oded Gabbay 
>>>
>>> Reviewed-by: J?r?me Glisse 
>>
>>
>> At least fro the VMIDs on demand allocation should be trivial to
>> implement, so I would rather prefer this instead of a fixed assignment.
>
>IIRC, the way the CP hw scheduler works you have to give it a range of vmids
>and it assigns them dynamically as queues are mapped so effectively they
>are potentially in use once the CP scheduler is set up.
>
>Alex

Right. The SET_RESOURCES packet (kfd_pm4_headers.h, added in patch 49) 
allocates a range of HW queues, VMIDs and GDS to the HW scheduler, then the 
scheduler uses the allocated VMIDs to support a potentially larger number of 
user processes by dynamically mapping PASIDs to VMIDs and memory queue 
descriptors (MQDs) to HW queues.

BTW Oded I think we have some duplicated defines at the end of 
kfd_pm4_headers.h, if they are really duplicates it would be great to remove 
those before the pull request.

Thanks,
JB

>
>
>>
>> Christian.
>>
>>
>>>
 ---
   drivers/gpu/drm/radeon/cik.c | 48
 ++--
   1 file changed, 24 insertions(+), 24 deletions(-)

 diff --git a/drivers/gpu/drm/radeon/cik.c
 b/drivers/gpu/drm/radeon/cik.c index 4bfc2c0..e0c8052 100644
 --- a/drivers/gpu/drm/radeon/cik.c
 +++ b/drivers/gpu/drm/radeon/cik.c
 @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device
 *rdev)
 /*
  * KV:2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
  * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues
 total
 +* Nonetheless, we assign only 1 pipe because all other
 + pipes
 will
 +* be handled by KFD
  */
 -   if (rdev->family == CHIP_KAVERI)
 -   rdev->mec.num_mec = 2;
 -   else
 -   rdev->mec.num_mec = 1;
 -   rdev->mec.num_pipe = 4;
 +   rdev->mec.num_mec = 1;
 +   rdev->mec.num_pipe = 1;
 rdev->mec.num_queue = rdev->mec.num_mec * rdev-
>>mec.num_pipe * 8;
 if (rdev->mec.hpd_eop_obj == NULL) { @@ -4809,28 +4808,24 @@
 static int cik_cp_compute_resume(struct radeon_device *rdev)
 /* init the pipes */
 mutex_lock(&rdev->srbm_mutex);
 -   for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
 -   int me = (i < 4) ? 1 : 2;
 -   int pipe = (i < 4) ? i : (i - 4);
   - eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i *
 MEC_HPD_SIZE * 2);
 +   eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
   - cik_srbm_select(rdev, me, pipe, 0, 0);
 +   cik_srbm_select(rdev, 0, 0, 0, 0);
   - /* write the EOP addr */
 -   WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
 -   WREG32(CP_HPD_EOP_BASE_ADDR_HI,
 upper_32_bits(eop_gpu_addr) >> 8);
 +   /* write the EOP addr */
 +   WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
 +   WREG32(CP_HPD_EOP_BASE_ADDR_HI,
>upper_32_bits(eop_gpu_addr)
 + >>
 8);
   - /* set the VMID assigned */
 -   WREG32(CP_HPD_EOP_VMID, 0);
 +   /* set the VMID assigned */
 +   WREG32(CP_HPD_EOP_VMID, 0);
 +
 +   /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
 +   tmp = RREG32(CP_HPD_EOP_CONTROL);
 +   tmp &= ~EOP_SIZE_MASK;
 +   tmp |= order_base_2(MEC_HPD_SIZE / 8);
 +   WREG32(CP_HPD_EOP_CONTROL, tmp);
   - /* set the EOP size, register value is 2^(EOP_SIZE+1)
 dwords */
 -   tmp = RREG32(CP_HPD_EOP_CONTROL);
 -   tmp &= ~EOP_SIZE_MASK;
 -   tmp |= order_base_2(MEC_HPD_SIZE / 8);
 -   WREG32(CP_HPD_EOP_CONTROL, tmp);
 -   }
 -   cik_srbm_select(rdev, 0, 0, 0, 0);
 mutex_unlock(&rdev->srbm_mutex);
 /* init the queues.  Just two for now. */ @@ -5876,8
 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct
 radeon

[PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register

2014-07-11 Thread Bridgman, John
Checking... we shouldn't need to call the lock from kfd any more.We should be 
able to do any required locking in radeon kgd code.

>-Original Message-
>From: Jerome Glisse [mailto:j.glisse at gmail.com]
>Sent: Friday, July 11, 2014 12:35 PM
>To: Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel at vger.kernel.org; dri-
>devel at lists.freedesktop.org; Bridgman, John; Lewycky, Andrew; Joerg
>Roedel; Gabbay, Oded; Koenig, Christian
>Subject: Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking
>srbm_gfx_cntl register
>
>On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote:
>> This patch adds a new interface to kfd2kgd_calls structure, which
>> allows the kfd to lock and unlock the srbm_gfx_cntl register
>
>Why does kfd needs to lock this register if kfd can not access any of those
>register ? This sounds broken to me, exposing a driver internal mutex to
>another driver is not something i am fan of.
>
>Cheers,
>J?r?me
>
>>
>> Signed-off-by: Oded Gabbay 
>> ---
>>  drivers/gpu/drm/radeon/radeon_kfd.c | 20 
>>  include/linux/radeon_kfd.h  |  4 
>>  2 files changed, 24 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c
>> b/drivers/gpu/drm/radeon/radeon_kfd.c
>> index 66ee36b..594020e 100644
>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
>> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd, struct
>> kgd_mem *mem);
>>
>>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>>
>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd); static void
>> +unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
>> +
>> +
>>  static const struct kfd2kgd_calls kfd2kgd = {
>>  .allocate_mem = allocate_mem,
>>  .free_mem = free_mem,
>> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>>  .kmap_mem = kmap_mem,
>>  .unkmap_mem = unkmap_mem,
>>  .get_vmem_size = get_vmem_size,
>> +.lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
>> +.unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
>>  };
>>
>>  static const struct kgd2kfd_calls *kgd2kfd; @@ -233,3 +239,17 @@
>> static uint64_t get_vmem_size(struct kgd_dev *kgd)
>>
>>  return rdev->mc.real_vram_size;
>>  }
>> +
>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>> +struct radeon_device *rdev = (struct radeon_device *)kgd;
>> +
>> +mutex_lock(&rdev->srbm_mutex);
>> +}
>> +
>> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>> +struct radeon_device *rdev = (struct radeon_device *)kgd;
>> +
>> +mutex_unlock(&rdev->srbm_mutex);
>> +}
>> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
>> index c7997d4..40b691c 100644
>> --- a/include/linux/radeon_kfd.h
>> +++ b/include/linux/radeon_kfd.h
>> @@ -81,6 +81,10 @@ struct kfd2kgd_calls {
>>  void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
>>
>>  uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>> +
>> +/* SRBM_GFX_CNTL mutex */
>> +void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>> +void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>>  };
>>
>>  bool kgd2kfd_init(unsigned interface_version,
>> --
>> 1.9.1
>>


[PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs

2014-07-11 Thread Bridgman, John


>-Original Message-
>From: Jerome Glisse [mailto:j.glisse at gmail.com]
>Sent: Friday, July 11, 2014 1:04 PM
>To: Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel at vger.kernel.org; dri-
>devel at lists.freedesktop.org; Bridgman, John; Lewycky, Andrew; Joerg
>Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon Vijay
>Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada; Santosh
>Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
>Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
>AMD's GPUs
>
>On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
>> This patch adds the code base of the hsa driver for
>> AMD's GPUs.
>>
>> This driver is called kfd.
>>
>> This initial version supports the first HSA chip, Kaveri.
>>
>> This driver is located in a new directory structure under drivers/gpu.
>>
>> Signed-off-by: Oded Gabbay 
>
>There is too coding style issues. While we have been lax on the enforcing the
>scripts/checkpatch.pl rules i think there is a limit to that. I am not strict
>on the 80chars per line but others things needs fixing so we stay inline.
>
>Also i am a bit worried about the license, given top comment in each of the
>files i am not sure this is GPL2 compatible. I would need to ask lawyer to
>review that.
>

Hi Jerome,

Which line in the license are you concerned about ? In theory we're using the 
same license as the initial code pushes for radeon, and I just did a side-by 
side compare with the license header on cik.c in the radeon tree and confirmed 
that the two licenses are identical. 

The cik.c header has an additional "Authors:" line which the kfd files do not, 
but AFAIK that is not part of the license text proper.

JB


[PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs

2014-07-11 Thread Bridgman, John


>-Original Message-
>From: Jerome Glisse [mailto:j.glisse at gmail.com]
>Sent: Friday, July 11, 2014 2:11 PM
>To: Bridgman, John
>Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>kernel at vger.kernel.org; dri-devel at lists.freedesktop.org; Lewycky, Andrew;
>Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
>Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
>Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
>Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
>AMD's GPUs
>
>On Fri, Jul 11, 2014 at 06:02:39PM +, Bridgman, John wrote:
>> >From: Jerome Glisse [mailto:j.glisse at gmail.com]
>> >Sent: Friday, July 11, 2014 1:04 PM
>> >To: Oded Gabbay
>> >Cc: David Airlie; Deucher, Alexander; linux-kernel at vger.kernel.org;
>> >dri- devel at lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
>> >Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
>> >Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp
>> >Zabel
>> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>> >for AMD's GPUs
>> >
>> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
>> >> This patch adds the code base of the hsa driver for AMD's GPUs.
>> >>
>> >> This driver is called kfd.
>> >>
>> >> This initial version supports the first HSA chip, Kaveri.
>> >>
>> >> This driver is located in a new directory structure under drivers/gpu.
>> >>
>> >> Signed-off-by: Oded Gabbay 
>> >
>> >There is too coding style issues. While we have been lax on the
>> >enforcing the scripts/checkpatch.pl rules i think there is a limit to
>> >that. I am not strict on the 80chars per line but others things needs fixing
>so we stay inline.
>> >
>> >Also i am a bit worried about the license, given top comment in each
>> >of the files i am not sure this is GPL2 compatible. I would need to
>> >ask lawyer to review that.
>> >
>>
>> Hi Jerome,
>>
>> Which line in the license are you concerned about ? In theory we're using
>the same license as the initial code pushes for radeon, and I just did a 
>side-by
>side compare with the license header on cik.c in the radeon tree and
>confirmed that the two licenses are identical.
>>
>> The cik.c header has an additional "Authors:" line which the kfd files do
>not, but AFAIK that is not part of the license text proper.
>>
>
>You can not claim GPL if you want to use this license. radeon is weird best for
>historical reasons as we wanted to share code with BSD thus it is dual
>licensed and this is reflected with :
>MODULE_LICENSE("GPL and additional rights");
>
>inside radeon_drv.c
>
>So if you want to have MODULE_LICENSE(GPL) then you should have header
>that use the GPL license wording and no wording from BSD like license.
>Otherwise change the MODULE_LICENSE and it would also be good to say
>dual licensed at top of each files (or least next to each license) so that it 
>is
>clear this is BSD & GPL license.

Got it. Missed that we had a different MODULE_LICENSE.

Since the goal is license compatibility with radeon so we can update the 
interface and move code between the drivers in future I guess my preference 
would be to update MODULE_LICENSE in the kfd code to "GPL and additional 
rights", do you think that would be OK ?
>
>Cheers,
>J?r?me


[PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs

2014-07-11 Thread Bridgman, John


>-Original Message-
>From: Jerome Glisse [mailto:j.glisse at gmail.com]
>Sent: Friday, July 11, 2014 2:52 PM
>To: Bridgman, John
>Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>kernel at vger.kernel.org; dri-devel at lists.freedesktop.org; Lewycky, Andrew;
>Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
>Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
>Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
>Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
>AMD's GPUs
>
>On Fri, Jul 11, 2014 at 06:46:30PM +, Bridgman, John wrote:
>> >From: Jerome Glisse [mailto:j.glisse at gmail.com]
>> >Sent: Friday, July 11, 2014 2:11 PM
>> >To: Bridgman, John
>> >Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>> >kernel at vger.kernel.org; dri-devel at lists.freedesktop.org; Lewycky,
>> >Andrew; Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J.
>> >Wysocki; Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke;
>> >Srinivas Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
>> >Philipp Zabel
>> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>> >for AMD's GPUs
>> >
>> >On Fri, Jul 11, 2014 at 06:02:39PM +, Bridgman, John wrote:
>> >> >From: Jerome Glisse [mailto:j.glisse at gmail.com]
>> >> >Sent: Friday, July 11, 2014 1:04 PM
>> >> >To: Oded Gabbay
>> >> >Cc: David Airlie; Deucher, Alexander;
>> >> >linux-kernel at vger.kernel.org;
>> >> >dri- devel at lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>> >> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
>> >> >Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
>> >> >Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
>> >> >Philipp Zabel
>> >> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>> >> >for AMD's GPUs
>> >> >
>> >> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
>> >> >> This patch adds the code base of the hsa driver for AMD's GPUs.
>> >> >>
>> >> >> This driver is called kfd.
>> >> >>
>> >> >> This initial version supports the first HSA chip, Kaveri.
>> >> >>
>> >> >> This driver is located in a new directory structure under drivers/gpu.
>> >> >>
>> >> >> Signed-off-by: Oded Gabbay 
>> >> >
>> >> >There is too coding style issues. While we have been lax on the
>> >> >enforcing the scripts/checkpatch.pl rules i think there is a limit
>> >> >to that. I am not strict on the 80chars per line but others things
>> >> >needs fixing
>> >so we stay inline.
>> >> >
>> >> >Also i am a bit worried about the license, given top comment in
>> >> >each of the files i am not sure this is GPL2 compatible. I would
>> >> >need to ask lawyer to review that.
>> >> >
>> >>
>> >> Hi Jerome,
>> >>
>> >> Which line in the license are you concerned about ? In theory we're
>> >> using
>> >the same license as the initial code pushes for radeon, and I just
>> >did a side-by side compare with the license header on cik.c in the
>> >radeon tree and confirmed that the two licenses are identical.
>> >>
>> >> The cik.c header has an additional "Authors:" line which the kfd
>> >> files do
>> >not, but AFAIK that is not part of the license text proper.
>> >>
>> >
>> >You can not claim GPL if you want to use this license. radeon is
>> >weird best for historical reasons as we wanted to share code with BSD
>> >thus it is dual licensed and this is reflected with :
>> >MODULE_LICENSE("GPL and additional rights");
>> >
>> >inside radeon_drv.c
>> >
>> >So if you want to have MODULE_LICENSE(GPL) then you should have
>> >header that use the GPL license wording and no wording from BSD like
>license.
>> >Otherwise change the MODULE_LICENSE and it would also be good to say
>> >dual licensed at top of each files (or least next to each license) so
>> >that it is clear this is BSD & GPL license.
>>
>> Got it. Missed that we had a different MODULE_LICENSE.
>>
>> Since the goal is license compatibility with radeon so we can update the
>interface and move code between the drivers in future I guess my
>preference would be to update MODULE_LICENSE in the kfd code to "GPL and
>additional rights", do you think that would be OK ?
>
>I am not a lawyer and nothing that i said should be considered as legal advice
>(on the contrary ;)) I think you need to be more clear with each license to
>clear says GPLv2 or BSD ie dual licensed but the dual license is a beast you
>would definitly want to talk to lawyer about.

Yeah, dual license seems horrid in its implications for developers so we've 
always tried to avoid it. GPL hurts us for porting to other OSes so the X11 / 
"GPL with additional rights" combo seemed like the ideal solution and we made 
it somewhat of a corporate standard. Hope that doesn't come back to haunt us. 

Meditate on this I will. Thanks !

>
>Cheers,
>J?r?me


[PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV

2014-07-11 Thread Bridgman, John
>From: Ilyes Gouta [mailto:ilyes.gouta at gmail.com] 
>Sent: Friday, July 11, 2014 2:00 PM
>To: Bridgman, John
>Cc: Alex Deucher; Koenig, Christian; Oded Gabbay; Deucher, Alexander; Lewycky, 
>Andrew; LKML; Maling list - DRI developers
>Subject: Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes 
>in KV
>
>Hi,
>
>Just a side question (for information),
>
>On Fri, Jul 11, 2014 at 6:07 PM, Bridgman, John  
>wrote:
>
>Right. The SET_RESOURCES packet (kfd_pm4_headers.h, added in patch 49) 
>allocates a range of HW queues, VMIDs and GDS to the HW scheduler, then >the 
>scheduler uses the allocated VMIDs to support a potentially larger number of 
>user processes by dynamically mapping PASIDs to VMIDs and memory >queue 
>descriptors (MQDs) to HW queues.
>
>Are there any documentation/specifications online describing these mechanisms?

Nothing yet, but we should write some docco for this similar to what was 
written for the gfx blocks. I'll add that to the list, thanks.


[PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register

2014-07-12 Thread Bridgman, John
Confirmed. The locking functions are removed from the interface in commit 82 :

[PATCH 82/83] drm/radeon: Remove lock functions from kfd2kgd interface

There is an elegant symmetry there, but yeah we need to find a way to make this 
less awkward to review without screwing up all the work you've done so far. 
It's not obvious how to do that though. I looked at squashing into a smaller 
number of big commits earlier on but unless we completely rip the code out and 
recreate from scratch I don't see anything better than :

- a few foundation commits
- a big code dump that covers everything up to ~patch 54 (with 71 squashed in)
- remaining commits squashed a bit to combine fixes with initial code

Is that what you had in mind when you said ~10 big commits ? Our feeling was 
that the need to skip over the original scheduler would make it more like "one 
really big commit and 10-20 smaller ones", and I think we all felt that the 
"big code dump" required to skip over the original scheduler would be a 
non-starter. 

I guess there is another option, and maybe that's what you had in mind -- 
breaking the "big code dump" into smaller commits would be possible if we were 
willing to not have working code until we got to the equivalent of ~patch 54 
(+71) when all the new scheduler bits were in. Maybe that would still be an 
improvement ?

Thanks,
JB

>-Original Message-
>From: Bridgman, John
>Sent: Friday, July 11, 2014 1:48 PM
>To: 'Jerome Glisse'; Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel at vger.kernel.org; dri-
>devel at lists.freedesktop.org; Lewycky, Andrew; Joerg Roedel; Gabbay, Oded;
>Koenig, Christian
>Subject: RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking
>srbm_gfx_cntl register
>
>Checking... we shouldn't need to call the lock from kfd any more.We should
>be able to do any required locking in radeon kgd code.
>
>>-Original Message-
>>From: Jerome Glisse [mailto:j.glisse at gmail.com]
>>Sent: Friday, July 11, 2014 12:35 PM
>>To: Oded Gabbay
>>Cc: David Airlie; Deucher, Alexander; linux-kernel at vger.kernel.org;
>>dri- devel at lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>>Joerg Roedel; Gabbay, Oded; Koenig, Christian
>>Subject: Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of
>>locking srbm_gfx_cntl register
>>
>>On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote:
>>> This patch adds a new interface to kfd2kgd_calls structure, which
>>> allows the kfd to lock and unlock the srbm_gfx_cntl register
>>
>>Why does kfd needs to lock this register if kfd can not access any of
>>those register ? This sounds broken to me, exposing a driver internal
>>mutex to another driver is not something i am fan of.
>>
>>Cheers,
>>J?r?me
>>
>>>
>>> Signed-off-by: Oded Gabbay 
>>> ---
>>>  drivers/gpu/drm/radeon/radeon_kfd.c | 20 
>>>  include/linux/radeon_kfd.h  |  4 
>>>  2 files changed, 24 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> index 66ee36b..594020e 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd,
>struct
>>> kgd_mem *mem);
>>>
>>>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>>>
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd); static void
>>> +unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
>>> +
>>> +
>>>  static const struct kfd2kgd_calls kfd2kgd = {
>>> .allocate_mem = allocate_mem,
>>> .free_mem = free_mem,
>>> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>>> .kmap_mem = kmap_mem,
>>> .unkmap_mem = unkmap_mem,
>>> .get_vmem_size = get_vmem_size,
>>> +   .lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
>>> +   .unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
>>>  };
>>>
>>>  static const struct kgd2kfd_calls *kgd2kfd; @@ -233,3 +239,17 @@
>>> static uint64_t get_vmem_size(struct kgd_dev *kgd)
>>>
>>> return rdev->mc.real_vram_size;
>>>  }
>>> +
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +   struct radeon_device *rdev = (struct radeon_device *)kgd;
>>> +
>>> +   mutex_lock(&rdev->srbm_mutex);
>>> +}
>>> +
>>> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +   struct radeon_dev

Recall: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register

2014-07-12 Thread Bridgman, John
Bridgman, John would like to recall the message, "[PATCH 07/83] drm/radeon: Add 
kfd-->kgd interface of locking srbm_gfx_cntl register".


[PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register

2014-07-12 Thread Bridgman, John
>-Original Message-
>From: Bridgman, John
>Sent: Friday, July 11, 2014 1:48 PM
>To: 'Jerome Glisse'; Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel at vger.kernel.org; dri-
>devel at lists.freedesktop.org; Lewycky, Andrew; Joerg Roedel; Gabbay, Oded;
>Koenig, Christian
>Subject: RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking
>srbm_gfx_cntl register
>
>Checking... we shouldn't need to call the lock from kfd any more.We should
>be able to do any required locking in radeon kgd code.

Confirmed. The locking functions are removed from the interface in commit 82 :

[PATCH 82/83] drm/radeon: Remove lock functions from kfd2kgd interface

There is an elegant symmetry there, but yeah we need to find a way to make this 
less awkward to review without screwing up all the work you've done so far. 
It's not obvious how to do that though. I looked at squashing into a smaller 
number of big commits earlier on but unless we completely rip the code out and 
recreate from scratch I don't see anything better than :

- a few foundation commits
- a big code dump that covers everything up to ~patch 54 (with 71 squashed in)
- remaining commits squashed a bit to combine fixes with initial code

Is that what you had in mind when you said ~10 big commits ? Our feeling was 
that the need to skip over the original scheduler would make it more like "one 
really big commit and 10-20 smaller ones", and I think we all felt that the 
"big code dump" required to skip over the original scheduler would be a 
non-starter. 

I guess there is another option, and maybe that's what you had in mind -- 
breaking the "big code dump" into smaller commits would be possible if we were 
willing to not have working code until we got to the equivalent of ~patch 54 
(+71) when all the new scheduler bits were in. Maybe that would still be an 
improvement ?

Thanks,
JB

>
>>-Original Message-
>>From: Jerome Glisse [mailto:j.glisse at gmail.com]
>>Sent: Friday, July 11, 2014 12:35 PM
>>To: Oded Gabbay
>>Cc: David Airlie; Deucher, Alexander; linux-kernel at vger.kernel.org;
>>dri- devel at lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>>Joerg Roedel; Gabbay, Oded; Koenig, Christian
>>Subject: Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of
>>locking srbm_gfx_cntl register
>>
>>On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote:
>>> This patch adds a new interface to kfd2kgd_calls structure, which
>>> allows the kfd to lock and unlock the srbm_gfx_cntl register
>>
>>Why does kfd needs to lock this register if kfd can not access any of
>>those register ? This sounds broken to me, exposing a driver internal
>>mutex to another driver is not something i am fan of.
>>
>>Cheers,
>>J?r?me
>>
>>>
>>> Signed-off-by: Oded Gabbay 
>>> ---
>>>  drivers/gpu/drm/radeon/radeon_kfd.c | 20 
>>>  include/linux/radeon_kfd.h  |  4 
>>>  2 files changed, 24 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> index 66ee36b..594020e 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd,
>struct
>>> kgd_mem *mem);
>>>
>>>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>>>
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd); static void
>>> +unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
>>> +
>>> +
>>>  static const struct kfd2kgd_calls kfd2kgd = {
>>> .allocate_mem = allocate_mem,
>>> .free_mem = free_mem,
>>> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>>> .kmap_mem = kmap_mem,
>>> .unkmap_mem = unkmap_mem,
>>> .get_vmem_size = get_vmem_size,
>>> +   .lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
>>> +   .unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
>>>  };
>>>
>>>  static const struct kgd2kfd_calls *kgd2kfd; @@ -233,3 +239,17 @@
>>> static uint64_t get_vmem_size(struct kgd_dev *kgd)
>>>
>>> return rdev->mc.real_vram_size;
>>>  }
>>> +
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +   struct radeon_device *rdev = (struct radeon_device *)kgd;
>>> +
>>> +   mutex_lock(&rdev->srbm_mutex);
>>> +}
>>> +
>>> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +   struct radeon_dev

[PATCH 00/83] AMD HSA kernel driver

2014-07-13 Thread Bridgman, John


>-Original Message-
>From: Jerome Glisse [mailto:j.glisse at gmail.com]
>Sent: Saturday, July 12, 2014 11:56 PM
>To: Gabbay, Oded
>Cc: linux-kernel at vger.kernel.org; Bridgman, John; Deucher, Alexander;
>Lewycky, Andrew; joro at 8bytes.org; akpm at linux-foundation.org; dri-
>devel at lists.freedesktop.org; airlied at linux.ie; oded.gabbay at gmail.com
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On Sat, Jul 12, 2014 at 09:55:49PM +, Gabbay, Oded wrote:
>> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
>> > On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
>> > >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
>> > > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
>> > > > >   This patch set implements a Heterogeneous System
>> > > > > Architecture
>> > > > >  (HSA) driver
>> > > > >   for radeon-family GPUs.
>> > > >  This is just quick comments on few things. Given size of this,
>> > > > people  will need to have time to review things.
>> > > > >   HSA allows different processor types (CPUs, DSPs, GPUs,
>> > > > > etc..) to
>> > > > >  share
>> > > > >   system resources more effectively via HW features including
>> > > > > shared pageable
>> > > > >   memory, userspace-accessible work queues, and platform-level
>> > > > > atomics. In
>> > > > >   addition to the memory protection mechanisms in GPUVM and
>> > > > > IOMMUv2, the Sea
>> > > > >   Islands family of GPUs also performs HW-level validation of
>> > > > > commands passed
>> > > > >   in through the queues (aka rings).
>> > > > >   The code in this patch set is intended to serve both as a
>> > > > > sample  driver for
>> > > > >   other HSA-compatible hardware devices and as a production
>> > > > > driver  for
>> > > > >   radeon-family processors. The code is architected to support
>> > > > > multiple CPUs
>> > > > >   each with connected GPUs, although the current
>> > > > > implementation  focuses on a
>> > > > >   single Kaveri/Berlin APU, and works alongside the existing
>> > > > > radeon  kernel
>> > > > >   graphics driver (kgd).
>> > > > >   AMD GPUs designed for use with HSA (Sea Islands and up)
>> > > > > share  some hardware
>> > > > >   functionality between HSA compute and regular gfx/compute
>> > > > > (memory,
>> > > > >   interrupts, registers), while other functionality has been
>> > > > > added
>> > > > >   specifically for HSA compute  (hw scheduler for virtualized
>> > > > > compute rings).
>> > > > >   All shared hardware is owned by the radeon graphics driver,
>> > > > > and  an interface
>> > > > >   between kfd and kgd allows the kfd to make use of those
>> > > > > shared  resources,
>> > > > >   while HSA-specific functionality is managed directly by kfd
>> > > > > by  submitting
>> > > > >   packets into an HSA-specific command queue (the "HIQ").
>> > > > >   During kfd module initialization a char device node
>> > > > > (/dev/kfd) is
>> > > > >  created
>> > > > >   (surviving until module exit), with ioctls for queue
>> > > > > creation &  management,
>> > > > >   and data structures are initialized for managing HSA device
>> > > > > topology.
>> > > > >   The rest of the initialization is driven by calls from the
>> > > > > radeon  kgd at
>> > > > >   the following points :
>> > > > >   - radeon_init (kfd_init)
>> > > > >   - radeon_exit (kfd_fini)
>> > > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
>> > > > >   - radeon_driver_unload_kms (kfd_device_fini)
>> > > > >   During the probe and init processing per-device data
>> > > > > structures  are
>> > > > >   established which connect to the associated graphics kernel
>> > > > > driver. This
>> > > > >   information is exposed to userspace via sysfs, along with a
>>

[PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Bridgman, John


>-Original Message-
>From: Dave Airlie [mailto:airlied at gmail.com]
>Sent: Tuesday, July 15, 2014 12:35 AM
>To: Christian K?nig
>Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>kernel at vger.kernel.org; dri-devel at lists.freedesktop.org; Deucher,
>Alexander; akpm at linux-foundation.org
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On 14 July 2014 18:37, Christian K?nig  wrote:
>>> I vote for HSA module that expose ioctl and is an intermediary with
>>> the kernel driver that handle the hardware. This gives a single point
>>> for HSA hardware and yes this enforce things for any hardware
>manufacturer.
>>> I am more than happy to tell them that this is it and nothing else if
>>> they want to get upstream.
>>
>> I think we should still discuss this single point of entry a bit more.
>>
>> Just to make it clear the plan is to expose all physical HSA capable
>> devices through a single /dev/hsa device node to userspace.
>
>This is why we don't design kernel interfaces in secret foundations, and
>expect anyone to like them.

Understood and agree. In this case though this isn't a cross-vendor interface 
designed by a secret committee, it's supposed to be more of an inoffensive 
little single-vendor interface designed *for* a secret committee. I'm hoping 
that's better ;)

>
>So before we go any further, how is this stuff planned to work for multiple
>GPUs/accelerators?

Three classes of "multiple" :

1. Single CPU with IOMMUv2 and multiple GPUs:

- all devices accessible via /dev/kfd
- topology information identifies CPU + GPUs, each has "node ID" at top of 
userspace API, "global ID" at user/kernel interface
 (don't think we've implemented CPU part yet though)
- userspace builds snapshot from sysfs info & exposes to HSAIL runtime, which 
in turn exposes the "standard" API
- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast for 
APU, relatively less so for dGPU over PCIE)
- to-be-added memory operations allow allocation & residency control (within 
existing gfx driver limits) of buffers in VRAM & carved-out system RAM
- queue operations specify a node ID to userspace library, which translates to 
"global ID" before calling kfd

2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or more 
GPUs:

- topology information exposes CPUs & GPUs, along with affinity info showing 
what is connected to what
- everything else works as in (1) above

3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or 
more GPUs

- no attempt to cover this with HSA topology, each CPU and associated GPUs is 
accessed independently via separate /dev/kfd instances

>
>Do we have a userspace to exercise this interface so we can see how such a
>thing would look?

Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for 
final approval this week

There's a separate test harness to exercise the userspace lib calls, haven't 
started IP review or sanitizing for that but legal stuff is done

>
>Dave.


[PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Bridgman, John


>-Original Message-
>From: dri-devel [mailto:dri-devel-bounces at lists.freedesktop.org] On Behalf
>Of Bridgman, John
>Sent: Tuesday, July 15, 2014 1:07 PM
>To: Dave Airlie; Christian K?nig
>Cc: Lewycky, Andrew; linux-kernel at vger.kernel.org; dri-
>devel at lists.freedesktop.org; Deucher, Alexander; akpm at linux-
>foundation.org
>Subject: RE: [PATCH 00/83] AMD HSA kernel driver
>
>
>
>>-Original Message-
>>From: Dave Airlie [mailto:airlied at gmail.com]
>>Sent: Tuesday, July 15, 2014 12:35 AM
>>To: Christian K?nig
>>Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>>kernel at vger.kernel.org; dri-devel at lists.freedesktop.org; Deucher,
>>Alexander; akpm at linux-foundation.org
>>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>>
>>On 14 July 2014 18:37, Christian K?nig  wrote:
>>>> I vote for HSA module that expose ioctl and is an intermediary with
>>>> the kernel driver that handle the hardware. This gives a single
>>>> point for HSA hardware and yes this enforce things for any hardware
>>manufacturer.
>>>> I am more than happy to tell them that this is it and nothing else
>>>> if they want to get upstream.
>>>
>>> I think we should still discuss this single point of entry a bit more.
>>>
>>> Just to make it clear the plan is to expose all physical HSA capable
>>> devices through a single /dev/hsa device node to userspace.
>>
>>This is why we don't design kernel interfaces in secret foundations,
>>and expect anyone to like them.
>
>Understood and agree. In this case though this isn't a cross-vendor interface
>designed by a secret committee, it's supposed to be more of an inoffensive
>little single-vendor interface designed *for* a secret committee. I'm hoping
>that's better ;)
>
>>
>>So before we go any further, how is this stuff planned to work for
>>multiple GPUs/accelerators?
>
>Three classes of "multiple" :
>
>1. Single CPU with IOMMUv2 and multiple GPUs:
>
>- all devices accessible via /dev/kfd
>- topology information identifies CPU + GPUs, each has "node ID" at top of
>userspace API, "global ID" at user/kernel interface  (don't think we've
>implemented CPU part yet though)
>- userspace builds snapshot from sysfs info & exposes to HSAIL runtime,
>which in turn exposes the "standard" API
>- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast
>for APU, relatively less so for dGPU over PCIE)
>- to-be-added memory operations allow allocation & residency control
>(within existing gfx driver limits) of buffers in VRAM & carved-out system
>RAM
>- queue operations specify a node ID to userspace library, which translates to
>"global ID" before calling kfd
>
>2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
>more GPUs:
>
>- topology information exposes CPUs & GPUs, along with affinity info
>showing what is connected to what
>- everything else works as in (1) above

This is probably a good point to stress that HSA topology is only intended as 
an OS-independent way of communicating system info up to higher levels of the 
HSA stack, not as a new and competing way to *manage* system properties inside 
Linux or any other OS.

>
>3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or
>more GPUs
>
>- no attempt to cover this with HSA topology, each CPU and associated GPUs
>is accessed independently via separate /dev/kfd instances
>
>>
>>Do we have a userspace to exercise this interface so we can see how
>>such a thing would look?
>
>Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for 
>final
>approval this week
>
>There's a separate test harness to exercise the userspace lib calls, haven't
>started IP review or sanitizing for that but legal stuff is done
>
>>
>>Dave.
>___
>dri-devel mailing list
>dri-devel at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Bridgman, John


>-Original Message-
>From: Jerome Glisse [mailto:j.glisse at gmail.com]
>Sent: Tuesday, July 15, 2014 1:37 PM
>To: Bridgman, John
>Cc: Dave Airlie; Christian K?nig; Lewycky, Andrew; linux-
>kernel at vger.kernel.org; dri-devel at lists.freedesktop.org; Deucher,
>Alexander; akpm at linux-foundation.org
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On Tue, Jul 15, 2014 at 05:06:56PM +, Bridgman, John wrote:
>> >From: Dave Airlie [mailto:airlied at gmail.com]
>> >Sent: Tuesday, July 15, 2014 12:35 AM
>> >To: Christian K?nig
>> >Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>> >kernel at vger.kernel.org; dri-devel at lists.freedesktop.org; Deucher,
>> >Alexander; akpm at linux-foundation.org
>> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>> >
>> >On 14 July 2014 18:37, Christian K?nig  wrote:
>> >>> I vote for HSA module that expose ioctl and is an intermediary
>> >>> with the kernel driver that handle the hardware. This gives a
>> >>> single point for HSA hardware and yes this enforce things for any
>> >>> hardware
>> >manufacturer.
>> >>> I am more than happy to tell them that this is it and nothing else
>> >>> if they want to get upstream.
>> >>
>> >> I think we should still discuss this single point of entry a bit more.
>> >>
>> >> Just to make it clear the plan is to expose all physical HSA
>> >> capable devices through a single /dev/hsa device node to userspace.
>> >
>> >This is why we don't design kernel interfaces in secret foundations,
>> >and expect anyone to like them.
>>
>> Understood and agree. In this case though this isn't a cross-vendor
>> interface designed by a secret committee, it's supposed to be more of
>> an inoffensive little single-vendor interface designed *for* a secret
>> committee. I'm hoping that's better ;)
>>
>> >
>> >So before we go any further, how is this stuff planned to work for
>> >multiple GPUs/accelerators?
>>
>> Three classes of "multiple" :
>>
>> 1. Single CPU with IOMMUv2 and multiple GPUs:
>>
>> - all devices accessible via /dev/kfd
>> - topology information identifies CPU + GPUs, each has "node ID" at
>> top of userspace API, "global ID" at user/kernel interface  (don't
>> think we've implemented CPU part yet though)
>> - userspace builds snapshot from sysfs info & exposes to HSAIL
>> runtime, which in turn exposes the "standard" API
>
>This is why i do not like the sysfs approach, it would be lot nicer to have
>device file per provider and thus hsail can listen on device file event and
>discover if hardware is vanishing or appearing. Periodicaly going over sysfs
>files is not the right way to do that.

Agree that wouldn't be good. There's an event mechanism still to come - mostly 
for communicating fences and shader interrupts back to userspace, but also used 
for "device change" notifications, so no polling of sysfs.

>
>> - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2
>> (fast for APU, relatively less so for dGPU over PCIE)
>> - to-be-added memory operations allow allocation & residency control
>> (within existing gfx driver limits) of buffers in VRAM & carved-out
>> system RAM
>> - queue operations specify a node ID to userspace library, which
>> translates to "global ID" before calling kfd
>>
>> 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
>more GPUs:
>>
>> - topology information exposes CPUs & GPUs, along with affinity info
>> showing what is connected to what
>> - everything else works as in (1) above
>>
>
>This is suppose to be part of HSA ? This is lot broader than i thought.

Yes although it can be skipped on most systems. We figured that topology needed 
to cover everything that would be handled by a single OS image, so in a NUMA 
system it would need to cover all the CPUs. I think that is still the right 
scope, do you agree ?

>
>> 3. Multiple CPUs not connected via fabric (eg a blade server) each
>> with 0 or more GPUs
>>
>> - no attempt to cover this with HSA topology, each CPU and associated
>> GPUs is accessed independently via separate /dev/kfd instances
>>
>> >
>> >Do we have a userspace to exercise this interface so we can see how
>> >such a thing would look?
>>
>> Yes -- initial IP review done, legal stuff done, sanitizing WIP,
>> hoping for final approval this week
>>
>> There's a separate test harness to exercise the userspace lib calls,
>> haven't started IP review or sanitizing for that but legal stuff is
>> done
>>
>> >
>> >Dave.


[PATCH v2 00/25] AMDKFD kernel driver

2014-07-21 Thread Bridgman, John


>-Original Message-
>From: dri-devel [mailto:dri-devel-bounces at lists.freedesktop.org] On Behalf
>Of Jerome Glisse
>Sent: Monday, July 21, 2014 7:06 PM
>To: Gabbay, Oded
>Cc: Lewycky, Andrew; Pinchuk, Evgeny; Daenzer, Michel; linux-
>kernel at vger.kernel.org; dri-devel at lists.freedesktop.org; linux-mm;
>Skidanov, Alexey; Andrew Morton
>Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver
>
>On Tue, Jul 22, 2014 at 12:56:13AM +0300, Oded Gabbay wrote:
>> On 21/07/14 22:28, Jerome Glisse wrote:
>> > On Mon, Jul 21, 2014 at 10:23:43PM +0300, Oded Gabbay wrote:
>> >> On 21/07/14 21:59, Jerome Glisse wrote:
>> >>> On Mon, Jul 21, 2014 at 09:36:44PM +0300, Oded Gabbay wrote:
>>  On 21/07/14 21:14, Jerome Glisse wrote:
>> > On Mon, Jul 21, 2014 at 08:42:58PM +0300, Oded Gabbay wrote:
>> >> On 21/07/14 18:54, Jerome Glisse wrote:
>> >>> On Mon, Jul 21, 2014 at 05:12:06PM +0300, Oded Gabbay wrote:
>>  On 21/07/14 16:39, Christian K?nig wrote:
>> > Am 21.07.2014 14:36, schrieb Oded Gabbay:
>> >> On 20/07/14 20:46, Jerome Glisse wrote:
>> >>> On Thu, Jul 17, 2014 at 04:57:25PM +0300, Oded Gabbay wrote:
>>  Forgot to cc mailing list on cover letter. Sorry.
>> 
>>  As a continuation to the existing discussion, here is a
>>  v2 patch series restructured with a cleaner history and
>>  no totally-different-early-versions of the code.
>> 
>>  Instead of 83 patches, there are now a total of 25
>>  patches, where 5 of them are modifications to radeon driver
>and 18 of them include only amdkfd code.
>>  There is no code going away or even modified between
>patches, only added.
>> 
>>  The driver was renamed from radeon_kfd to amdkfd and
>>  moved to reside under drm/radeon/amdkfd. This move was
>>  done to emphasize the fact that this driver is an
>>  AMD-only driver at this point. Having said that, we do
>>  foresee a generic hsa framework being implemented in the
>future and in that case, we will adjust amdkfd to work within that
>framework.
>> 
>>  As the amdkfd driver should support multiple AMD gfx
>>  drivers, we want to keep it as a seperate driver from
>>  radeon. Therefore, the amdkfd code is contained in its
>>  own folder. The amdkfd folder was put under the radeon
>>  folder because the only AMD gfx driver in the Linux
>>  kernel at this point is the radeon driver. Having said
>>  that, we will probably need to move it (maybe to be directly
>under drm) after we integrate with additional AMD gfx drivers.
>> 
>>  For people who like to review using git, the v2 patch set is
>located at:
>>  http://cgit.freedesktop.org/~gabbayo/linux/log/?h=kfd-nex
>>  t-3.17-v2
>> 
>>  Written by Oded Gabbayh 
>> >>>
>> >>> So quick comments before i finish going over all patches.
>> >>> There is many things that need more documentation
>> >>> espacialy as of right now there is no userspace i can go look at.
>> >> So quick comments on some of your questions but first of
>> >> all, thanks for the time you dedicated to review the code.
>> >>>
>> >>> There few show stopper, biggest one is gpu memory pinning
>> >>> this is a big no, that would need serious arguments for
>> >>> any hope of convincing me on that side.
>> >> We only do gpu memory pinning for kernel objects. There are
>> >> no userspace objects that are pinned on the gpu memory in
>> >> our driver. If that is the case, is it still a show stopper ?
>> >>
>> >> The kernel objects are:
>> >> - pipelines (4 per device)
>> >> - mqd per hiq (only 1 per device)
>> >> - mqd per userspace queue. On KV, we support up to 1K
>> >> queues per process, for a total of 512K queues. Each mqd is
>> >> 151 bytes, but the allocation is done in
>> >> 256 alignment. So total *possible* memory is 128MB
>> >> - kernel queue (only 1 per device)
>> >> - fence address for kernel queue
>> >> - runlists for the CP (1 or 2 per device)
>> >
>> > The main questions here are if it's avoid able to pin down
>> > the memory and if the memory is pinned down at driver load,
>> > by request from userspace or by anything else.
>> >
>> > As far as I can see only the "mqd per userspace queue" might
>> > be a bit questionable, everything else sounds reasonable.
>> >
>> > Christian.
>> 
>>  Most of the pin downs are done on device initialization.
>>  The "mqd per userspace" is done per userspace queue creation.
>>  However, as

[PATCH v2 00/25] AMDKFD kernel driver

2014-07-23 Thread Bridgman, John


>-Original Message-
>From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch]
>Sent: Wednesday, July 23, 2014 3:06 AM
>To: Gabbay, Oded
>Cc: Jerome Glisse; Christian K?nig; David Airlie; Alex Deucher; Andrew
>Morton; Bridgman, John; Joerg Roedel; Lewycky, Andrew; Daenzer, Michel;
>Goz, Ben; Skidanov, Alexey; linux-kernel at vger.kernel.org; dri-
>devel at lists.freedesktop.org; linux-mm; Sellek, Tom
>Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver
>
>On Wed, Jul 23, 2014 at 8:50 AM, Oded Gabbay 
>wrote:
>> On 22/07/14 14:15, Daniel Vetter wrote:
>>>
>>> On Tue, Jul 22, 2014 at 12:52:43PM +0300, Oded Gabbay wrote:
>>>>
>>>> On 22/07/14 12:21, Daniel Vetter wrote:
>>>>>
>>>>> On Tue, Jul 22, 2014 at 10:19 AM, Oded Gabbay
>
>>>>> wrote:
>>>>>>>
>>>>>>> Exactly, just prevent userspace from submitting more. And if you
>>>>>>> have misbehaving userspace that submits too much, reset the gpu
>>>>>>> and tell it that you're sorry but won't schedule any more work.
>>>>>>
>>>>>>
>>>>>> I'm not sure how you intend to know if a userspace misbehaves or not.
>>>>>> Can
>>>>>> you elaborate ?
>>>>>
>>>>>
>>>>> Well that's mostly policy, currently in i915 we only have a check
>>>>> for hangs, and if userspace hangs a bit too often then we stop it.
>>>>> I guess you can do that with the queue unmapping you've describe in
>>>>> reply to Jerome's mail.
>>>>> -Daniel
>>>>>
>>>> What do you mean by hang ? Like the tdr mechanism in Windows (checks
>>>> if a gpu job takes more than 2 seconds, I think, and if so,
>>>> terminates the job).
>>>
>>>
>>> Essentially yes. But we also have some hw features to kill jobs
>>> quicker, e.g. for media workloads.
>>> -Daniel
>>>
>>
>> Yeah, so this is what I'm talking about when I say that you and Jerome
>> come from a graphics POV and amdkfd come from a compute POV, no
>offense intended.
>>
>> For compute jobs, we simply can't use this logic to terminate jobs.
>> Graphics are mostly Real-Time while compute jobs can take from a few
>> ms to a few hours!!! And I'm not talking about an entire application
>> runtime but on a single submission of jobs by the userspace app. We
>> have tests with jobs that take between 20-30 minutes to complete. In
>> theory, we can even imagine a compute job which takes 1 or 2 days (on
>larger APUs).
>>
>> Now, I understand the question of how do we prevent the compute job
>> from monopolizing the GPU, and internally here we have some ideas that
>> we will probably share in the next few days, but my point is that I
>> don't think we can terminate a compute job because it is running for more
>than x seconds.
>> It is like you would terminate a CPU process which runs more than x
>seconds.
>>
>> I think this is a *very* important discussion (detecting a misbehaved
>> compute process) and I would like to continue it, but I don't think
>> moving the job submission from userspace control to kernel control
>> will solve this core problem.
>
>Well graphics gets away with cooperative scheduling since usually people
>want to see stuff within a few frames, so we can legitimately kill jobs after a
>fairly short timeout. Imo if you want to allow userspace to submit compute
>jobs that are atomic and take a few minutes to hours with no break-up in
>between and no hw means to preempt then that design is screwed up. We
>really can't tell the core vm that "sorry we will hold onto these gobloads of
>memory you really need now for another few hours". Pinning memory like
>that essentially without a time limit is restricted to root.

Hi Daniel;

I don't really understand the reference to "gobloads of memory". Unlike radeon 
graphics, the userspace data for HSA applications is maintained in pageable 
system memory and accessed via the IOMMUv2 (ATC/PRI). The IOMMUv2 driver and mm 
subsystem takes care of faulting in memory pages as needed, nothing is 
long-term pinned.

The only pinned memory we are talking about here is per-queue and per-process 
data structures in the driver, which are tiny by comparison. Oded provided the 
"hardware limits" (ie an insane number of process & threads) for context, but 
real-world limits will be one or two orders of magnitude lower. Agree we should 
have included those limits in the initial code, that would have made the "real 
world" memory footprint much more visible. 

Make sense ?

>-Daniel
>--
>Daniel Vetter
>Software Engineer, Intel Corporation
>+41 (0) 79 365 57 48 - http://blog.ffwll.ch


[PATCH v2 00/25] AMDKFD kernel driver

2014-07-23 Thread Bridgman, John


>-Original Message-
>From: Christian K?nig [mailto:deathsimple at vodafone.de]
>Sent: Wednesday, July 23, 2014 3:04 AM
>To: Gabbay, Oded; Jerome Glisse; David Airlie; Alex Deucher; Andrew
>Morton; Bridgman, John; Joerg Roedel; Lewycky, Andrew; Daenzer, Michel;
>Goz, Ben; Skidanov, Alexey; linux-kernel at vger.kernel.org; dri-
>devel at lists.freedesktop.org; linux-mm; Sellek, Tom
>Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver
>
>Am 23.07.2014 08:50, schrieb Oded Gabbay:
>> On 22/07/14 14:15, Daniel Vetter wrote:
>>> On Tue, Jul 22, 2014 at 12:52:43PM +0300, Oded Gabbay wrote:
>>>> On 22/07/14 12:21, Daniel Vetter wrote:
>>>>> On Tue, Jul 22, 2014 at 10:19 AM, Oded Gabbay
>
>>>>> wrote:
>>>>>>> Exactly, just prevent userspace from submitting more. And if you
>>>>>>> have misbehaving userspace that submits too much, reset the gpu
>>>>>>> and tell it that you're sorry but won't schedule any more work.
>>>>>>
>>>>>> I'm not sure how you intend to know if a userspace misbehaves or
>>>>>> not. Can you elaborate ?
>>>>>
>>>>> Well that's mostly policy, currently in i915 we only have a check
>>>>> for hangs, and if userspace hangs a bit too often then we stop it.
>>>>> I guess you can do that with the queue unmapping you've describe in
>>>>> reply to Jerome's mail.
>>>>> -Daniel
>>>>>
>>>> What do you mean by hang ? Like the tdr mechanism in Windows (checks
>>>> if a gpu job takes more than 2 seconds, I think, and if so,
>>>> terminates the job).
>>>
>>> Essentially yes. But we also have some hw features to kill jobs
>>> quicker, e.g. for media workloads.
>>> -Daniel
>>>
>>
>> Yeah, so this is what I'm talking about when I say that you and Jerome
>> come from a graphics POV and amdkfd come from a compute POV, no
>> offense intended.
>>
>> For compute jobs, we simply can't use this logic to terminate jobs.
>> Graphics are mostly Real-Time while compute jobs can take from a few
>> ms to a few hours!!! And I'm not talking about an entire application
>> runtime but on a single submission of jobs by the userspace app. We
>> have tests with jobs that take between 20-30 minutes to complete. In
>> theory, we can even imagine a compute job which takes 1 or 2 days (on
>> larger APUs).
>>
>> Now, I understand the question of how do we prevent the compute job
>> from monopolizing the GPU, and internally here we have some ideas that
>> we will probably share in the next few days, but my point is that I
>> don't think we can terminate a compute job because it is running for
>> more than x seconds. It is like you would terminate a CPU process
>> which runs more than x seconds.
>
>Yeah that's why one of the first things I've did was making the timeout
>configurable in the radeon module.
>
>But it doesn't necessary needs be a timeout, we should also kill a running job
>submission if the CPU process associated with the job is killed.
>
>> I think this is a *very* important discussion (detecting a misbehaved
>> compute process) and I would like to continue it, but I don't think
>> moving the job submission from userspace control to kernel control
>> will solve this core problem.
>
>We need to get this topic solved, otherwise the driver won't make it
>upstream. Allowing userpsace to monopolizing resources either memory,
>CPU or GPU time or special things like counters etc... is a strict no go for a
>kernel module.
>
>I agree that moving the job submission from userpsace to kernel wouldn't
>solve this problem. As Daniel and I pointed out now multiple times it's rather
>easily possible to prevent further job submissions from userspace, in the
>worst case by unmapping the doorbell page.
>
>Moving it to an IOCTL would just make it a bit less complicated.

Hi Christian;

HSA uses usermode queues so that programs running on GPU can dispatch work to 
themselves or to other GPUs with a consistent dispatch mechanism for CPU and 
GPU code. We could potentially use s_msg and trap every GPU dispatch back 
through CPU code but that gets slow and ugly very quickly. 

>
>Christian.
>
>>
>> Oded



[PATCH v2 00/25] AMDKFD kernel driver

2014-07-23 Thread Bridgman, John


>-Original Message-
>From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel
>Vetter
>Sent: Wednesday, July 23, 2014 10:42 AM
>To: Bridgman, John
>Cc: Daniel Vetter; Gabbay, Oded; Jerome Glisse; Christian K?nig; David Airlie;
>Alex Deucher; Andrew Morton; Joerg Roedel; Lewycky, Andrew; Daenzer,
>Michel; Goz, Ben; Skidanov, Alexey; linux-kernel at vger.kernel.org; dri-
>devel at lists.freedesktop.org; linux-mm; Sellek, Tom
>Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver
>
>On Wed, Jul 23, 2014 at 01:33:24PM +, Bridgman, John wrote:
>>
>>
>> >-Original Message-
>> >From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch]
>> >Sent: Wednesday, July 23, 2014 3:06 AM
>> >To: Gabbay, Oded
>> >Cc: Jerome Glisse; Christian K?nig; David Airlie; Alex Deucher;
>> >Andrew Morton; Bridgman, John; Joerg Roedel; Lewycky, Andrew;
>> >Daenzer, Michel; Goz, Ben; Skidanov, Alexey;
>> >linux-kernel at vger.kernel.org; dri- devel at lists.freedesktop.org;
>> >linux-mm; Sellek, Tom
>> >Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver
>> >
>> >On Wed, Jul 23, 2014 at 8:50 AM, Oded Gabbay 
>> >wrote:
>> >> On 22/07/14 14:15, Daniel Vetter wrote:
>> >>>
>> >>> On Tue, Jul 22, 2014 at 12:52:43PM +0300, Oded Gabbay wrote:
>> >>>>
>> >>>> On 22/07/14 12:21, Daniel Vetter wrote:
>> >>>>>
>> >>>>> On Tue, Jul 22, 2014 at 10:19 AM, Oded Gabbay
>> >
>> >>>>> wrote:
>> >>>>>>>
>> >>>>>>> Exactly, just prevent userspace from submitting more. And if
>> >>>>>>> you have misbehaving userspace that submits too much, reset
>> >>>>>>> the gpu and tell it that you're sorry but won't schedule any more
>work.
>> >>>>>>
>> >>>>>>
>> >>>>>> I'm not sure how you intend to know if a userspace misbehaves or
>not.
>> >>>>>> Can
>> >>>>>> you elaborate ?
>> >>>>>
>> >>>>>
>> >>>>> Well that's mostly policy, currently in i915 we only have a
>> >>>>> check for hangs, and if userspace hangs a bit too often then we stop
>it.
>> >>>>> I guess you can do that with the queue unmapping you've describe
>> >>>>> in reply to Jerome's mail.
>> >>>>> -Daniel
>> >>>>>
>> >>>> What do you mean by hang ? Like the tdr mechanism in Windows
>> >>>> (checks if a gpu job takes more than 2 seconds, I think, and if
>> >>>> so, terminates the job).
>> >>>
>> >>>
>> >>> Essentially yes. But we also have some hw features to kill jobs
>> >>> quicker, e.g. for media workloads.
>> >>> -Daniel
>> >>>
>> >>
>> >> Yeah, so this is what I'm talking about when I say that you and
>> >> Jerome come from a graphics POV and amdkfd come from a compute
>POV,
>> >> no
>> >offense intended.
>> >>
>> >> For compute jobs, we simply can't use this logic to terminate jobs.
>> >> Graphics are mostly Real-Time while compute jobs can take from a
>> >> few ms to a few hours!!! And I'm not talking about an entire
>> >> application runtime but on a single submission of jobs by the
>> >> userspace app. We have tests with jobs that take between 20-30
>> >> minutes to complete. In theory, we can even imagine a compute job
>> >> which takes 1 or 2 days (on
>> >larger APUs).
>> >>
>> >> Now, I understand the question of how do we prevent the compute job
>> >> from monopolizing the GPU, and internally here we have some ideas
>> >> that we will probably share in the next few days, but my point is
>> >> that I don't think we can terminate a compute job because it is
>> >> running for more
>> >than x seconds.
>> >> It is like you would terminate a CPU process which runs more than x
>> >seconds.
>> >>
>> >> I think this is a *very* important discussion (detecting a
>> >> misbehaved compute process) and I would like to continue it, but I
>> >> don't think moving the job submission from userspace control to
>> >> kernel control will solve this core p

[PATCH v2 00/25] AMDKFD kernel driver

2014-07-23 Thread Bridgman, John


>-Original Message-
>From: dri-devel [mailto:dri-devel-bounces at lists.freedesktop.org] On Behalf
>Of Bridgman, John
>Sent: Wednesday, July 23, 2014 11:07 AM
>To: Daniel Vetter
>Cc: Lewycky, Andrew; linux-mm; Daniel Vetter; Daenzer, Michel; linux-
>kernel at vger.kernel.org; Sellek, Tom; Skidanov, Alexey; dri-
>devel at lists.freedesktop.org; Andrew Morton
>Subject: RE: [PATCH v2 00/25] AMDKFD kernel driver
>
>
>
>>-Original Message-
>>From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel
>>Vetter
>>Sent: Wednesday, July 23, 2014 10:42 AM
>>To: Bridgman, John
>>Cc: Daniel Vetter; Gabbay, Oded; Jerome Glisse; Christian K?nig; David
>>Airlie; Alex Deucher; Andrew Morton; Joerg Roedel; Lewycky, Andrew;
>>Daenzer, Michel; Goz, Ben; Skidanov, Alexey;
>>linux-kernel at vger.kernel.org; dri- devel at lists.freedesktop.org;
>>linux-mm; Sellek, Tom
>>Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver
>>
>>On Wed, Jul 23, 2014 at 01:33:24PM +, Bridgman, John wrote:
>>>
>>>
>>> >-Original Message-
>>> >From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch]
>>> >Sent: Wednesday, July 23, 2014 3:06 AM
>>> >To: Gabbay, Oded
>>> >Cc: Jerome Glisse; Christian K?nig; David Airlie; Alex Deucher;
>>> >Andrew Morton; Bridgman, John; Joerg Roedel; Lewycky, Andrew;
>>> >Daenzer, Michel; Goz, Ben; Skidanov, Alexey;
>>> >linux-kernel at vger.kernel.org; dri- devel at lists.freedesktop.org;
>>> >linux-mm; Sellek, Tom
>>> >Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver
>>> >
>>> >On Wed, Jul 23, 2014 at 8:50 AM, Oded Gabbay 
>>> >wrote:
>>> >> On 22/07/14 14:15, Daniel Vetter wrote:
>>> >>>
>>> >>> On Tue, Jul 22, 2014 at 12:52:43PM +0300, Oded Gabbay wrote:
>>> >>>>
>>> >>>> On 22/07/14 12:21, Daniel Vetter wrote:
>>> >>>>>
>>> >>>>> On Tue, Jul 22, 2014 at 10:19 AM, Oded Gabbay
>>> >
>>> >>>>> wrote:
>>> >>>>>>>
>>> >>>>>>> Exactly, just prevent userspace from submitting more. And if
>>> >>>>>>> you have misbehaving userspace that submits too much, reset
>>> >>>>>>> the gpu and tell it that you're sorry but won't schedule any
>>> >>>>>>> more
>>work.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> I'm not sure how you intend to know if a userspace misbehaves
>>> >>>>>> or
>>not.
>>> >>>>>> Can
>>> >>>>>> you elaborate ?
>>> >>>>>
>>> >>>>>
>>> >>>>> Well that's mostly policy, currently in i915 we only have a
>>> >>>>> check for hangs, and if userspace hangs a bit too often then we
>>> >>>>> stop
>>it.
>>> >>>>> I guess you can do that with the queue unmapping you've
>>> >>>>> describe in reply to Jerome's mail.
>>> >>>>> -Daniel
>>> >>>>>
>>> >>>> What do you mean by hang ? Like the tdr mechanism in Windows
>>> >>>> (checks if a gpu job takes more than 2 seconds, I think, and if
>>> >>>> so, terminates the job).
>>> >>>
>>> >>>
>>> >>> Essentially yes. But we also have some hw features to kill jobs
>>> >>> quicker, e.g. for media workloads.
>>> >>> -Daniel
>>> >>>
>>> >>
>>> >> Yeah, so this is what I'm talking about when I say that you and
>>> >> Jerome come from a graphics POV and amdkfd come from a compute
>>POV,
>>> >> no
>>> >offense intended.
>>> >>
>>> >> For compute jobs, we simply can't use this logic to terminate jobs.
>>> >> Graphics are mostly Real-Time while compute jobs can take from a
>>> >> few ms to a few hours!!! And I'm not talking about an entire
>>> >> application runtime but on a single submission of jobs by the
>>> >> userspace app. We have tests with jobs that take between 20-30
>>> >> minutes to complete. In theory, we can even imagine a compute job
>>> >> which t

[PATCH v2 00/25] AMDKFD kernel driver

2014-07-23 Thread Bridgman, John

>-Original Message-
>From: dri-devel [mailto:dri-devel-bounces at lists.freedesktop.org] On Behalf
>Of Jesse Barnes
>Sent: Wednesday, July 23, 2014 5:00 PM
>To: dri-devel at lists.freedesktop.org
>Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver
>
>On Mon, 21 Jul 2014 19:05:46 +0200
>daniel at ffwll.ch (Daniel Vetter) wrote:
>
>> On Mon, Jul 21, 2014 at 11:58:52AM -0400, Jerome Glisse wrote:
>> > On Mon, Jul 21, 2014 at 05:25:11PM +0200, Daniel Vetter wrote:
>> > > On Mon, Jul 21, 2014 at 03:39:09PM +0200, Christian K?nig wrote:
>> > > > Am 21.07.2014 14:36, schrieb Oded Gabbay:
>> > > > >On 20/07/14 20:46, Jerome Glisse wrote:
>
>[snip!!]
My BlackBerry thumb thanks you ;)
>
>> > > >
>> > > > The main questions here are if it's avoid able to pin down the
>> > > > memory and if the memory is pinned down at driver load, by
>> > > > request from userspace or by anything else.
>> > > >
>> > > > As far as I can see only the "mqd per userspace queue" might be
>> > > > a bit questionable, everything else sounds reasonable.
>> > >
>> > > Aside, i915 perspective again (i.e. how we solved this): When
>> > > scheduling away from contexts we unpin them and put them into the
>> > > lru. And in the shrinker we have a last-ditch callback to switch
>> > > to a default context (since you can't ever have no context once
>> > > you've started) which means we can evict any context object if it's
>getting in the way.
>> >
>> > So Intel hardware report through some interrupt or some channel when
>> > it is not using a context ? ie kernel side get notification when
>> > some user context is done executing ?
>>
>> Yes, as long as we do the scheduling with the cpu we get interrupts
>> for context switches. The mechanic is already published in the
>> execlist patches currently floating around. We get a special context
>> switch interrupt.
>>
>> But we have this unpin logic already on the current code where we
>> switch contexts through in-line cs commands from the kernel. There we
>> obviously use the normal batch completion events.
>
>Yeah and we can continue that going forward.  And of course if your hw can
>do page faulting, you don't need to pin the normal data buffers.
>
>Usually there are some special buffers that need to be pinned for longer
>periods though, anytime the context could be active.  Sounds like in this case
>the userland queues, which makes some sense.  But maybe for smaller
>systems the size limit could be clamped to something smaller than 128M.  Or
>tie it into the rlimit somehow, just like we do for mlock() stuff.
>
Yeah, even the queues are in pageable memory, it's just a ~256 byte structure 
per queue (the Memory Queue Descriptor) that describes the queue to hardware, 
plus a couple of pages for each process using HSA to hold things like 
doorbells. Current thinking is to limit # processes using HSA to ~256 and 
#queues per process to ~1024 by default in the initial code, although my guess 
is that we could take the #queues per process default limit even lower.  

>> > The issue with radeon hardware AFAICT is that the hardware do not
>> > report any thing about the userspace context running ie you do not
>> > get notification when a context is not use. Well AFAICT. Maybe hardware
>do provide that.
>>
>> I'm not sure whether we can do the same trick with the hw scheduler.
>> But then unpinning hw contexts will drain the pipeline anyway, so I
>> guess we can just stop feeding the hw scheduler until it runs dry. And
>> then unpin and evict.
>
>Yeah we should have an idea which contexts have been fed to the scheduler,
>at least with kernel based submission.  With userspace submission we'll be in a
>tougher spot...  but as you say we can always idle things and unpin everything
>under pressure.  That's a really big hammer to apply though.
>
>> > Like the VMID is a limited resources so you have to dynamicly bind
>> > them so maybe we can only allocate pinned buffer for each VMID and
>> > then when binding a PASID to a VMID it also copy back pinned buffer to
>pasid unpinned copy.
>>
>> Yeah, pasid assignment will be fun. Not sure whether Jesse's patches
>> will do this already. We _do_ already have fun with ctx id assigments
>> though since we move them around (and the hw id is the ggtt address
>> afaik). So we need to remap them already. Not sure on the details for
>> pasid mapping, iirc it's a separate field somewhere in the context
>> struct. Jesse knows the details.
>
>The PASID space is a bit bigger, 20 bits iirc.  So we probably won't run out
>quickly or often.  But when we do I thought we could apply the same trick
>Linux uses for ASID management on SPARC and ia64 (iirc on sparc anyway,
>maybe MIPS too): "allocate" a PASID everytime you need one, but don't tie it
>to the process at all, just use it as a counter that lets you know when you 
>need
>to do a full TLB flush, then start the allocation process over.  This lets you
>minimize TLB flushing and gracefully handles oversubscription.

IIRC we h

[RFC] Using DC in amdgpu for upcoming GPU

2016-12-12 Thread Bridgman, John
Thanks Dave. Apologies in advance for top posting but I'm stuck on a mail 
client that makes a big mess when I try...


>If DC was ready for the next-gen GPU it would be ready for the current
>GPU, it's not the specific ASIC code that is the problem, it's the
>huge midlayer sitting in the middle.


We realize that (a) we are getting into the high-risk-of-breakage part of the 
rework and (b) no matter how much we change the code structure there's a good 
chance that a month after it goes upstream one of us is going to find that more 
structural changes are required.


I was kinda thinking that if we are doing high-risk activities (risk of subtle 
breakage not obvious regression, and/or risk of making structural changes that 
turn out to be a bad idea even though we all thought they were correct last 
week) there's an argument for doing it in code which only supports cards that 
people can't buy yet.


From: Dave Airlie 
Sent: December 11, 2016 9:57 PM
To: Wentland, Harry
Cc: dri-devel; amd-gfx mailing list; Bridgman, John; Deucher, Alexander; 
Lazare, Jordan; Cheng, Tony; Cyr, Aric; Grodzovsky, Andrey
Subject: Re: [RFC] Using DC in amdgpu for upcoming GPU

On 8 December 2016 at 12:02, Harry Wentland  wrote:
> We propose to use the Display Core (DC) driver for display support on
> AMD's upcoming GPU (referred to by uGPU in the rest of the doc). In order to
> avoid a flag day the plan is to only support uGPU initially and transition
> to older ASICs gradually.

[FAQ: from past few days]

1) Hey you replied to Daniel, you never addressed the points of the RFC!
I've read it being said that I hadn't addressed the RFC, and you know
I've realised I actually had, because the RFC is great but it
presupposes the codebase as designed can get upstream eventually, and
I don't think it can. The code is too littered with midlayering and
other problems, that actually addressing the individual points of the
RFC would be missing the main point I'm trying to make.

This code needs rewriting, not cleaning, not polishing, it needs to be
split into its constituent parts, and reintegrated in a form more
Linux process friendly.

I feel that if I reply to the individual points Harry has raised in
this RFC, that it means the code would then be suitable for merging,
which it still won't, and I don't want people wasting another 6
months.

If DC was ready for the next-gen GPU it would be ready for the current
GPU, it's not the specific ASIC code that is the problem, it's the
huge midlayer sitting in the middle.

2) We really need to share all of this code between OSes, why does
Linux not want it?

Sharing code is a laudable goal and I appreciate the resourcing
constraints that led us to the point at which we find ourselves, but
the way forward involves finding resources to upstream this code,
dedicated people (even one person) who can spend time on a day by day
basis talking to people in the open and working upstream, improving
other pieces of the drm as they go, reading atomic patches and
reviewing them, and can incrementally build the DC experience on top
of the Linux kernel infrastructure. Then having the corresponding
changes in the DC codebase happen internally to correspond to how the
kernel code ends up looking. Lots of this code overlaps with stuff the
drm already does, lots of is stuff the drm should be doing, so patches
to the drm should be sent instead.

3) Then how do we upstream it?
Resource(s) need(s) to start concentrating at splitting this thing up
and using portions of it in the upstream kernel. We don't land fully
formed code in the kernel if we can avoid it. Because you can't review
the ideas and structure as easy as when someone builds up code in
chunks and actually develops in the Linux kernel. This has always
produced better more maintainable code. Maybe the result will end up
improving the AMD codebase as well.

4) Why can't we put this in staging?
People have also mentioned staging, Daniel has called it a dead end,
I'd have considered staging for this code base, and I still might.
However staging has rules, and the main one is code in staging needs a
TODO list, and agreed criteria for exiting staging, I don't think we'd
be able to get an agreement on what the TODO list should contain and
how we'd ever get all things on it done. If this code ended up in
staging, it would most likely require someone dedicated to recreating
it in the mainline driver in an incremental fashion, and I don't see
that resource being available.

5) Why is a midlayer bad?
I'm not going to go into specifics on the DC midlayer, but we abhor
midlayers for a fair few reasons. The main reason I find causes the
most issues is locking. When you have breaks in code flow between
multiple layers, but having layers calling back into previous layers
it becomes near imposs

[RFC] Using DC in amdgpu for upcoming GPU

2016-12-12 Thread Bridgman, John
couple of typo fixes re: top posting and "only supports" -> "is only used for"


____
From: Bridgman, John
Sent: December 11, 2016 10:21 PM
To: Dave Airlie; Wentland, Harry
Cc: dri-devel; amd-gfx mailing list; Deucher, Alexander; Lazare, Jordan; Cheng, 
Tony; Cyr, Aric; Grodzovsky, Andrey
Subject: Re: [RFC] Using DC in amdgpu for upcoming GPU


Thanks Dave. Apologies in advance for top posting but I'm stuck on a mail 
client that makes a big mess when I try anything else...


>If DC was ready for the next-gen GPU it would be ready for the current
>GPU, it's not the specific ASIC code that is the problem, it's the
>huge midlayer sitting in the middle.


We realize that (a) we are getting into the high-risk-of-breakage part of the 
rework and (b) no matter how much we change the code structure there's a good 
chance that a month after it goes upstream one of us is going to find that more 
structural changes are required.


I was kinda thinking that if we are doing high-risk activities (risk of subtle 
breakage not obvious regression, and/or risk of making structural changes that 
turn out to be a bad idea even though we all thought they were correct last 
week) there's an argument for doing it in code which is only used for cards 
that people can't buy yet.


From: Dave Airlie 
Sent: December 11, 2016 9:57 PM
To: Wentland, Harry
Cc: dri-devel; amd-gfx mailing list; Bridgman, John; Deucher, Alexander; 
Lazare, Jordan; Cheng, Tony; Cyr, Aric; Grodzovsky, Andrey
Subject: Re: [RFC] Using DC in amdgpu for upcoming GPU

On 8 December 2016 at 12:02, Harry Wentland  wrote:
> We propose to use the Display Core (DC) driver for display support on
> AMD's upcoming GPU (referred to by uGPU in the rest of the doc). In order to
> avoid a flag day the plan is to only support uGPU initially and transition
> to older ASICs gradually.

[FAQ: from past few days]

1) Hey you replied to Daniel, you never addressed the points of the RFC!
I've read it being said that I hadn't addressed the RFC, and you know
I've realised I actually had, because the RFC is great but it
presupposes the codebase as designed can get upstream eventually, and
I don't think it can. The code is too littered with midlayering and
other problems, that actually addressing the individual points of the
RFC would be missing the main point I'm trying to make.

This code needs rewriting, not cleaning, not polishing, it needs to be
split into its constituent parts, and reintegrated in a form more
Linux process friendly.

I feel that if I reply to the individual points Harry has raised in
this RFC, that it means the code would then be suitable for merging,
which it still won't, and I don't want people wasting another 6
months.

If DC was ready for the next-gen GPU it would be ready for the current
GPU, it's not the specific ASIC code that is the problem, it's the
huge midlayer sitting in the middle.

2) We really need to share all of this code between OSes, why does
Linux not want it?

Sharing code is a laudable goal and I appreciate the resourcing
constraints that led us to the point at which we find ourselves, but
the way forward involves finding resources to upstream this code,
dedicated people (even one person) who can spend time on a day by day
basis talking to people in the open and working upstream, improving
other pieces of the drm as they go, reading atomic patches and
reviewing them, and can incrementally build the DC experience on top
of the Linux kernel infrastructure. Then having the corresponding
changes in the DC codebase happen internally to correspond to how the
kernel code ends up looking. Lots of this code overlaps with stuff the
drm already does, lots of is stuff the drm should be doing, so patches
to the drm should be sent instead.

3) Then how do we upstream it?
Resource(s) need(s) to start concentrating at splitting this thing up
and using portions of it in the upstream kernel. We don't land fully
formed code in the kernel if we can avoid it. Because you can't review
the ideas and structure as easy as when someone builds up code in
chunks and actually develops in the Linux kernel. This has always
produced better more maintainable code. Maybe the result will end up
improving the AMD codebase as well.

4) Why can't we put this in staging?
People have also mentioned staging, Daniel has called it a dead end,
I'd have considered staging for this code base, and I still might.
However staging has rules, and the main one is code in staging needs a
TODO list, and agreed criteria for exiting staging, I don't think we'd
be able to get an agreement on what the TODO list should contain and
how we'd ever get all things on it done. If this code ended up in
staging, it would most likely require someone dedicated to recreating
it in the ma

[RFC] Using DC in amdgpu for upcoming GPU

2016-12-12 Thread Bridgman, John
v3 with typo fixes and additional comments/questions..



From: Bridgman, John
Sent: December 11, 2016 10:21 PM
To: Dave Airlie; Wentland, Harry
Cc: dri-devel; amd-gfx mailing list; Deucher, Alexander; Lazare, Jordan; Cheng, 
Tony; Cyr, Aric; Grodzovsky, Andrey
Subject: Re: [RFC] Using DC in amdgpu for upcoming GPU


Thanks Dave. Apologies in advance for top posting but I'm stuck on a mail 
client that makes a big mess when I try anything else...


>This code needs rewriting, not cleaning, not polishing, it needs to be
>split into its constituent parts, and reintegrated in a form more
>Linux process friendly.


Can we say "restructuring" just for consistency with Daniel's message (the 
HW-dependent bits don't need to be rewritten but the way they are used/called 
needs to change) ?


>I feel that if I reply to the individual points Harry has raised in
>this RFC, that it means the code would then be suitable for merging,
>which it still won't, and I don't want people wasting another 6
>months.


That's fair. There was an implicit "when it's suitable" assumption in the RFC, 
but we'll make that explicit in the future.


>If DC was ready for the next-gen GPU it would be ready for the current
>GPU, it's not the specific ASIC code that is the problem, it's the
>huge midlayer sitting in the middle.


We realize that (a) we are getting into the high-risk-of-breakage part of the 
rework and (b) no matter how much we change the code structure there's a good 
chance that a month after it goes upstream one of us is going to find that more 
structural changes are required.


I was kinda thinking that if we are doing high-risk activities (risk of subtle 
breakage not obvious regression, and/or risk of making structural changes that 
turn out to be a bad idea even though we all thought they were correct last 
week) there's an argument for doing it in code which is only used for cards 
that people can't buy yet.

____
From: Dave Airlie 
Sent: December 11, 2016 9:57 PM
To: Wentland, Harry
Cc: dri-devel; amd-gfx mailing list; Bridgman, John; Deucher, Alexander; 
Lazare, Jordan; Cheng, Tony; Cyr, Aric; Grodzovsky, Andrey
Subject: Re: [RFC] Using DC in amdgpu for upcoming GPU

On 8 December 2016 at 12:02, Harry Wentland  wrote:
> We propose to use the Display Core (DC) driver for display support on
> AMD's upcoming GPU (referred to by uGPU in the rest of the doc). In order to
> avoid a flag day the plan is to only support uGPU initially and transition
> to older ASICs gradually.

[FAQ: from past few days]

1) Hey you replied to Daniel, you never addressed the points of the RFC!
I've read it being said that I hadn't addressed the RFC, and you know
I've realised I actually had, because the RFC is great but it
presupposes the codebase as designed can get upstream eventually, and
I don't think it can. The code is too littered with midlayering and
other problems, that actually addressing the individual points of the
RFC would be missing the main point I'm trying to make.

This code needs rewriting, not cleaning, not polishing, it needs to be
split into its constituent parts, and reintegrated in a form more
Linux process friendly.

I feel that if I reply to the individual points Harry has raised in
this RFC, that it means the code would then be suitable for merging,
which it still won't, and I don't want people wasting another 6
months.

If DC was ready for the next-gen GPU it would be ready for the current
GPU, it's not the specific ASIC code that is the problem, it's the
huge midlayer sitting in the middle.

2) We really need to share all of this code between OSes, why does
Linux not want it?

Sharing code is a laudable goal and I appreciate the resourcing
constraints that led us to the point at which we find ourselves, but
the way forward involves finding resources to upstream this code,
dedicated people (even one person) who can spend time on a day by day
basis talking to people in the open and working upstream, improving
other pieces of the drm as they go, reading atomic patches and
reviewing them, and can incrementally build the DC experience on top
of the Linux kernel infrastructure. Then having the corresponding
changes in the DC codebase happen internally to correspond to how the
kernel code ends up looking. Lots of this code overlaps with stuff the
drm already does, lots of is stuff the drm should be doing, so patches
to the drm should be sent instead.

3) Then how do we upstream it?
Resource(s) need(s) to start concentrating at splitting this thing up
and using portions of it in the upstream kernel. We don't land fully
formed code in the kernel if we can avoid it. Because you can't review
the ideas and structure as easy as when someone builds up code in
chunks and actually 

[RFC] Using DC in amdgpu for upcoming GPU

2016-12-12 Thread Bridgman, John
Yep, good point. We have tended to stay a bit behind bleeding edge because our 
primary tasks so far have been:


1. Support enterprise distros (with old kernels) via the hybrid driver 
(AMDGPU-PRO), where the closer to upstream we get the more of a gap we have to 
paper over with KCL code


2. Push architecturally simple code (new GPU support) upstream, where being 
closer to upstream makes the up-streaming task simpler but not by that much


So 4.7 isn't as bad a compromise as it might seem.


That said, in the case of DAL/DC it's a different story as you say... 
architecturally complex code needing to be woven into a fast-moving subsystem 
of the kernel. So for DAL/DC anything other than upstream is going to be a big 
pain.


OK, need to think that through.


Thanks !


From: dri-devel  on behalf of 
Daniel Vetter 
Sent: December 12, 2016 2:22 AM
To: Wentland, Harry
Cc: Grodzovsky, Andrey; amd-gfx at lists.freedesktop.org; dri-devel at 
lists.freedesktop.org; Deucher, Alexander; Cheng, Tony
Subject: Re: [RFC] Using DC in amdgpu for upcoming GPU

On Wed, Dec 07, 2016 at 09:02:13PM -0500, Harry Wentland wrote:
> Current version of DC:
>
>  * 
> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/display?h=amd-staging-4.7
>
> Once Alex pulls in the latest patches:
>
>  * 
> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/display?h=amd-staging-4.7

One more: That 4.7 here is going to be unbelievable amounts of pain for
you. Yes it's a totally sensible idea to just freeze your baseline kernel
because then linux looks a lot more like Windows where the driver abi is
frozen. But it makes following upstream entirely impossible, because
rebasing is always a pain and hence postponed. Which means you can't just
use the latest stuff in upstream drm, which means collaboration with
others and sharing bugfixes in core is a lot more pain, which then means
you do more than necessary in your own code and results in HALs like DAL,
perpetuating the entire mess.

So I think you don't just need to demidlayer DAL/DC, you also need to
demidlayer your development process. In our experience here at Intel that
needs continuous integration testing (in drm-tip), because even 1 month of
not resyncing with drm-next is sometimes way too long. See e.g. the
controlD regression we just had. And DAL is stuck on a 1 year old kernel,
so pretty much only of historical significance and otherwise dead code.

And then for any stuff which isn't upstream yet (like your internal
enabling, or DAL here, or our own internal enabling) you need continuous
rebasing&re-validation. When we started doing this years ago it was still
manually, but we still rebased like every few days to keep the pain down
and adjust continuously to upstream evolution. But then going to a
continous rebase bot that sends you mail when something goes wrong was
again a massive improvement.

I guess in the end Conway's law that your software architecture
necessarily reflects how you organize your teams applies again. Fix your
process and it'll become glaringly obvious to everyone involved that
DC-the-design as-is is entirely unworkeable and how it needs to be fixed.

>From my own experience over the past few years: Doing that is a fun
journey ;-)

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
-- next part --
An HTML attachment was scrubbed...
URL: 



[RFC] Using DC in amdgpu for upcoming GPU

2016-12-13 Thread Bridgman, John
>>If the Linux community contributes to DC, I guess those contributions
can generally be assumed to be GPLv2 licensed.  Yet a future version
of the macOS driver would incorporate those contributions in the same
binary as their closed source OS-specific portion.


My understanding of the "general rule" was that contributions are normally 
assumed to be made under the "local license", ie GPLv2 for kernel changes in 
general, but the appropriate lower-level license when made to a specific 
subsystem with a more permissive license (eg the X11 license aka MIT aka "GPL 
plus additional rights" license we use for almost all of the graphics 
subsystem. If DC is not X11 licensed today it should be (but I'm pretty sure it 
already is).


We need to keep the graphics subsystem permissively licensed in general to 
allow uptake by other free OS projects such as *BSD, not just closed source.


Either way, driver-level maintainers are going to have to make sure that 
contributions have clear licensing.


Thanks,

John


From: dri-devel  on behalf of Lukas 
Wunner 
Sent: December 13, 2016 4:40 AM
To: Cheng, Tony
Cc: Grodzovsky, Andrey; dri-devel; amd-gfx mailing list; Deucher, Alexander
Subject: Re: [RFC] Using DC in amdgpu for upcoming GPU

On Mon, Dec 12, 2016 at 09:52:08PM -0500, Cheng, Tony wrote:
> With DC the display hardware programming, resource optimization, power
> management and interaction with rest of system will be fully validated
> across multiple OSs.

Do I understand DAL3.jpg correctly that the macOS driver builds on top
of DAL Core?  I'm asking because the graphics drivers shipping with
macOS as well as on Apple's EFI Firmware Volume are closed source.

If the Linux community contributes to DC, I guess those contributions
can generally be assumed to be GPLv2 licensed.  Yet a future version
of the macOS driver would incorporate those contributions in the same
binary as their closed source OS-specific portion.

I don't quite see how that would be legal but maybe I'm missing
something.

Presumably the situation with the Windows driver is the same.

I guess you could maintain a separate branch sans community contributions
which would serve as a basis for closed source drivers, but not sure if
that is feasible given your resource constraints.

Thanks,

Lukas
___
dri-devel mailing list
dri-devel at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
-- next part --
An HTML attachment was scrubbed...
URL: 



HDMI Audio screwed up w/ recent kernels

2016-12-27 Thread Bridgman, John
IIRC it depends on the chip generation - for example amdgpu does not yet 
include HDMI audio support on SI parts.


From: dri-devel  on behalf of 
Daniel Vetter 
Sent: December 27, 2016 1:20 PM
To: James Cloos
Cc: dri-devel at lists.freedesktop.org
Subject: Re: HDMI Audio screwed up w/ recent kernels

On Tue, Dec 27, 2016 at 12:17:43PM -0500, James Cloos wrote:
> > "DV" == Daniel Vetter  writes:
>
> DV> amdgpu doesn't yet support hdmi audio.
>
> Then why does it support the amdgpu.audio command line option, and why
> does booting an amdgpu kernel with amdgpu.audio=1 sound the same as
> booting a radeon kernel w/ radeon.audio=1?

Hm, I thought there was in issue there still. Anyway, was just a drive-by
comment, please ignore me ;-)
-Daniel

>
> In linux/drivers/gpu/drm/amd/amdgpu:
>
>   :; grep -l audio_enable *.c|xargs
>   dce_v10_0.c dce_v11_0.c dce_v6_0.c dce_v8_0.c
>
>   :; grep -l amdgpu_audio *.c |xargs
>   amdgpu_connectors.c amdgpu_display.c amdgpu_drv.c atombios_encoders.c
>   dce_v10_0.c dce_v11_0.c dce_v6_0.c dce_v8_0.c
>
> -JimC
> --
> James Cloos  OpenPGP: 0x997A9F17ED7DAEA6

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
-- next part --
An HTML attachment was scrubbed...
URL: 



[PATCH] hsakmt: allow building with gcc 4.x

2016-03-29 Thread Bridgman, John
The hsakmt code requires c99 support, however gcc 4.x defaults to
c89 while gcc 5 defaults to c11. Adding this macro provides c99
support on older gcc while not forcing gcc 5 back from c11 to c99.

Signed-off-by: John Bridgman 
---
 configure.ac | 1 +
 1 file changed, 1 insertion(+)

diff --git a/configure.ac b/configure.ac
index b8e9bea..8f32cbb 100644
--- a/configure.ac
+++ b/configure.ac
@@ -66,6 +66,7 @@ AC_CANONICAL_HOST
 AC_PROG_AWK
 test_CFLAGS=${CFLAGS+set} # We may override autoconf default CFLAGS.
 AC_PROG_CC
+AC_PROG_CC_STDC
 AC_PROG_INSTALL
 AC_PROG_LIBTOOL
 AC_PROG_MAKE_SET
-- 
1.9.1



[PATCH] hsakmt: allow building with gcc 4.x

2016-03-29 Thread Bridgman, John

>-Original Message-
>From: Emil Velikov [mailto:emil.l.velikov at gmail.com]
>Sent: Tuesday, March 29, 2016 4:08 PM
>To: Bridgman, John
>Cc: dri-devel at lists.freedesktop.org
>Subject: Re: [PATCH] hsakmt: allow building with gcc 4.x
>
>Hi John,
>
>On 29 March 2016 at 16:39, Bridgman, John 
>wrote:
>> The hsakmt code requires c99 support, however gcc 4.x defaults to
>> c89 while gcc 5 defaults to c11. Adding this macro provides c99
>> support on older gcc while not forcing gcc 5 back from c11 to c99.
>>
>> Signed-off-by: John Bridgman 
>> ---
>>  configure.ac | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/configure.ac b/configure.ac index b8e9bea..8f32cbb 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -66,6 +66,7 @@ AC_CANONICAL_HOST
>>  AC_PROG_AWK
>>  test_CFLAGS=${CFLAGS+set} # We may override autoconf default CFLAGS.
>>  AC_PROG_CC
>> +AC_PROG_CC_STDC
>Some versions of autoconf have AC_PROG_CC_STDC as obsolete, while
>others will silently fall back to C89 (according to the autoconf ML).
>
>I've used AC_PROG_CC_C99 and $ac_cv_prog_cc_c99 for libdrm [1]. Did not
>have old enough compiler to test it against though :-\

Hi Emil,

I looked at both AC_PROG_CC_STDC and _C99 options, but found more anecdotal 
concerns about _C99 so went with _STDC for the patch... but after sending it 
out I realized that the concerns I found seemed to apply to both STDC and C99 
equally. I read about _STDC being obsoleted but also noticed that it was 
"unobsoleted" in autoconf 2.60, the same release where AC_PROG_CC_C99 was 
apparently added.

I did run across a 2013 email suggesting that _C99 and _STDC were *both* going 
to be obsoleted, but was not able to find any further discussion and did not 
see any mention of either macro in subsequent autoconf release notes. 

Anyways, if you are using AC_PROG_CC_C99 in libdrm then I think it makes sense 
to do the same in libhsakmt. I will spin a v2 of the patch.

Thanks,
John
>
>-Emil
>
>[1]
>https://cgit.freedesktop.org/mesa/drm/commit/?id=e59f00fb43c2b83bdadb1
>7fa35c3018f817a3806


[PATCH] hsakmt: allow building with gcc 4.x v2

2016-03-29 Thread Bridgman, John
The hsakmt code requires C99 compiler support, however gcc 4.x
defaults to C89 (gcc 5 defaults to C11). v2 patch copies code
from libdrm, using AC_PROG_CC_C99 and checking success.

v1 used AC_PROG_CC_STDC and did not check C99 was enabled.

Signed-off-by: John Bridgman 
---
 configure.ac | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/configure.ac b/configure.ac
index b8e9bea..0111067 100644
--- a/configure.ac
+++ b/configure.ac
@@ -66,6 +66,12 @@ AC_CANONICAL_HOST
 AC_PROG_AWK
 test_CFLAGS=${CFLAGS+set} # We may override autoconf default CFLAGS.
 AC_PROG_CC
+AC_PROG_CC_C99
+
+if test "x$ac_cv_prog_cc_c99" = xno; then
+   AC_MSG_ERROR([Building hsakmt requires C99 enabled compiler])
+fi
+
 AC_PROG_INSTALL
 AC_PROG_LIBTOOL
 AC_PROG_MAKE_SET
-- 
1.9.1



[PATCH v3 00/23] AMDKFD Kernel Driver

2014-08-05 Thread Bridgman, John
))To be clear, when we ask for open source userspace that shows how things are 
suppose to be use we are thinking something like mesa but in this case most 
likely something like an open source opencl implementation on top of that 
kernel api.

Yep, understood. We're working on that too. Next should be the HSA API runtime, 
which is essentially the user mode driver for HSA that language toolchains run 
over.

I think Sumatra (Java) will probably be the first open source language runtime 
rather than OpenCL -- it's working today albeit via an older version of the HSA 
API.

Thanks,
JB



- Original Message -
From: Jerome Glisse [mailto:j.gli...@gmail.com]
Sent: Tuesday, August 05, 2014 01:51 PM Eastern Standard Time
To: Gabbay, Oded
Cc: Lewycky, Andrew; Daenzer, Michel; linux-kernel at vger.kernel.org 
; dri-devel at lists.freedesktop.org 
; Andrew Morton 
Subject: Re: [PATCH v3 00/23] AMDKFD Kernel Driver

On Tue, Aug 05, 2014 at 06:30:28PM +0300, Oded Gabbay wrote:
> Hi,
> Here is the v3 patch set of amdkfd.
> 
> This version contains changes and fixes to code, as agreed on during the 
> review
> of the v2 patch set.
> 
> The major changes are:
> 
> - There are two new module parameters: # of processes and # of queues per 
>   process. The defaults, as agreed on in the v2 review, are 32 and 128 
>   respectively. This sets the default amount of GART address space that amdkfd
>   requires to 3.5MB (3MB for userspace queues mqds and 0.5MB for other stuff,
>   such as mqd for kernel queue, hpd for pipelines, etc.)
>   
> - All the GART address space usage of amdkfd is done inside a single 
> contiguous
>   buffer that is allocated from system memory, and pinned to the start of the 
>   GART during the startup of amdkfd (which is just after the startup of 
>   radeon). The management of this buffer is done by the radeon sa manager. 
>   This buffer is not evict-able.
>   
> - Mapping of doorbells is initiated by the userspace lib (by mmap syscall), 
>   instead of initiating it from inside an ioctl (using vm_mmap).
>   
> - Removed ioctls for exclusive access to performance counters
>   
> - Added documentation about the QCM (Queue Control Management), apertures and
>   interfaces between amdkfd and radeon.
> 
> Two important notes:
> 
> - The topology patch has not been changed. Look at 
>   http://lists.freedesktop.org/archives/dri-devel/2014-July/065042.html
>   for my response. I also put my answer as an explanation in the commit msg
>   of the patch.
>   
> - There are still some minor code style issues I need to fix. I didn't want
>   to delay v3 any further but I will publish either v4 with those fixes,
>   or just relevant patches if the whole patch set will be merged.
> 
> For people who like to review using git, the v3 patch set is located at:
> http://cgit.freedesktop.org/~gabbayo/linux/log/?h=kfd-next-3.17-v3
> 
> In addition, I would like to announce that we have uploaded the userspace lib
> that accompanies amdkfd. That lib is called "libhsakmt" and you can view it 
> at:
> http://cgit.freedesktop.org/~gabbayo/libhsakmt

Not commenting on the patchset yet, will try to find sometime in my non work
hour to do that. But the userspace you released are just a libdrm like thing
and this is not what we mean by we need to have userspace that shows how the
kernel api is use.

So this library is nothing but a wrapper and have allmost no value for any
serious review of the kernel api.

To be clear, when we ask for open source userspace that shows how things are
suppose to be use we are thinking something like mesa but in this case most
likely something like an open source opencl implementation on top of that
kernel api.


Btw this library code remind me of VHDL ... thought code style for userspace
library is anybody choice.

Cheers,
J?r?me

> 
> Alexey Skidanov (1):
>   amdkfd: Implement the Get Process Aperture IOCTL
> 
> Andrew Lewycky (3):
>   amdkfd: Add basic modules to amdkfd
>   amdkfd: Add interrupt handling module
>   amdkfd: Implement the Set Memory Policy IOCTL
> 
> Ben Goz (8):
>   amdkfd: Add queue module
>   amdkfd: Add mqd_manager module
>   amdkfd: Add kernel queue module
>   amdkfd: Add module parameter of scheduling policy
>   amdkfd: Add packet manager module
>   amdkfd: Add process queue manager module
>   amdkfd: Add device queue manager module
>   amdkfd: Implement the create/destroy/update queue IOCTLs
> 
> Evgeny Pinchuk (2):
>   amdkfd: Add topology module to amdkfd
>   amdkfd: Implement the Get Clock Counters IOCTL
> 
> Oded Gabbay (9):
>   drm/radeon: reduce number of free VMIDs and pipes in KV
>   drm/radeon/cik: Don't touch int of pipes 1-7
>   drm/radeon: Report doorbell configuration to amdkfd
>   drm/radeon: adding synchronization for GRBM GFX
>   drm/radeon: Add radeon <--> amdkfd interface
>   Update MAINTAINERS and CREDITS files with amdkfd info
>   amdkfd: Add IOCTL set definitions of amdkfd
>   amdkfd: Add amdkfd skeleton driver
>   amdkfd: Add binding/unbinding

[PATCH 49/88] drm/amdgpu: remove AMDGPU_GEM_CREATE_CPU_GTT_UC

2015-06-12 Thread Bridgman, John

From: dri-devel [dri-devel-bounces at lists.freedesktop.org] on behalf of Emil 
Velikov [emil.l.veli...@gmail.com]
Sent: June 12, 2015 11:47 AM
To: Alex Deucher
Cc: ML dri-devel
Subject: Re: [PATCH 49/88] drm/amdgpu: remove AMDGPU_GEM_CREATE_CPU_GTT_UC

On 27 May 2015 at 04:19, Alex Deucher  wrote:
> From: Jammy Zhou 
>
> This flag isn't used by user mode drivers, remove it to avoid
> confusion. And rename GTT_WC to GTT_USWC to make it clear.
>
-Just a wild question:
-Assuming that user mode drivers means UMS, does this mean that there
-will be such drivers in the future ? Or is that what's the nature of
-the proprietary/binary drivers ?

I believe "user mode drivers" in this context means open or closed source GL 
drivers. The plan is to always use open source X driver and KMS.

-Thanks
-Emil
___
dri-devel mailing list
dri-devel at lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[hsakmt] hsakmt organization and formal releases

2015-10-18 Thread Bridgman, John
Hi Oded,

Looks good. We now have a nice automated build/test system running internally, 
I imagine automake/autoconf should be able to fit OK into that (although I guess
it might trigger the usual religious war about build tools :))

re: #2, IIRC we used to do the even/odd numbering on minor not micro, is that
a deliberate change ?

I'm sure I'll get questions about the hsakmt/hsakmt folder structure but if you 
get rid of the src folder not sure what would be better. Just curious, why not 
keep
build files at top level then have src/include folders underneath ?

That include/linux folder has been bugging me for a while so we'll probably get
rid of it at some point. Don't remember the history so unless someone else 
does...

Everything else looks good. Thanks !!

  Original Message
From: Oded Gabbay
Sent: Sunday, October 18, 2015 2:39 AM
To: Bridgman, John
Cc: Maling list - DRI developers; Alex Deucher
Subject: Re: [hsakmt] hsakmt organization and formal releases


On Fri, Oct 9, 2015 at 7:46 PM, Alex Deucher  wrote:
> On Tue, Oct 6, 2015 at 8:00 AM, Oded Gabbay  wrote:
>> Hi,
>>
>> I had some time during the recent local holidays, so I thought I
>> improve the hsakmt library in terms of releases:
>>
>> 1. I added automake/autoconf files to standardize the package to be
>> created using configure/make/make install.
>>
>> 2. I created a very simple scheme of numbering so we could track releases.
>>
>> 3. I created a git repository under freedesktop.org amd's folder that
>> will hold hsakmt code (instead of using my private git repo). You can
>> clone the new repo from: git://anongit.freedesktop.org/amd/hsakmt
>>
>> 4. I created two new sub-components in freedesktop.org bugzilla, under
>> DRI, for hsakmt and amdkfd, so bugs can be filed correctly.
>>
>> As part of point 1, I rearranged the layout of the source files a bit,
>> although I kept the formation of the include files so it would be easy
>> to use inside AMD :)
>>
>> I would like to get (positive) feedback on this, and then I will
>> create the first official release and also F22, F23 and F24 (rawhide)
>> rpm packages that will be part of the distribution.
>>
>> As a reminder, this repository will be used to manage only the
>> upstream version of hsakmt. Private/non-yet-upstreamed releases of AMD
>> are done in GitHub.
>
> Looks good to me!  thanks,
>
> Alex
>
>>
>> Thanks,
>>
>>Oded

John, any comment from you ?
If not, I assume I can go ahead.

  Oded


[hsakmt] hsakmt organization and formal releases

2015-10-18 Thread Bridgman, John
Hi Oded,

Looking at it now...

  Original Message
From: Oded Gabbay
Sent: Sunday, October 18, 2015 2:39 AM
To: Bridgman, John
Cc: Maling list - DRI developers; Alex Deucher
Subject: Re: [hsakmt] hsakmt organization and formal releases


On Fri, Oct 9, 2015 at 7:46 PM, Alex Deucher  wrote:
> On Tue, Oct 6, 2015 at 8:00 AM, Oded Gabbay  wrote:
>> Hi,
>>
>> I had some time during the recent local holidays, so I thought I
>> improve the hsakmt library in terms of releases:
>>
>> 1. I added automake/autoconf files to standardize the package to be
>> created using configure/make/make install.
>>
>> 2. I created a very simple scheme of numbering so we could track releases.
>>
>> 3. I created a git repository under freedesktop.org amd's folder that
>> will hold hsakmt code (instead of using my private git repo). You can
>> clone the new repo from: git://anongit.freedesktop.org/amd/hsakmt
>>
>> 4. I created two new sub-components in freedesktop.org bugzilla, under
>> DRI, for hsakmt and amdkfd, so bugs can be filed correctly.
>>
>> As part of point 1, I rearranged the layout of the source files a bit,
>> although I kept the formation of the include files so it would be easy
>> to use inside AMD :)
>>
>> I would like to get (positive) feedback on this, and then I will
>> create the first official release and also F22, F23 and F24 (rawhide)
>> rpm packages that will be part of the distribution.
>>
>> As a reminder, this repository will be used to manage only the
>> upstream version of hsakmt. Private/non-yet-upstreamed releases of AMD
>> are done in GitHub.
>
> Looks good to me!  thanks,
>
> Alex
>
>>
>> Thanks,
>>
>>Oded

John, any comment from you ?
If not, I assume I can go ahead.

  Oded


[hsakmt] hsakmt organization and formal releases

2015-10-18 Thread Bridgman, John
Thanks Oded. Just to be clear I wasn't talking about getting rid of the include 
folder, just the include/linux subfolder.

From: Oded Gabbay [oded.gab...@gmail.com]
Sent: October 18, 2015 6:51 AM
To: Bridgman, John
Cc: Maling list - DRI developers; Alex Deucher
Subject: Re: [hsakmt] hsakmt organization and formal releases

On Sun, Oct 18, 2015 at 1:14 PM, Bridgman, John  
wrote:
> Hi Oded,
>
> Looks good. We now have a nice automated build/test system running internally,
> I imagine automake/autoconf should be able to fit OK into that (although I 
> guess
> it might trigger the usual religious war about build tools :))
>
> re: #2, IIRC we used to do the even/odd numbering on minor not micro, is that
> a deliberate change ?
>
Actually, the whole version numbering in hsakmt is confusing...
Up until now, there were:

1. HSAKMT_VERSION_MAJOR/HSAKMT_VERSION_MINOR in hsakmttypes.h. I have
no idea for what those defines are used - probably left-overs from
windows thunk design. Maybe Paul knows.

2. KFD_IOCTL_MAJOR_VERSION/KFD_IOCTL_MINOR_VERSION in kfd_ioctl.h.
That's left-overs from the time we returned those defines to the HSA
RT layer. Now, we get those from the amdkfd using an IOCTL call. It
allows the HSA RT to know the features that amdkfd provides. I think
there were discussions of replacing them with some bitmask field, but
I don't know the current status.

Therefore, I didn't want to rely on any of the above, so I created a
new system in configure.ac that will be used just to provide numbers
for the sake of releases, as the two existing methods don't change
according to releases.

In my scheme, major will be used for ABI breakage and/or when minor
gets to big numbers, minor for important releases and micro for
differentiating between a released version and in-development version.

e.g. 1.0.4 - released version
   1.0.5 - version in development.
   1.0.6 - next released version.
   1.0.7 - next version in development.
   1.1.0 - next important/big release
   1.1.1 - development tree after the release

> I'm sure I'll get questions about the hsakmt/hsakmt folder structure but if 
> you
> get rid of the src folder not sure what would be better. Just curious, why 
> not keep
> build files at top level then have src/include folders underneath ?
>

This folder structure is standard when using autoconf tools. So you
have a root folder, i.e. hsakmt, then you have the folder which
contains sources (and headers), which is the same name. You can have
other folders at this level, for "tests", "docs", etc. Also, folders
are created automatically at this level, e.g. autom4te.cache

Under the "second" hsakmt folder, you have your project sources, in
any structure you want. The resulting build outputs (objects,
libraries, etc.) are included in those folders and in hidden .deps,
.libs folders. They are ignored using .gitignore files.

I did it according to the pixman library which I now also maintain.
Initially, I even deleted the include folder, but I returned it to
make the change less radical for you (see below).

> That include/linux folder has been bugging me for a while so we'll probably 
> get
> rid of it at some point. Don't remember the history so unless someone else 
> does...
>
The reason for the include folder is to map it directly to the
internal source control tool you use, so that it can be shared between
windows and linux development.
The reason for include/linux is to separate linux-only files from
windows-only files.

As I said above, I have no problem removing these, but I feared it
might be too radical for you...
If you want to see how it will look like without the include folder,
check out 
http://cgit.freedesktop.org/amd/hsakmt/tree/?id=e8a6286922e9add2d34ca29fd2b3c6b3ace35f69

Thanks,

Oded

> Everything else looks good. Thanks !!
> 
>   Original Message
> From: Oded Gabbay
> Sent: Sunday, October 18, 2015 2:39 AM
> To: Bridgman, John
> Cc: Maling list - DRI developers; Alex Deucher
> Subject: Re: [hsakmt] hsakmt organization and formal releases
>
>
> On Fri, Oct 9, 2015 at 7:46 PM, Alex Deucher  wrote:
>> On Tue, Oct 6, 2015 at 8:00 AM, Oded Gabbay  wrote:
>>> Hi,
>>>
>>> I had some time during the recent local holidays, so I thought I
>>> improve the hsakmt library in terms of releases:
>>>
>>> 1. I added automake/autoconf files to standardize the package to be
>>> created using configure/make/make install.
>>>
>>> 2. I created a very simple scheme of numbering so we could track releases.
>>>
>>> 3. I created a git repository under freedesktop.org amd's folder that
>>> will hold hsakmt code (instead of