Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
> I has test dmabuf based drm gem module for exynos and I found one problem.
> you can refer to this test repository:
> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
> 
> at this repository, I added some exception codes for resource release
> in addition to Dave's patch sets.
> 
> let's suppose we use dmabuf based vb2 and drm gem with physically
> continuous memory(no IOMMU) and we try to share allocated buffer
> between them(v4l2 and drm driver).
> 
> 1. request memory allocation through drm gem interface.
> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
> gem object.
> - internally, private gem based dmabuf moudle calls drm_buf_export()
> to register allocated gem object to fd.
> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
> buffer to v4l2 based device.
> - internally, vb2 plug in module gets a buffer to the fd and then
> calls dmabuf->ops->map_dmabuf() callback to get the sg table
> containing physical memory info to the gem object. and then the
> physical memory info would be copied to vb2_xx_buf object.
> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
> this repository:
> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
> 
> after that, if v4l2 driver want to release vb2_xx_buf object with
> allocated memory region by user request, how should we do?. refcount
> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
> object is released videobuf2 framework don't know who is using the
> physical memory region. so this physical memory region is released and
> when drm driver tries to access the region or to release it also, a
> problem would be induced.
> 
> for this problem, I added get_shared_cnt() callback to dma-buf.h but
> I'm not sure that this is good way. maybe there may be better way.
> if there is any missing point, please let me know.

The dma_buf object needs to hold a reference on the underlying
(necessarily reference-counted) buffer object when the exporter creates
the dma_buf handle. This reference should then get dropped in the
exporters dma_buf->ops->release() function, which is only getting called
when the last reference to the dma_buf disappears.

If this doesn't work like that currently, we have a bug, and exporting the
reference count or something similar can't fix that.

Yours, Daniel

PS: Please cut down the original mail when replying, otherwise it's pretty
hard to find your response ;-)
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Dave Airlie
On Mon, Jan 9, 2012 at 8:10 AM, Daniel Vetter  wrote:
> On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
>> I has test dmabuf based drm gem module for exynos and I found one problem.
>> you can refer to this test repository:
>> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
>>
>> at this repository, I added some exception codes for resource release
>> in addition to Dave's patch sets.
>>
>> let's suppose we use dmabuf based vb2 and drm gem with physically
>> continuous memory(no IOMMU) and we try to share allocated buffer
>> between them(v4l2 and drm driver).
>>
>> 1. request memory allocation through drm gem interface.
>> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
>> gem object.
>> - internally, private gem based dmabuf moudle calls drm_buf_export()
>> to register allocated gem object to fd.
>> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
>> buffer to v4l2 based device.
>> - internally, vb2 plug in module gets a buffer to the fd and then
>> calls dmabuf->ops->map_dmabuf() callback to get the sg table
>> containing physical memory info to the gem object. and then the
>> physical memory info would be copied to vb2_xx_buf object.
>> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
>> this repository:
>> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
>>
>> after that, if v4l2 driver want to release vb2_xx_buf object with
>> allocated memory region by user request, how should we do?. refcount
>> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
>> object is released videobuf2 framework don't know who is using the
>> physical memory region. so this physical memory region is released and
>> when drm driver tries to access the region or to release it also, a
>> problem would be induced.
>>
>> for this problem, I added get_shared_cnt() callback to dma-buf.h but
>> I'm not sure that this is good way. maybe there may be better way.
>> if there is any missing point, please let me know.
>
> The dma_buf object needs to hold a reference on the underlying
> (necessarily reference-counted) buffer object when the exporter creates
> the dma_buf handle. This reference should then get dropped in the
> exporters dma_buf->ops->release() function, which is only getting called
> when the last reference to the dma_buf disappears.
>
> If this doesn't work like that currently, we have a bug, and exporting the
> reference count or something similar can't fix that.
>
> Yours, Daniel
>
> PS: Please cut down the original mail when replying, otherwise it's pretty
> hard to find your response ;-)

And also the importer needs to realise it doesn't own the pages in the
sg_table and when its freeing its backing memory it shouldn't free
those pages. So for GEM objects we have to keep track if we allocated
the pages or we got them from an dma buf.

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Daniel Vetter
On Sun, Jan 08, 2012 at 05:56:31PM -0500, Jerome Glisse wrote:
> On Sun, Jan 8, 2012 at 9:05 AM, Daniel Vetter  wrote:
> > Hi all,
> >
> > Meh, I've wanted to port the small set of helpers nouveau already has to
> > handle per open fd gpu virtual address spaces to core drm, so that I could
> > reuse them for i915. Just to go one small step towards unifying drivers in
> > drm/* a bit ...
> >
> > Looks like I'll have another driver to wrestle or just forget about it and
> > reinvet that wheel for i915, too.
> >
> > 
> >
> > Cheers, Daniel
> > --
> 
> I looked at nouveau before writing this code, thing is, in the end
> there is little common code especialy when you take different path on
> how you handle things (persistent or dynamic page table for instance).
> Thought couple things can still be share. Note that the whole radeon
> code is designed with the possibility of having several address space
> per process, thought there is no use for such things today we believe
> things like opencl+opengl can benefit of each having their own address
> space.

- I've realized when looking through nouveau that we likely can't share
  match more than a gem_bo->vma lookup plus a bunch of helper functions.

- Imo having more than one gpu virtual address space per fd doesn't make
  much sense. libdrm (at least for i915) is mostly just about issueing
  cmdbuffers. Hence it's probably easier to just open two fds and
  instantiate two libdrm buffer managers if you want two address spaces
  for otherwise you have to teach libdrm that the same buffer object still
  can have different addresses (which is pretty much against the point of
  gpu virtual address spaces).

I also realize that in the dri1 days there's been way too much common code
that only gets used by one or two drivers and hence isn't really commonly
useable at all (and also not really of decent quality). So I'm all in
favour for driver-specific stuff, especially for execution and memory
management. But:

- nouveau already has gpu virtual address spaces, radeon just grew them
  with this patch and i915 is on track to get them, too: Patches to enable
  the different hw addressing mode for Sandybridge and later are ready,
  and with Ivybridge hw engineers kinked out the remaining bugs so we can
  actually context-switch between different address spaces without hitting
  hw bugs.

- The more general picture is that with the advent of more general-purpose
  apis and usecases for gpus like opencl (or also background video
  encoding/decoding/transcoding with libva) users will want to control gpu
  resources. So I expect that we'll grow resource limits, schedulers with
  priorities and maybe also something like control groups in a few years.
  But if we don't put a bit of thought into the commonalities of things
  like gpu virtual address spaces, scheduling and similar things I fear we
  won't be able to create a sensible common interface to allocate and
  control resources in the feature. Which will result in a sub-par
  experience. 

But if my google-fu doesn't fail me gpu address spaces for radeon was
posted the first time as v22 ever on a public list and merged right away,
so there's been simply no time to discuss cross-driver issues.  Which is
part of why I'm slightly miffed ;-)

Cheers, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: TTM and AGP conflicts

2012-01-09 Thread Thomas Hellstrom

On 01/06/2012 04:51 PM, James Simmons wrote:
   

You can achieve what you want by either adding a new domain so you would have
system, vram, agp, pcidma and object can be bound to one and only one. Or you
can hack your own agp ttm backend that could bind bo to agp or pci or
both at the same time depending on what you want to achieve.
 

The question is how does one know which domain you want in tt_create.
Currenty drivers are using there dev_priv but if you have have more than
one option available how does one choose? Would you be okay with passing
in a domain flag?

   

Well i agree that something would be usefull there so the driver know
which bind/unbind function it should use. Thomas i would prefer
passing the bo to the tt_create callback but a flag is the minimum we
need.
 

We can discuss this after the merge widow. Jerome your patch does fix a
regression whereas my proposal is a enhancement.
   


..Back from parental leave.

I'm not 100% sure I understand the problem correctly, but I assume the 
problem is that
you receive a "bind" request for pages allocated out of the wrong DMA 
pool? Is that correct?


In that case we need to make both the following happen:
1) The backend should be required to supply a fallback that can bind 
anyway by allocating the correct page type and copy.
2) I'm OK with providing a hint to tt_create. IIRC we're passing bo::mem 
to tt_bind. Would it be ok to pass the same to tt_create?


/Thomas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC] Future TTM DMA direction

2012-01-09 Thread Thomas Hellstrom

Hi!

When TTM was originally written, it was assumed that GPU apertures could 
address pages directly, and that the CPU could access those pages 
without explicit synchronization. The process of binding a page to a GPU 
translation table was a simple one-step operation, and we needed to 
worry about fragmentation in the GPU aperture only.


Now that we "sort of" support DMA memory there are three things I think 
are missing:


1) We can't gracefully handle coherent DMA OOMs or coherent DMA 
(Including CMA) memory fragmentation leading to failed allocations.
2) We can't handle dynamic mapping of pages into and out of dma, and 
corresponding IOMMU space shortage or fragmentation, and CPU 
synchronization.

3) We have no straightforward way of moving pages between devices.

I think a reasonable way to support this is to make binding to a 
non-fixed (system page based) TTM memory type a two-step binding 
process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE) 
instead of only (MEMORY_TYPE).


In step 1) the bo is bound to a specific DMA type. These could be for 
example:
(NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be 
allowed as well.
In this step, we perform dma_sync_for_device, or allocate dma-specific 
pages maintaining LRU lists so that if we receive a DMA memory 
allocation OOM, we can unbind bo:s bound to the same DMA type. Standard 
graphics cards would then, for example, use the NONE DMA type when run 
on bare metal or COHERENT when run on Xen. A "COHERENT" OOM condition 
would then lead to eviction of another bo. (Note that DMA eviction might 
involve data copies and be costly, but still better than failing).
Binding with the DYNAMIC memory type would mean that CPU accesses are 
disallowed, and that user-space CPU page mappings might need to be 
killed, with a corresponding sync_for_cpu if they are faulted in again 
(perhaps on a page-by-page basis). Any attempt to bo_kmap() a bo page 
bound to DYNAMIC DMA mapping should trigger a BUG.


In step 2) The bo is bound to the GPU in the same way it's done today. 
Evicting from DMA will of course also trigger an evict from GPU, but an 
evict from GPU will not trigger a DMA evict.


Making a bo "anonymous" and thus moveable between devices would then 
mean binding it to the "NONE" DMA type.


Comments, suggestions?

/Thomas







___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] Future TTM DMA direction

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
> Hi!
> 
> When TTM was originally written, it was assumed that GPU apertures
> could address pages directly, and that the CPU could access those
> pages without explicit synchronization. The process of binding a
> page to a GPU translation table was a simple one-step operation, and
> we needed to worry about fragmentation in the GPU aperture only.
> 
> Now that we "sort of" support DMA memory there are three things I
> think are missing:
> 
> 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
> (Including CMA) memory fragmentation leading to failed allocations.
> 2) We can't handle dynamic mapping of pages into and out of dma, and
> corresponding IOMMU space shortage or fragmentation, and CPU
> synchronization.
> 3) We have no straightforward way of moving pages between devices.
> 
> I think a reasonable way to support this is to make binding to a
> non-fixed (system page based) TTM memory type a two-step binding
> process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
> instead of only (MEMORY_TYPE).
> 
> In step 1) the bo is bound to a specific DMA type. These could be
> for example:
> (NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
> allowed as well.
> In this step, we perform dma_sync_for_device, or allocate
> dma-specific pages maintaining LRU lists so that if we receive a DMA
> memory allocation OOM, we can unbind bo:s bound to the same DMA
> type. Standard graphics cards would then, for example, use the NONE
> DMA type when run on bare metal or COHERENT when run on Xen. A
> "COHERENT" OOM condition would then lead to eviction of another bo.
> (Note that DMA eviction might involve data copies and be costly, but
> still better than failing).
> Binding with the DYNAMIC memory type would mean that CPU accesses
> are disallowed, and that user-space CPU page mappings might need to
> be killed, with a corresponding sync_for_cpu if they are faulted in
> again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
> bo page bound to DYNAMIC DMA mapping should trigger a BUG.
> 
> In step 2) The bo is bound to the GPU in the same way it's done
> today. Evicting from DMA will of course also trigger an evict from
> GPU, but an evict from GPU will not trigger a DMA evict.
> 
> Making a bo "anonymous" and thus moveable between devices would then
> mean binding it to the "NONE" DMA type.
> 
> Comments, suggestions?

Well I think we need to solve outstanding issues in the dma_buf framework
first. Currently dma_buf isn't really up to par to handle coherency
between the cpu and devices and there's also not yet any way to handle dma
address space fragmentation/exhaustion.

I fear that if you jump ahead with improving the ttm support alone we
might end up with something incompatible to the stuff dma_buf eventually
will grow, resulting in decent amounts of wasted efforts.

Cc'ed a bunch of relevant lists to foster input from people.

For a starter you seem to want much more low-level integration with the
dma api than existing users commonly need. E.g. if I understand things
correctly drivers just call dma_alloc_coherent and the platform/board code
then decides whether the device needs a contigious allocation from cma or
whether something else is good, too (e.g. vmalloc for the cpu + iommu).
Another thing is that I think doing lru eviction in case of dma address
space exhaustion (or fragmentation) needs at least awereness of what's
going on in the upper layers. iommus are commonly shared between devices
and I presume that two ttm drivers sitting behind the same iommu and
fighting over it's resources can lead to some hilarious outcomes.

Cheers, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread InKi Dae
2012/1/9 Daniel Vetter :
> On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
>> I has test dmabuf based drm gem module for exynos and I found one problem.
>> you can refer to this test repository:
>> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
>>
>> at this repository, I added some exception codes for resource release
>> in addition to Dave's patch sets.
>>
>> let's suppose we use dmabuf based vb2 and drm gem with physically
>> continuous memory(no IOMMU) and we try to share allocated buffer
>> between them(v4l2 and drm driver).
>>
>> 1. request memory allocation through drm gem interface.
>> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
>> gem object.
>> - internally, private gem based dmabuf moudle calls drm_buf_export()
>> to register allocated gem object to fd.
>> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
>> buffer to v4l2 based device.
>> - internally, vb2 plug in module gets a buffer to the fd and then
>> calls dmabuf->ops->map_dmabuf() callback to get the sg table
>> containing physical memory info to the gem object. and then the
>> physical memory info would be copied to vb2_xx_buf object.
>> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
>> this repository:
>> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
>>
>> after that, if v4l2 driver want to release vb2_xx_buf object with
>> allocated memory region by user request, how should we do?. refcount
>> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
>> object is released videobuf2 framework don't know who is using the
>> physical memory region. so this physical memory region is released and
>> when drm driver tries to access the region or to release it also, a
>> problem would be induced.
>>
>> for this problem, I added get_shared_cnt() callback to dma-buf.h but
>> I'm not sure that this is good way. maybe there may be better way.
>> if there is any missing point, please let me know.
>
> The dma_buf object needs to hold a reference on the underlying
> (necessarily reference-counted) buffer object when the exporter creates
> the dma_buf handle. This reference should then get dropped in the
> exporters dma_buf->ops->release() function, which is only getting called
> when the last reference to the dma_buf disappears.
>

when the exporter creates the dma_buf handle(for example, gem -> fd),
I think the refcount of gem object should be increased at this point,
and decreased by dma_buf->ops->release() again because when the
dma_buf is created and dma_buf_export() is called, this dma_buf refers
to the gem object one time. and in case of inporter(fd -> gem),
file->f_count of the dma_buf is increased and then when this gem
object is released by user request such as drm close or
drn_gem_close_ioctl, dma_buf_put() should be called by
dma_buf->ops->detach() to decrease file->f_count again because the gem
object refers to the dma_buf. for this, you can refer to my test
repository I mentioned above. but the problem is that when a buffer is
released by one side, another can't know whether the buffer already
was released or not.
note : in case of sharing a buffer between v4l2 and drm driver, the
memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
vb2_xx_buf through sg table. in this case, only memory info is used to
share, not some objects.

> If this doesn't work like that currently, we have a bug, and exporting the
> reference count or something similar can't fix that.
>
> Yours, Daniel
>
> PS: Please cut down the original mail when replying, otherwise it's pretty
> hard to find your response ;-)

Ok, got it. thanks. :)

> --
> Daniel Vetter
> Mail: dan...@ffwll.ch
> Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 07:10:25PM +0900, InKi Dae wrote:
> 2012/1/9 Daniel Vetter :
> > On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
> >> I has test dmabuf based drm gem module for exynos and I found one problem.
> >> you can refer to this test repository:
> >> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
> >>
> >> at this repository, I added some exception codes for resource release
> >> in addition to Dave's patch sets.
> >>
> >> let's suppose we use dmabuf based vb2 and drm gem with physically
> >> continuous memory(no IOMMU) and we try to share allocated buffer
> >> between them(v4l2 and drm driver).
> >>
> >> 1. request memory allocation through drm gem interface.
> >> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
> >> gem object.
> >> - internally, private gem based dmabuf moudle calls drm_buf_export()
> >> to register allocated gem object to fd.
> >> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
> >> buffer to v4l2 based device.
> >> - internally, vb2 plug in module gets a buffer to the fd and then
> >> calls dmabuf->ops->map_dmabuf() callback to get the sg table
> >> containing physical memory info to the gem object. and then the
> >> physical memory info would be copied to vb2_xx_buf object.
> >> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
> >> this repository:
> >> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
> >>
> >> after that, if v4l2 driver want to release vb2_xx_buf object with
> >> allocated memory region by user request, how should we do?. refcount
> >> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
> >> object is released videobuf2 framework don't know who is using the
> >> physical memory region. so this physical memory region is released and
> >> when drm driver tries to access the region or to release it also, a
> >> problem would be induced.
> >>
> >> for this problem, I added get_shared_cnt() callback to dma-buf.h but
> >> I'm not sure that this is good way. maybe there may be better way.
> >> if there is any missing point, please let me know.
> >
> > The dma_buf object needs to hold a reference on the underlying
> > (necessarily reference-counted) buffer object when the exporter creates
> > the dma_buf handle. This reference should then get dropped in the
> > exporters dma_buf->ops->release() function, which is only getting called
> > when the last reference to the dma_buf disappears.
> >
> 
> when the exporter creates the dma_buf handle(for example, gem -> fd),
> I think the refcount of gem object should be increased at this point,
> and decreased by dma_buf->ops->release() again because when the
> dma_buf is created and dma_buf_export() is called, this dma_buf refers
> to the gem object one time. and in case of inporter(fd -> gem),
> file->f_count of the dma_buf is increased and then when this gem
> object is released by user request such as drm close or
> drn_gem_close_ioctl, dma_buf_put() should be called by
> dma_buf->ops->detach() to decrease file->f_count again because the gem
> object refers to the dma_buf. for this, you can refer to my test
> repository I mentioned above. but the problem is that when a buffer is
> released by one side, another can't know whether the buffer already
> was released or not.

Nope, dma_buf_put should not be called by ->detach. The importer gets his
reference when importing the dma_buf and needs to drop that reference
himself when it's done using the buffer by calling dma_buf_put (i.e. after
the last ->detach call). I think adding separate reference counting to
->attach and ->detach is a waste of time and only papers over buggy
importers.

Additionally the importer does _not_ control the lifetime of an dma_buf
object and it's underlying backing storage. It hence may _never_ free the
backing storage itself, that's the job of the exporter.

With that cleared up, referencing the exporters underlying buffer object
from the dma_buf will just do the right thing.

> note : in case of sharing a buffer between v4l2 and drm driver, the
> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
> vb2_xx_buf through sg table. in this case, only memory info is used to
> share, not some objects.

Hm, maybe I need to take a look at the currently proposed v4l dma_buf
patches ;-) atm I don't have an idea what exactly you're talking about.

Yours, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 44549] "Ring buffer test failed" on Radeon 5450

2012-01-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44549

Michel Dänzer  changed:

   What|Removed |Added

  Attachment #55245|0   |1
is obsolete||

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 44549] "Ring buffer test failed" on Radeon 5450

2012-01-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44549

Michel Dänzer  changed:

   What|Removed |Added

  Attachment #55247|0   |1
is obsolete||

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] Future TTM DMA direction

2012-01-09 Thread Thomas Hellstrom

On 01/09/2012 11:11 AM, Daniel Vetter wrote:

On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
   

Hi!

When TTM was originally written, it was assumed that GPU apertures
could address pages directly, and that the CPU could access those
pages without explicit synchronization. The process of binding a
page to a GPU translation table was a simple one-step operation, and
we needed to worry about fragmentation in the GPU aperture only.

Now that we "sort of" support DMA memory there are three things I
think are missing:

1) We can't gracefully handle coherent DMA OOMs or coherent DMA
(Including CMA) memory fragmentation leading to failed allocations.
2) We can't handle dynamic mapping of pages into and out of dma, and
corresponding IOMMU space shortage or fragmentation, and CPU
synchronization.
3) We have no straightforward way of moving pages between devices.

I think a reasonable way to support this is to make binding to a
non-fixed (system page based) TTM memory type a two-step binding
process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
instead of only (MEMORY_TYPE).

In step 1) the bo is bound to a specific DMA type. These could be
for example:
(NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
allowed as well.
In this step, we perform dma_sync_for_device, or allocate
dma-specific pages maintaining LRU lists so that if we receive a DMA
memory allocation OOM, we can unbind bo:s bound to the same DMA
type. Standard graphics cards would then, for example, use the NONE
DMA type when run on bare metal or COHERENT when run on Xen. A
"COHERENT" OOM condition would then lead to eviction of another bo.
(Note that DMA eviction might involve data copies and be costly, but
still better than failing).
Binding with the DYNAMIC memory type would mean that CPU accesses
are disallowed, and that user-space CPU page mappings might need to
be killed, with a corresponding sync_for_cpu if they are faulted in
again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
bo page bound to DYNAMIC DMA mapping should trigger a BUG.

In step 2) The bo is bound to the GPU in the same way it's done
today. Evicting from DMA will of course also trigger an evict from
GPU, but an evict from GPU will not trigger a DMA evict.

Making a bo "anonymous" and thus moveable between devices would then
mean binding it to the "NONE" DMA type.

Comments, suggestions?
 

Well I think we need to solve outstanding issues in the dma_buf framework
first. Currently dma_buf isn't really up to par to handle coherency
between the cpu and devices and there's also not yet any way to handle dma
address space fragmentation/exhaustion.

I fear that if you jump ahead with improving the ttm support alone we
might end up with something incompatible to the stuff dma_buf eventually
will grow, resulting in decent amounts of wasted efforts.

Cc'ed a bunch of relevant lists to foster input from people.
   


Daniel,

Thanks for your input. I think this is mostly orthogonal to dma_buf, and
really a way to adapt TTM to be DMA-api aware. That's currently done
within the TTM backends. CMA was mearly included as an example that
might not be relevant.

I haven't followed dma_buf that closely lately, but if it's growing from 
being just
a way to share buffer objects between devices to something providing 
also low-level

allocators with fragmentation prevention, there's definitely an overlap.
However, on the dma_buf meeting in Budapest there seemed to be little or 
no interest
in robust buffer allocation / fragmentation prevention although I 
remember bringing

it up to the point where I felt annoying :).


For a starter you seem to want much more low-level integration with the
dma api than existing users commonly need. E.g. if I understand things
correctly drivers just call dma_alloc_coherent and the platform/board code
then decides whether the device needs a contigious allocation from cma or
whether something else is good, too (e.g. vmalloc for the cpu + iommu).
Another thing is that I think doing lru eviction in case of dma address
space exhaustion (or fragmentation) needs at least awereness of what's
going on in the upper layers. iommus are commonly shared between devices
and I presume that two ttm drivers sitting behind the same iommu and
fighting over it's resources can lead to some hilarious outcomes.
   


A good point, I didn't think of that.

For TTM drivers sharing the same IOMMU it's really possible to make such 
LRU global,

(assuming IOMMU identity is available to the TTM-aware drivers), but unless
fragmentation prevention the way we use it for graphics drivers 
(allocate - submit - fence) ends
up in the IOMMU space management code, it's impossible to make this 
scheme system-wide.



Cheers, Daniel
   


Thanks,

/Thomas
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread InKi Dae
2012/1/9 Daniel Vetter :
> On Mon, Jan 09, 2012 at 07:10:25PM +0900, InKi Dae wrote:
>> 2012/1/9 Daniel Vetter :
>> > On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
>> >> I has test dmabuf based drm gem module for exynos and I found one problem.
>> >> you can refer to this test repository:
>> >> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
>> >>
>> >> at this repository, I added some exception codes for resource release
>> >> in addition to Dave's patch sets.
>> >>
>> >> let's suppose we use dmabuf based vb2 and drm gem with physically
>> >> continuous memory(no IOMMU) and we try to share allocated buffer
>> >> between them(v4l2 and drm driver).
>> >>
>> >> 1. request memory allocation through drm gem interface.
>> >> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
>> >> gem object.
>> >> - internally, private gem based dmabuf moudle calls drm_buf_export()
>> >> to register allocated gem object to fd.
>> >> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
>> >> buffer to v4l2 based device.
>> >> - internally, vb2 plug in module gets a buffer to the fd and then
>> >> calls dmabuf->ops->map_dmabuf() callback to get the sg table
>> >> containing physical memory info to the gem object. and then the
>> >> physical memory info would be copied to vb2_xx_buf object.
>> >> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
>> >> this repository:
>> >> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
>> >>
>> >> after that, if v4l2 driver want to release vb2_xx_buf object with
>> >> allocated memory region by user request, how should we do?. refcount
>> >> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
>> >> object is released videobuf2 framework don't know who is using the
>> >> physical memory region. so this physical memory region is released and
>> >> when drm driver tries to access the region or to release it also, a
>> >> problem would be induced.
>> >>
>> >> for this problem, I added get_shared_cnt() callback to dma-buf.h but
>> >> I'm not sure that this is good way. maybe there may be better way.
>> >> if there is any missing point, please let me know.
>> >
>> > The dma_buf object needs to hold a reference on the underlying
>> > (necessarily reference-counted) buffer object when the exporter creates
>> > the dma_buf handle. This reference should then get dropped in the
>> > exporters dma_buf->ops->release() function, which is only getting called
>> > when the last reference to the dma_buf disappears.
>> >
>>
>> when the exporter creates the dma_buf handle(for example, gem -> fd),
>> I think the refcount of gem object should be increased at this point,
>> and decreased by dma_buf->ops->release() again because when the
>> dma_buf is created and dma_buf_export() is called, this dma_buf refers
>> to the gem object one time. and in case of inporter(fd -> gem),
>> file->f_count of the dma_buf is increased and then when this gem
>> object is released by user request such as drm close or
>> drn_gem_close_ioctl, dma_buf_put() should be called by
>> dma_buf->ops->detach() to decrease file->f_count again because the gem
>> object refers to the dma_buf. for this, you can refer to my test
>> repository I mentioned above. but the problem is that when a buffer is
>> released by one side, another can't know whether the buffer already
>> was released or not.
>
> Nope, dma_buf_put should not be called by ->detach. The importer gets his
> reference when importing the dma_buf and needs to drop that reference
> himself when it's done using the buffer by calling dma_buf_put (i.e. after
> the last ->detach call).

I'm afraid that there may be my missing points. I'm confusing. who is
Importer and who is Exporter you think? Importer is fd goes to private
buffer and Exporter is private buffer goes to fd? if so, yes, when the
importer needs to drop that reference(the importer want to release
that buffer), dma_buf_put() should be called somewhere and in my case,
that function is called by drm_prime_gem_destory(). this function is
included at Dave's patch sets and also dma_buf_detatch() is called
there. and I just thought that here is right place. I didn't find the
place dma_buf_put() is called anywhere. could you please tell me where
dma_buf_put() should be called at you think?.

for this, you can refer to Dave's repository:
http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

> I think adding separate reference counting to
> ->attach and ->detach is a waste of time and only papers over buggy
> importers.
>

 I mean  when fd goes to private buffer, that reference(file->f_count)
would be increased by dma_buf_get() and  only ->detach is used to drop
that reference.

> Additionally the importer does _not_ control the lifetime of an dma_buf
> object and it's underlying backing storage. It hence may _never_ free the
> backing storage itself, that's the job of the exporter.
>

yes,

[patch 1/2] drm/radeon: use after free in radeon_vm_bo_add()

2012-01-09 Thread Dan Carpenter
"bo_va" is dereferenced in the error message.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index 3ef58ca..2944c78 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -486,10 +486,10 @@ int radeon_vm_bo_add(struct radeon_device *rdev,
}
if (bo_va->soffset >= tmp->soffset && bo_va->soffset < 
tmp->eoffset) {
/* bo and tmp overlap, invalid offset */
-   kfree(bo_va);
dev_err(rdev->dev, "bo %p va 0x%08X conflict with (bo 
%p 0x%08X 0x%08X)\n",
bo, (unsigned)bo_va->soffset, tmp->bo,
(unsigned)tmp->soffset, (unsigned)tmp->eoffset);
+   kfree(bo_va);
mutex_unlock(&vm->mutex);
return -EINVAL;
}
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[patch 2/2] drm/radeon: double lock typo in radeon_vm_bo_rmv()

2012-01-09 Thread Dan Carpenter
The second lock should be an unlock or it causes a deadlock.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index 2944c78..4a9f797 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -595,7 +595,7 @@ int radeon_vm_bo_rmv(struct radeon_device *rdev,
radeon_vm_bo_update_pte(rdev, vm, bo, NULL);
radeon_mutex_unlock(&rdev->cs_mutex);
list_del(&bo_va->vm_list);
-   mutex_lock(&vm->mutex);
+   mutex_unlock(&vm->mutex);
 
kfree(bo_va);
return 0;
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [patch 1/2] drm/radeon: use after free in radeon_vm_bo_add()

2012-01-09 Thread Alex Deucher
On Mon, Jan 9, 2012 at 7:44 AM, Dan Carpenter  wrote:
> "bo_va" is dereferenced in the error message.
>
> Signed-off-by: Dan Carpenter 

Reviewed-by: Alex Deucher 

>
> diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
> b/drivers/gpu/drm/radeon/radeon_gart.c
> index 3ef58ca..2944c78 100644
> --- a/drivers/gpu/drm/radeon/radeon_gart.c
> +++ b/drivers/gpu/drm/radeon/radeon_gart.c
> @@ -486,10 +486,10 @@ int radeon_vm_bo_add(struct radeon_device *rdev,
>                }
>                if (bo_va->soffset >= tmp->soffset && bo_va->soffset < 
> tmp->eoffset) {
>                        /* bo and tmp overlap, invalid offset */
> -                       kfree(bo_va);
>                        dev_err(rdev->dev, "bo %p va 0x%08X conflict with (bo 
> %p 0x%08X 0x%08X)\n",
>                                bo, (unsigned)bo_va->soffset, tmp->bo,
>                                (unsigned)tmp->soffset, 
> (unsigned)tmp->eoffset);
> +                       kfree(bo_va);
>                        mutex_unlock(&vm->mutex);
>                        return -EINVAL;
>                }
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [patch 2/2] drm/radeon: double lock typo in radeon_vm_bo_rmv()

2012-01-09 Thread Alex Deucher
On Mon, Jan 9, 2012 at 7:45 AM, Dan Carpenter  wrote:
> The second lock should be an unlock or it causes a deadlock.
>
> Signed-off-by: Dan Carpenter 

Reviewed-by: Alex Deucher 

>
> diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
> b/drivers/gpu/drm/radeon/radeon_gart.c
> index 2944c78..4a9f797 100644
> --- a/drivers/gpu/drm/radeon/radeon_gart.c
> +++ b/drivers/gpu/drm/radeon/radeon_gart.c
> @@ -595,7 +595,7 @@ int radeon_vm_bo_rmv(struct radeon_device *rdev,
>        radeon_vm_bo_update_pte(rdev, vm, bo, NULL);
>        radeon_mutex_unlock(&rdev->cs_mutex);
>        list_del(&bo_va->vm_list);
> -       mutex_lock(&vm->mutex);
> +       mutex_unlock(&vm->mutex);
>
>        kfree(bo_va);
>        return 0;
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Rob Clark
On Mon, Jan 9, 2012 at 4:10 AM, InKi Dae  wrote:
> note : in case of sharing a buffer between v4l2 and drm driver, the
> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
> vb2_xx_buf through sg table. in this case, only memory info is used to
> share, not some objects.

which v4l2/vb2 patches are you looking at?  The patches I was using,
vb2 holds a reference to the 'struct dma_buf *' internally, not just
keeping the sg_table

BR,
-R
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: TTM and AGP conflicts

2012-01-09 Thread Jerome Glisse
On Mon, Jan 09, 2012 at 10:07:02AM +0100, Thomas Hellstrom wrote:
> On 01/06/2012 04:51 PM, James Simmons wrote:
> You can achieve what you want by either adding a new domain so you would 
> have
> system, vram, agp, pcidma and object can be bound to one and only one. Or 
> you
> can hack your own agp ttm backend that could bind bo to agp or pci or
> both at the same time depending on what you want to achieve.
> >>>The question is how does one know which domain you want in tt_create.
> >>>Currenty drivers are using there dev_priv but if you have have more than
> >>>one option available how does one choose? Would you be okay with passing
> >>>in a domain flag?
> >>>
> >>Well i agree that something would be usefull there so the driver know
> >>which bind/unbind function it should use. Thomas i would prefer
> >>passing the bo to the tt_create callback but a flag is the minimum we
> >>need.
> >We can discuss this after the merge widow. Jerome your patch does fix a
> >regression whereas my proposal is a enhancement.
> 
> ..Back from parental leave.
> 
> I'm not 100% sure I understand the problem correctly, but I assume
> the problem is that
> you receive a "bind" request for pages allocated out of the wrong
> DMA pool? Is that correct?
> 
> In that case we need to make both the following happen:
> 1) The backend should be required to supply a fallback that can bind
> anyway by allocating the correct page type and copy.
> 2) I'm OK with providing a hint to tt_create. IIRC we're passing
> bo::mem to tt_bind. Would it be ok to pass the same to tt_create?
> 
> /Thomas
> 

Issue here is that different backend (AGP or PCI DMA or someother
communication way btw GPU and system memory) needs to use different
page allocator/backend. Thus tt_create need either to know about the
bo or at least about bo:mem.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/radeon/kms: add support for streamout

2012-01-09 Thread Jerome Glisse
On Thu, Jan 05, 2012 at 10:21:31PM -0500, Alex Deucher wrote:
> On Thu, Jan 5, 2012 at 9:30 PM, Marek Olšák  wrote:
> > People can start using transform feedback on r6xx with this.
> >
> > Strict CS checking will be implemented later.

We really need to have checking in the same patch, otherwise someone can
claim that the non checking property of what you introduce is part of
the API and thus we can't change it. I know it's painful.

Cheers,
Jerome

> >
> > Signed-off-by: Marek Olšák 
> > ---
> >  drivers/gpu/drm/radeon/evergreen_cs.c     |  104 
> > +++--
> >  drivers/gpu/drm/radeon/evergreend.h       |   10 +++
> >  drivers/gpu/drm/radeon/r600_cs.c          |  105 
> > +++--
> >  drivers/gpu/drm/radeon/r600d.h            |    6 ++
> >  drivers/gpu/drm/radeon/radeon_drv.c       |    3 +-
> >  drivers/gpu/drm/radeon/reg_srcs/cayman    |   10 +++
> >  drivers/gpu/drm/radeon/reg_srcs/evergreen |   10 +++
> >  drivers/gpu/drm/radeon/reg_srcs/r600      |   10 +++
> >  8 files changed, 246 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
> > b/drivers/gpu/drm/radeon/evergreen_cs.c
> > index cd4590a..3150489 100644
> > --- a/drivers/gpu/drm/radeon/evergreen_cs.c
> > +++ b/drivers/gpu/drm/radeon/evergreen_cs.c
> > @@ -60,6 +60,10 @@ struct evergreen_cs_track {
> >        u32                     cb_shader_mask;
> >        u32                     vgt_strmout_config;
> >        u32                     vgt_strmout_buffer_config;
> > +       struct radeon_bo        *vgt_strmout_bo[4];
> > +       u64                     vgt_strmout_bo_mc[4];
> > +       u32                     vgt_strmout_bo_offset[4];
> > +       u32                     vgt_strmout_size[4];
> >        u32                     db_depth_control;
> >        u32                     db_depth_view;
> >        u32                     db_depth_size;
> > @@ -159,18 +163,19 @@ static void evergreen_cs_track_init(struct 
> > evergreen_cs_track *track)
> >        track->db_s_write_offset = 0x;
> >        track->db_s_read_bo = NULL;
> >        track->db_s_write_bo = NULL;
> > +
> > +       for (i = 0; i < 4; i++) {
> > +               track->vgt_strmout_size[i] = 0;
> > +               track->vgt_strmout_bo[i] = NULL;
> > +               track->vgt_strmout_bo_offset[i] = 0x;
> > +               track->vgt_strmout_bo_mc[i] = 0x;
> > +       }
> >  }
> >
> >  static int evergreen_cs_track_check(struct radeon_cs_parser *p)
> >  {
> >        struct evergreen_cs_track *track = p->track;
> >
> > -       /* we don't support stream out buffer yet */
> > -       if (track->vgt_strmout_config || track->vgt_strmout_buffer_config) {
> > -               dev_warn(p->dev, "this kernel doesn't support SMX output 
> > buffer\n");
> > -               return -EINVAL;
> > -       }
> > -
> >        /* XXX fill in */
> >        return 0;
> >  }
> > @@ -597,6 +602,37 @@ static int evergreen_cs_check_reg(struct 
> > radeon_cs_parser *p, u32 reg, u32 idx)
> >        case VGT_STRMOUT_BUFFER_CONFIG:
> >                track->vgt_strmout_buffer_config = radeon_get_ib_value(p, 
> > idx);
> >                break;
> > +       case VGT_STRMOUT_BUFFER_BASE_0:
> > +       case VGT_STRMOUT_BUFFER_BASE_1:
> > +       case VGT_STRMOUT_BUFFER_BASE_2:
> > +       case VGT_STRMOUT_BUFFER_BASE_3:
> > +               r = evergreen_cs_packet_next_reloc(p, &reloc);
> > +               if (r) {
> > +                       dev_warn(p->dev, "bad SET_CONTEXT_REG "
> > +                                       "0x%04X\n", reg);
> > +                       return -EINVAL;
> > +               }
> > +               tmp = (reg - VGT_STRMOUT_BUFFER_BASE_0) / 16;
> > +               track->vgt_strmout_bo_offset[tmp] = radeon_get_ib_value(p, 
> > idx) << 8;
> > +               ib[idx] += (u32)((reloc->lobj.gpu_offset >> 8) & 
> > 0x);
> > +               track->vgt_strmout_bo[tmp] = reloc->robj;
> > +               track->vgt_strmout_bo_mc[tmp] = reloc->lobj.gpu_offset;
> > +               break;
> > +       case VGT_STRMOUT_BUFFER_SIZE_0:
> > +       case VGT_STRMOUT_BUFFER_SIZE_1:
> > +       case VGT_STRMOUT_BUFFER_SIZE_2:
> > +       case VGT_STRMOUT_BUFFER_SIZE_3:
> > +               tmp = (reg - VGT_STRMOUT_BUFFER_SIZE_0) / 16;
> > +               track->vgt_strmout_size[tmp] = radeon_get_ib_value(p, idx);
> > +               break;
> > +       case CP_COHER_BASE:
> > +               r = evergreen_cs_packet_next_reloc(p, &reloc);
> > +               if (r) {
> > +                       dev_warn(p->dev, "missing reloc for CP_COHER_BASE "
> > +                                       "0x%04X\n", reg);
> > +                       return -EINVAL;
> > +               }
> > +               ib[idx] += (u32)((reloc->lobj.gpu_offset >> 8) & 
> > 0x);
> >        case CB_TARGET_MASK:
> >                track->cb_target_mask = radeon_get_ib_value(p, idx);
> >                break;
> > @@ -1451,6 +14

Re: [PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Jerome Glisse
On Mon, Jan 09, 2012 at 09:31:16AM +0100, Daniel Vetter wrote:
> On Sun, Jan 08, 2012 at 05:56:31PM -0500, Jerome Glisse wrote:
> > On Sun, Jan 8, 2012 at 9:05 AM, Daniel Vetter  wrote:
> > > Hi all,
> > >
> > > Meh, I've wanted to port the small set of helpers nouveau already has to
> > > handle per open fd gpu virtual address spaces to core drm, so that I could
> > > reuse them for i915. Just to go one small step towards unifying drivers in
> > > drm/* a bit ...
> > >
> > > Looks like I'll have another driver to wrestle or just forget about it and
> > > reinvet that wheel for i915, too.
> > >
> > > 
> > >
> > > Cheers, Daniel
> > > --
> > 
> > I looked at nouveau before writing this code, thing is, in the end
> > there is little common code especialy when you take different path on
> > how you handle things (persistent or dynamic page table for instance).
> > Thought couple things can still be share. Note that the whole radeon
> > code is designed with the possibility of having several address space
> > per process, thought there is no use for such things today we believe
> > things like opencl+opengl can benefit of each having their own address
> > space.
> 
> - I've realized when looking through nouveau that we likely can't share
>   match more than a gem_bo->vma lookup plus a bunch of helper functions.
> 
> - Imo having more than one gpu virtual address space per fd doesn't make
>   much sense. libdrm (at least for i915) is mostly just about issueing
>   cmdbuffers. Hence it's probably easier to just open two fds and
>   instantiate two libdrm buffer managers if you want two address spaces
>   for otherwise you have to teach libdrm that the same buffer object still
>   can have different addresses (which is pretty much against the point of
>   gpu virtual address spaces).

Radeon barely use libdrm (only the ddx use part of it).

> 
> I also realize that in the dri1 days there's been way too much common code
> that only gets used by one or two drivers and hence isn't really commonly
> useable at all (and also not really of decent quality). So I'm all in
> favour for driver-specific stuff, especially for execution and memory
> management. But:
> 
> - nouveau already has gpu virtual address spaces, radeon just grew them
>   with this patch and i915 is on track to get them, too: Patches to enable
>   the different hw addressing mode for Sandybridge and later are ready,
>   and with Ivybridge hw engineers kinked out the remaining bugs so we can
>   actually context-switch between different address spaces without hitting
>   hw bugs.
> 
> - The more general picture is that with the advent of more general-purpose
>   apis and usecases for gpus like opencl (or also background video
>   encoding/decoding/transcoding with libva) users will want to control gpu
>   resources. So I expect that we'll grow resource limits, schedulers with
>   priorities and maybe also something like control groups in a few years.
>   But if we don't put a bit of thought into the commonalities of things
>   like gpu virtual address spaces, scheduling and similar things I fear we
>   won't be able to create a sensible common interface to allocate and
>   control resources in the feature. Which will result in a sub-par
>   experience. 

I wish we could come up with a common API for different GPU but the fact
is all kernel API we exposed so far is specific to each GPU beside
modesetting. Anythings that use the processing power of GPU use
dedicated API.

For instance radeon virtual address give control to userspace, ie userspace
decide the virtual address space at which each bo will be. On the contrary
nouveau do the allocation in the kernel. We did choose userspace to be in
charge for few reasons (top of my head) :
- allow to have 1:1 mapping btw cpu address space and gpu without having
  playing with process vm in the kernel
- allow easy command stream replay (no need to rewrite the command
  stream because virtual address are different)

Scheduling is next big things coming up, with lastest gen of gpu growing
the capacity to preempt gpu task and also offering a finer control over
the GPU cores. Again here i fear that we will each grow our own API.

My believe is that as long as we expose a low level API to use the GPU
there is no way we can provide a common API for things like scheduling
or virtual address space. The way you submit command is tightly related
to scheduling and virtual address stuff.

> But if my google-fu doesn't fail me gpu address spaces for radeon was
> posted the first time as v22 ever on a public list and merged right away,
> so there's been simply no time to discuss cross-driver issues.  Which is
> part of why I'm slightly miffed ;-)
> 

I too wish that we could have release this earlier. I guess it was included
because we still managed to get enough eyes familiar with radeon to look and
play with this code.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.fre

Re: [PATCH] drm/radeon/kms: add support for streamout

2012-01-09 Thread Marek Olšák
You're both right. I wanted to put this patch out so that somebody
else could try this stuff out. I'll finish it when time allows.

Marek

On Mon, Jan 9, 2012 at 4:26 PM, Jerome Glisse  wrote:
> On Thu, Jan 05, 2012 at 10:21:31PM -0500, Alex Deucher wrote:
>> On Thu, Jan 5, 2012 at 9:30 PM, Marek Olšák  wrote:
>> > People can start using transform feedback on r6xx with this.
>> >
>> > Strict CS checking will be implemented later.
>
> We really need to have checking in the same patch, otherwise someone can
> claim that the non checking property of what you introduce is part of
> the API and thus we can't change it. I know it's painful.
>
> Cheers,
> Jerome
>
>> >
>> > Signed-off-by: Marek Olšák 
>> > ---
>> >  drivers/gpu/drm/radeon/evergreen_cs.c     |  104 
>> > +++--
>> >  drivers/gpu/drm/radeon/evergreend.h       |   10 +++
>> >  drivers/gpu/drm/radeon/r600_cs.c          |  105 
>> > +++--
>> >  drivers/gpu/drm/radeon/r600d.h            |    6 ++
>> >  drivers/gpu/drm/radeon/radeon_drv.c       |    3 +-
>> >  drivers/gpu/drm/radeon/reg_srcs/cayman    |   10 +++
>> >  drivers/gpu/drm/radeon/reg_srcs/evergreen |   10 +++
>> >  drivers/gpu/drm/radeon/reg_srcs/r600      |   10 +++
>> >  8 files changed, 246 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
>> > b/drivers/gpu/drm/radeon/evergreen_cs.c
>> > index cd4590a..3150489 100644
>> > --- a/drivers/gpu/drm/radeon/evergreen_cs.c
>> > +++ b/drivers/gpu/drm/radeon/evergreen_cs.c
>> > @@ -60,6 +60,10 @@ struct evergreen_cs_track {
>> >        u32                     cb_shader_mask;
>> >        u32                     vgt_strmout_config;
>> >        u32                     vgt_strmout_buffer_config;
>> > +       struct radeon_bo        *vgt_strmout_bo[4];
>> > +       u64                     vgt_strmout_bo_mc[4];
>> > +       u32                     vgt_strmout_bo_offset[4];
>> > +       u32                     vgt_strmout_size[4];
>> >        u32                     db_depth_control;
>> >        u32                     db_depth_view;
>> >        u32                     db_depth_size;
>> > @@ -159,18 +163,19 @@ static void evergreen_cs_track_init(struct 
>> > evergreen_cs_track *track)
>> >        track->db_s_write_offset = 0x;
>> >        track->db_s_read_bo = NULL;
>> >        track->db_s_write_bo = NULL;
>> > +
>> > +       for (i = 0; i < 4; i++) {
>> > +               track->vgt_strmout_size[i] = 0;
>> > +               track->vgt_strmout_bo[i] = NULL;
>> > +               track->vgt_strmout_bo_offset[i] = 0x;
>> > +               track->vgt_strmout_bo_mc[i] = 0x;
>> > +       }
>> >  }
>> >
>> >  static int evergreen_cs_track_check(struct radeon_cs_parser *p)
>> >  {
>> >        struct evergreen_cs_track *track = p->track;
>> >
>> > -       /* we don't support stream out buffer yet */
>> > -       if (track->vgt_strmout_config || track->vgt_strmout_buffer_config) 
>> > {
>> > -               dev_warn(p->dev, "this kernel doesn't support SMX output 
>> > buffer\n");
>> > -               return -EINVAL;
>> > -       }
>> > -
>> >        /* XXX fill in */
>> >        return 0;
>> >  }
>> > @@ -597,6 +602,37 @@ static int evergreen_cs_check_reg(struct 
>> > radeon_cs_parser *p, u32 reg, u32 idx)
>> >        case VGT_STRMOUT_BUFFER_CONFIG:
>> >                track->vgt_strmout_buffer_config = radeon_get_ib_value(p, 
>> > idx);
>> >                break;
>> > +       case VGT_STRMOUT_BUFFER_BASE_0:
>> > +       case VGT_STRMOUT_BUFFER_BASE_1:
>> > +       case VGT_STRMOUT_BUFFER_BASE_2:
>> > +       case VGT_STRMOUT_BUFFER_BASE_3:
>> > +               r = evergreen_cs_packet_next_reloc(p, &reloc);
>> > +               if (r) {
>> > +                       dev_warn(p->dev, "bad SET_CONTEXT_REG "
>> > +                                       "0x%04X\n", reg);
>> > +                       return -EINVAL;
>> > +               }
>> > +               tmp = (reg - VGT_STRMOUT_BUFFER_BASE_0) / 16;
>> > +               track->vgt_strmout_bo_offset[tmp] = radeon_get_ib_value(p, 
>> > idx) << 8;
>> > +               ib[idx] += (u32)((reloc->lobj.gpu_offset >> 8) & 
>> > 0x);
>> > +               track->vgt_strmout_bo[tmp] = reloc->robj;
>> > +               track->vgt_strmout_bo_mc[tmp] = reloc->lobj.gpu_offset;
>> > +               break;
>> > +       case VGT_STRMOUT_BUFFER_SIZE_0:
>> > +       case VGT_STRMOUT_BUFFER_SIZE_1:
>> > +       case VGT_STRMOUT_BUFFER_SIZE_2:
>> > +       case VGT_STRMOUT_BUFFER_SIZE_3:
>> > +               tmp = (reg - VGT_STRMOUT_BUFFER_SIZE_0) / 16;
>> > +               track->vgt_strmout_size[tmp] = radeon_get_ib_value(p, idx);
>> > +               break;
>> > +       case CP_COHER_BASE:
>> > +               r = evergreen_cs_packet_next_reloc(p, &reloc);
>> > +               if (r) {
>> > +                       dev_warn(p->dev, "missing reloc for CP_COHER_BASE "
>> > +                                       "0x%04

Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 09:06:56PM +0900, InKi Dae wrote:
> 2012/1/9 Daniel Vetter :
> > On Mon, Jan 09, 2012 at 07:10:25PM +0900, InKi Dae wrote:
> >> 2012/1/9 Daniel Vetter :
> >> > On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
> >> >> I has test dmabuf based drm gem module for exynos and I found one 
> >> >> problem.
> >> >> you can refer to this test repository:
> >> >> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
> >> >>
> >> >> at this repository, I added some exception codes for resource release
> >> >> in addition to Dave's patch sets.
> >> >>
> >> >> let's suppose we use dmabuf based vb2 and drm gem with physically
> >> >> continuous memory(no IOMMU) and we try to share allocated buffer
> >> >> between them(v4l2 and drm driver).
> >> >>
> >> >> 1. request memory allocation through drm gem interface.
> >> >> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
> >> >> gem object.
> >> >> - internally, private gem based dmabuf moudle calls drm_buf_export()
> >> >> to register allocated gem object to fd.
> >> >> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
> >> >> buffer to v4l2 based device.
> >> >> - internally, vb2 plug in module gets a buffer to the fd and then
> >> >> calls dmabuf->ops->map_dmabuf() callback to get the sg table
> >> >> containing physical memory info to the gem object. and then the
> >> >> physical memory info would be copied to vb2_xx_buf object.
> >> >> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
> >> >> this repository:
> >> >> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
> >> >>
> >> >> after that, if v4l2 driver want to release vb2_xx_buf object with
> >> >> allocated memory region by user request, how should we do?. refcount
> >> >> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
> >> >> object is released videobuf2 framework don't know who is using the
> >> >> physical memory region. so this physical memory region is released and
> >> >> when drm driver tries to access the region or to release it also, a
> >> >> problem would be induced.
> >> >>
> >> >> for this problem, I added get_shared_cnt() callback to dma-buf.h but
> >> >> I'm not sure that this is good way. maybe there may be better way.
> >> >> if there is any missing point, please let me know.
> >> >
> >> > The dma_buf object needs to hold a reference on the underlying
> >> > (necessarily reference-counted) buffer object when the exporter creates
> >> > the dma_buf handle. This reference should then get dropped in the
> >> > exporters dma_buf->ops->release() function, which is only getting called
> >> > when the last reference to the dma_buf disappears.
> >> >
> >>
> >> when the exporter creates the dma_buf handle(for example, gem -> fd),
> >> I think the refcount of gem object should be increased at this point,
> >> and decreased by dma_buf->ops->release() again because when the
> >> dma_buf is created and dma_buf_export() is called, this dma_buf refers
> >> to the gem object one time. and in case of inporter(fd -> gem),
> >> file->f_count of the dma_buf is increased and then when this gem
> >> object is released by user request such as drm close or
> >> drn_gem_close_ioctl, dma_buf_put() should be called by
> >> dma_buf->ops->detach() to decrease file->f_count again because the gem
> >> object refers to the dma_buf. for this, you can refer to my test
> >> repository I mentioned above. but the problem is that when a buffer is
> >> released by one side, another can't know whether the buffer already
> >> was released or not.
> >
> > Nope, dma_buf_put should not be called by ->detach. The importer gets his
> > reference when importing the dma_buf and needs to drop that reference
> > himself when it's done using the buffer by calling dma_buf_put (i.e. after
> > the last ->detach call).
> 
> I'm afraid that there may be my missing points. I'm confusing. who is
> Importer and who is Exporter you think? Importer is fd goes to private
> buffer and Exporter is private buffer goes to fd? if so, yes, when the
> importer needs to drop that reference(the importer want to release
> that buffer), dma_buf_put() should be called somewhere and in my case,
> that function is called by drm_prime_gem_destory(). this function is
> included at Dave's patch sets and also dma_buf_detatch() is called
> there. and I just thought that here is right place. I didn't find the
> place dma_buf_put() is called anywhere. could you please tell me where
> dma_buf_put() should be called at you think?.
> 
> for this, you can refer to Dave's repository:
> http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

I haven't really looked at Dave's latest prime patches, but he reported
some reference counting issues last time around we chatted about it on
irc. So maybe you're just right and the dma_buf_put is indeed missing from
drm_prime_gem_destroy ;-) But as I've said, haven't really re

Re: [PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Alex Deucher
On Mon, Jan 9, 2012 at 10:44 AM, Jerome Glisse  wrote:
> On Mon, Jan 09, 2012 at 09:31:16AM +0100, Daniel Vetter wrote:
>> On Sun, Jan 08, 2012 at 05:56:31PM -0500, Jerome Glisse wrote:
>> > On Sun, Jan 8, 2012 at 9:05 AM, Daniel Vetter  wrote:
>> > > Hi all,
>> > >
>> > > Meh, I've wanted to port the small set of helpers nouveau already has to
>> > > handle per open fd gpu virtual address spaces to core drm, so that I 
>> > > could
>> > > reuse them for i915. Just to go one small step towards unifying drivers 
>> > > in
>> > > drm/* a bit ...
>> > >
>> > > Looks like I'll have another driver to wrestle or just forget about it 
>> > > and
>> > > reinvet that wheel for i915, too.
>> > >
>> > > 
>> > >
>> > > Cheers, Daniel
>> > > --
>> >
>> > I looked at nouveau before writing this code, thing is, in the end
>> > there is little common code especialy when you take different path on
>> > how you handle things (persistent or dynamic page table for instance).
>> > Thought couple things can still be share. Note that the whole radeon
>> > code is designed with the possibility of having several address space
>> > per process, thought there is no use for such things today we believe
>> > things like opencl+opengl can benefit of each having their own address
>> > space.
>>
>> - I've realized when looking through nouveau that we likely can't share
>>   match more than a gem_bo->vma lookup plus a bunch of helper functions.
>>
>> - Imo having more than one gpu virtual address space per fd doesn't make
>>   much sense. libdrm (at least for i915) is mostly just about issueing
>>   cmdbuffers. Hence it's probably easier to just open two fds and
>>   instantiate two libdrm buffer managers if you want two address spaces
>>   for otherwise you have to teach libdrm that the same buffer object still
>>   can have different addresses (which is pretty much against the point of
>>   gpu virtual address spaces).
>
> Radeon barely use libdrm (only the ddx use part of it).
>
>>
>> I also realize that in the dri1 days there's been way too much common code
>> that only gets used by one or two drivers and hence isn't really commonly
>> useable at all (and also not really of decent quality). So I'm all in
>> favour for driver-specific stuff, especially for execution and memory
>> management. But:
>>
>> - nouveau already has gpu virtual address spaces, radeon just grew them
>>   with this patch and i915 is on track to get them, too: Patches to enable
>>   the different hw addressing mode for Sandybridge and later are ready,
>>   and with Ivybridge hw engineers kinked out the remaining bugs so we can
>>   actually context-switch between different address spaces without hitting
>>   hw bugs.
>>
>> - The more general picture is that with the advent of more general-purpose
>>   apis and usecases for gpus like opencl (or also background video
>>   encoding/decoding/transcoding with libva) users will want to control gpu
>>   resources. So I expect that we'll grow resource limits, schedulers with
>>   priorities and maybe also something like control groups in a few years.
>>   But if we don't put a bit of thought into the commonalities of things
>>   like gpu virtual address spaces, scheduling and similar things I fear we
>>   won't be able to create a sensible common interface to allocate and
>>   control resources in the feature. Which will result in a sub-par
>>   experience.
>
> I wish we could come up with a common API for different GPU but the fact
> is all kernel API we exposed so far is specific to each GPU beside
> modesetting. Anythings that use the processing power of GPU use
> dedicated API.
>
> For instance radeon virtual address give control to userspace, ie userspace
> decide the virtual address space at which each bo will be. On the contrary
> nouveau do the allocation in the kernel. We did choose userspace to be in
> charge for few reasons (top of my head) :
> - allow to have 1:1 mapping btw cpu address space and gpu without having
>  playing with process vm in the kernel
> - allow easy command stream replay (no need to rewrite the command
>  stream because virtual address are different)
>
> Scheduling is next big things coming up, with lastest gen of gpu growing
> the capacity to preempt gpu task and also offering a finer control over
> the GPU cores. Again here i fear that we will each grow our own API.
>
> My believe is that as long as we expose a low level API to use the GPU
> there is no way we can provide a common API for things like scheduling
> or virtual address space. The way you submit command is tightly related
> to scheduling and virtual address stuff.
>
>> But if my google-fu doesn't fail me gpu address spaces for radeon was
>> posted the first time as v22 ever on a public list and merged right away,
>> so there's been simply no time to discuss cross-driver issues.  Which is
>> part of why I'm slightly miffed ;-)
>>
>
> I too wish that we could have release this earlier. I guess it was included
> because we still

Re: [ANNOUNCE] libdrm 2.4.30

2012-01-09 Thread Eric Anholt
On Sun, 08 Jan 2012 00:48:56 -0800, Jeremy Huddleston  
wrote:
> Well that's a ppc box, so maybe endianness issues... ?

Oh, that's probably it.  But I'd say it's more a bug that Intel libdrm
is being built on ppc.


pgpUm6kwWVbPV.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 10:44:15AM -0500, Jerome Glisse wrote:
> On Mon, Jan 09, 2012 at 09:31:16AM +0100, Daniel Vetter wrote:
> > On Sun, Jan 08, 2012 at 05:56:31PM -0500, Jerome Glisse wrote:
> > > On Sun, Jan 8, 2012 at 9:05 AM, Daniel Vetter  wrote:
> > > > Hi all,
> > > >
> > > > Meh, I've wanted to port the small set of helpers nouveau already has to
> > > > handle per open fd gpu virtual address spaces to core drm, so that I 
> > > > could
> > > > reuse them for i915. Just to go one small step towards unifying drivers 
> > > > in
> > > > drm/* a bit ...
> > > >
> > > > Looks like I'll have another driver to wrestle or just forget about it 
> > > > and
> > > > reinvet that wheel for i915, too.
> > > >
> > > > 
> > > >
> > > > Cheers, Daniel
> > > > --
> > > 
> > > I looked at nouveau before writing this code, thing is, in the end
> > > there is little common code especialy when you take different path on
> > > how you handle things (persistent or dynamic page table for instance).
> > > Thought couple things can still be share. Note that the whole radeon
> > > code is designed with the possibility of having several address space
> > > per process, thought there is no use for such things today we believe
> > > things like opencl+opengl can benefit of each having their own address
> > > space.
> > 
> > - I've realized when looking through nouveau that we likely can't share
> >   match more than a gem_bo->vma lookup plus a bunch of helper functions.
> > 
> > - Imo having more than one gpu virtual address space per fd doesn't make
> >   much sense. libdrm (at least for i915) is mostly just about issueing
> >   cmdbuffers. Hence it's probably easier to just open two fds and
> >   instantiate two libdrm buffer managers if you want two address spaces
> >   for otherwise you have to teach libdrm that the same buffer object still
> >   can have different addresses (which is pretty much against the point of
> >   gpu virtual address spaces).
> 
> Radeon barely use libdrm (only the ddx use part of it).

Ok, that explains things a bit.

> > I also realize that in the dri1 days there's been way too much common code
> > that only gets used by one or two drivers and hence isn't really commonly
> > useable at all (and also not really of decent quality). So I'm all in
> > favour for driver-specific stuff, especially for execution and memory
> > management. But:
> > 
> > - nouveau already has gpu virtual address spaces, radeon just grew them
> >   with this patch and i915 is on track to get them, too: Patches to enable
> >   the different hw addressing mode for Sandybridge and later are ready,
> >   and with Ivybridge hw engineers kinked out the remaining bugs so we can
> >   actually context-switch between different address spaces without hitting
> >   hw bugs.
> > 
> > - The more general picture is that with the advent of more general-purpose
> >   apis and usecases for gpus like opencl (or also background video
> >   encoding/decoding/transcoding with libva) users will want to control gpu
> >   resources. So I expect that we'll grow resource limits, schedulers with
> >   priorities and maybe also something like control groups in a few years.
> >   But if we don't put a bit of thought into the commonalities of things
> >   like gpu virtual address spaces, scheduling and similar things I fear we
> >   won't be able to create a sensible common interface to allocate and
> >   control resources in the feature. Which will result in a sub-par
> >   experience. 
> 
> I wish we could come up with a common API for different GPU but the fact
> is all kernel API we exposed so far is specific to each GPU beside
> modesetting. Anythings that use the processing power of GPU use
> dedicated API.
> 
> For instance radeon virtual address give control to userspace, ie userspace
> decide the virtual address space at which each bo will be. On the contrary
> nouveau do the allocation in the kernel. We did choose userspace to be in
> charge for few reasons (top of my head) :
> - allow to have 1:1 mapping btw cpu address space and gpu without having
>   playing with process vm in the kernel
> - allow easy command stream replay (no need to rewrite the command
>   stream because virtual address are different)

This stuff here is awesome and I've really missed it from the changelog. I
think the value in having a discussion across drivers isn't always in
coming up with a common interface, but sometimes just in learning each
anothers design tradeoffs.

> Scheduling is next big things coming up, with lastest gen of gpu growing
> the capacity to preempt gpu task and also offering a finer control over
> the GPU cores. Again here i fear that we will each grow our own API.
> 
> My believe is that as long as we expose a low level API to use the GPU
> there is no way we can provide a common API for things like scheduling
> or virtual address space. The way you submit command is tightly related
> to scheduling and virtual address stuff.

Well I think 

Re: [PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 11:07:44AM -0500, Alex Deucher wrote:
> We would have liked to as well, but with the holidays and internal
> review process and people being on vacation, and the kernel merge
> window timing it didn't work out this time.  Still as Jerome said, our
> VM setup is tightly coupled with the way our HW works.  I'm not sure
> how much code could actually be shared.

Again to clarify, I'm not so much interesting in sharing code here - I
know it'll be nigh to impossible by having looked through nouveau already
and quickly glossed through your stuff. I see the value is just in having
a minimal set of common idioms that we can use to build cool stuff like
resource management.

Think of gem, which is actually about next to nothing. There's a the flink
stuff and a common close ioctl and recently also a dummy create unusable
for anythine else than boot spalsh screens. Plus the shmem backing store
which seems to be not so much in favour with embedded chips, i.e doesn't
really count. Ridiculous compared to all the other stuff required to draw
triangles on a gpu, but good enough to make things like dri2 and wayland
possible.

Cheers, daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: TTM and AGP conflicts

2012-01-09 Thread Konrad Rzeszutek Wilk
On Mon, Jan 09, 2012 at 10:07:02AM +0100, Thomas Hellstrom wrote:
> On 01/06/2012 04:51 PM, James Simmons wrote:
> You can achieve what you want by either adding a new domain so you would 
> have
> system, vram, agp, pcidma and object can be bound to one and only one. Or 
> you
> can hack your own agp ttm backend that could bind bo to agp or pci or
> both at the same time depending on what you want to achieve.
> >>>The question is how does one know which domain you want in tt_create.
> >>>Currenty drivers are using there dev_priv but if you have have more than
> >>>one option available how does one choose? Would you be okay with passing
> >>>in a domain flag?
> >>>
> >>Well i agree that something would be usefull there so the driver know
> >>which bind/unbind function it should use. Thomas i would prefer
> >>passing the bo to the tt_create callback but a flag is the minimum we
> >>need.
> >We can discuss this after the merge widow. Jerome your patch does fix a
> >regression whereas my proposal is a enhancement.
> 
> ..Back from parental leave.

Congratulations!
> 
> I'm not 100% sure I understand the problem correctly, but I assume
> the problem is that
> you receive a "bind" request for pages allocated out of the wrong
> DMA pool? Is that correct?
> 
> In that case we need to make both the following happen:
> 1) The backend should be required to supply a fallback that can bind
> anyway by allocating the correct page type and copy.
> 2) I'm OK with providing a hint to tt_create. IIRC we're passing
> bo::mem to tt_bind. Would it be ok to pass the same to tt_create?
> 
> /Thomas
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Alex Deucher
On Mon, Jan 9, 2012 at 11:49 AM, Daniel Vetter  wrote:
> On Mon, Jan 09, 2012 at 11:07:44AM -0500, Alex Deucher wrote:
>> We would have liked to as well, but with the holidays and internal
>> review process and people being on vacation, and the kernel merge
>> window timing it didn't work out this time.  Still as Jerome said, our
>> VM setup is tightly coupled with the way our HW works.  I'm not sure
>> how much code could actually be shared.
>
> Again to clarify, I'm not so much interesting in sharing code here - I
> know it'll be nigh to impossible by having looked through nouveau already
> and quickly glossed through your stuff. I see the value is just in having
> a minimal set of common idioms that we can use to build cool stuff like
> resource management.
>
> Think of gem, which is actually about next to nothing. There's a the flink
> stuff and a common close ioctl and recently also a dummy create unusable
> for anythine else than boot spalsh screens. Plus the shmem backing store
> which seems to be not so much in favour with embedded chips, i.e doesn't
> really count. Ridiculous compared to all the other stuff required to draw
> triangles on a gpu, but good enough to make things like dri2 and wayland
> possible.
>

I generally agree here, but I don't think this patch prevents
something like that in the future.  Also, I think we need to actually
use GPU VM for a bit and see how everyone's implementations and plans
shake out before we can decide which idioms are common.

Alex

> Cheers, daniel
> --
> Daniel Vetter
> Mail: dan...@ffwll.ch
> Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 11:56:20AM -0500, Alex Deucher wrote:
> On Mon, Jan 9, 2012 at 11:49 AM, Daniel Vetter  wrote:
> > On Mon, Jan 09, 2012 at 11:07:44AM -0500, Alex Deucher wrote:
> >> We would have liked to as well, but with the holidays and internal
> >> review process and people being on vacation, and the kernel merge
> >> window timing it didn't work out this time.  Still as Jerome said, our
> >> VM setup is tightly coupled with the way our HW works.  I'm not sure
> >> how much code could actually be shared.
> >
> > Again to clarify, I'm not so much interesting in sharing code here - I
> > know it'll be nigh to impossible by having looked through nouveau already
> > and quickly glossed through your stuff. I see the value is just in having
> > a minimal set of common idioms that we can use to build cool stuff like
> > resource management.
> >
> > Think of gem, which is actually about next to nothing. There's a the flink
> > stuff and a common close ioctl and recently also a dummy create unusable
> > for anythine else than boot spalsh screens. Plus the shmem backing store
> > which seems to be not so much in favour with embedded chips, i.e doesn't
> > really count. Ridiculous compared to all the other stuff required to draw
> > triangles on a gpu, but good enough to make things like dri2 and wayland
> > possible.
> >
> 
> I generally agree here, but I don't think this patch prevents
> something like that in the future.  Also, I think we need to actually
> use GPU VM for a bit and see how everyone's implementations and plans
> shake out before we can decide which idioms are common.

Agreed. My sligh miff has only really ben that this got posted only
recently and there was no chance to have this quick discussion before it
got merged.

Cheers, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 27314] displayport link training fails on certain panels (channel equalization fails)

2012-01-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=27314

--- Comment #63 from Rafael Ávila de Espíndola  
2012-01-09 17:11:34 PST ---
Created attachment 55359
  --> https://bugs.freedesktop.org/attachment.cgi?id=55359
dmesg from kernel 3.2

I just tried KMS with kernel 3.2 and I still get a black screen :-(

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread InKi Dae
2012/1/10 Rob Clark :
> On Mon, Jan 9, 2012 at 4:10 AM, InKi Dae  wrote:
>> note : in case of sharing a buffer between v4l2 and drm driver, the
>> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
>> vb2_xx_buf through sg table. in this case, only memory info is used to
>> share, not some objects.
>
> which v4l2/vb2 patches are you looking at?  The patches I was using,
> vb2 holds a reference to the 'struct dma_buf *' internally, not just
> keeping the sg_table
>

yes, not keeping the sg_table. I mean... see a example below please.

static void vb2_dma_contig_map_dmabuf(void *mem_priv)
{
struct sg_table *sg;
 ...
 sg = dma_buf_map_attachment(buf->db_attach, dir);
 ...
 buf->dma_addr = sg_dma_address(sg->sgl);
 ...
}

at least with no IOMMU, the memory information(containing physical
memory address) would be copied to vb2_xx_buf object if drm gem
exported its own buffer and vb2 wants to use that buffer at this time,
sg table is used to share that buffer. and the problem I pointed out
is that this buffer(also physical memory region) could be released by
vb2 framework(as you know, vb2_xx_buf object and the memory region for
buf->dma_addr pointing) but the Exporter(drm gem) couldn't know that
so some problems would be induced once drm gem tries to release or
access that buffer. and I have tried to resolve this issue adding
get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
good way. maybe there would be better way.

Thanks.

> BR,
> -R
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Rob Clark
On Mon, Jan 9, 2012 at 7:34 PM, InKi Dae  wrote:
> 2012/1/10 Rob Clark :
>> On Mon, Jan 9, 2012 at 4:10 AM, InKi Dae  wrote:
>>> note : in case of sharing a buffer between v4l2 and drm driver, the
>>> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
>>> vb2_xx_buf through sg table. in this case, only memory info is used to
>>> share, not some objects.
>>
>> which v4l2/vb2 patches are you looking at?  The patches I was using,
>> vb2 holds a reference to the 'struct dma_buf *' internally, not just
>> keeping the sg_table
>>
>
> yes, not keeping the sg_table. I mean... see a example below please.
>
> static void vb2_dma_contig_map_dmabuf(void *mem_priv)
> {
>    struct sg_table *sg;
>     ...
>     sg = dma_buf_map_attachment(buf->db_attach, dir);
>     ...
>     buf->dma_addr = sg_dma_address(sg->sgl);
>     ...
> }
>
> at least with no IOMMU, the memory information(containing physical
> memory address) would be copied to vb2_xx_buf object if drm gem
> exported its own buffer and vb2 wants to use that buffer at this time,
> sg table is used to share that buffer. and the problem I pointed out
> is that this buffer(also physical memory region) could be released by
> vb2 framework(as you know, vb2_xx_buf object and the memory region for
> buf->dma_addr pointing) but the Exporter(drm gem) couldn't know that
> so some problems would be induced once drm gem tries to release or
> access that buffer. and I have tried to resolve this issue adding
> get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
> good way. maybe there would be better way.

the exporter (in this case your driver's drm/gem bits) shouldn't
release that mapping / sgtable until the importer (in this case v4l2)
calls dma_buf_unmap fxn..

It would be an error if the importer did a dma_buf_put() without first
calling dma_buf_unmap_attachment() (if currently mapped) and then
dma_buf_detach() (if currently attached).  Perhaps somewhere there
should be some sanity checking debug code which could be enabled to do
a WARN_ON() if the importer does the wrong thing.  It shouldn't really
be part of the API, I don't think, but it actually does seem like a
good thing, esp. as new drivers start trying to use dmabuf, to have
some debug options which could be enabled.

It is entirely possible that something was missed on the vb2 patches,
but the way it is intended to work is like this:
https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-core.c#L92

where it does a detach() before the dma_buf_put(), and the vb2-contig
backend checks here that it is also unmapped():
https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-dma-contig.c#L251

BR,
-R

> Thanks.
>
>> BR,
>> -R
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/nouveau: fix ttm move notify callback

2012-01-09 Thread Ben Skeggs
On Fri, 2012-01-06 at 16:00 -0500, Jerome Glisse wrote:
> On Fri, Jan 06, 2012 at 02:52:49PM -0500, Konrad Rzeszutek Wilk wrote:
> > > Still having difficulty to reproduce can you reproduce with the attached
> > > printk debuging patch and provide the log (only few printk preceding the
> > > oops or segfault are interesting).
> > 
> > http://darnok.org/vga/move_notify-v212.log
> > 
> 
> Looks like nouveau doesn't like move notify being call on driver
> shutdown or when somethings om nv50 is down. Ben i think you will
> be better at finding a fix for that than me.
I'm also not able to reproduce this issue on a NV98 (so, i'd expect
every nv50+ chipset to behave the same) chipset with the current code in
Dave's drm-core-next tree..

Am I missing something?

Ben.

> 
> Cheers,
> Jerome


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Semwal, Sumit
On Tue, Jan 10, 2012 at 7:44 AM, Rob Clark  wrote:
> On Mon, Jan 9, 2012 at 7:34 PM, InKi Dae  wrote:
>> 2012/1/10 Rob Clark :
>> at least with no IOMMU, the memory information(containing physical
>> memory address) would be copied to vb2_xx_buf object if drm gem
>> exported its own buffer and vb2 wants to use that buffer at this time,
>> sg table is used to share that buffer. and the problem I pointed out
>> is that this buffer(also physical memory region) could be released by
>> vb2 framework(as you know, vb2_xx_buf object and the memory region for
>> buf->dma_addr pointing) but the Exporter(drm gem) couldn't know that
>> so some problems would be induced once drm gem tries to release or
>> access that buffer. and I have tried to resolve this issue adding
>> get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
>> good way. maybe there would be better way.
Hi Inki,
As also mentioned in the documentation patch, importer (the user of
the buffer) - in this case for current RFC patches on
v4l2-as-a-user[1] vb2 framework - shouldn't release the backing memory
of the buffer directly - it should only use the dma-buf callbacks in
the right sequence to let the exporter know that it is done using this
buffer, so the exporter can release it if allowed and needed.
>
> the exporter (in this case your driver's drm/gem bits) shouldn't
> release that mapping / sgtable until the importer (in this case v4l2)
> calls dma_buf_unmap fxn..
>
> It would be an error if the importer did a dma_buf_put() without first
> calling dma_buf_unmap_attachment() (if currently mapped) and then
> dma_buf_detach() (if currently attached).  Perhaps somewhere there
> should be some sanity checking debug code which could be enabled to do
> a WARN_ON() if the importer does the wrong thing.  It shouldn't really
> be part of the API, I don't think, but it actually does seem like a
> good thing, esp. as new drivers start trying to use dmabuf, to have
> some debug options which could be enabled.
>
> It is entirely possible that something was missed on the vb2 patches,
> but the way it is intended to work is like this:
> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-core.c#L92
>
> where it does a detach() before the dma_buf_put(), and the vb2-contig
> backend checks here that it is also unmapped():
> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-dma-contig.c#L251

The proposed RFC for V4L2 adaptation at [1] does exactly the same
thing; detach() before dma_buf_put(), and check in vb2-contig backend
for unmapped() as mentioned above.

>
> BR,
> -R
>
BR,
Sumit.

[1]: V4l2 as a dma-buf user RFC:
http://comments.gmane.org/gmane.linux.drivers.video-input-infrastructure/42966
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [patch] drm/nv50/pm: signedness bug in nv50_pm_clocks_pre()

2012-01-09 Thread Martin Peres

Le 04/01/2012 08:20, Dan Carpenter a écrit :

calc_mclk() returns zero on success and negative on failure but clk is
a u32.

Signed-off-by: Dan Carpenter

diff --git a/drivers/gpu/drm/nouveau/nv50_pm.c 
b/drivers/gpu/drm/nouveau/nv50_pm.c
index 0393721..3508de9 100644
--- a/drivers/gpu/drm/nouveau/nv50_pm.c
+++ b/drivers/gpu/drm/nouveau/nv50_pm.c
@@ -540,7 +540,7 @@ nv50_pm_clocks_pre(struct drm_device *dev, struct 
nouveau_pm_level *perflvl)
info->mclk_hwsq.len = 0;
if (perflvl->memory) {
clk = calc_mclk(dev, perflvl->memory,&info->mclk_hwsq);
-   if (clk<  0) {
+   if ((int)clk<  0) {
ret = clk;
goto error;
}

Well spotted Dan!

Sorry for the late answer, was busy reworking this file for safe reclocking.

I have a slightly different fix for that. Please tell me if It suits 
you: 
https://gitorious.org/linux-nouveau-pm/linux-nouveau-pm/commit/c1b80360ezd1aa7dd780ac383aae9437c66ef3b89

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [patch] drm/nv50/pm: signedness bug in nv50_pm_clocks_pre()

2012-01-09 Thread Dan Carpenter
On Tue, Jan 10, 2012 at 12:28:13AM +0100, Martin Peres wrote:
> Le 04/01/2012 08:20, Dan Carpenter a écrit :
> >calc_mclk() returns zero on success and negative on failure but clk is
> >a u32.
> >
> >Signed-off-by: Dan Carpenter
> >
> >diff --git a/drivers/gpu/drm/nouveau/nv50_pm.c 
> >b/drivers/gpu/drm/nouveau/nv50_pm.c
> >index 0393721..3508de9 100644
> >--- a/drivers/gpu/drm/nouveau/nv50_pm.c
> >+++ b/drivers/gpu/drm/nouveau/nv50_pm.c
> >@@ -540,7 +540,7 @@ nv50_pm_clocks_pre(struct drm_device *dev, struct 
> >nouveau_pm_level *perflvl)
> > info->mclk_hwsq.len = 0;
> > if (perflvl->memory) {
> > clk = calc_mclk(dev, perflvl->memory,&info->mclk_hwsq);
> >-if (clk<  0) {
> >+if ((int)clk<  0) {
> > ret = clk;
> > goto error;
> > }
> Well spotted Dan!
> 
> Sorry for the late answer, was busy reworking this file for safe reclocking.
> 
> I have a slightly different fix for that. Please tell me if It suits
> you: 
> https://gitorious.org/linux-nouveau-pm/linux-nouveau-pm/commit/c1b80360ezd1aa7dd780ac383aae9437c66ef3b89

That link redirects to
https://gitorious.org/linux-nouveau-pm/linux-nouveau-pm/commits/master
and it doesn't show the patch.

But I wasn't a huge fan of adding the cast very much either so I'm
sure your patch is good.

regards,
dan carpenter


signature.asc
Description: Digital signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread InKi Dae
2012/1/10 Semwal, Sumit :
> On Tue, Jan 10, 2012 at 7:44 AM, Rob Clark  wrote:
>> On Mon, Jan 9, 2012 at 7:34 PM, InKi Dae  wrote:
>>> 2012/1/10 Rob Clark :
>>> at least with no IOMMU, the memory information(containing physical
>>> memory address) would be copied to vb2_xx_buf object if drm gem
>>> exported its own buffer and vb2 wants to use that buffer at this time,
>>> sg table is used to share that buffer. and the problem I pointed out
>>> is that this buffer(also physical memory region) could be released by
>>> vb2 framework(as you know, vb2_xx_buf object and the memory region for
>>> buf->dma_addr pointing) but the Exporter(drm gem) couldn't know that
>>> so some problems would be induced once drm gem tries to release or
>>> access that buffer. and I have tried to resolve this issue adding
>>> get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
>>> good way. maybe there would be better way.
> Hi Inki,
> As also mentioned in the documentation patch, importer (the user of
> the buffer) - in this case for current RFC patches on
> v4l2-as-a-user[1] vb2 framework - shouldn't release the backing memory
> of the buffer directly - it should only use the dma-buf callbacks in
> the right sequence to let the exporter know that it is done using this
> buffer, so the exporter can release it if allowed and needed.

thank you for your comments.:) and below are some tables about dmabuf
operations with ideal use and these tables indicate reference count of
when buffer is created, shared and released. so if there are any
problems, please let me know. P.S. these are just simple cases so
there would be others.


in case of using only drm gem and dmabuf,

operations   gem refcountfile f_count   buf refcount

1. gem create   A:1   A:0
2. export(handle A -> fd)A:2A:1  A:0
3. import(fd -> handle B)A:2, B:1 A:2  A:1
4. file close(A)  A:2, B:1 A:1  A:1
5. gem close(A)A:1, B:1 A:1  A:1
6. gem close(B)A:1, B:0 A:1  A:0
7. file close(A)  A:0A:0
---
3. handle B shares the buf of handle A.
6. release handle B but its buf.
7. release gem handle A and dmabuf of file A and also physical memory region.


and in case of using drm gem, vb2 and dmabuf,

operations  gem, vb2 refcountfile f_count   buf refcount

1. gem create   A:1 A:0
   (GEM side)
2. export(handle A -> fd)A:2 A:1  A:0
   (GEM side)
3. import(fd -> handle B)A:2, B:1  A:2  A:1
   (VB2 side)
4. file close(A)  A:2, B:1  A:1  A:1
   (VB2 side)
5. vb2 close(B) A:2, B:0  A:1  A:0
   (VB2 side)
6. gem close(A)A:1A:1  A:0
   (GEM side)
7. file close(A)  A:0A:0
   (GEM side)

3. vb2 handle B is shared with the buf of gem handle A.
5. release vb2 handle B and decrease refcount of the buf pointed by it.
7. release gem handle A and dmabuf of file A and also physical memory region.

>>
>> the exporter (in this case your driver's drm/gem bits) shouldn't
>> release that mapping / sgtable until the importer (in this case v4l2)
>> calls dma_buf_unmap fxn..
>>
>> It would be an error if the importer did a dma_buf_put() without first
>> calling dma_buf_unmap_attachment() (if currently mapped) and then
>> dma_buf_detach() (if currently attached).  Perhaps somewhere there
>> should be some sanity checking debug code which could be enabled to do
>> a WARN_ON() if the importer does the wrong thing.  It shouldn't really
>> be part of the API, I don't think, but it actually does seem like a
>> good thing, esp. as new drivers start trying to use dmabuf, to have
>> some debug options which could be enabled.
>>
>> It is entirely possible that something was missed on the vb2 patches,
>> but the way it is intended to work is like this:
>> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-core.c#L92
>>
>> where it does a detach() before the dma_buf_put(), and the vb2-contig
>> backend checks here that it is also unmapped():
>> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-dma-contig.c#L251
>
> The proposed RFC for V4L2 adaptation at [1] does exactly the same
> thi

[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread InKi Dae
2011/12/2 Sumit Semwal :
> This is the first step in defining a dma buffer sharing mechanism.
>
> A new buffer object dma_buf is added, with operations and API to allow easy
> sharing of this buffer object across devices.
>
> The framework allows:
> - different devices to 'attach' themselves to this buffer, to facilitate
> ?backing storage negotiation, using dma_buf_attach() API.
> - association of a file pointer with each user-buffer and associated
> ? allocator-defined operations on that buffer. This operation is called the
> ? 'export' operation.
> - this exported buffer-object to be shared with the other entity by asking for
> ? its 'file-descriptor (fd)', and sharing the fd across.
> - a received fd to get the buffer object back, where it can be accessed using
> ? the associated exporter-defined operations.
> - the exporter and user to share the scatterlist using map_dma_buf and
> ? unmap_dma_buf operations.
>
> Atleast one 'attach()' call is required to be made prior to calling the
> map_dma_buf() operation.
>
> Couple of building blocks in map_dma_buf() are added to ease introduction
> of sync'ing across exporter and users, and late allocation by the exporter.
>
> *OPTIONALLY*: mmap() file operation is provided for the associated 'fd', as
> wrapper over the optional allocator defined mmap(), to be used by devices
> that might need one.
>
> More details are there in the documentation patch.
>
> This is based on design suggestions from many people at the mini-summits[1],
> most notably from Arnd Bergmann , Rob Clark  
> and
> Daniel Vetter .
>
> The implementation is inspired from proof-of-concept patch-set from
> Tomasz Stanislawski , who demonstrated buffer 
> sharing
> between two v4l2 devices. [2]
>
> [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement
> [2]: http://lwn.net/Articles/454389
>
> Signed-off-by: Sumit Semwal 
> Signed-off-by: Sumit Semwal 
> ---
> ?drivers/base/Kconfig ? ?| ? 10 ++
> ?drivers/base/Makefile ? | ? ?1 +
> ?drivers/base/dma-buf.c ?| ?290 
> +++
> ?include/linux/dma-buf.h | ?176 
> ?4 files changed, 477 insertions(+), 0 deletions(-)
> ?create mode 100644 drivers/base/dma-buf.c
> ?create mode 100644 include/linux/dma-buf.h
>
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 21cf46f..07d8095 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -174,4 +174,14 @@ config SYS_HYPERVISOR
>
> ?source "drivers/base/regmap/Kconfig"
>
> +config DMA_SHARED_BUFFER
> + ? ? ? bool "Buffer framework to be shared between drivers"
> + ? ? ? default n
> + ? ? ? depends on ANON_INODES
> + ? ? ? help
> + ? ? ? ? This option enables the framework for buffer-sharing between
> + ? ? ? ? multiple drivers. A buffer is associated with a file using driver
> + ? ? ? ? APIs extension; the file's descriptor can then be passed on to other
> + ? ? ? ? driver.
> +
> ?endmenu
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 99a375a..d0df046 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_DEVTMPFS) ?+= devtmpfs.o
> ?obj-y ? ? ? ? ? ? ? ? ?+= power/
> ?obj-$(CONFIG_HAS_DMA) ?+= dma-mapping.o
> ?obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o
> ?obj-$(CONFIG_ISA) ? ? ?+= isa.o
> ?obj-$(CONFIG_FW_LOADER) ? ? ? ?+= firmware_class.o
> ?obj-$(CONFIG_NUMA) ? ? += node.o
> diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
> new file mode 100644
> index 000..4b9005e
> --- /dev/null
> +++ b/drivers/base/dma-buf.c
> @@ -0,0 +1,290 @@
> +/*
> + * Framework for buffer objects that can be shared across devices/subsystems.
> + *
> + * Copyright(C) 2011 Linaro Limited. All rights reserved.
> + * Author: Sumit Semwal 
> + *
> + * Many thanks to linaro-mm-sig list, and specially
> + * Arnd Bergmann , Rob Clark  and
> + * Daniel Vetter  for their support in creation and
> + * refining of this idea.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published 
> by
> + * the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but 
> WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. ?See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program. ?If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static inline int is_dma_buf_file(struct file *);
> +
> +static int dma_buf_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + ? ? ? struct dma_buf *dmabuf;
> +
> + ? ? ? if (!is_dma_buf_file(file))
> + ? ? ? ? ? ? ? return -EINVAL;
> +
> + ? ? ? dmabuf = file->private_data;

[git pull] dma-buf tree

2012-01-09 Thread Sumit Semwal
On 9 January 2012 03:38, Linus Torvalds  
wrote:
> On Fri, Jan 6, 2012 at 7:06 AM, Dave Airlie  wrote:
>>
>> Now we've all agreed that the initial implementation is a good baseline
>> for us to move forward on, but its messy working with others when the core
>> code is out of tree. So we'd like to merge the core dma-buf code now so we
>> can all build on top of it for 3.4. I know some people would say we
>> shouldn't merge things with no users, but it will really make our lives
>> easier going forward to get this baseline into the tree.
>
> Ok, merged.
Hi Linus,
Thanks!
>
> However, I ask myself whether it's ever sane to ask the user to enable
> this? Isn't this very much a "drivers will use 'select' to pick up the
> intfrastructure code" kind of thing?
Yes, you're right; I will submit a patch to change appropriately, as
well as document it as well. Would it be ok to submit this change
alongwith one of the first users of the framework, or would you rather
prefer it in the rc2 / rc3 time frame?

>
> ? ? ? ? ? ? ? ? ? ? ?Linus
Best regards,
~Sumit.


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
> I has test dmabuf based drm gem module for exynos and I found one problem.
> you can refer to this test repository:
> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
> 
> at this repository, I added some exception codes for resource release
> in addition to Dave's patch sets.
> 
> let's suppose we use dmabuf based vb2 and drm gem with physically
> continuous memory(no IOMMU) and we try to share allocated buffer
> between them(v4l2 and drm driver).
> 
> 1. request memory allocation through drm gem interface.
> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
> gem object.
> - internally, private gem based dmabuf moudle calls drm_buf_export()
> to register allocated gem object to fd.
> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
> buffer to v4l2 based device.
> - internally, vb2 plug in module gets a buffer to the fd and then
> calls dmabuf->ops->map_dmabuf() callback to get the sg table
> containing physical memory info to the gem object. and then the
> physical memory info would be copied to vb2_xx_buf object.
> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
> this repository:
> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
> 
> after that, if v4l2 driver want to release vb2_xx_buf object with
> allocated memory region by user request, how should we do?. refcount
> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
> object is released videobuf2 framework don't know who is using the
> physical memory region. so this physical memory region is released and
> when drm driver tries to access the region or to release it also, a
> problem would be induced.
> 
> for this problem, I added get_shared_cnt() callback to dma-buf.h but
> I'm not sure that this is good way. maybe there may be better way.
> if there is any missing point, please let me know.

The dma_buf object needs to hold a reference on the underlying
(necessarily reference-counted) buffer object when the exporter creates
the dma_buf handle. This reference should then get dropped in the
exporters dma_buf->ops->release() function, which is only getting called
when the last reference to the dma_buf disappears.

If this doesn't work like that currently, we have a bug, and exporting the
reference count or something similar can't fix that.

Yours, Daniel

PS: Please cut down the original mail when replying, otherwise it's pretty
hard to find your response ;-)
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Dave Airlie
On Mon, Jan 9, 2012 at 8:10 AM, Daniel Vetter  wrote:
> On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
>> I has test dmabuf based drm gem module for exynos and I found one problem.
>> you can refer to this test repository:
>> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
>>
>> at this repository, I added some exception codes for resource release
>> in addition to Dave's patch sets.
>>
>> let's suppose we use dmabuf based vb2 and drm gem with physically
>> continuous memory(no IOMMU) and we try to share allocated buffer
>> between them(v4l2 and drm driver).
>>
>> 1. request memory allocation through drm gem interface.
>> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
>> gem object.
>> - internally, private gem based dmabuf moudle calls drm_buf_export()
>> to register allocated gem object to fd.
>> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
>> buffer to v4l2 based device.
>> - internally, vb2 plug in module gets a buffer to the fd and then
>> calls dmabuf->ops->map_dmabuf() callback to get the sg table
>> containing physical memory info to the gem object. and then the
>> physical memory info would be copied to vb2_xx_buf object.
>> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
>> this repository:
>> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
>>
>> after that, if v4l2 driver want to release vb2_xx_buf object with
>> allocated memory region by user request, how should we do?. refcount
>> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
>> object is released videobuf2 framework don't know who is using the
>> physical memory region. so this physical memory region is released and
>> when drm driver tries to access the region or to release it also, a
>> problem would be induced.
>>
>> for this problem, I added get_shared_cnt() callback to dma-buf.h but
>> I'm not sure that this is good way. maybe there may be better way.
>> if there is any missing point, please let me know.
>
> The dma_buf object needs to hold a reference on the underlying
> (necessarily reference-counted) buffer object when the exporter creates
> the dma_buf handle. This reference should then get dropped in the
> exporters dma_buf->ops->release() function, which is only getting called
> when the last reference to the dma_buf disappears.
>
> If this doesn't work like that currently, we have a bug, and exporting the
> reference count or something similar can't fix that.
>
> Yours, Daniel
>
> PS: Please cut down the original mail when replying, otherwise it's pretty
> hard to find your response ;-)

And also the importer needs to realise it doesn't own the pages in the
sg_table and when its freeing its backing memory it shouldn't free
those pages. So for GEM objects we have to keep track if we allocated
the pages or we got them from an dma buf.

Dave.


[PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Daniel Vetter
On Sun, Jan 08, 2012 at 05:56:31PM -0500, Jerome Glisse wrote:
> On Sun, Jan 8, 2012 at 9:05 AM, Daniel Vetter  wrote:
> > Hi all,
> >
> > Meh, I've wanted to port the small set of helpers nouveau already has to
> > handle per open fd gpu virtual address spaces to core drm, so that I could
> > reuse them for i915. Just to go one small step towards unifying drivers in
> > drm/* a bit ...
> >
> > Looks like I'll have another driver to wrestle or just forget about it and
> > reinvet that wheel for i915, too.
> >
> > 
> >
> > Cheers, Daniel
> > --
> 
> I looked at nouveau before writing this code, thing is, in the end
> there is little common code especialy when you take different path on
> how you handle things (persistent or dynamic page table for instance).
> Thought couple things can still be share. Note that the whole radeon
> code is designed with the possibility of having several address space
> per process, thought there is no use for such things today we believe
> things like opencl+opengl can benefit of each having their own address
> space.

- I've realized when looking through nouveau that we likely can't share
  match more than a gem_bo->vma lookup plus a bunch of helper functions.

- Imo having more than one gpu virtual address space per fd doesn't make
  much sense. libdrm (at least for i915) is mostly just about issueing
  cmdbuffers. Hence it's probably easier to just open two fds and
  instantiate two libdrm buffer managers if you want two address spaces
  for otherwise you have to teach libdrm that the same buffer object still
  can have different addresses (which is pretty much against the point of
  gpu virtual address spaces).

I also realize that in the dri1 days there's been way too much common code
that only gets used by one or two drivers and hence isn't really commonly
useable at all (and also not really of decent quality). So I'm all in
favour for driver-specific stuff, especially for execution and memory
management. But:

- nouveau already has gpu virtual address spaces, radeon just grew them
  with this patch and i915 is on track to get them, too: Patches to enable
  the different hw addressing mode for Sandybridge and later are ready,
  and with Ivybridge hw engineers kinked out the remaining bugs so we can
  actually context-switch between different address spaces without hitting
  hw bugs.

- The more general picture is that with the advent of more general-purpose
  apis and usecases for gpus like opencl (or also background video
  encoding/decoding/transcoding with libva) users will want to control gpu
  resources. So I expect that we'll grow resource limits, schedulers with
  priorities and maybe also something like control groups in a few years.
  But if we don't put a bit of thought into the commonalities of things
  like gpu virtual address spaces, scheduling and similar things I fear we
  won't be able to create a sensible common interface to allocate and
  control resources in the feature. Which will result in a sub-par
  experience. 

But if my google-fu doesn't fail me gpu address spaces for radeon was
posted the first time as v22 ever on a public list and merged right away,
so there's been simply no time to discuss cross-driver issues.  Which is
part of why I'm slightly miffed ;-)

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


TTM and AGP conflicts

2012-01-09 Thread Thomas Hellstrom
On 01/06/2012 04:51 PM, James Simmons wrote:
>
 You can achieve what you want by either adding a new domain so you would 
 have
 system, vram, agp, pcidma and object can be bound to one and only one. Or 
 you
 can hack your own agp ttm backend that could bind bo to agp or pci or
 both at the same time depending on what you want to achieve.
  
>>> The question is how does one know which domain you want in tt_create.
>>> Currenty drivers are using there dev_priv but if you have have more than
>>> one option available how does one choose? Would you be okay with passing
>>> in a domain flag?
>>>
>>>
>> Well i agree that something would be usefull there so the driver know
>> which bind/unbind function it should use. Thomas i would prefer
>> passing the bo to the tt_create callback but a flag is the minimum we
>> need.
>>  
> We can discuss this after the merge widow. Jerome your patch does fix a
> regression whereas my proposal is a enhancement.
>

..Back from parental leave.

I'm not 100% sure I understand the problem correctly, but I assume the 
problem is that
you receive a "bind" request for pages allocated out of the wrong DMA 
pool? Is that correct?

In that case we need to make both the following happen:
1) The backend should be required to supply a fallback that can bind 
anyway by allocating the correct page type and copy.
2) I'm OK with providing a hint to tt_create. IIRC we're passing bo::mem 
to tt_bind. Would it be ok to pass the same to tt_create?

/Thomas



[RFC] Future TTM DMA direction

2012-01-09 Thread Thomas Hellstrom
Hi!

When TTM was originally written, it was assumed that GPU apertures could 
address pages directly, and that the CPU could access those pages 
without explicit synchronization. The process of binding a page to a GPU 
translation table was a simple one-step operation, and we needed to 
worry about fragmentation in the GPU aperture only.

Now that we "sort of" support DMA memory there are three things I think 
are missing:

1) We can't gracefully handle coherent DMA OOMs or coherent DMA 
(Including CMA) memory fragmentation leading to failed allocations.
2) We can't handle dynamic mapping of pages into and out of dma, and 
corresponding IOMMU space shortage or fragmentation, and CPU 
synchronization.
3) We have no straightforward way of moving pages between devices.

I think a reasonable way to support this is to make binding to a 
non-fixed (system page based) TTM memory type a two-step binding 
process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE) 
instead of only (MEMORY_TYPE).

In step 1) the bo is bound to a specific DMA type. These could be for 
example:
(NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be 
allowed as well.
In this step, we perform dma_sync_for_device, or allocate dma-specific 
pages maintaining LRU lists so that if we receive a DMA memory 
allocation OOM, we can unbind bo:s bound to the same DMA type. Standard 
graphics cards would then, for example, use the NONE DMA type when run 
on bare metal or COHERENT when run on Xen. A "COHERENT" OOM condition 
would then lead to eviction of another bo. (Note that DMA eviction might 
involve data copies and be costly, but still better than failing).
Binding with the DYNAMIC memory type would mean that CPU accesses are 
disallowed, and that user-space CPU page mappings might need to be 
killed, with a corresponding sync_for_cpu if they are faulted in again 
(perhaps on a page-by-page basis). Any attempt to bo_kmap() a bo page 
bound to DYNAMIC DMA mapping should trigger a BUG.

In step 2) The bo is bound to the GPU in the same way it's done today. 
Evicting from DMA will of course also trigger an evict from GPU, but an 
evict from GPU will not trigger a DMA evict.

Making a bo "anonymous" and thus moveable between devices would then 
mean binding it to the "NONE" DMA type.

Comments, suggestions?

/Thomas









[RFC] Future TTM DMA direction

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
> Hi!
> 
> When TTM was originally written, it was assumed that GPU apertures
> could address pages directly, and that the CPU could access those
> pages without explicit synchronization. The process of binding a
> page to a GPU translation table was a simple one-step operation, and
> we needed to worry about fragmentation in the GPU aperture only.
> 
> Now that we "sort of" support DMA memory there are three things I
> think are missing:
> 
> 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
> (Including CMA) memory fragmentation leading to failed allocations.
> 2) We can't handle dynamic mapping of pages into and out of dma, and
> corresponding IOMMU space shortage or fragmentation, and CPU
> synchronization.
> 3) We have no straightforward way of moving pages between devices.
> 
> I think a reasonable way to support this is to make binding to a
> non-fixed (system page based) TTM memory type a two-step binding
> process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
> instead of only (MEMORY_TYPE).
> 
> In step 1) the bo is bound to a specific DMA type. These could be
> for example:
> (NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
> allowed as well.
> In this step, we perform dma_sync_for_device, or allocate
> dma-specific pages maintaining LRU lists so that if we receive a DMA
> memory allocation OOM, we can unbind bo:s bound to the same DMA
> type. Standard graphics cards would then, for example, use the NONE
> DMA type when run on bare metal or COHERENT when run on Xen. A
> "COHERENT" OOM condition would then lead to eviction of another bo.
> (Note that DMA eviction might involve data copies and be costly, but
> still better than failing).
> Binding with the DYNAMIC memory type would mean that CPU accesses
> are disallowed, and that user-space CPU page mappings might need to
> be killed, with a corresponding sync_for_cpu if they are faulted in
> again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
> bo page bound to DYNAMIC DMA mapping should trigger a BUG.
> 
> In step 2) The bo is bound to the GPU in the same way it's done
> today. Evicting from DMA will of course also trigger an evict from
> GPU, but an evict from GPU will not trigger a DMA evict.
> 
> Making a bo "anonymous" and thus moveable between devices would then
> mean binding it to the "NONE" DMA type.
> 
> Comments, suggestions?

Well I think we need to solve outstanding issues in the dma_buf framework
first. Currently dma_buf isn't really up to par to handle coherency
between the cpu and devices and there's also not yet any way to handle dma
address space fragmentation/exhaustion.

I fear that if you jump ahead with improving the ttm support alone we
might end up with something incompatible to the stuff dma_buf eventually
will grow, resulting in decent amounts of wasted efforts.

Cc'ed a bunch of relevant lists to foster input from people.

For a starter you seem to want much more low-level integration with the
dma api than existing users commonly need. E.g. if I understand things
correctly drivers just call dma_alloc_coherent and the platform/board code
then decides whether the device needs a contigious allocation from cma or
whether something else is good, too (e.g. vmalloc for the cpu + iommu).
Another thing is that I think doing lru eviction in case of dma address
space exhaustion (or fragmentation) needs at least awereness of what's
going on in the upper layers. iommus are commonly shared between devices
and I presume that two ttm drivers sitting behind the same iommu and
fighting over it's resources can lead to some hilarious outcomes.

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread InKi Dae
2012/1/9 Daniel Vetter :
> On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
>> I has test dmabuf based drm gem module for exynos and I found one problem.
>> you can refer to this test repository:
>> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
>>
>> at this repository, I added some exception codes for resource release
>> in addition to Dave's patch sets.
>>
>> let's suppose we use dmabuf based vb2 and drm gem with physically
>> continuous memory(no IOMMU) and we try to share allocated buffer
>> between them(v4l2 and drm driver).
>>
>> 1. request memory allocation through drm gem interface.
>> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
>> gem object.
>> - internally, private gem based dmabuf moudle calls drm_buf_export()
>> to register allocated gem object to fd.
>> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
>> buffer to v4l2 based device.
>> - internally, vb2 plug in module gets a buffer to the fd and then
>> calls dmabuf->ops->map_dmabuf() callback to get the sg table
>> containing physical memory info to the gem object. and then the
>> physical memory info would be copied to vb2_xx_buf object.
>> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
>> this repository:
>> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
>>
>> after that, if v4l2 driver want to release vb2_xx_buf object with
>> allocated memory region by user request, how should we do?. refcount
>> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
>> object is released videobuf2 framework don't know who is using the
>> physical memory region. so this physical memory region is released and
>> when drm driver tries to access the region or to release it also, a
>> problem would be induced.
>>
>> for this problem, I added get_shared_cnt() callback to dma-buf.h but
>> I'm not sure that this is good way. maybe there may be better way.
>> if there is any missing point, please let me know.
>
> The dma_buf object needs to hold a reference on the underlying
> (necessarily reference-counted) buffer object when the exporter creates
> the dma_buf handle. This reference should then get dropped in the
> exporters dma_buf->ops->release() function, which is only getting called
> when the last reference to the dma_buf disappears.
>

when the exporter creates the dma_buf handle(for example, gem -> fd),
I think the refcount of gem object should be increased at this point,
and decreased by dma_buf->ops->release() again because when the
dma_buf is created and dma_buf_export() is called, this dma_buf refers
to the gem object one time. and in case of inporter(fd -> gem),
file->f_count of the dma_buf is increased and then when this gem
object is released by user request such as drm close or
drn_gem_close_ioctl, dma_buf_put() should be called by
dma_buf->ops->detach() to decrease file->f_count again because the gem
object refers to the dma_buf. for this, you can refer to my test
repository I mentioned above. but the problem is that when a buffer is
released by one side, another can't know whether the buffer already
was released or not.
note : in case of sharing a buffer between v4l2 and drm driver, the
memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
vb2_xx_buf through sg table. in this case, only memory info is used to
share, not some objects.

> If this doesn't work like that currently, we have a bug, and exporting the
> reference count or something similar can't fix that.
>
> Yours, Daniel
>
> PS: Please cut down the original mail when replying, otherwise it's pretty
> hard to find your response ;-)

Ok, got it. thanks. :)

> --
> Daniel Vetter
> Mail: daniel at ffwll.ch
> Mobile: +41 (0)79 365 57 48


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 07:10:25PM +0900, InKi Dae wrote:
> 2012/1/9 Daniel Vetter :
> > On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
> >> I has test dmabuf based drm gem module for exynos and I found one problem.
> >> you can refer to this test repository:
> >> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
> >>
> >> at this repository, I added some exception codes for resource release
> >> in addition to Dave's patch sets.
> >>
> >> let's suppose we use dmabuf based vb2 and drm gem with physically
> >> continuous memory(no IOMMU) and we try to share allocated buffer
> >> between them(v4l2 and drm driver).
> >>
> >> 1. request memory allocation through drm gem interface.
> >> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
> >> gem object.
> >> - internally, private gem based dmabuf moudle calls drm_buf_export()
> >> to register allocated gem object to fd.
> >> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
> >> buffer to v4l2 based device.
> >> - internally, vb2 plug in module gets a buffer to the fd and then
> >> calls dmabuf->ops->map_dmabuf() callback to get the sg table
> >> containing physical memory info to the gem object. and then the
> >> physical memory info would be copied to vb2_xx_buf object.
> >> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
> >> this repository:
> >> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
> >>
> >> after that, if v4l2 driver want to release vb2_xx_buf object with
> >> allocated memory region by user request, how should we do?. refcount
> >> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
> >> object is released videobuf2 framework don't know who is using the
> >> physical memory region. so this physical memory region is released and
> >> when drm driver tries to access the region or to release it also, a
> >> problem would be induced.
> >>
> >> for this problem, I added get_shared_cnt() callback to dma-buf.h but
> >> I'm not sure that this is good way. maybe there may be better way.
> >> if there is any missing point, please let me know.
> >
> > The dma_buf object needs to hold a reference on the underlying
> > (necessarily reference-counted) buffer object when the exporter creates
> > the dma_buf handle. This reference should then get dropped in the
> > exporters dma_buf->ops->release() function, which is only getting called
> > when the last reference to the dma_buf disappears.
> >
> 
> when the exporter creates the dma_buf handle(for example, gem -> fd),
> I think the refcount of gem object should be increased at this point,
> and decreased by dma_buf->ops->release() again because when the
> dma_buf is created and dma_buf_export() is called, this dma_buf refers
> to the gem object one time. and in case of inporter(fd -> gem),
> file->f_count of the dma_buf is increased and then when this gem
> object is released by user request such as drm close or
> drn_gem_close_ioctl, dma_buf_put() should be called by
> dma_buf->ops->detach() to decrease file->f_count again because the gem
> object refers to the dma_buf. for this, you can refer to my test
> repository I mentioned above. but the problem is that when a buffer is
> released by one side, another can't know whether the buffer already
> was released or not.

Nope, dma_buf_put should not be called by ->detach. The importer gets his
reference when importing the dma_buf and needs to drop that reference
himself when it's done using the buffer by calling dma_buf_put (i.e. after
the last ->detach call). I think adding separate reference counting to
->attach and ->detach is a waste of time and only papers over buggy
importers.

Additionally the importer does _not_ control the lifetime of an dma_buf
object and it's underlying backing storage. It hence may _never_ free the
backing storage itself, that's the job of the exporter.

With that cleared up, referencing the exporters underlying buffer object
from the dma_buf will just do the right thing.

> note : in case of sharing a buffer between v4l2 and drm driver, the
> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
> vb2_xx_buf through sg table. in this case, only memory info is used to
> share, not some objects.

Hm, maybe I need to take a look at the currently proposed v4l dma_buf
patches ;-) atm I don't have an idea what exactly you're talking about.

Yours, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[Bug 44549] "Ring buffer test failed" on Radeon 5450

2012-01-09 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=44549

Michel D?nzer  changed:

   What|Removed |Added

  Attachment #55245|0   |1
is obsolete||

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


[Bug 44549] "Ring buffer test failed" on Radeon 5450

2012-01-09 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=44549

Michel D?nzer  changed:

   What|Removed |Added

  Attachment #55247|0   |1
is obsolete||

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


[RFC] Future TTM DMA direction

2012-01-09 Thread Thomas Hellstrom
On 01/09/2012 11:11 AM, Daniel Vetter wrote:
> On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
>
>> Hi!
>>
>> When TTM was originally written, it was assumed that GPU apertures
>> could address pages directly, and that the CPU could access those
>> pages without explicit synchronization. The process of binding a
>> page to a GPU translation table was a simple one-step operation, and
>> we needed to worry about fragmentation in the GPU aperture only.
>>
>> Now that we "sort of" support DMA memory there are three things I
>> think are missing:
>>
>> 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
>> (Including CMA) memory fragmentation leading to failed allocations.
>> 2) We can't handle dynamic mapping of pages into and out of dma, and
>> corresponding IOMMU space shortage or fragmentation, and CPU
>> synchronization.
>> 3) We have no straightforward way of moving pages between devices.
>>
>> I think a reasonable way to support this is to make binding to a
>> non-fixed (system page based) TTM memory type a two-step binding
>> process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
>> instead of only (MEMORY_TYPE).
>>
>> In step 1) the bo is bound to a specific DMA type. These could be
>> for example:
>> (NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
>> allowed as well.
>> In this step, we perform dma_sync_for_device, or allocate
>> dma-specific pages maintaining LRU lists so that if we receive a DMA
>> memory allocation OOM, we can unbind bo:s bound to the same DMA
>> type. Standard graphics cards would then, for example, use the NONE
>> DMA type when run on bare metal or COHERENT when run on Xen. A
>> "COHERENT" OOM condition would then lead to eviction of another bo.
>> (Note that DMA eviction might involve data copies and be costly, but
>> still better than failing).
>> Binding with the DYNAMIC memory type would mean that CPU accesses
>> are disallowed, and that user-space CPU page mappings might need to
>> be killed, with a corresponding sync_for_cpu if they are faulted in
>> again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
>> bo page bound to DYNAMIC DMA mapping should trigger a BUG.
>>
>> In step 2) The bo is bound to the GPU in the same way it's done
>> today. Evicting from DMA will of course also trigger an evict from
>> GPU, but an evict from GPU will not trigger a DMA evict.
>>
>> Making a bo "anonymous" and thus moveable between devices would then
>> mean binding it to the "NONE" DMA type.
>>
>> Comments, suggestions?
>>  
> Well I think we need to solve outstanding issues in the dma_buf framework
> first. Currently dma_buf isn't really up to par to handle coherency
> between the cpu and devices and there's also not yet any way to handle dma
> address space fragmentation/exhaustion.
>
> I fear that if you jump ahead with improving the ttm support alone we
> might end up with something incompatible to the stuff dma_buf eventually
> will grow, resulting in decent amounts of wasted efforts.
>
> Cc'ed a bunch of relevant lists to foster input from people.
>

Daniel,

Thanks for your input. I think this is mostly orthogonal to dma_buf, and
really a way to adapt TTM to be DMA-api aware. That's currently done
within the TTM backends. CMA was mearly included as an example that
might not be relevant.

I haven't followed dma_buf that closely lately, but if it's growing from 
being just
a way to share buffer objects between devices to something providing 
also low-level
allocators with fragmentation prevention, there's definitely an overlap.
However, on the dma_buf meeting in Budapest there seemed to be little or 
no interest
in robust buffer allocation / fragmentation prevention although I 
remember bringing
it up to the point where I felt annoying :).

> For a starter you seem to want much more low-level integration with the
> dma api than existing users commonly need. E.g. if I understand things
> correctly drivers just call dma_alloc_coherent and the platform/board code
> then decides whether the device needs a contigious allocation from cma or
> whether something else is good, too (e.g. vmalloc for the cpu + iommu).
> Another thing is that I think doing lru eviction in case of dma address
> space exhaustion (or fragmentation) needs at least awereness of what's
> going on in the upper layers. iommus are commonly shared between devices
> and I presume that two ttm drivers sitting behind the same iommu and
> fighting over it's resources can lead to some hilarious outcomes.
>

A good point, I didn't think of that.

For TTM drivers sharing the same IOMMU it's really possible to make such 
LRU global,
(assuming IOMMU identity is available to the TTM-aware drivers), but unless
fragmentation prevention the way we use it for graphics drivers 
(allocate - submit - fence) ends
up in the IOMMU space management code, it's impossible to make this 
scheme system-wide.

> Cheers, Daniel
>

Thanks,

/Thomas


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread InKi Dae
2012/1/9 Daniel Vetter :
> On Mon, Jan 09, 2012 at 07:10:25PM +0900, InKi Dae wrote:
>> 2012/1/9 Daniel Vetter :
>> > On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
>> >> I has test dmabuf based drm gem module for exynos and I found one problem.
>> >> you can refer to this test repository:
>> >> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
>> >>
>> >> at this repository, I added some exception codes for resource release
>> >> in addition to Dave's patch sets.
>> >>
>> >> let's suppose we use dmabuf based vb2 and drm gem with physically
>> >> continuous memory(no IOMMU) and we try to share allocated buffer
>> >> between them(v4l2 and drm driver).
>> >>
>> >> 1. request memory allocation through drm gem interface.
>> >> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
>> >> gem object.
>> >> - internally, private gem based dmabuf moudle calls drm_buf_export()
>> >> to register allocated gem object to fd.
>> >> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
>> >> buffer to v4l2 based device.
>> >> - internally, vb2 plug in module gets a buffer to the fd and then
>> >> calls dmabuf->ops->map_dmabuf() callback to get the sg table
>> >> containing physical memory info to the gem object. and then the
>> >> physical memory info would be copied to vb2_xx_buf object.
>> >> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
>> >> this repository:
>> >> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
>> >>
>> >> after that, if v4l2 driver want to release vb2_xx_buf object with
>> >> allocated memory region by user request, how should we do?. refcount
>> >> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
>> >> object is released videobuf2 framework don't know who is using the
>> >> physical memory region. so this physical memory region is released and
>> >> when drm driver tries to access the region or to release it also, a
>> >> problem would be induced.
>> >>
>> >> for this problem, I added get_shared_cnt() callback to dma-buf.h but
>> >> I'm not sure that this is good way. maybe there may be better way.
>> >> if there is any missing point, please let me know.
>> >
>> > The dma_buf object needs to hold a reference on the underlying
>> > (necessarily reference-counted) buffer object when the exporter creates
>> > the dma_buf handle. This reference should then get dropped in the
>> > exporters dma_buf->ops->release() function, which is only getting called
>> > when the last reference to the dma_buf disappears.
>> >
>>
>> when the exporter creates the dma_buf handle(for example, gem -> fd),
>> I think the refcount of gem object should be increased at this point,
>> and decreased by dma_buf->ops->release() again because when the
>> dma_buf is created and dma_buf_export() is called, this dma_buf refers
>> to the gem object one time. and in case of inporter(fd -> gem),
>> file->f_count of the dma_buf is increased and then when this gem
>> object is released by user request such as drm close or
>> drn_gem_close_ioctl, dma_buf_put() should be called by
>> dma_buf->ops->detach() to decrease file->f_count again because the gem
>> object refers to the dma_buf. for this, you can refer to my test
>> repository I mentioned above. but the problem is that when a buffer is
>> released by one side, another can't know whether the buffer already
>> was released or not.
>
> Nope, dma_buf_put should not be called by ->detach. The importer gets his
> reference when importing the dma_buf and needs to drop that reference
> himself when it's done using the buffer by calling dma_buf_put (i.e. after
> the last ->detach call).

I'm afraid that there may be my missing points. I'm confusing. who is
Importer and who is Exporter you think? Importer is fd goes to private
buffer and Exporter is private buffer goes to fd? if so, yes, when the
importer needs to drop that reference(the importer want to release
that buffer), dma_buf_put() should be called somewhere and in my case,
that function is called by drm_prime_gem_destory(). this function is
included at Dave's patch sets and also dma_buf_detatch() is called
there. and I just thought that here is right place. I didn't find the
place dma_buf_put() is called anywhere. could you please tell me where
dma_buf_put() should be called at you think?.

for this, you can refer to Dave's repository:
http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

> I think adding separate reference counting to
> ->attach and ->detach is a waste of time and only papers over buggy
> importers.
>

 I mean  when fd goes to private buffer, that reference(file->f_count)
would be increased by dma_buf_get() and  only ->detach is used to drop
that reference.

> Additionally the importer does _not_ control the lifetime of an dma_buf
> object and it's underlying backing storage. It hence may _never_ free the
> backing storage itself, that's the job of the exporter.
>

yes,

[patch 1/2] drm/radeon: use after free in radeon_vm_bo_add()

2012-01-09 Thread Dan Carpenter
"bo_va" is dereferenced in the error message.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index 3ef58ca..2944c78 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -486,10 +486,10 @@ int radeon_vm_bo_add(struct radeon_device *rdev,
}
if (bo_va->soffset >= tmp->soffset && bo_va->soffset < 
tmp->eoffset) {
/* bo and tmp overlap, invalid offset */
-   kfree(bo_va);
dev_err(rdev->dev, "bo %p va 0x%08X conflict with (bo 
%p 0x%08X 0x%08X)\n",
bo, (unsigned)bo_va->soffset, tmp->bo,
(unsigned)tmp->soffset, (unsigned)tmp->eoffset);
+   kfree(bo_va);
mutex_unlock(&vm->mutex);
return -EINVAL;
}


[patch 2/2] drm/radeon: double lock typo in radeon_vm_bo_rmv()

2012-01-09 Thread Dan Carpenter
The second lock should be an unlock or it causes a deadlock.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index 2944c78..4a9f797 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -595,7 +595,7 @@ int radeon_vm_bo_rmv(struct radeon_device *rdev,
radeon_vm_bo_update_pte(rdev, vm, bo, NULL);
radeon_mutex_unlock(&rdev->cs_mutex);
list_del(&bo_va->vm_list);
-   mutex_lock(&vm->mutex);
+   mutex_unlock(&vm->mutex);

kfree(bo_va);
return 0;


[patch 1/2] drm/radeon: use after free in radeon_vm_bo_add()

2012-01-09 Thread Alex Deucher
On Mon, Jan 9, 2012 at 7:44 AM, Dan Carpenter  
wrote:
> "bo_va" is dereferenced in the error message.
>
> Signed-off-by: Dan Carpenter 

Reviewed-by: Alex Deucher 

>
> diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
> b/drivers/gpu/drm/radeon/radeon_gart.c
> index 3ef58ca..2944c78 100644
> --- a/drivers/gpu/drm/radeon/radeon_gart.c
> +++ b/drivers/gpu/drm/radeon/radeon_gart.c
> @@ -486,10 +486,10 @@ int radeon_vm_bo_add(struct radeon_device *rdev,
> ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ?if (bo_va->soffset >= tmp->soffset && bo_va->soffset < 
> tmp->eoffset) {
> ? ? ? ? ? ? ? ? ? ? ? ?/* bo and tmp overlap, invalid offset */
> - ? ? ? ? ? ? ? ? ? ? ? kfree(bo_va);
> ? ? ? ? ? ? ? ? ? ? ? ?dev_err(rdev->dev, "bo %p va 0x%08X conflict with (bo 
> %p 0x%08X 0x%08X)\n",
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?bo, (unsigned)bo_va->soffset, tmp->bo,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(unsigned)tmp->soffset, 
> (unsigned)tmp->eoffset);
> + ? ? ? ? ? ? ? ? ? ? ? kfree(bo_va);
> ? ? ? ? ? ? ? ? ? ? ? ?mutex_unlock(&vm->mutex);
> ? ? ? ? ? ? ? ? ? ? ? ?return -EINVAL;
> ? ? ? ? ? ? ? ?}
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


[patch 2/2] drm/radeon: double lock typo in radeon_vm_bo_rmv()

2012-01-09 Thread Alex Deucher
On Mon, Jan 9, 2012 at 7:45 AM, Dan Carpenter  
wrote:
> The second lock should be an unlock or it causes a deadlock.
>
> Signed-off-by: Dan Carpenter 

Reviewed-by: Alex Deucher 

>
> diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
> b/drivers/gpu/drm/radeon/radeon_gart.c
> index 2944c78..4a9f797 100644
> --- a/drivers/gpu/drm/radeon/radeon_gart.c
> +++ b/drivers/gpu/drm/radeon/radeon_gart.c
> @@ -595,7 +595,7 @@ int radeon_vm_bo_rmv(struct radeon_device *rdev,
> ? ? ? ?radeon_vm_bo_update_pte(rdev, vm, bo, NULL);
> ? ? ? ?radeon_mutex_unlock(&rdev->cs_mutex);
> ? ? ? ?list_del(&bo_va->vm_list);
> - ? ? ? mutex_lock(&vm->mutex);
> + ? ? ? mutex_unlock(&vm->mutex);
>
> ? ? ? ?kfree(bo_va);
> ? ? ? ?return 0;
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Rob Clark
On Mon, Jan 9, 2012 at 4:10 AM, InKi Dae  wrote:
> note : in case of sharing a buffer between v4l2 and drm driver, the
> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
> vb2_xx_buf through sg table. in this case, only memory info is used to
> share, not some objects.

which v4l2/vb2 patches are you looking at?  The patches I was using,
vb2 holds a reference to the 'struct dma_buf *' internally, not just
keeping the sg_table

BR,
-R


TTM and AGP conflicts

2012-01-09 Thread Jerome Glisse
On Mon, Jan 09, 2012 at 10:07:02AM +0100, Thomas Hellstrom wrote:
> On 01/06/2012 04:51 PM, James Simmons wrote:
> You can achieve what you want by either adding a new domain so you would 
> have
> system, vram, agp, pcidma and object can be bound to one and only one. Or 
> you
> can hack your own agp ttm backend that could bind bo to agp or pci or
> both at the same time depending on what you want to achieve.
> >>>The question is how does one know which domain you want in tt_create.
> >>>Currenty drivers are using there dev_priv but if you have have more than
> >>>one option available how does one choose? Would you be okay with passing
> >>>in a domain flag?
> >>>
> >>Well i agree that something would be usefull there so the driver know
> >>which bind/unbind function it should use. Thomas i would prefer
> >>passing the bo to the tt_create callback but a flag is the minimum we
> >>need.
> >We can discuss this after the merge widow. Jerome your patch does fix a
> >regression whereas my proposal is a enhancement.
> 
> ..Back from parental leave.
> 
> I'm not 100% sure I understand the problem correctly, but I assume
> the problem is that
> you receive a "bind" request for pages allocated out of the wrong
> DMA pool? Is that correct?
> 
> In that case we need to make both the following happen:
> 1) The backend should be required to supply a fallback that can bind
> anyway by allocating the correct page type and copy.
> 2) I'm OK with providing a hint to tt_create. IIRC we're passing
> bo::mem to tt_bind. Would it be ok to pass the same to tt_create?
> 
> /Thomas
> 

Issue here is that different backend (AGP or PCI DMA or someother
communication way btw GPU and system memory) needs to use different
page allocator/backend. Thus tt_create need either to know about the
bo or at least about bo:mem.

Cheers,
Jerome


[PATCH] drm/radeon/kms: add support for streamout

2012-01-09 Thread Jerome Glisse
On Thu, Jan 05, 2012 at 10:21:31PM -0500, Alex Deucher wrote:
> On Thu, Jan 5, 2012 at 9:30 PM, Marek Ol??k  wrote:
> > People can start using transform feedback on r6xx with this.
> >
> > Strict CS checking will be implemented later.

We really need to have checking in the same patch, otherwise someone can
claim that the non checking property of what you introduce is part of
the API and thus we can't change it. I know it's painful.

Cheers,
Jerome

> >
> > Signed-off-by: Marek Ol??k 
> > ---
> > ?drivers/gpu/drm/radeon/evergreen_cs.c ? ? | ?104 
> > +++--
> > ?drivers/gpu/drm/radeon/evergreend.h ? ? ? | ? 10 +++
> > ?drivers/gpu/drm/radeon/r600_cs.c ? ? ? ? ?| ?105 
> > +++--
> > ?drivers/gpu/drm/radeon/r600d.h ? ? ? ? ? ?| ? ?6 ++
> > ?drivers/gpu/drm/radeon/radeon_drv.c ? ? ? | ? ?3 +-
> > ?drivers/gpu/drm/radeon/reg_srcs/cayman ? ?| ? 10 +++
> > ?drivers/gpu/drm/radeon/reg_srcs/evergreen | ? 10 +++
> > ?drivers/gpu/drm/radeon/reg_srcs/r600 ? ? ?| ? 10 +++
> > ?8 files changed, 246 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
> > b/drivers/gpu/drm/radeon/evergreen_cs.c
> > index cd4590a..3150489 100644
> > --- a/drivers/gpu/drm/radeon/evergreen_cs.c
> > +++ b/drivers/gpu/drm/radeon/evergreen_cs.c
> > @@ -60,6 +60,10 @@ struct evergreen_cs_track {
> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? cb_shader_mask;
> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? vgt_strmout_config;
> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? vgt_strmout_buffer_config;
> > + ? ? ? struct radeon_bo ? ? ? ?*vgt_strmout_bo[4];
> > + ? ? ? u64 ? ? ? ? ? ? ? ? ? ? vgt_strmout_bo_mc[4];
> > + ? ? ? u32 ? ? ? ? ? ? ? ? ? ? vgt_strmout_bo_offset[4];
> > + ? ? ? u32 ? ? ? ? ? ? ? ? ? ? vgt_strmout_size[4];
> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? db_depth_control;
> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? db_depth_view;
> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? db_depth_size;
> > @@ -159,18 +163,19 @@ static void evergreen_cs_track_init(struct 
> > evergreen_cs_track *track)
> > ? ? ? ?track->db_s_write_offset = 0x;
> > ? ? ? ?track->db_s_read_bo = NULL;
> > ? ? ? ?track->db_s_write_bo = NULL;
> > +
> > + ? ? ? for (i = 0; i < 4; i++) {
> > + ? ? ? ? ? ? ? track->vgt_strmout_size[i] = 0;
> > + ? ? ? ? ? ? ? track->vgt_strmout_bo[i] = NULL;
> > + ? ? ? ? ? ? ? track->vgt_strmout_bo_offset[i] = 0x;
> > + ? ? ? ? ? ? ? track->vgt_strmout_bo_mc[i] = 0x;
> > + ? ? ? }
> > ?}
> >
> > ?static int evergreen_cs_track_check(struct radeon_cs_parser *p)
> > ?{
> > ? ? ? ?struct evergreen_cs_track *track = p->track;
> >
> > - ? ? ? /* we don't support stream out buffer yet */
> > - ? ? ? if (track->vgt_strmout_config || track->vgt_strmout_buffer_config) {
> > - ? ? ? ? ? ? ? dev_warn(p->dev, "this kernel doesn't support SMX output 
> > buffer\n");
> > - ? ? ? ? ? ? ? return -EINVAL;
> > - ? ? ? }
> > -
> > ? ? ? ?/* XXX fill in */
> > ? ? ? ?return 0;
> > ?}
> > @@ -597,6 +602,37 @@ static int evergreen_cs_check_reg(struct 
> > radeon_cs_parser *p, u32 reg, u32 idx)
> > ? ? ? ?case VGT_STRMOUT_BUFFER_CONFIG:
> > ? ? ? ? ? ? ? ?track->vgt_strmout_buffer_config = radeon_get_ib_value(p, 
> > idx);
> > ? ? ? ? ? ? ? ?break;
> > + ? ? ? case VGT_STRMOUT_BUFFER_BASE_0:
> > + ? ? ? case VGT_STRMOUT_BUFFER_BASE_1:
> > + ? ? ? case VGT_STRMOUT_BUFFER_BASE_2:
> > + ? ? ? case VGT_STRMOUT_BUFFER_BASE_3:
> > + ? ? ? ? ? ? ? r = evergreen_cs_packet_next_reloc(p, &reloc);
> > + ? ? ? ? ? ? ? if (r) {
> > + ? ? ? ? ? ? ? ? ? ? ? dev_warn(p->dev, "bad SET_CONTEXT_REG "
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "0x%04X\n", reg);
> > + ? ? ? ? ? ? ? ? ? ? ? return -EINVAL;
> > + ? ? ? ? ? ? ? }
> > + ? ? ? ? ? ? ? tmp = (reg - VGT_STRMOUT_BUFFER_BASE_0) / 16;
> > + ? ? ? ? ? ? ? track->vgt_strmout_bo_offset[tmp] = radeon_get_ib_value(p, 
> > idx) << 8;
> > + ? ? ? ? ? ? ? ib[idx] += (u32)((reloc->lobj.gpu_offset >> 8) & 
> > 0x);
> > + ? ? ? ? ? ? ? track->vgt_strmout_bo[tmp] = reloc->robj;
> > + ? ? ? ? ? ? ? track->vgt_strmout_bo_mc[tmp] = reloc->lobj.gpu_offset;
> > + ? ? ? ? ? ? ? break;
> > + ? ? ? case VGT_STRMOUT_BUFFER_SIZE_0:
> > + ? ? ? case VGT_STRMOUT_BUFFER_SIZE_1:
> > + ? ? ? case VGT_STRMOUT_BUFFER_SIZE_2:
> > + ? ? ? case VGT_STRMOUT_BUFFER_SIZE_3:
> > + ? ? ? ? ? ? ? tmp = (reg - VGT_STRMOUT_BUFFER_SIZE_0) / 16;
> > + ? ? ? ? ? ? ? track->vgt_strmout_size[tmp] = radeon_get_ib_value(p, idx);
> > + ? ? ? ? ? ? ? break;
> > + ? ? ? case CP_COHER_BASE:
> > + ? ? ? ? ? ? ? r = evergreen_cs_packet_next_reloc(p, &reloc);
> > + ? ? ? ? ? ? ? if (r) {
> > + ? ? ? ? ? ? ? ? ? ? ? dev_warn(p->dev, "missing reloc for CP_COHER_BASE "
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "0x%04X\n", reg);
> > + ? ? ? ? ? ? ? ? ? ? ? return -EINVAL;
> > + ? ? ? ? ? ? ? }
> > + ? ? ? ? ? ? ? ib[idx] += (u32)((reloc->lobj.gpu_offset >> 8) & 
> > 0x);
> > ? ? ? ?case CB_TARGET_MASK:
> > ? ? ? ? ? ? ? ?track->cb_target_mask = radeon_get_ib_value(p, idx);
> > ? ? ? ? ? ? ? ?break;
> > @@ -1451,6 +14

[PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Jerome Glisse
On Mon, Jan 09, 2012 at 09:31:16AM +0100, Daniel Vetter wrote:
> On Sun, Jan 08, 2012 at 05:56:31PM -0500, Jerome Glisse wrote:
> > On Sun, Jan 8, 2012 at 9:05 AM, Daniel Vetter  wrote:
> > > Hi all,
> > >
> > > Meh, I've wanted to port the small set of helpers nouveau already has to
> > > handle per open fd gpu virtual address spaces to core drm, so that I could
> > > reuse them for i915. Just to go one small step towards unifying drivers in
> > > drm/* a bit ...
> > >
> > > Looks like I'll have another driver to wrestle or just forget about it and
> > > reinvet that wheel for i915, too.
> > >
> > > 
> > >
> > > Cheers, Daniel
> > > --
> > 
> > I looked at nouveau before writing this code, thing is, in the end
> > there is little common code especialy when you take different path on
> > how you handle things (persistent or dynamic page table for instance).
> > Thought couple things can still be share. Note that the whole radeon
> > code is designed with the possibility of having several address space
> > per process, thought there is no use for such things today we believe
> > things like opencl+opengl can benefit of each having their own address
> > space.
> 
> - I've realized when looking through nouveau that we likely can't share
>   match more than a gem_bo->vma lookup plus a bunch of helper functions.
> 
> - Imo having more than one gpu virtual address space per fd doesn't make
>   much sense. libdrm (at least for i915) is mostly just about issueing
>   cmdbuffers. Hence it's probably easier to just open two fds and
>   instantiate two libdrm buffer managers if you want two address spaces
>   for otherwise you have to teach libdrm that the same buffer object still
>   can have different addresses (which is pretty much against the point of
>   gpu virtual address spaces).

Radeon barely use libdrm (only the ddx use part of it).

> 
> I also realize that in the dri1 days there's been way too much common code
> that only gets used by one or two drivers and hence isn't really commonly
> useable at all (and also not really of decent quality). So I'm all in
> favour for driver-specific stuff, especially for execution and memory
> management. But:
> 
> - nouveau already has gpu virtual address spaces, radeon just grew them
>   with this patch and i915 is on track to get them, too: Patches to enable
>   the different hw addressing mode for Sandybridge and later are ready,
>   and with Ivybridge hw engineers kinked out the remaining bugs so we can
>   actually context-switch between different address spaces without hitting
>   hw bugs.
> 
> - The more general picture is that with the advent of more general-purpose
>   apis and usecases for gpus like opencl (or also background video
>   encoding/decoding/transcoding with libva) users will want to control gpu
>   resources. So I expect that we'll grow resource limits, schedulers with
>   priorities and maybe also something like control groups in a few years.
>   But if we don't put a bit of thought into the commonalities of things
>   like gpu virtual address spaces, scheduling and similar things I fear we
>   won't be able to create a sensible common interface to allocate and
>   control resources in the feature. Which will result in a sub-par
>   experience. 

I wish we could come up with a common API for different GPU but the fact
is all kernel API we exposed so far is specific to each GPU beside
modesetting. Anythings that use the processing power of GPU use
dedicated API.

For instance radeon virtual address give control to userspace, ie userspace
decide the virtual address space at which each bo will be. On the contrary
nouveau do the allocation in the kernel. We did choose userspace to be in
charge for few reasons (top of my head) :
- allow to have 1:1 mapping btw cpu address space and gpu without having
  playing with process vm in the kernel
- allow easy command stream replay (no need to rewrite the command
  stream because virtual address are different)

Scheduling is next big things coming up, with lastest gen of gpu growing
the capacity to preempt gpu task and also offering a finer control over
the GPU cores. Again here i fear that we will each grow our own API.

My believe is that as long as we expose a low level API to use the GPU
there is no way we can provide a common API for things like scheduling
or virtual address space. The way you submit command is tightly related
to scheduling and virtual address stuff.

> But if my google-fu doesn't fail me gpu address spaces for radeon was
> posted the first time as v22 ever on a public list and merged right away,
> so there's been simply no time to discuss cross-driver issues.  Which is
> part of why I'm slightly miffed ;-)
> 

I too wish that we could have release this earlier. I guess it was included
because we still managed to get enough eyes familiar with radeon to look and
play with this code.

Cheers,
Jerome


[PATCH] drm/radeon/kms: add support for streamout

2012-01-09 Thread Marek Olšák
You're both right. I wanted to put this patch out so that somebody
else could try this stuff out. I'll finish it when time allows.

Marek

On Mon, Jan 9, 2012 at 4:26 PM, Jerome Glisse  wrote:
> On Thu, Jan 05, 2012 at 10:21:31PM -0500, Alex Deucher wrote:
>> On Thu, Jan 5, 2012 at 9:30 PM, Marek Ol??k  wrote:
>> > People can start using transform feedback on r6xx with this.
>> >
>> > Strict CS checking will be implemented later.
>
> We really need to have checking in the same patch, otherwise someone can
> claim that the non checking property of what you introduce is part of
> the API and thus we can't change it. I know it's painful.
>
> Cheers,
> Jerome
>
>> >
>> > Signed-off-by: Marek Ol??k 
>> > ---
>> > ?drivers/gpu/drm/radeon/evergreen_cs.c ? ? | ?104 
>> > +++--
>> > ?drivers/gpu/drm/radeon/evergreend.h ? ? ? | ? 10 +++
>> > ?drivers/gpu/drm/radeon/r600_cs.c ? ? ? ? ?| ?105 
>> > +++--
>> > ?drivers/gpu/drm/radeon/r600d.h ? ? ? ? ? ?| ? ?6 ++
>> > ?drivers/gpu/drm/radeon/radeon_drv.c ? ? ? | ? ?3 +-
>> > ?drivers/gpu/drm/radeon/reg_srcs/cayman ? ?| ? 10 +++
>> > ?drivers/gpu/drm/radeon/reg_srcs/evergreen | ? 10 +++
>> > ?drivers/gpu/drm/radeon/reg_srcs/r600 ? ? ?| ? 10 +++
>> > ?8 files changed, 246 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
>> > b/drivers/gpu/drm/radeon/evergreen_cs.c
>> > index cd4590a..3150489 100644
>> > --- a/drivers/gpu/drm/radeon/evergreen_cs.c
>> > +++ b/drivers/gpu/drm/radeon/evergreen_cs.c
>> > @@ -60,6 +60,10 @@ struct evergreen_cs_track {
>> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? cb_shader_mask;
>> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? vgt_strmout_config;
>> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? vgt_strmout_buffer_config;
>> > + ? ? ? struct radeon_bo ? ? ? ?*vgt_strmout_bo[4];
>> > + ? ? ? u64 ? ? ? ? ? ? ? ? ? ? vgt_strmout_bo_mc[4];
>> > + ? ? ? u32 ? ? ? ? ? ? ? ? ? ? vgt_strmout_bo_offset[4];
>> > + ? ? ? u32 ? ? ? ? ? ? ? ? ? ? vgt_strmout_size[4];
>> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? db_depth_control;
>> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? db_depth_view;
>> > ? ? ? ?u32 ? ? ? ? ? ? ? ? ? ? db_depth_size;
>> > @@ -159,18 +163,19 @@ static void evergreen_cs_track_init(struct 
>> > evergreen_cs_track *track)
>> > ? ? ? ?track->db_s_write_offset = 0x;
>> > ? ? ? ?track->db_s_read_bo = NULL;
>> > ? ? ? ?track->db_s_write_bo = NULL;
>> > +
>> > + ? ? ? for (i = 0; i < 4; i++) {
>> > + ? ? ? ? ? ? ? track->vgt_strmout_size[i] = 0;
>> > + ? ? ? ? ? ? ? track->vgt_strmout_bo[i] = NULL;
>> > + ? ? ? ? ? ? ? track->vgt_strmout_bo_offset[i] = 0x;
>> > + ? ? ? ? ? ? ? track->vgt_strmout_bo_mc[i] = 0x;
>> > + ? ? ? }
>> > ?}
>> >
>> > ?static int evergreen_cs_track_check(struct radeon_cs_parser *p)
>> > ?{
>> > ? ? ? ?struct evergreen_cs_track *track = p->track;
>> >
>> > - ? ? ? /* we don't support stream out buffer yet */
>> > - ? ? ? if (track->vgt_strmout_config || track->vgt_strmout_buffer_config) 
>> > {
>> > - ? ? ? ? ? ? ? dev_warn(p->dev, "this kernel doesn't support SMX output 
>> > buffer\n");
>> > - ? ? ? ? ? ? ? return -EINVAL;
>> > - ? ? ? }
>> > -
>> > ? ? ? ?/* XXX fill in */
>> > ? ? ? ?return 0;
>> > ?}
>> > @@ -597,6 +602,37 @@ static int evergreen_cs_check_reg(struct 
>> > radeon_cs_parser *p, u32 reg, u32 idx)
>> > ? ? ? ?case VGT_STRMOUT_BUFFER_CONFIG:
>> > ? ? ? ? ? ? ? ?track->vgt_strmout_buffer_config = radeon_get_ib_value(p, 
>> > idx);
>> > ? ? ? ? ? ? ? ?break;
>> > + ? ? ? case VGT_STRMOUT_BUFFER_BASE_0:
>> > + ? ? ? case VGT_STRMOUT_BUFFER_BASE_1:
>> > + ? ? ? case VGT_STRMOUT_BUFFER_BASE_2:
>> > + ? ? ? case VGT_STRMOUT_BUFFER_BASE_3:
>> > + ? ? ? ? ? ? ? r = evergreen_cs_packet_next_reloc(p, &reloc);
>> > + ? ? ? ? ? ? ? if (r) {
>> > + ? ? ? ? ? ? ? ? ? ? ? dev_warn(p->dev, "bad SET_CONTEXT_REG "
>> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "0x%04X\n", reg);
>> > + ? ? ? ? ? ? ? ? ? ? ? return -EINVAL;
>> > + ? ? ? ? ? ? ? }
>> > + ? ? ? ? ? ? ? tmp = (reg - VGT_STRMOUT_BUFFER_BASE_0) / 16;
>> > + ? ? ? ? ? ? ? track->vgt_strmout_bo_offset[tmp] = radeon_get_ib_value(p, 
>> > idx) << 8;
>> > + ? ? ? ? ? ? ? ib[idx] += (u32)((reloc->lobj.gpu_offset >> 8) & 
>> > 0x);
>> > + ? ? ? ? ? ? ? track->vgt_strmout_bo[tmp] = reloc->robj;
>> > + ? ? ? ? ? ? ? track->vgt_strmout_bo_mc[tmp] = reloc->lobj.gpu_offset;
>> > + ? ? ? ? ? ? ? break;
>> > + ? ? ? case VGT_STRMOUT_BUFFER_SIZE_0:
>> > + ? ? ? case VGT_STRMOUT_BUFFER_SIZE_1:
>> > + ? ? ? case VGT_STRMOUT_BUFFER_SIZE_2:
>> > + ? ? ? case VGT_STRMOUT_BUFFER_SIZE_3:
>> > + ? ? ? ? ? ? ? tmp = (reg - VGT_STRMOUT_BUFFER_SIZE_0) / 16;
>> > + ? ? ? ? ? ? ? track->vgt_strmout_size[tmp] = radeon_get_ib_value(p, idx);
>> > + ? ? ? ? ? ? ? break;
>> > + ? ? ? case CP_COHER_BASE:
>> > + ? ? ? ? ? ? ? r = evergreen_cs_packet_next_reloc(p, &reloc);
>> > + ? ? ? ? ? ? ? if (r) {
>> > + ? ? ? ? ? ? ? ? ? ? ? dev_warn(p->dev, "missing reloc for CP_COHER_BASE "
>> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "0x%04

[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 09:06:56PM +0900, InKi Dae wrote:
> 2012/1/9 Daniel Vetter :
> > On Mon, Jan 09, 2012 at 07:10:25PM +0900, InKi Dae wrote:
> >> 2012/1/9 Daniel Vetter :
> >> > On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
> >> >> I has test dmabuf based drm gem module for exynos and I found one 
> >> >> problem.
> >> >> you can refer to this test repository:
> >> >> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
> >> >>
> >> >> at this repository, I added some exception codes for resource release
> >> >> in addition to Dave's patch sets.
> >> >>
> >> >> let's suppose we use dmabuf based vb2 and drm gem with physically
> >> >> continuous memory(no IOMMU) and we try to share allocated buffer
> >> >> between them(v4l2 and drm driver).
> >> >>
> >> >> 1. request memory allocation through drm gem interface.
> >> >> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
> >> >> gem object.
> >> >> - internally, private gem based dmabuf moudle calls drm_buf_export()
> >> >> to register allocated gem object to fd.
> >> >> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
> >> >> buffer to v4l2 based device.
> >> >> - internally, vb2 plug in module gets a buffer to the fd and then
> >> >> calls dmabuf->ops->map_dmabuf() callback to get the sg table
> >> >> containing physical memory info to the gem object. and then the
> >> >> physical memory info would be copied to vb2_xx_buf object.
> >> >> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
> >> >> this repository:
> >> >> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
> >> >>
> >> >> after that, if v4l2 driver want to release vb2_xx_buf object with
> >> >> allocated memory region by user request, how should we do?. refcount
> >> >> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
> >> >> object is released videobuf2 framework don't know who is using the
> >> >> physical memory region. so this physical memory region is released and
> >> >> when drm driver tries to access the region or to release it also, a
> >> >> problem would be induced.
> >> >>
> >> >> for this problem, I added get_shared_cnt() callback to dma-buf.h but
> >> >> I'm not sure that this is good way. maybe there may be better way.
> >> >> if there is any missing point, please let me know.
> >> >
> >> > The dma_buf object needs to hold a reference on the underlying
> >> > (necessarily reference-counted) buffer object when the exporter creates
> >> > the dma_buf handle. This reference should then get dropped in the
> >> > exporters dma_buf->ops->release() function, which is only getting called
> >> > when the last reference to the dma_buf disappears.
> >> >
> >>
> >> when the exporter creates the dma_buf handle(for example, gem -> fd),
> >> I think the refcount of gem object should be increased at this point,
> >> and decreased by dma_buf->ops->release() again because when the
> >> dma_buf is created and dma_buf_export() is called, this dma_buf refers
> >> to the gem object one time. and in case of inporter(fd -> gem),
> >> file->f_count of the dma_buf is increased and then when this gem
> >> object is released by user request such as drm close or
> >> drn_gem_close_ioctl, dma_buf_put() should be called by
> >> dma_buf->ops->detach() to decrease file->f_count again because the gem
> >> object refers to the dma_buf. for this, you can refer to my test
> >> repository I mentioned above. but the problem is that when a buffer is
> >> released by one side, another can't know whether the buffer already
> >> was released or not.
> >
> > Nope, dma_buf_put should not be called by ->detach. The importer gets his
> > reference when importing the dma_buf and needs to drop that reference
> > himself when it's done using the buffer by calling dma_buf_put (i.e. after
> > the last ->detach call).
> 
> I'm afraid that there may be my missing points. I'm confusing. who is
> Importer and who is Exporter you think? Importer is fd goes to private
> buffer and Exporter is private buffer goes to fd? if so, yes, when the
> importer needs to drop that reference(the importer want to release
> that buffer), dma_buf_put() should be called somewhere and in my case,
> that function is called by drm_prime_gem_destory(). this function is
> included at Dave's patch sets and also dma_buf_detatch() is called
> there. and I just thought that here is right place. I didn't find the
> place dma_buf_put() is called anywhere. could you please tell me where
> dma_buf_put() should be called at you think?.
> 
> for this, you can refer to Dave's repository:
> http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

I haven't really looked at Dave's latest prime patches, but he reported
some reference counting issues last time around we chatted about it on
irc. So maybe you're just right and the dma_buf_put is indeed missing from
drm_prime_gem_destroy ;-) But as I've said, haven't really re

[PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Alex Deucher
On Mon, Jan 9, 2012 at 10:44 AM, Jerome Glisse  wrote:
> On Mon, Jan 09, 2012 at 09:31:16AM +0100, Daniel Vetter wrote:
>> On Sun, Jan 08, 2012 at 05:56:31PM -0500, Jerome Glisse wrote:
>> > On Sun, Jan 8, 2012 at 9:05 AM, Daniel Vetter  wrote:
>> > > Hi all,
>> > >
>> > > Meh, I've wanted to port the small set of helpers nouveau already has to
>> > > handle per open fd gpu virtual address spaces to core drm, so that I 
>> > > could
>> > > reuse them for i915. Just to go one small step towards unifying drivers 
>> > > in
>> > > drm/* a bit ...
>> > >
>> > > Looks like I'll have another driver to wrestle or just forget about it 
>> > > and
>> > > reinvet that wheel for i915, too.
>> > >
>> > > 
>> > >
>> > > Cheers, Daniel
>> > > --
>> >
>> > I looked at nouveau before writing this code, thing is, in the end
>> > there is little common code especialy when you take different path on
>> > how you handle things (persistent or dynamic page table for instance).
>> > Thought couple things can still be share. Note that the whole radeon
>> > code is designed with the possibility of having several address space
>> > per process, thought there is no use for such things today we believe
>> > things like opencl+opengl can benefit of each having their own address
>> > space.
>>
>> - I've realized when looking through nouveau that we likely can't share
>> ? match more than a gem_bo->vma lookup plus a bunch of helper functions.
>>
>> - Imo having more than one gpu virtual address space per fd doesn't make
>> ? much sense. libdrm (at least for i915) is mostly just about issueing
>> ? cmdbuffers. Hence it's probably easier to just open two fds and
>> ? instantiate two libdrm buffer managers if you want two address spaces
>> ? for otherwise you have to teach libdrm that the same buffer object still
>> ? can have different addresses (which is pretty much against the point of
>> ? gpu virtual address spaces).
>
> Radeon barely use libdrm (only the ddx use part of it).
>
>>
>> I also realize that in the dri1 days there's been way too much common code
>> that only gets used by one or two drivers and hence isn't really commonly
>> useable at all (and also not really of decent quality). So I'm all in
>> favour for driver-specific stuff, especially for execution and memory
>> management. But:
>>
>> - nouveau already has gpu virtual address spaces, radeon just grew them
>> ? with this patch and i915 is on track to get them, too: Patches to enable
>> ? the different hw addressing mode for Sandybridge and later are ready,
>> ? and with Ivybridge hw engineers kinked out the remaining bugs so we can
>> ? actually context-switch between different address spaces without hitting
>> ? hw bugs.
>>
>> - The more general picture is that with the advent of more general-purpose
>> ? apis and usecases for gpus like opencl (or also background video
>> ? encoding/decoding/transcoding with libva) users will want to control gpu
>> ? resources. So I expect that we'll grow resource limits, schedulers with
>> ? priorities and maybe also something like control groups in a few years.
>> ? But if we don't put a bit of thought into the commonalities of things
>> ? like gpu virtual address spaces, scheduling and similar things I fear we
>> ? won't be able to create a sensible common interface to allocate and
>> ? control resources in the feature. Which will result in a sub-par
>> ? experience.
>
> I wish we could come up with a common API for different GPU but the fact
> is all kernel API we exposed so far is specific to each GPU beside
> modesetting. Anythings that use the processing power of GPU use
> dedicated API.
>
> For instance radeon virtual address give control to userspace, ie userspace
> decide the virtual address space at which each bo will be. On the contrary
> nouveau do the allocation in the kernel. We did choose userspace to be in
> charge for few reasons (top of my head) :
> - allow to have 1:1 mapping btw cpu address space and gpu without having
> ?playing with process vm in the kernel
> - allow easy command stream replay (no need to rewrite the command
> ?stream because virtual address are different)
>
> Scheduling is next big things coming up, with lastest gen of gpu growing
> the capacity to preempt gpu task and also offering a finer control over
> the GPU cores. Again here i fear that we will each grow our own API.
>
> My believe is that as long as we expose a low level API to use the GPU
> there is no way we can provide a common API for things like scheduling
> or virtual address space. The way you submit command is tightly related
> to scheduling and virtual address stuff.
>
>> But if my google-fu doesn't fail me gpu address spaces for radeon was
>> posted the first time as v22 ever on a public list and merged right away,
>> so there's been simply no time to discuss cross-driver issues. ?Which is
>> part of why I'm slightly miffed ;-)
>>
>
> I too wish that we could have release this earlier. I guess it was included
> because we still

[ANNOUNCE] libdrm 2.4.30

2012-01-09 Thread Eric Anholt
On Sun, 08 Jan 2012 00:48:56 -0800, Jeremy Huddleston  
wrote:
> Well that's a ppc box, so maybe endianness issues... ?

Oh, that's probably it.  But I'd say it's more a bug that Intel libdrm
is being built on ppc.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20120109/9d5b6d44/attachment.pgp>


[PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 10:44:15AM -0500, Jerome Glisse wrote:
> On Mon, Jan 09, 2012 at 09:31:16AM +0100, Daniel Vetter wrote:
> > On Sun, Jan 08, 2012 at 05:56:31PM -0500, Jerome Glisse wrote:
> > > On Sun, Jan 8, 2012 at 9:05 AM, Daniel Vetter  wrote:
> > > > Hi all,
> > > >
> > > > Meh, I've wanted to port the small set of helpers nouveau already has to
> > > > handle per open fd gpu virtual address spaces to core drm, so that I 
> > > > could
> > > > reuse them for i915. Just to go one small step towards unifying drivers 
> > > > in
> > > > drm/* a bit ...
> > > >
> > > > Looks like I'll have another driver to wrestle or just forget about it 
> > > > and
> > > > reinvet that wheel for i915, too.
> > > >
> > > > 
> > > >
> > > > Cheers, Daniel
> > > > --
> > > 
> > > I looked at nouveau before writing this code, thing is, in the end
> > > there is little common code especialy when you take different path on
> > > how you handle things (persistent or dynamic page table for instance).
> > > Thought couple things can still be share. Note that the whole radeon
> > > code is designed with the possibility of having several address space
> > > per process, thought there is no use for such things today we believe
> > > things like opencl+opengl can benefit of each having their own address
> > > space.
> > 
> > - I've realized when looking through nouveau that we likely can't share
> >   match more than a gem_bo->vma lookup plus a bunch of helper functions.
> > 
> > - Imo having more than one gpu virtual address space per fd doesn't make
> >   much sense. libdrm (at least for i915) is mostly just about issueing
> >   cmdbuffers. Hence it's probably easier to just open two fds and
> >   instantiate two libdrm buffer managers if you want two address spaces
> >   for otherwise you have to teach libdrm that the same buffer object still
> >   can have different addresses (which is pretty much against the point of
> >   gpu virtual address spaces).
> 
> Radeon barely use libdrm (only the ddx use part of it).

Ok, that explains things a bit.

> > I also realize that in the dri1 days there's been way too much common code
> > that only gets used by one or two drivers and hence isn't really commonly
> > useable at all (and also not really of decent quality). So I'm all in
> > favour for driver-specific stuff, especially for execution and memory
> > management. But:
> > 
> > - nouveau already has gpu virtual address spaces, radeon just grew them
> >   with this patch and i915 is on track to get them, too: Patches to enable
> >   the different hw addressing mode for Sandybridge and later are ready,
> >   and with Ivybridge hw engineers kinked out the remaining bugs so we can
> >   actually context-switch between different address spaces without hitting
> >   hw bugs.
> > 
> > - The more general picture is that with the advent of more general-purpose
> >   apis and usecases for gpus like opencl (or also background video
> >   encoding/decoding/transcoding with libva) users will want to control gpu
> >   resources. So I expect that we'll grow resource limits, schedulers with
> >   priorities and maybe also something like control groups in a few years.
> >   But if we don't put a bit of thought into the commonalities of things
> >   like gpu virtual address spaces, scheduling and similar things I fear we
> >   won't be able to create a sensible common interface to allocate and
> >   control resources in the feature. Which will result in a sub-par
> >   experience. 
> 
> I wish we could come up with a common API for different GPU but the fact
> is all kernel API we exposed so far is specific to each GPU beside
> modesetting. Anythings that use the processing power of GPU use
> dedicated API.
> 
> For instance radeon virtual address give control to userspace, ie userspace
> decide the virtual address space at which each bo will be. On the contrary
> nouveau do the allocation in the kernel. We did choose userspace to be in
> charge for few reasons (top of my head) :
> - allow to have 1:1 mapping btw cpu address space and gpu without having
>   playing with process vm in the kernel
> - allow easy command stream replay (no need to rewrite the command
>   stream because virtual address are different)

This stuff here is awesome and I've really missed it from the changelog. I
think the value in having a discussion across drivers isn't always in
coming up with a common interface, but sometimes just in learning each
anothers design tradeoffs.

> Scheduling is next big things coming up, with lastest gen of gpu growing
> the capacity to preempt gpu task and also offering a finer control over
> the GPU cores. Again here i fear that we will each grow our own API.
> 
> My believe is that as long as we expose a low level API to use the GPU
> there is no way we can provide a common API for things like scheduling
> or virtual address space. The way you submit command is tightly related
> to scheduling and virtual address stuff.

Well I think 

[PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 11:07:44AM -0500, Alex Deucher wrote:
> We would have liked to as well, but with the holidays and internal
> review process and people being on vacation, and the kernel merge
> window timing it didn't work out this time.  Still as Jerome said, our
> VM setup is tightly coupled with the way our HW works.  I'm not sure
> how much code could actually be shared.

Again to clarify, I'm not so much interesting in sharing code here - I
know it'll be nigh to impossible by having looked through nouveau already
and quickly glossed through your stuff. I see the value is just in having
a minimal set of common idioms that we can use to build cool stuff like
resource management.

Think of gem, which is actually about next to nothing. There's a the flink
stuff and a common close ioctl and recently also a dummy create unusable
for anythine else than boot spalsh screens. Plus the shmem backing store
which seems to be not so much in favour with embedded chips, i.e doesn't
really count. Ridiculous compared to all the other stuff required to draw
triangles on a gpu, but good enough to make things like dri2 and wayland
possible.

Cheers, daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


TTM and AGP conflicts

2012-01-09 Thread Konrad Rzeszutek Wilk
On Mon, Jan 09, 2012 at 10:07:02AM +0100, Thomas Hellstrom wrote:
> On 01/06/2012 04:51 PM, James Simmons wrote:
> You can achieve what you want by either adding a new domain so you would 
> have
> system, vram, agp, pcidma and object can be bound to one and only one. Or 
> you
> can hack your own agp ttm backend that could bind bo to agp or pci or
> both at the same time depending on what you want to achieve.
> >>>The question is how does one know which domain you want in tt_create.
> >>>Currenty drivers are using there dev_priv but if you have have more than
> >>>one option available how does one choose? Would you be okay with passing
> >>>in a domain flag?
> >>>
> >>Well i agree that something would be usefull there so the driver know
> >>which bind/unbind function it should use. Thomas i would prefer
> >>passing the bo to the tt_create callback but a flag is the minimum we
> >>need.
> >We can discuss this after the merge widow. Jerome your patch does fix a
> >regression whereas my proposal is a enhancement.
> 
> ..Back from parental leave.

Congratulations!
> 
> I'm not 100% sure I understand the problem correctly, but I assume
> the problem is that
> you receive a "bind" request for pages allocated out of the wrong
> DMA pool? Is that correct?
> 
> In that case we need to make both the following happen:
> 1) The backend should be required to supply a fallback that can bind
> anyway by allocating the correct page type and copy.
> 2) I'm OK with providing a hint to tt_create. IIRC we're passing
> bo::mem to tt_bind. Would it be ok to pass the same to tt_create?
> 
> /Thomas
> 
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Alex Deucher
On Mon, Jan 9, 2012 at 11:49 AM, Daniel Vetter  wrote:
> On Mon, Jan 09, 2012 at 11:07:44AM -0500, Alex Deucher wrote:
>> We would have liked to as well, but with the holidays and internal
>> review process and people being on vacation, and the kernel merge
>> window timing it didn't work out this time. ?Still as Jerome said, our
>> VM setup is tightly coupled with the way our HW works. ?I'm not sure
>> how much code could actually be shared.
>
> Again to clarify, I'm not so much interesting in sharing code here - I
> know it'll be nigh to impossible by having looked through nouveau already
> and quickly glossed through your stuff. I see the value is just in having
> a minimal set of common idioms that we can use to build cool stuff like
> resource management.
>
> Think of gem, which is actually about next to nothing. There's a the flink
> stuff and a common close ioctl and recently also a dummy create unusable
> for anythine else than boot spalsh screens. Plus the shmem backing store
> which seems to be not so much in favour with embedded chips, i.e doesn't
> really count. Ridiculous compared to all the other stuff required to draw
> triangles on a gpu, but good enough to make things like dri2 and wayland
> possible.
>

I generally agree here, but I don't think this patch prevents
something like that in the future.  Also, I think we need to actually
use GPU VM for a bit and see how everyone's implementations and plans
shake out before we can decide which idioms are common.

Alex

> Cheers, daniel
> --
> Daniel Vetter
> Mail: daniel at ffwll.ch
> Mobile: +41 (0)79 365 57 48


[PATCH 1/3] drm/radeon: GPU virtual memory support v22

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 11:56:20AM -0500, Alex Deucher wrote:
> On Mon, Jan 9, 2012 at 11:49 AM, Daniel Vetter  wrote:
> > On Mon, Jan 09, 2012 at 11:07:44AM -0500, Alex Deucher wrote:
> >> We would have liked to as well, but with the holidays and internal
> >> review process and people being on vacation, and the kernel merge
> >> window timing it didn't work out this time. ?Still as Jerome said, our
> >> VM setup is tightly coupled with the way our HW works. ?I'm not sure
> >> how much code could actually be shared.
> >
> > Again to clarify, I'm not so much interesting in sharing code here - I
> > know it'll be nigh to impossible by having looked through nouveau already
> > and quickly glossed through your stuff. I see the value is just in having
> > a minimal set of common idioms that we can use to build cool stuff like
> > resource management.
> >
> > Think of gem, which is actually about next to nothing. There's a the flink
> > stuff and a common close ioctl and recently also a dummy create unusable
> > for anythine else than boot spalsh screens. Plus the shmem backing store
> > which seems to be not so much in favour with embedded chips, i.e doesn't
> > really count. Ridiculous compared to all the other stuff required to draw
> > triangles on a gpu, but good enough to make things like dri2 and wayland
> > possible.
> >
> 
> I generally agree here, but I don't think this patch prevents
> something like that in the future.  Also, I think we need to actually
> use GPU VM for a bit and see how everyone's implementations and plans
> shake out before we can decide which idioms are common.

Agreed. My sligh miff has only really ben that this got posted only
recently and there was no chance to have this quick discussion before it
got merged.

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Rob Clark
On Mon, Jan 9, 2012 at 7:34 PM, InKi Dae  wrote:
> 2012/1/10 Rob Clark :
>> On Mon, Jan 9, 2012 at 4:10 AM, InKi Dae  wrote:
>>> note : in case of sharing a buffer between v4l2 and drm driver, the
>>> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
>>> vb2_xx_buf through sg table. in this case, only memory info is used to
>>> share, not some objects.
>>
>> which v4l2/vb2 patches are you looking at? ?The patches I was using,
>> vb2 holds a reference to the 'struct dma_buf *' internally, not just
>> keeping the sg_table
>>
>
> yes, not keeping the sg_table. I mean... see a example below please.
>
> static void vb2_dma_contig_map_dmabuf(void *mem_priv)
> {
> ? ?struct sg_table *sg;
> ? ? ...
> ? ? sg = dma_buf_map_attachment(buf->db_attach, dir);
> ? ? ...
> ? ? buf->dma_addr = sg_dma_address(sg->sgl);
> ? ? ...
> }
>
> at least with no IOMMU, the memory information(containing physical
> memory address) would be copied to vb2_xx_buf object if drm gem
> exported its own buffer and vb2 wants to use that buffer at this time,
> sg table is used to share that buffer. and the problem I pointed out
> is that this buffer(also physical memory region) could be released by
> vb2 framework(as you know, vb2_xx_buf object and the memory region for
> buf->dma_addr pointing) but the Exporter(drm gem) couldn't know that
> so some problems would be induced once drm gem tries to release or
> access that buffer. and I have tried to resolve this issue adding
> get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
> good way. maybe there would be better way.

the exporter (in this case your driver's drm/gem bits) shouldn't
release that mapping / sgtable until the importer (in this case v4l2)
calls dma_buf_unmap fxn..

It would be an error if the importer did a dma_buf_put() without first
calling dma_buf_unmap_attachment() (if currently mapped) and then
dma_buf_detach() (if currently attached).  Perhaps somewhere there
should be some sanity checking debug code which could be enabled to do
a WARN_ON() if the importer does the wrong thing.  It shouldn't really
be part of the API, I don't think, but it actually does seem like a
good thing, esp. as new drivers start trying to use dmabuf, to have
some debug options which could be enabled.

It is entirely possible that something was missed on the vb2 patches,
but the way it is intended to work is like this:
https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-core.c#L92

where it does a detach() before the dma_buf_put(), and the vb2-contig
backend checks here that it is also unmapped():
https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-dma-contig.c#L251

BR,
-R

> Thanks.
>
>> BR,
>> -R
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel