from:"Ilija Hadzic"

Breakage in "track dev_mapping in more robust and flexible way"

2012-10-25 Thread Ilija Hadzic

On Thu, Oct 25, 2012 at 11:10 AM, Thomas Hellstr?m
 wrote:
> On 10/25/12 4:41 PM, Jerome Glisse wrote:
>>
>> On Thu, Oct 25, 2012 at 04:02:25PM +0200, Thomas Hellstrom wrote:
>>>
>>> Hi,
>>>
>>> This commit
>>>
>>>  From 949c4a34afacfe800fc442afac117aba15284962 Mon Sep 17 00:00:00 2001
>>> From: Ilija Hadzic 
>>> Date: Tue, 15 May 2012 16:40:10 -0400
>>> Subject: [PATCH] drm: track dev_mapping in more robust and flexible way
>>>
>>> Setting dev_mapping (pointer to the address_space structure
>>> used for memory mappings) to the address_space of the first
>>> opener's inode and then failing if other openers come in
>>> through a different inode has a few restrictions that are
>>> eliminated by this patch.
>>>
>>> If we already have valid dev_mapping and we spot an opener
>>> with different i_node, we force its i_mapping pointer to the
>>> already established address_space structure (first opener's
>>> inode). This will make all mappings from drm device hang off
>>> the same address_space object.
>>> ...
>>>
>>> Breaks drivers using TTM, since when the X server calls into the
>>> driver open, drm's dev_mapping has not
>>> yet been setup. The setup needs to be moved before the driver's open
>>> hook is called.
>>>
>>> Typically, if a TTM-aware driver is provoked by the Xorg server to
>>> move a buffer from system to VRAM or AGP,
>>> before any other drm client is started, The user-space page table
>>> entries are not killed before the move, and left pointing
>>> into freed pages, causing system crashes and / or user-space access
>>> to arbitrary memory.
>>
>> Doesn't handle move invalidate the drm file mapping before scheduling
>> the move ?
>
> Yes, but to do that it needs a correct value of bdev::dev_mapping, which is
> now incorrectly set on the
> *second* open instead of the first open.
>

So you are implying that in the first open the assignment of dev->dev_mapping is
somehow skipped (which could happen if drm_setup returns an error) or that the
driver on which you are having problems with (nouveau I presume) needs
dev_mapping
in the firstopen hook ?

It's been a while since I did it, but if my memory serves me well I
thought I explicitly
verified that dev_mapping was correctly set in the first open (though
GPUs I use are
primarily AMD).

-- Ilija


> /Thomas
>
>
>
>> Cheers,
>> Jerome
>
>
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

Breakage in "track dev_mapping in more robust and flexible way"

2012-10-25 Thread Ilija Hadzic


Or could the driver that causes the problem be vmwgfx ? I now looked at 
the code and I notice that indeed vmwgfx sets its private dev_mapping in 
the open hook, while all other TTM-based drivers (radeon and nouveau) do 
it when they create an object.

-- Ilija

Breakage in "track dev_mapping in more robust and flexible way"

2012-10-25 Thread Ilija Hadzic


Can you give the attached patch a whirl and let me know if it fixes the 
problem?

As I indicated in my previous note, vmwgfx should be the only affected 
driver because it looks at dev_mapping in the open hook (others do it when 
they create an object, so they are not affected).

I'll probably revise it (and I'll have some general questions about 
drm_open syscall) before officially send the patch, but I wanted to get 
something quickly to you to check if it fixes your problem. I hope that 
your vmwgfx test environment is such that you can reproduce the original
problem.

thanks,

-- Ilija
-- next part --

[PATCH] drm: set dev_mapping before calling drm_open_helper

2012-10-25 Thread Ilija Hadzic

Some drivers (specifically vmwgfx) look at dev_mapping
in their open hook, so we have to set dev->dev_mapping
earlier in the process.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_fops.c |   43 +--
 1 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index 7ef1b67..50b7b47 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -121,6 +121,8 @@ int drm_open(struct inode *inode, struct file *filp)
int minor_id = iminor(inode);
struct drm_minor *minor;
int retcode = 0;
+   int need_setup = 0;
+   struct address_space *old_mapping;

minor = idr_find(&drm_minors_idr, minor_id);
if (!minor)
@@ -132,23 +134,36 @@ int drm_open(struct inode *inode, struct file *filp)
if (drm_device_is_unplugged(dev))
return -ENODEV;

+   if (!dev->open_count++)
+   need_setup = 1;
+   mutex_lock(&dev->struct_mutex);
+   old_mapping = dev->dev_mapping;
+   if (old_mapping == NULL)
+   dev->dev_mapping = &inode->i_data;
+   mutex_unlock(&dev->struct_mutex);
+
retcode = drm_open_helper(inode, filp, dev);
-   if (!retcode) {
-   atomic_inc(&dev->counts[_DRM_STAT_OPENS]);
-   if (!dev->open_count++)
-   retcode = drm_setup(dev);
-   }
-   if (!retcode) {
-   mutex_lock(&dev->struct_mutex);
-   if (dev->dev_mapping == NULL)
-   dev->dev_mapping = &inode->i_data;
-   /* ihold ensures nobody can remove inode with our i_data */
-   ihold(container_of(dev->dev_mapping, struct inode, i_data));
-   inode->i_mapping = dev->dev_mapping;
-   filp->f_mapping = dev->dev_mapping;
-   mutex_unlock(&dev->struct_mutex);
+   if (retcode)
+   goto err_undo;
+   atomic_inc(&dev->counts[_DRM_STAT_OPENS]);
+   if (need_setup) {
+   retcode = drm_setup(dev);
+   if (retcode)
+   goto err_undo;
}
+   /* ihold ensures nobody can remove inode with our i_data */
+   mutex_lock(&dev->struct_mutex);
+   ihold(container_of(dev->dev_mapping, struct inode, i_data));
+   inode->i_mapping = dev->dev_mapping;
+   filp->f_mapping = dev->dev_mapping;
+   mutex_unlock(&dev->struct_mutex);
+   return 0;

+err_undo:
+   dev->open_count--;
+   mutex_lock(&dev->struct_mutex);
+   dev->dev_mapping = old_mapping;
+   mutex_unlock(&dev->struct_mutex);
return retcode;
 }
 EXPORT_SYMBOL(drm_open);
-- 
1.7.4.1

Breakage in "track dev_mapping in more robust and flexible way"

2012-10-26 Thread Ilija Hadzic

On Fri, 26 Oct 2012, Thomas Hellstrom wrote:

> Hi,
>
> On 10/25/2012 11:27 PM, Ilija Hadzic wrote:
>> 
>> Can you give the attached patch a whirl and let me know if it fixes the 
>> problem?
>> 
>> As I indicated in my previous note, vmwgfx should be the only affected 
>> driver because it looks at dev_mapping in the open hook (others do it when 
>> they create an object, so they are not affected).
>> 
>> I'll probably revise it (and I'll have some general questions about 
>> drm_open syscall) before officially send the patch, but I wanted to get 
>> something quickly to you to check if it fixes your problem. I hope that 
>> your vmwgfx test environment is such that you can reproduce the original
>> problem.
>> 
>> thanks,
>> 
>> -- Ilija
>
> Yes, it appears like this patch fixes the problem. It'd be good to have it in 
> 3.7 (drm-fixes) with a cc to stable.
>

OK great. Thanks for testing. Before I send out an "official" patch, I 
have a few questions for those who have been around longer and can 
possibly reflect better than me on the history of drm_open syscall.

Currently, before touching dev->dev_mapping field we grab dev->struct 
mutex. This has been introduced by Dave Airlie a long time ago in 
a2c0a97b784f837300f7b0869c82ab712c600952. I tried to preserve that in all 
patches where I touched dev_open, but looking at the code I don't think 
the mutex is necessary. Namely, drm_open is only set in drm_open, and 
concurrent openers are protected with drm_global_mutex. Other places 
(drivers) where dev->dev_mapping is accessed is read-only and dev_mapping 
is written at first open when there are no file descriptors around to 
issue any other call. Also, it doesn't look to me that any driver locks 
dev->struct_mutex before accessing dev->dev_mapping anyway. So I am 
thinking of dropping the mutex completely, but I would like to hear a 
second thought.

The other issue, I noticed is that of the drm_setup() call fails, the 
open_count counter would remain incremented and I think we need to restore 
it back (or if I am missing something, would someone please enlighten me). 
This was also in the kernel all this time (and I have not noticed until 
now), so I "smuggled" that fix in the patch that I sent you. However, 
wonder if I should cut the separate patch for open_count fix.

Actually, I think that I should cut three patches: one to drop the mutex, 
one to fix the open_count and one to fix your problem with dev_mapping and 
that probably all three should CC stable. Before I do that, I'd like to 
hear opinions of others.

thanks,

Ilija

[PATCH 1/2] drm: restore open_count if drm_setup fails

2012-10-29 Thread Ilija Hadzic

If drm_setup (called at first open) fails, the whole
open call has failed, so we should not keep the
open_count incremented.

Signed-off-by: Ilija Hadzic 
Cc: stable at vger.kernel.org
---
 drivers/gpu/drm/drm_fops.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index 7ef1b67..af68eca 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -135,8 +135,11 @@ int drm_open(struct inode *inode, struct file *filp)
retcode = drm_open_helper(inode, filp, dev);
if (!retcode) {
atomic_inc(&dev->counts[_DRM_STAT_OPENS]);
-   if (!dev->open_count++)
+   if (!dev->open_count++) {
retcode = drm_setup(dev);
+   if (retcode)
+   dev->open_count--;
+   }
}
if (!retcode) {
mutex_lock(&dev->struct_mutex);
-- 
1.7.12.4

[PATCH 2/2] drm: set dev_mapping before calling drm_open_helper

2012-10-29 Thread Ilija Hadzic

Some drivers (specifically vmwgfx) look at dev_mapping
in their open hook, so we have to set dev->dev_mapping
earlier in the process.

Reference:
http://lists.freedesktop.org/archives/dri-devel/2012-October/029420.html

Signed-off-by: Ilija Hadzic 
Reported-by: Thomas Hellstrom 
Cc: stable at vger.kernel.org
---
 drivers/gpu/drm/drm_fops.c | 47 +-
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index af68eca..133b413 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -121,6 +121,8 @@ int drm_open(struct inode *inode, struct file *filp)
int minor_id = iminor(inode);
struct drm_minor *minor;
int retcode = 0;
+   int need_setup = 0;
+   struct address_space *old_mapping;

minor = idr_find(&drm_minors_idr, minor_id);
if (!minor)
@@ -132,26 +134,37 @@ int drm_open(struct inode *inode, struct file *filp)
if (drm_device_is_unplugged(dev))
return -ENODEV;

+   if (!dev->open_count++)
+   need_setup = 1;
+   mutex_lock(&dev->struct_mutex);
+   old_mapping = dev->dev_mapping;
+   if (old_mapping == NULL)
+   dev->dev_mapping = &inode->i_data;
+   /* ihold ensures nobody can remove inode with our i_data */
+   ihold(container_of(dev->dev_mapping, struct inode, i_data));
+   inode->i_mapping = dev->dev_mapping;
+   filp->f_mapping = dev->dev_mapping;
+   mutex_unlock(&dev->struct_mutex);
+
retcode = drm_open_helper(inode, filp, dev);
-   if (!retcode) {
-   atomic_inc(&dev->counts[_DRM_STAT_OPENS]);
-   if (!dev->open_count++) {
-   retcode = drm_setup(dev);
-   if (retcode)
-   dev->open_count--;
-   }
-   }
-   if (!retcode) {
-   mutex_lock(&dev->struct_mutex);
-   if (dev->dev_mapping == NULL)
-   dev->dev_mapping = &inode->i_data;
-   /* ihold ensures nobody can remove inode with our i_data */
-   ihold(container_of(dev->dev_mapping, struct inode, i_data));
-   inode->i_mapping = dev->dev_mapping;
-   filp->f_mapping = dev->dev_mapping;
-   mutex_unlock(&dev->struct_mutex);
+   if (retcode)
+   goto err_undo;
+   atomic_inc(&dev->counts[_DRM_STAT_OPENS]);
+   if (need_setup) {
+   retcode = drm_setup(dev);
+   if (retcode)
+   goto err_undo;
}
+   return 0;

+err_undo:
+   mutex_lock(&dev->struct_mutex);
+   filp->f_mapping = old_mapping;
+   inode->i_mapping = old_mapping;
+   iput(container_of(dev->dev_mapping, struct inode, i_data));
+   dev->dev_mapping = old_mapping;
+   mutex_unlock(&dev->struct_mutex);
+   dev->open_count--;
return retcode;
 }
 EXPORT_SYMBOL(drm_open);
-- 
1.7.12.4

glxgears frame rate when DPMS is in "off" state

2012-12-10 Thread Ilija Hadzic



Hi everyone,

With relatively recent versions of AMD/ATI DDX (xf86-video-ati library), I 
have noticed a behavior related to DPMS that looks incorrect to me.


Namely, if I run glxgears, the reported frame rate is equal to that of the 
monitor refresh rate, which is correct. Now if I enter DPMS "off" state, 
wait a while, and then exit it (back to DPMS "on"), I see that while in 
"off" mode the frame rate was up in the thousands. Consequently, the CPU 
utilization went up to 100% (split about 50%/50% between X and and 
glxgears process).


I have traced the problem to DDX and here are some findings. Now, thinking 
about how to fix it (elaborated later), I realize that I am not sure what 
would be the conceptually correct thing to do to fix this.


Here is how the problem happens:

* Screen is put into DPMS "off" mode.

* The application requests the buffer-swap and X eventually ends up in 
radeon_dri2_schedule_swap.


* radeon_dri2_schedule_swap tries to determine the CRTC by calling 
radeon_dri2_drawable_crtc which further leads into radeon_pick_best_crtc


* In radeon_pick_best_crtc, no CRTC is found because CRTCs are either 
unused by the affected drawable or the only right candidate CRTC is 
skipped by this check (radeon_crtc_is_enabled looks explicitly at DPMS 
state):


if (!radeon_crtc_is_enabled(crtc))
continue;

* Consequently, radeon_pick_best_crtc returns -1 to 
radeon_dri2_schedule_swap, which decides that it can't do the vblank

wait and jumps to blit_fallback label.

* blit_fallback does its thing, achieving the effect of swap, but now 
there is no pacing. It returns immediatelly and application proceeds with 
rendering the next frame without any pause.


* As a consequence, we get a glxgears and X to run at maximum speed 
allowed by the CPU and GPU combined.


Now, the reason DPMS exists is to conserve power, but it doesn't make much 
sense to conserve power through monitor shutoff if we will eat up much 
more power by thrashing the processor and the GPU.


One quick fix that came into my mind is to replace the 'if' in 
radeon_pick_best_crtc with something like this:


if (!crtc->enabled)
continue;

(whether by design or by accident, crtc in DPMS "off" state is still 
enabled as far as that flag is concerned). However, that will introduce 
the regression with regard to this bug:


https://bugs.freedesktop.org/show_bug.cgi?id=49761

(which is the reason the above check was originally added).

Another possibility would be to enforce some maximum rate per-drawable 
(using sleep for example) when radeon_dri2_schedule_swap decides to take 
the blit_fallback path. However, I don't personally like it and I have a 
gut feeling that sleeping in shedule_swap would probably do harm somewhere 
else. Also, there may be applications that would want to render an 
animated scene off-screen at maximum speed (e.g., off-line rendering) and

radeon_dri2_schedule_swap has no way of telling whether crtc is -1 because
the application wants it that way or because the associated crtc is in
power-savings mode.

Clearly, the behavior that we have now is wrong from the power-savings 
perspective (i.e., it completely defeats the purpose of DPMS), but before 
I try to hack up the fix, I would like to hear some suggestions on what 
the "right" thing to do would be.


thanks,

Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

radeon CS parser refactoring

2013-01-02 Thread Ilija Hadzic

The following set of patches refactor the CS-parser logic
in an effort to consolidate the code that is repeated across
ASIC-specific files. All patches except #8 are function-neutral,
that is they preserve the existing functionality of the driver.
Patch #8 adds one extra check for WAIT_REG_MEM packet that I
believe should be there.

I have been running with these patches for about a month on several
machines with Evergreen and NI GPUs without noticing any regressions.
I also ran quick tests on all Radeon GPUs that I had around, which
range from R300 through NI cards.

The patches have been rebased to the current state of Dave's drm-next
branch.

regards,

Ilija



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 01/12] drm/radeon: remove unecessary assignment

2013-01-02 Thread Ilija Hadzic

length_dw field was assigned twice. While at it, move user_ptr
assignment together with all other assignments to p->chunks[i]
structure to make the code more readable.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon_cs.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 396baba0..4be64e0 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -203,7 +203,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void 
*data)
p->chunks[i].length_dw = user_chunk.length_dw;
p->chunks[i].kdata = NULL;
p->chunks[i].chunk_id = user_chunk.chunk_id;
-
+   p->chunks[i].user_ptr = (void __user *)(unsigned 
long)user_chunk.chunk_data;
if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) {
p->chunk_relocs_idx = i;
}
@@ -226,9 +226,6 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void 
*data)
return -EINVAL;
}
 
-   p->chunks[i].length_dw = user_chunk.length_dw;
-   p->chunks[i].user_ptr = (void __user *)(unsigned 
long)user_chunk.chunk_data;
-
cdata = (uint32_t *)(unsigned long)user_chunk.chunk_data;
if ((p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) ||
(p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS)) {
-- 
1.7.12

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 02/12] drm/radeon: remove unused prototype from radeon_cs

2013-01-02 Thread Ilija Hadzic

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon_cs.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 4be64e0..8b03f1c 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -29,9 +29,6 @@
 #include "radeon_reg.h"
 #include "radeon.h"
 
-void r100_cs_dump_packet(struct radeon_cs_parser *p,
-struct radeon_cs_packet *pkt);
-
 static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 {
struct drm_device *ddev = p->rdev->ddev;
-- 
1.7.12

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 03/12] drm/radeon: fix formatting

2013-01-02 Thread Ilija Hadzic

Preparatory patch: patches to follow will touch a piece of code
that had broken indentication, so fix it before touching it.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/r100.c | 45 +--
 1 file changed, 22 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 8ff7cac..40c0318 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -2106,31 +2106,30 @@ int r100_cs_parse(struct radeon_cs_parser *p)
}
p->idx += pkt.count + 2;
switch (pkt.type) {
-   case PACKET_TYPE0:
-   if (p->rdev->family >= CHIP_R200)
-   r = r100_cs_parse_packet0(p, &pkt,
- 
p->rdev->config.r100.reg_safe_bm,
- 
p->rdev->config.r100.reg_safe_bm_size,
- 
&r200_packet0_check);
-   else
-   r = r100_cs_parse_packet0(p, &pkt,
- 
p->rdev->config.r100.reg_safe_bm,
- 
p->rdev->config.r100.reg_safe_bm_size,
- 
&r100_packet0_check);
-   break;
-   case PACKET_TYPE2:
-   break;
-   case PACKET_TYPE3:
-   r = r100_packet3_check(p, &pkt);
-   break;
-   default:
-   DRM_ERROR("Unknown packet type %d !\n",
- pkt.type);
-   return -EINVAL;
+   case PACKET_TYPE0:
+   if (p->rdev->family >= CHIP_R200)
+   r = r100_cs_parse_packet0(p, &pkt,
+   p->rdev->config.r100.reg_safe_bm,
+   p->rdev->config.r100.reg_safe_bm_size,
+   &r200_packet0_check);
+   else
+   r = r100_cs_parse_packet0(p, &pkt,
+   p->rdev->config.r100.reg_safe_bm,
+   p->rdev->config.r100.reg_safe_bm_size,
+   &r100_packet0_check);
+   break;
+   case PACKET_TYPE2:
+   break;
+   case PACKET_TYPE3:
+   r = r100_packet3_check(p, &pkt);
+   break;
+   default:
+   DRM_ERROR("Unknown packet type %d !\n",
+ pkt.type);
+   return -EINVAL;
}
-   if (r) {
+   if (r)
return r;
-   }
} while (p->idx < p->chunks[p->chunk_ib_idx].length_dw);
return 0;
 }
-- 
1.7.12

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 04/12] drm/radeon: implement common cs packet parse function

2013-01-02 Thread Ilija Hadzic

CS packet parse functions have a lot of in common across
all ASICs. Implement a common function and take care of
small differences between families inside the function.

This patch is a prep for major refactoring and consolidation
of CS parsing code.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon_cs.c  | 53 +
 drivers/gpu/drm/radeon/radeon_reg.h | 11 
 2 files changed, 64 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 8b03f1c..95eb673 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -639,3 +639,56 @@ u32 radeon_get_ib_value(struct radeon_cs_parser *p, int 
idx)
idx_value = ibc->kpage[new_page][pg_offset/4];
return idx_value;
 }
+
+/**
+ * radeon_cs_packet_parse() - parse cp packet and point ib index to next packet
+ * @parser:parser structure holding parsing context.
+ * @pkt:   where to store packet information
+ *
+ * Assume that chunk_ib_index is properly set. Will return -EINVAL
+ * if packet is bigger than remaining ib size. or if packets is unknown.
+ **/
+int radeon_cs_packet_parse(struct radeon_cs_parser *p,
+  struct radeon_cs_packet *pkt,
+  unsigned idx)
+{
+   struct radeon_cs_chunk *ib_chunk = &p->chunks[p->chunk_ib_idx];
+   struct radeon_device *rdev = p->rdev;
+   uint32_t header;
+
+   if (idx >= ib_chunk->length_dw) {
+   DRM_ERROR("Can not parse packet at %d after CS end %d !\n",
+ idx, ib_chunk->length_dw);
+   return -EINVAL;
+   }
+   header = radeon_get_ib_value(p, idx);
+   pkt->idx = idx;
+   pkt->type = RADEON_CP_PACKET_GET_TYPE(header);
+   pkt->count = RADEON_CP_PACKET_GET_COUNT(header);
+   pkt->one_reg_wr = 0;
+   switch (pkt->type) {
+   case RADEON_PACKET_TYPE0:
+   if (rdev->family < CHIP_R600) {
+   pkt->reg = R100_CP_PACKET0_GET_REG(header);
+   pkt->one_reg_wr =
+   RADEON_CP_PACKET0_GET_ONE_REG_WR(header);
+   } else
+   pkt->reg = R600_CP_PACKET0_GET_REG(header);
+   break;
+   case RADEON_PACKET_TYPE3:
+   pkt->opcode = RADEON_CP_PACKET3_GET_OPCODE(header);
+   break;
+   case RADEON_PACKET_TYPE2:
+   pkt->count = -1;
+   break;
+   default:
+   DRM_ERROR("Unknown packet type %d at %d !\n", pkt->type, idx);
+   return -EINVAL;
+   }
+   if ((pkt->count + 1 + pkt->idx) >= ib_chunk->length_dw) {
+   DRM_ERROR("Packet (%d:%d:%d) end after CS buffer (%d) !\n",
+ pkt->idx, pkt->type, pkt->count, ib_chunk->length_dw);
+   return -EINVAL;
+   }
+   return 0;
+}
diff --git a/drivers/gpu/drm/radeon/radeon_reg.h 
b/drivers/gpu/drm/radeon/radeon_reg.h
index 5d8f735..d9e4204 100644
--- a/drivers/gpu/drm/radeon/radeon_reg.h
+++ b/drivers/gpu/drm/radeon/radeon_reg.h
@@ -3706,4 +3706,15 @@
 
 #define RV530_GB_PIPE_SELECT2   0x4124
 
+#define RADEON_CP_PACKET_GET_TYPE(h) (((h) >> 30) & 3)
+#define RADEON_CP_PACKET_GET_COUNT(h) (((h) >> 16) & 0x3FFF)
+#define RADEON_CP_PACKET0_GET_ONE_REG_WR(h) (((h) >> 15) & 1)
+#define RADEON_CP_PACKET3_GET_OPCODE(h) (((h) >> 8) & 0xFF)
+#define R100_CP_PACKET0_GET_REG(h) (((h) & 0x1FFF) << 2)
+#define R600_CP_PACKET0_GET_REG(h) (((h) & 0x) << 2)
+#define RADEON_PACKET_TYPE0 0
+#define RADEON_PACKET_TYPE1 1
+#define RADEON_PACKET_TYPE2 2
+#define RADEON_PACKET_TYPE3 3
+
 #endif
-- 
1.7.12

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 05/12] drm/radeon: use common cs packet parse function

2013-01-02 Thread Ilija Hadzic

We now have a common radeon_cs_packet_parse function
that is good for all ASICs. Hook it up and eliminate
ASIC-specific versions.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/evergreen_cs.c | 57 +++--
 drivers/gpu/drm/radeon/r100.c | 55 +++-
 drivers/gpu/drm/radeon/r300.c |  2 +-
 drivers/gpu/drm/radeon/r600_cs.c  | 59 ---
 drivers/gpu/drm/radeon/radeon.h   |  4 +++
 5 files changed, 20 insertions(+), 157 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 7a44566..1ba4ca3 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -1009,53 +1009,6 @@ static int evergreen_cs_track_check(struct 
radeon_cs_parser *p)
 }
 
 /**
- * evergreen_cs_packet_parse() - parse cp packet and point ib index to next 
packet
- * @parser:parser structure holding parsing context.
- * @pkt:   where to store packet informations
- *
- * Assume that chunk_ib_index is properly set. Will return -EINVAL
- * if packet is bigger than remaining ib size. or if packets is unknown.
- **/
-static int evergreen_cs_packet_parse(struct radeon_cs_parser *p,
- struct radeon_cs_packet *pkt,
- unsigned idx)
-{
-   struct radeon_cs_chunk *ib_chunk = &p->chunks[p->chunk_ib_idx];
-   uint32_t header;
-
-   if (idx >= ib_chunk->length_dw) {
-   DRM_ERROR("Can not parse packet at %d after CS end %d !\n",
- idx, ib_chunk->length_dw);
-   return -EINVAL;
-   }
-   header = radeon_get_ib_value(p, idx);
-   pkt->idx = idx;
-   pkt->type = CP_PACKET_GET_TYPE(header);
-   pkt->count = CP_PACKET_GET_COUNT(header);
-   pkt->one_reg_wr = 0;
-   switch (pkt->type) {
-   case PACKET_TYPE0:
-   pkt->reg = CP_PACKET0_GET_REG(header);
-   break;
-   case PACKET_TYPE3:
-   pkt->opcode = CP_PACKET3_GET_OPCODE(header);
-   break;
-   case PACKET_TYPE2:
-   pkt->count = -1;
-   break;
-   default:
-   DRM_ERROR("Unknown packet type %d at %d !\n", pkt->type, idx);
-   return -EINVAL;
-   }
-   if ((pkt->count + 1 + pkt->idx) >= ib_chunk->length_dw) {
-   DRM_ERROR("Packet (%d:%d:%d) end after CS buffer (%d) !\n",
- pkt->idx, pkt->type, pkt->count, ib_chunk->length_dw);
-   return -EINVAL;
-   }
-   return 0;
-}
-
-/**
  * evergreen_cs_packet_next_reloc() - parse next packet which should be reloc 
packet3
  * @parser:parser structure holding parsing context.
  * @data:  pointer to relocation data
@@ -1080,7 +1033,7 @@ static int evergreen_cs_packet_next_reloc(struct 
radeon_cs_parser *p,
}
*cs_reloc = NULL;
relocs_chunk = &p->chunks[p->chunk_relocs_idx];
-   r = evergreen_cs_packet_parse(p, &p3reloc, p->idx);
+   r = radeon_cs_packet_parse(p, &p3reloc, p->idx);
if (r) {
return r;
}
@@ -1112,7 +1065,7 @@ static bool evergreen_cs_packet_next_is_pkt3_nop(struct 
radeon_cs_parser *p)
struct radeon_cs_packet p3reloc;
int r;
 
-   r = evergreen_cs_packet_parse(p, &p3reloc, p->idx);
+   r = radeon_cs_packet_parse(p, &p3reloc, p->idx);
if (r) {
return false;
}
@@ -1150,7 +1103,7 @@ static int evergreen_cs_packet_parse_vline(struct 
radeon_cs_parser *p)
ib = p->ib.ptr;
 
/* parse the WAIT_REG_MEM */
-   r = evergreen_cs_packet_parse(p, &wait_reg_mem, p->idx);
+   r = radeon_cs_packet_parse(p, &wait_reg_mem, p->idx);
if (r)
return r;
 
@@ -1183,7 +1136,7 @@ static int evergreen_cs_packet_parse_vline(struct 
radeon_cs_parser *p)
}
 
/* jump over the NOP */
-   r = evergreen_cs_packet_parse(p, &p3reloc, p->idx + wait_reg_mem.count 
+ 2);
+   r = radeon_cs_packet_parse(p, &p3reloc, p->idx + wait_reg_mem.count + 
2);
if (r)
return r;
 
@@ -2819,7 +2772,7 @@ int evergreen_cs_parse(struct radeon_cs_parser *p)
p->track = track;
}
do {
-   r = evergreen_cs_packet_parse(p, &pkt, p->idx);
+   r = radeon_cs_packet_parse(p, &pkt, p->idx);
if (r) {
kfree(p->track);
p->track = NULL;
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 40c0318..7842447 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -1370,53 +1370,6 @@ void r100_cs_dump_packet(struct

[PATCH 06/12] drm/radeon: factor out cs_next_is_pkt3_nop function

2013-01-02 Thread Ilija Hadzic

Once we factored out radeon_cs_packet_parse function,
evergreen_cs_next_is_pkt3_nop and r600_cs_next_is_pkt3_nop
functions became identical, so they can be factored out
into a common function.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/evergreen_cs.c | 23 +--
 drivers/gpu/drm/radeon/r600_cs.c  | 30 --
 drivers/gpu/drm/radeon/radeon.h   |  2 ++
 drivers/gpu/drm/radeon/radeon_cs.c| 21 +
 drivers/gpu/drm/radeon/radeon_reg.h   |  2 ++
 5 files changed, 30 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 1ba4ca3..883b9f7 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -1055,27 +1055,6 @@ static int evergreen_cs_packet_next_reloc(struct 
radeon_cs_parser *p,
 }
 
 /**
- * evergreen_cs_packet_next_is_pkt3_nop() - test if the next packet is NOP
- * @p: structure holding the parser context.
- *
- * Check if the next packet is a relocation packet3.
- **/
-static bool evergreen_cs_packet_next_is_pkt3_nop(struct radeon_cs_parser *p)
-{
-   struct radeon_cs_packet p3reloc;
-   int r;
-
-   r = radeon_cs_packet_parse(p, &p3reloc, p->idx);
-   if (r) {
-   return false;
-   }
-   if (p3reloc.type != PACKET_TYPE3 || p3reloc.opcode != PACKET3_NOP) {
-   return false;
-   }
-   return true;
-}
-
-/**
  * evergreen_cs_packet_next_vline() - parse userspace VLINE packet
  * @parser:parser structure holding parsing context.
  *
@@ -2464,7 +2443,7 @@ static int evergreen_packet3_check(struct 
radeon_cs_parser *p,
 
if ((tex_dim == SQ_TEX_DIM_2D_MSAA || tex_dim 
== SQ_TEX_DIM_2D_ARRAY_MSAA) &&
!mip_address &&
-   !evergreen_cs_packet_next_is_pkt3_nop(p)) {
+   !radeon_cs_packet_next_is_pkt3_nop(p)) {
/* MIP_ADDRESS should point to FMASK 
for an MSAA texture.
 * It should be 0 if FMASK is disabled. 
*/
moffset = 0;
diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
index 1717fec..e2fb76a 100644
--- a/drivers/gpu/drm/radeon/r600_cs.c
+++ b/drivers/gpu/drm/radeon/r600_cs.c
@@ -877,28 +877,6 @@ static int r600_cs_packet_next_reloc_nomm(struct 
radeon_cs_parser *p,
 }
 
 /**
- * r600_cs_packet_next_is_pkt3_nop() - test if next packet is packet3 nop for 
reloc
- * @parser:parser structure holding parsing context.
- *
- * Check next packet is relocation packet3, do bo validation and compute
- * GPU offset using the provided start.
- **/
-static int r600_cs_packet_next_is_pkt3_nop(struct radeon_cs_parser *p)
-{
-   struct radeon_cs_packet p3reloc;
-   int r;
-
-   r = radeon_cs_packet_parse(p, &p3reloc, p->idx);
-   if (r) {
-   return 0;
-   }
-   if (p3reloc.type != PACKET_TYPE3 || p3reloc.opcode != PACKET3_NOP) {
-   return 0;
-   }
-   return 1;
-}
-
-/**
  * r600_cs_packet_next_vline() - parse userspace VLINE packet
  * @parser:parser structure holding parsing context.
  *
@@ -1108,7 +1086,7 @@ static int r600_cs_check_reg(struct radeon_cs_parser *p, 
u32 reg, u32 idx)
break;
case R_028010_DB_DEPTH_INFO:
if (!(p->cs_flags & RADEON_CS_KEEP_TILING_FLAGS) &&
-   r600_cs_packet_next_is_pkt3_nop(p)) {
+   radeon_cs_packet_next_is_pkt3_nop(p)) {
r = r600_cs_packet_next_reloc(p, &reloc);
if (r) {
dev_warn(p->dev, "bad SET_CONTEXT_REG "
@@ -1209,7 +1187,7 @@ static int r600_cs_check_reg(struct radeon_cs_parser *p, 
u32 reg, u32 idx)
case R_0280B8_CB_COLOR6_INFO:
case R_0280BC_CB_COLOR7_INFO:
if (!(p->cs_flags & RADEON_CS_KEEP_TILING_FLAGS) &&
-r600_cs_packet_next_is_pkt3_nop(p)) {
+radeon_cs_packet_next_is_pkt3_nop(p)) {
r = r600_cs_packet_next_reloc(p, &reloc);
if (r) {
dev_err(p->dev, "bad SET_CONTEXT_REG 0x%04X\n", 
reg);
@@ -1273,7 +1251,7 @@ static int r600_cs_check_reg(struct radeon_cs_parser *p, 
u32 reg, u32 idx)
case R_0280F8_CB_COLOR6_FRAG:
case R_0280FC_CB_COLOR7_FRAG:
tmp = (reg - R_0280E0_CB_COLOR0_FRAG) / 4;
-   if (!r600_cs_packet_next_is_pkt3_nop(p)) {
+   if (!radeon_cs_packet_next_is_pkt3_nop(p)) {
if (!track->cb_color_base_last[tmp]) {
dev_err(

[PATCH 07/12] drm/radeon: refactor vline packet parsing function

2013-01-02 Thread Ilija Hadzic

vline packet parsing function for R600 and Evergreen+ are
the same, except that they use different registers. Factor
out the algorithm into a common function that uses register
table passed from ASIC-specific caller.

This reduces ASIC-specific function to (trivial) setup
of register table and call into the common function.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/evergreen_cs.c | 120 +++---
 drivers/gpu/drm/radeon/r600_cs.c  |  61 +++--
 drivers/gpu/drm/radeon/radeon.h   |   4 +-
 drivers/gpu/drm/radeon/radeon_reg.h   |   2 +
 4 files changed, 70 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 883b9f7..2690532 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -1055,109 +1055,35 @@ static int evergreen_cs_packet_next_reloc(struct 
radeon_cs_parser *p,
 }
 
 /**
- * evergreen_cs_packet_next_vline() - parse userspace VLINE packet
+ * evergreen_cs_packet_parse_vline() - parse userspace VLINE packet
  * @parser:parser structure holding parsing context.
  *
- * Userspace sends a special sequence for VLINE waits.
- * PACKET0 - VLINE_START_END + value
- * PACKET3 - WAIT_REG_MEM poll vline status reg
- * RELOC (P3) - crtc_id in reloc.
- *
- * This function parses this and relocates the VLINE START END
- * and WAIT_REG_MEM packets to the correct crtc.
- * It also detects a switched off crtc and nulls out the
- * wait in that case.
+ * This is an Evergreen(+)-specific function for parsing VLINE packets.
+ * Real work is done by r600_cs_common_vline_parse function.
+ * Here we just set up ASIC-specific register table and call
+ * the common implementation function.
  */
 static int evergreen_cs_packet_parse_vline(struct radeon_cs_parser *p)
 {
-   struct drm_mode_object *obj;
-   struct drm_crtc *crtc;
-   struct radeon_crtc *radeon_crtc;
-   struct radeon_cs_packet p3reloc, wait_reg_mem;
-   int crtc_id;
-   int r;
-   uint32_t header, h_idx, reg, wait_reg_mem_info;
-   volatile uint32_t *ib;
-
-   ib = p->ib.ptr;
-
-   /* parse the WAIT_REG_MEM */
-   r = radeon_cs_packet_parse(p, &wait_reg_mem, p->idx);
-   if (r)
-   return r;
-
-   /* check its a WAIT_REG_MEM */
-   if (wait_reg_mem.type != PACKET_TYPE3 ||
-   wait_reg_mem.opcode != PACKET3_WAIT_REG_MEM) {
-   DRM_ERROR("vline wait missing WAIT_REG_MEM segment\n");
-   return -EINVAL;
-   }
-
-   wait_reg_mem_info = radeon_get_ib_value(p, wait_reg_mem.idx + 1);
-   /* bit 4 is reg (0) or mem (1) */
-   if (wait_reg_mem_info & 0x10) {
-   DRM_ERROR("vline WAIT_REG_MEM waiting on MEM rather than 
REG\n");
-   return -EINVAL;
-   }
-   /* waiting for value to be equal */
-   if ((wait_reg_mem_info & 0x7) != 0x3) {
-   DRM_ERROR("vline WAIT_REG_MEM function not equal\n");
-   return -EINVAL;
-   }
-   if ((radeon_get_ib_value(p, wait_reg_mem.idx + 2) << 2) != 
EVERGREEN_VLINE_STATUS) {
-   DRM_ERROR("vline WAIT_REG_MEM bad reg\n");
-   return -EINVAL;
-   }
 
-   if (radeon_get_ib_value(p, wait_reg_mem.idx + 5) != 
EVERGREEN_VLINE_STAT) {
-   DRM_ERROR("vline WAIT_REG_MEM bad bit mask\n");
-   return -EINVAL;
-   }
-
-   /* jump over the NOP */
-   r = radeon_cs_packet_parse(p, &p3reloc, p->idx + wait_reg_mem.count + 
2);
-   if (r)
-   return r;
-
-   h_idx = p->idx - 2;
-   p->idx += wait_reg_mem.count + 2;
-   p->idx += p3reloc.count + 2;
-
-   header = radeon_get_ib_value(p, h_idx);
-   crtc_id = radeon_get_ib_value(p, h_idx + 2 + 7 + 1);
-   reg = CP_PACKET0_GET_REG(header);
-   obj = drm_mode_object_find(p->rdev->ddev, crtc_id, 
DRM_MODE_OBJECT_CRTC);
-   if (!obj) {
-   DRM_ERROR("cannot find crtc %d\n", crtc_id);
-   return -EINVAL;
-   }
-   crtc = obj_to_crtc(obj);
-   radeon_crtc = to_radeon_crtc(crtc);
-   crtc_id = radeon_crtc->crtc_id;
-
-   if (!crtc->enabled) {
-   /* if the CRTC isn't enabled - we need to nop out the 
WAIT_REG_MEM */
-   ib[h_idx + 2] = PACKET2(0);
-   ib[h_idx + 3] = PACKET2(0);
-   ib[h_idx + 4] = PACKET2(0);
-   ib[h_idx + 5] = PACKET2(0);
-   ib[h_idx + 6] = PACKET2(0);
-   ib[h_idx + 7] = PACKET2(0);
-   ib[h_idx + 8] = PACKET2(0);
-   } else {
-   switch (reg) {
-   case EVERGREEN_VLINE_START_END:
-   header &= ~R600_CP_PACKET0_REG_MASK;
-   header |= (EVERGREEN_VLINE_START_END + 
radeon_crtc->cr

[PATCH 08/12] drm/radeon: add a check to wait_reg_mem command

2013-01-02 Thread Ilija Hadzic

WAIT_REG_MEM on register does not allow the use of PFP.
Enforce this restriction when checking packets sent from
userland.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/evergreen_cs.c | 3 +++
 drivers/gpu/drm/radeon/r600_cs.c  | 8 
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 2690532..02aeb7f 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -2101,6 +2101,9 @@ static int evergreen_packet3_check(struct 
radeon_cs_parser *p,
 
ib[idx+1] = (ib[idx+1] & 0x3) | (offset & 0xfffc);
ib[idx+2] = upper_32_bits(offset) & 0xff;
+   } else if (idx_value & 0x100) {
+   DRM_ERROR("cannot use PFP on REG wait\n");
+   return -EINVAL;
}
break;
case PACKET3_CP_DMA:
diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
index 5d59275..d64432b 100644
--- a/drivers/gpu/drm/radeon/r600_cs.c
+++ b/drivers/gpu/drm/radeon/r600_cs.c
@@ -949,6 +949,11 @@ int r600_cs_common_vline_parse(struct radeon_cs_parser *p,
DRM_ERROR("vline WAIT_REG_MEM waiting on MEM instead of REG\n");
return -EINVAL;
}
+   /* bit 8 is me (0) or pfp (1) */
+   if (wait_reg_mem_info & 0x100) {
+   DRM_ERROR("vline WAIT_REG_MEM waiting on PFP instead of ME\n");
+   return -EINVAL;
+   }
/* waiting for value to be equal */
if ((wait_reg_mem_info & 0x7) != 0x3) {
DRM_ERROR("vline WAIT_REG_MEM function not equal\n");
@@ -1847,6 +1852,9 @@ static int r600_packet3_check(struct radeon_cs_parser *p,
 
ib[idx+1] = (ib[idx+1] & 0x3) | (offset & 0xfff0);
ib[idx+2] = upper_32_bits(offset) & 0xff;
+   } else if (idx_value & 0x100) {
+   DRM_ERROR("cannot use PFP on REG wait\n");
+   return -EINVAL;
}
break;
case PACKET3_CP_DMA:
-- 
1.7.12

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 09/12] drm/radeon: rename r100_cs_dump_packet to radeon_cs_dump_packet

2013-01-02 Thread Ilija Hadzic

This function is not limited to r100, but it can dump a
(raw) packet for any ASIC. Rename it accordingly and move
its declaration to radeon.h

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/r100.c   | 52 ++---
 drivers/gpu/drm/radeon/r100_track.h |  2 --
 drivers/gpu/drm/radeon/r200.c   | 14 +-
 drivers/gpu/drm/radeon/r300.c   | 18 ++---
 drivers/gpu/drm/radeon/radeon.h |  2 ++
 drivers/gpu/drm/radeon/radeon_cs.c  | 21 +++
 6 files changed, 58 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 7842447..cf7e359 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -1219,7 +1219,7 @@ int r100_reloc_pitch_offset(struct radeon_cs_parser *p,
if (r) {
DRM_ERROR("No reloc for ib[%d]=0x%04X\n",
  idx, reg);
-   r100_cs_dump_packet(p, pkt);
+   radeon_cs_dump_packet(p, pkt);
return r;
}
 
@@ -1233,7 +1233,7 @@ int r100_reloc_pitch_offset(struct radeon_cs_parser *p,
if (reloc->lobj.tiling_flags & RADEON_TILING_MICRO) {
if (reg == RADEON_SRC_PITCH_OFFSET) {
DRM_ERROR("Cannot src blit from microtiled 
surface\n");
-   r100_cs_dump_packet(p, pkt);
+   radeon_cs_dump_packet(p, pkt);
return -EINVAL;
}
tile_flags |= RADEON_DST_TILE_MICRO;
@@ -1263,7 +1263,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p,
if (c > 16) {
DRM_ERROR("Only 16 vertex buffers are allowed %d\n",
  pkt->opcode);
-   r100_cs_dump_packet(p, pkt);
+   radeon_cs_dump_packet(p, pkt);
return -EINVAL;
}
track->num_arrays = c;
@@ -1272,7 +1272,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p,
if (r) {
DRM_ERROR("No reloc for packet3 %d\n",
  pkt->opcode);
-   r100_cs_dump_packet(p, pkt);
+   radeon_cs_dump_packet(p, pkt);
return r;
}
idx_value = radeon_get_ib_value(p, idx);
@@ -1285,7 +1285,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p,
if (r) {
DRM_ERROR("No reloc for packet3 %d\n",
  pkt->opcode);
-   r100_cs_dump_packet(p, pkt);
+   radeon_cs_dump_packet(p, pkt);
return r;
}
ib[idx+2] = radeon_get_ib_value(p, idx + 2) + 
((u32)reloc->lobj.gpu_offset);
@@ -1298,7 +1298,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p,
if (r) {
DRM_ERROR("No reloc for packet3 %d\n",
  pkt->opcode);
-   r100_cs_dump_packet(p, pkt);
+   radeon_cs_dump_packet(p, pkt);
return r;
}
idx_value = radeon_get_ib_value(p, idx);
@@ -1355,20 +1355,6 @@ int r100_cs_parse_packet0(struct radeon_cs_parser *p,
return 0;
 }
 
-void r100_cs_dump_packet(struct radeon_cs_parser *p,
-struct radeon_cs_packet *pkt)
-{
-   volatile uint32_t *ib;
-   unsigned i;
-   unsigned idx;
-
-   ib = p->ib.ptr;
-   idx = pkt->idx;
-   for (i = 0; i <= (pkt->count + 1); i++, idx++) {
-   DRM_INFO("ib[%d]=0x%08X\n", idx, ib[idx]);
-   }
-}
-
 /**
  * r100_cs_packet_next_vline() - parse userspace VLINE packet
  * @parser:parser structure holding parsing context.
@@ -1492,14 +1478,14 @@ int r100_cs_packet_next_reloc(struct radeon_cs_parser 
*p,
if (p3reloc.type != PACKET_TYPE3 || p3reloc.opcode != PACKET3_NOP) {
DRM_ERROR("No packet3 for relocation for packet at %d.\n",
  p3reloc.idx);
-   r100_cs_dump_packet(p, &p3reloc);
+   radeon_cs_dump_packet(p, &p3reloc);
return -EINVAL;
}
idx = radeon_get_ib_value(p, p3reloc.idx + 1);
if (idx >= relocs_chunk->length_dw) {
DRM_ERROR("Relocs at %d after relocations chunk end %d !\n",
  idx, relocs_chunk->length_dw);
-   r100_cs_dump_packet(p, &p3reloc);
+   radeon_cs_dump_packet(p, &p3reloc);
return -EINVAL;
}
/* FIXME: we assume reloc size is 4 dwords */
@@ -1584,7 +1570,7 @@ static int r100_packet0_check(struct

[PATCH 10/12] drm/radeon: pull out common next_reloc function

2013-01-02 Thread Ilija Hadzic

next_reloc function does the same thing in all ASICs with
the exception of R600 which has a special case in legacy mode.
Pull out the common function in preparation for refactoring.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon.h|  3 +++
 drivers/gpu/drm/radeon/radeon_cs.c | 54 ++
 2 files changed, 57 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index a0e28dc..dd8feab 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -1972,6 +1972,9 @@ int radeon_cs_packet_parse(struct radeon_cs_parser *p,
 bool radeon_cs_packet_next_is_pkt3_nop(struct radeon_cs_parser *p);
 void radeon_cs_dump_packet(struct radeon_cs_parser *p,
   struct radeon_cs_packet *pkt);
+int radeon_cs_packet_next_reloc(struct radeon_cs_parser *p,
+   struct radeon_cs_reloc **cs_reloc,
+   int nomm);
 int r600_cs_common_vline_parse(struct radeon_cs_parser *p,
   uint32_t *vline_start_end,
   uint32_t *vline_status);
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 2ef7e81..5c3d773 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -734,3 +734,57 @@ void radeon_cs_dump_packet(struct radeon_cs_parser *p,
DRM_INFO("ib[%d]=0x%08X\n", idx, ib[idx]);
 }
 
+/**
+ * radeon_cs_packet_next_reloc() - parse next (should be reloc) packet
+ * @parser:parser structure holding parsing context.
+ * @data:  pointer to relocation data
+ * @offset_start:  starting offset
+ * @offset_mask:   offset mask (to align start offset on)
+ * @reloc: reloc informations
+ *
+ * Check if next packet is relocation packet3, do bo validation and compute
+ * GPU offset using the provided start.
+ **/
+int radeon_cs_packet_next_reloc(struct radeon_cs_parser *p,
+   struct radeon_cs_reloc **cs_reloc,
+   int nomm)
+{
+   struct radeon_cs_chunk *relocs_chunk;
+   struct radeon_cs_packet p3reloc;
+   unsigned idx;
+   int r;
+
+   if (p->chunk_relocs_idx == -1) {
+   DRM_ERROR("No relocation chunk !\n");
+   return -EINVAL;
+   }
+   *cs_reloc = NULL;
+   relocs_chunk = &p->chunks[p->chunk_relocs_idx];
+   r = radeon_cs_packet_parse(p, &p3reloc, p->idx);
+   if (r)
+   return r;
+   p->idx += p3reloc.count + 2;
+   if (p3reloc.type != RADEON_PACKET_TYPE3 ||
+   p3reloc.opcode != RADEON_PACKET3_NOP) {
+   DRM_ERROR("No packet3 for relocation for packet at %d.\n",
+ p3reloc.idx);
+   radeon_cs_dump_packet(p, &p3reloc);
+   return -EINVAL;
+   }
+   idx = radeon_get_ib_value(p, p3reloc.idx + 1);
+   if (idx >= relocs_chunk->length_dw) {
+   DRM_ERROR("Relocs at %d after relocations chunk end %d !\n",
+ idx, relocs_chunk->length_dw);
+   radeon_cs_dump_packet(p, &p3reloc);
+   return -EINVAL;
+   }
+   /* FIXME: we assume reloc size is 4 dwords */
+   if (nomm) {
+   *cs_reloc = p->relocs;
+   (*cs_reloc)->lobj.gpu_offset =
+   (u64)relocs_chunk->kdata[idx + 3] << 32;
+   (*cs_reloc)->lobj.gpu_offset |= relocs_chunk->kdata[idx + 0];
+   } else
+   *cs_reloc = p->relocs_ptr[(idx / 4)];
+   return 0;
+}
-- 
1.7.12

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 11/12] drm/radeon: use common next_reloc function

2013-01-02 Thread Ilija Hadzic

This patch eliminates ASIC-specific ***_cs_packet_next_reloc
functions and hooks up the new common function.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/evergreen_cs.c | 129 +--
 drivers/gpu/drm/radeon/r100.c |  76 +++-
 drivers/gpu/drm/radeon/r100_track.h   |   2 -
 drivers/gpu/drm/radeon/r200.c |  12 +--
 drivers/gpu/drm/radeon/r300.c |  16 ++--
 drivers/gpu/drm/radeon/r600_cs.c  | 158 +++---
 6 files changed, 98 insertions(+), 295 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 02aeb7f..211b509 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -36,9 +36,6 @@
 
 int r600_dma_cs_next_reloc(struct radeon_cs_parser *p,
   struct radeon_cs_reloc **cs_reloc);
-static int evergreen_cs_packet_next_reloc(struct radeon_cs_parser *p,
- struct radeon_cs_reloc **cs_reloc);
-
 struct evergreen_cs_track {
u32 group_size;
u32 nbanks;
@@ -1009,52 +1006,6 @@ static int evergreen_cs_track_check(struct 
radeon_cs_parser *p)
 }
 
 /**
- * evergreen_cs_packet_next_reloc() - parse next packet which should be reloc 
packet3
- * @parser:parser structure holding parsing context.
- * @data:  pointer to relocation data
- * @offset_start:  starting offset
- * @offset_mask:   offset mask (to align start offset on)
- * @reloc: reloc informations
- *
- * Check next packet is relocation packet3, do bo validation and compute
- * GPU offset using the provided start.
- **/
-static int evergreen_cs_packet_next_reloc(struct radeon_cs_parser *p,
- struct radeon_cs_reloc **cs_reloc)
-{
-   struct radeon_cs_chunk *relocs_chunk;
-   struct radeon_cs_packet p3reloc;
-   unsigned idx;
-   int r;
-
-   if (p->chunk_relocs_idx == -1) {
-   DRM_ERROR("No relocation chunk !\n");
-   return -EINVAL;
-   }
-   *cs_reloc = NULL;
-   relocs_chunk = &p->chunks[p->chunk_relocs_idx];
-   r = radeon_cs_packet_parse(p, &p3reloc, p->idx);
-   if (r) {
-   return r;
-   }
-   p->idx += p3reloc.count + 2;
-   if (p3reloc.type != PACKET_TYPE3 || p3reloc.opcode != PACKET3_NOP) {
-   DRM_ERROR("No packet3 for relocation for packet at %d.\n",
- p3reloc.idx);
-   return -EINVAL;
-   }
-   idx = radeon_get_ib_value(p, p3reloc.idx + 1);
-   if (idx >= relocs_chunk->length_dw) {
-   DRM_ERROR("Relocs at %d after relocations chunk end %d !\n",
- idx, relocs_chunk->length_dw);
-   return -EINVAL;
-   }
-   /* FIXME: we assume reloc size is 4 dwords */
-   *cs_reloc = p->relocs_ptr[(idx / 4)];
-   return 0;
-}
-
-/**
  * evergreen_cs_packet_parse_vline() - parse userspace VLINE packet
  * @parser:parser structure holding parsing context.
  *
@@ -1205,7 +1156,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser 
*p, u32 reg, u32 idx)
case SQ_LSTMP_RING_BASE:
case SQ_PSTMP_RING_BASE:
case SQ_VSTMP_RING_BASE:
-   r = evergreen_cs_packet_next_reloc(p, &reloc);
+   r = radeon_cs_packet_next_reloc(p, &reloc, 0);
if (r) {
dev_warn(p->dev, "bad SET_CONTEXT_REG "
"0x%04X\n", reg);
@@ -1234,7 +1185,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser 
*p, u32 reg, u32 idx)
case DB_Z_INFO:
track->db_z_info = radeon_get_ib_value(p, idx);
if (!(p->cs_flags & RADEON_CS_KEEP_TILING_FLAGS)) {
-   r = evergreen_cs_packet_next_reloc(p, &reloc);
+   r = radeon_cs_packet_next_reloc(p, &reloc, 0);
if (r) {
dev_warn(p->dev, "bad SET_CONTEXT_REG "
"0x%04X\n", reg);
@@ -1276,7 +1227,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser 
*p, u32 reg, u32 idx)
track->db_dirty = true;
break;
case DB_Z_READ_BASE:
-   r = evergreen_cs_packet_next_reloc(p, &reloc);
+   r = radeon_cs_packet_next_reloc(p, &reloc, 0);
if (r) {
dev_warn(p->dev, "bad SET_CONTEXT_REG "
"0x%04X\n", reg);
@@ -1288,7 +1239,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser 
*p, u32 reg, u32 idx)
track->db_dirty = true;

[PATCH 12/12] drm/radeon: consolidate redundant macros and constants

2013-01-02 Thread Ilija Hadzic

After refactoring the _cs logic, we ended up with many
macros and constants that #define the same thing.
Clean'em up.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/evergreen_cs.c | 18 +-
 drivers/gpu/drm/radeon/evergreend.h   | 13 ++---
 drivers/gpu/drm/radeon/nid.h  | 13 ++---
 drivers/gpu/drm/radeon/r100.c |  8 
 drivers/gpu/drm/radeon/r100d.h| 11 ---
 drivers/gpu/drm/radeon/r300.c |  6 +++---
 drivers/gpu/drm/radeon/r300d.h| 11 ---
 drivers/gpu/drm/radeon/r600_cs.c  | 10 +-
 drivers/gpu/drm/radeon/r600d.h| 13 ++---
 drivers/gpu/drm/radeon/rv515d.h   | 11 ---
 drivers/gpu/drm/radeon/si.c   | 12 ++--
 drivers/gpu/drm/radeon/sid.h  | 13 ++---
 12 files changed, 35 insertions(+), 104 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 211b509..4a9760a 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -2639,12 +2639,12 @@ int evergreen_cs_parse(struct radeon_cs_parser *p)
}
p->idx += pkt.count + 2;
switch (pkt.type) {
-   case PACKET_TYPE0:
+   case RADEON_PACKET_TYPE0:
r = evergreen_cs_parse_packet0(p, &pkt);
break;
-   case PACKET_TYPE2:
+   case RADEON_PACKET_TYPE2:
break;
-   case PACKET_TYPE3:
+   case RADEON_PACKET_TYPE3:
r = evergreen_packet3_check(p, &pkt);
break;
default:
@@ -3395,19 +3395,19 @@ int evergreen_ib_parse(struct radeon_device *rdev, 
struct radeon_ib *ib)
 
do {
pkt.idx = idx;
-   pkt.type = CP_PACKET_GET_TYPE(ib->ptr[idx]);
-   pkt.count = CP_PACKET_GET_COUNT(ib->ptr[idx]);
+   pkt.type = RADEON_CP_PACKET_GET_TYPE(ib->ptr[idx]);
+   pkt.count = RADEON_CP_PACKET_GET_COUNT(ib->ptr[idx]);
pkt.one_reg_wr = 0;
switch (pkt.type) {
-   case PACKET_TYPE0:
+   case RADEON_PACKET_TYPE0:
dev_err(rdev->dev, "Packet0 not allowed!\n");
ret = -EINVAL;
break;
-   case PACKET_TYPE2:
+   case RADEON_PACKET_TYPE2:
idx += 1;
break;
-   case PACKET_TYPE3:
-   pkt.opcode = CP_PACKET3_GET_OPCODE(ib->ptr[idx]);
+   case RADEON_PACKET_TYPE3:
+   pkt.opcode = RADEON_CP_PACKET3_GET_OPCODE(ib->ptr[idx]);
ret = evergreen_vm_packet3_check(rdev, ib->ptr, &pkt);
idx += pkt.count + 2;
break;
diff --git a/drivers/gpu/drm/radeon/evergreend.h 
b/drivers/gpu/drm/radeon/evergreend.h
index cb9baaa..5ce08cf 100644
--- a/drivers/gpu/drm/radeon/evergreend.h
+++ b/drivers/gpu/drm/radeon/evergreend.h
@@ -979,16 +979,7 @@
 /*
  * PM4
  */
-#definePACKET_TYPE00
-#definePACKET_TYPE11
-#definePACKET_TYPE22
-#definePACKET_TYPE33
-
-#define CP_PACKET_GET_TYPE(h) (((h) >> 30) & 3)
-#define CP_PACKET_GET_COUNT(h) (((h) >> 16) & 0x3FFF)
-#define CP_PACKET0_GET_REG(h) (((h) & 0x) << 2)
-#define CP_PACKET3_GET_OPCODE(h) (((h) >> 8) & 0xFF)
-#define PACKET0(reg, n)((PACKET_TYPE0 << 30) | 
\
+#define PACKET0(reg, n)((RADEON_PACKET_TYPE0 << 30) |  
\
 (((reg) >> 2) & 0x) |  \
 ((n) & 0x3FFF) << 16)
 #define CP_PACKET2 0x8000
@@ -997,7 +988,7 @@
 
 #define PACKET2(v) (CP_PACKET2 | REG_SET(PACKET2_PAD, (v)))
 
-#define PACKET3(op, n) ((PACKET_TYPE3 << 30) | \
+#define PACKET3(op, n) ((RADEON_PACKET_TYPE3 << 30) |  \
 (((op) & 0xFF) << 8) | \
 ((n) & 0x3FFF) << 16)
 
diff --git a/drivers/gpu/drm/radeon/nid.h b/drivers/gpu/drm/radeon/nid.h
index b93186b..a3a3212 100644
--- a/drivers/gpu/drm/radeon/nid.h
+++ b/drivers/gpu/drm/radeon/nid.h
@@ -474,16 +474,7 @@
 /*
  * PM4
  */
-#definePACKET_TYPE00
-#definePACKET_TYPE11
-#definePACKET_TYPE22
-#definePACKET_TYPE33
-
-#define CP_PACKET_GET_TYPE(h) (((h) >> 30) & 3)
-#define CP_PACKET_GET_COUNT(h) (((h) >> 16) & 0x3FFF)
-#define CP_PACKET0_GET_REG(h) (((h) & 0x) << 2)
-#define

Re: radeon CS parser refactoring

2013-01-03 Thread Ilija Hadzic




On Fri, 4 Jan 2013, Dave Airlie wrote:


Did you run these with pre-kms userspace etc to make sure it doesn't
cause regressions there?



No, I didn't, but I can give it a quick whirl. I think I still have one or 
two machines with 6.14.x DDX that I can put in UMS mode and see what 
happens.


Haven't used UMS for at least 2-3 years. My bet is that it is more likely 
that I will hit a few unrelated UMS problems that are there even without 
my patches, but I'll let you know the result when I have it.


-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: radeon CS parser refactoring

2013-01-04 Thread Ilija Hadzic



On Fri, 4 Jan 2013, Dave Airlie wrote:


Did you run these with pre-kms userspace etc to make sure it doesn't
cause regressions there?



I did some tests with UMS and shuffled a number of cards. As I feared, I 
ran into a number of unrelated problems, but in each case I have seen 
identical beahvior between the kernel with my patches and without. So as 
far as I can tell, my patches do not introduce regressions in legacy 
modes, althugh I am not sure about the coverage.


Also, in one test, I think I have hit a genuine bug in ATI DDX (explained 
below).


Below I describe what I saw with each card family. Maybe for some test 
cases I am missing some "magic" option in xorg.conf or maybe what I am 
seeing (mostly reduced feature set) is expected in UMS.


* With an NI card (TURKS, HD6570 card), Xorg just plain tells me that the
  card is supported in KMS only mode. So, I guess, that's it for that
  card.

* A test with Evergreen (CEDAR) card works in UMS mode, but I can't
  get 3D acceleration. I see these messages in Xorg log file:

[37.024] (II) RADEON(0): No DRI yet on Evergreen
.
.
[37.364] (II) AIGLX: Screen 0 is not DRI2 capable
[37.364] (II) AIGLX: Screen 0 is not DRI capable
[37.664] (II) AIGLX: Loaded and initialized swrast
[37.664] (II) GLX: Initialized DRISWRAST GL provider for screen 0

  Sounds like just a "property" of UMS to me, but I am not sure.
  Nevertheless, the behavior with and without my kernel patches is
  identical. Still, 2D copying should still be exercising the CS
  parser, so there is some test coverage here.

* A test with an R7xx card (RV730, E4690 card) results in a segfault in
  DDX. Again, this is irrespective of my kernel patches, so I believe
  that the bug has been there for a while and that it's in userland.
  The crash occurs in r600_set_render_target() function and what causes
  it is corrupted cb_conf->surface pointer. When the crash occurs it
  has a value of 0x1, which doesn't look like something that would live
  in .bss, .data or come from the heap. I didn't try other R6xx cards,
  but I suspect that they may have the same problem because they share
  the code in DDX. I don't know if UMS DDX will be maintained going
  forward, so I don't know if it makes sense to open a bug for this.
  BTW, DDX I am testing this with is 6.14.6

* A test with R300 card (Radeon X300 card) works (again identically
  with and without my patches), but again without 3D acceleration.
  So it's similar result and comment as with the Evergreen test, though
  relevant messages in Xorg log file are slightly different:

[35.630] (EE) AIGLX error: r300 does not export required DRI extension
[35.630] (EE) AIGLX: reverting to software rendering
[35.915] (II) AIGLX: Loaded and initialized swrast
[35.915] (II) GLX: Initialized DRISWRAST GL provider for screen 0

  Again, I don't know if this is just the way things are in UMS mode or if
  there is some configuration magic I need to do.

So at this point I'd say that I have not seen anything that would indicate 
a regression in legacy mode, although the limitations I have hit make the 
tests more limited that I thought they would be (and KMS I tested quite a 
bit, so I am confident there).


thanks,

Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: radeon CS parser refactoring

2013-01-07 Thread Ilija Hadzic




On Fri, 4 Jan 2013, Alex Deucher wrote:


R6xx and r7xx are really all you need to worry about in this case.
R1xx-r5xx UMS uses a different kernel interface for command submission
and evergreen and later don't have UMS drm support.  UMS r6xx/r7xx
support used the same kernel interface for command submission but the
kernel side was much simpler.


OK, I have found an old machine with RV730 GPU and a known-working UMS: 
2.6.35 kernel, 6.13.2 DDX, 7.8.2 mesa (classic), 1.9 Xorg). I changed the 
kernel to the latest drm-next and ran tests with and without my 
CS-refactoring patches. Here are the results:


* It appears that drm-next in its current state is broken with regard
  to UMS (nothing to do with my patches, it was pre-broken to begin with).
  UMS provokes the kernel into a NULL-pointer dereference oops. Good news
  is that I have tracked down the crash and I will be sending the patches
  with the fix shortly.

* There are multiple patches that contributed to the breakage of UMS.
  I didn't bother pin-pointing them all, but one that I looked
  (6a7068b4) dates back to April 2012 so there are kernels out in
  distros that crash on UMS. That probably tells us how many UMS
  users are left out there :-). BTW, the reason UMS crashes is
  because parser->rdev is NULL in UMS mode so every patch that tries
  to dereference it (and shares the code path with UMS) will cause an
  oops (it will become clear when you see the patches).

* So, having fixed the above incidental finding, I got my machine
  with ancient UMS-userspace and a shiny latest drm-next kernel (plus
  my CS-refactoring patches plus my yet-to-be-sent UMS fixes) to
  work. My test consists of bringing up Gnome (it's Gnome 2 on that
  machine), running glxgears, sphereworld (an example code from
  OpenGL Superbible book), and OpenArena. Everything seems to work.

* Going back to KMS and retesting there, things still look good.

So this (with tests I did on Friday) should cover all the cases.


Additionally, UMS requires the old
non-gallium 3D drivers.  It sounds like some other change in the ddx
broke UMS support for r6xx/r7xx.


The DDX segfault I reported on Friday was provoked by trying to run 
Gallium 3D driver on the top of UMS. Once I switched back to classic, the 
crash went away. So I guess the userspace crash was provoked by some 
obscure path that was never intended to be exercised. I don't think

it's worth investigating further.

-- Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

fix for crashes provoked by UMS mode

2013-01-07 Thread Ilija Hadzic

At Dave's request I ran some regression tests on my CS-refactoring
patches [1] against old UMS userspace. The tests have revealed
that the current kernel can be provoked into crashing in UMS mode
(and the problem is unrelated to refactoring of CS code; it has
been here before).

The following patches will fix the problem that I discovered
while testing (and they should probably go to drm-fixes branch).
First two patches are the fix for the acute problem, while
the third patch is a fix for an incidental finding uncovered
while debugging the problem.


[1] http://lists.freedesktop.org/archives/dri-devel/2013-January/032814.html

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/3] drm/radeon: fix NULL pointer dereference in UMS mode

2013-01-07 Thread Ilija Hadzic

In UMS mode parser->rdev is NULL, so dereferencing
will cause an oops.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon_cs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 396baba0..45151c4 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -279,7 +279,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void 
*data)
  p->chunks[p->chunk_ib_idx].length_dw);
return -EINVAL;
}
-   if ((p->rdev->flags & RADEON_IS_AGP)) {
+   if (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) {
p->chunks[p->chunk_ib_idx].kpage[0] = 
kmalloc(PAGE_SIZE, GFP_KERNEL);
p->chunks[p->chunk_ib_idx].kpage[1] = 
kmalloc(PAGE_SIZE, GFP_KERNEL);
if (p->chunks[p->chunk_ib_idx].kpage[0] == NULL ||
@@ -583,7 +583,8 @@ static int radeon_cs_update_pages(struct radeon_cs_parser 
*p, int pg_idx)
struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
int i;
int size = PAGE_SIZE;
-   bool copy1 = (p->rdev->flags & RADEON_IS_AGP) ? false : true;
+   bool copy1 = (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) ?
+   false : true;
 
for (i = ibc->last_copied_page + 1; i < pg_idx; i++) {
if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
-- 
1.8.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/3] drm/radeon: fix a bogus kfree

2013-01-07 Thread Ilija Hadzic

parser->chunks[.].kpage[.] is not always kmalloc-ed
by the parser initialization, so parser_fini should
not try to kfree it if it didn't allocate it.

This patch fixes a kernel oops that can be provoked
in UMS mode.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/r600_cs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
index 9ea13d0..f8adb01 100644
--- a/drivers/gpu/drm/radeon/r600_cs.c
+++ b/drivers/gpu/drm/radeon/r600_cs.c
@@ -2476,8 +2476,10 @@ static void r600_cs_parser_fini(struct radeon_cs_parser 
*parser, int error)
kfree(parser->relocs);
for (i = 0; i < parser->nchunks; i++) {
kfree(parser->chunks[i].kdata);
-   kfree(parser->chunks[i].kpage[0]);
-   kfree(parser->chunks[i].kpage[1]);
+   if (parser->rdev && (parser->rdev->flags & RADEON_IS_AGP)) {
+   kfree(parser->chunks[i].kpage[0]);
+   kfree(parser->chunks[i].kpage[1]);
+   }
}
kfree(parser->chunks);
kfree(parser->chunks_array);
-- 
1.8.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 3/3] drm/radeon: fix error path in kpage allocation

2013-01-07 Thread Ilija Hadzic

Index into chunks[] array doesn't look right.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon_cs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 45151c4..469661f 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -284,8 +284,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void 
*data)
p->chunks[p->chunk_ib_idx].kpage[1] = 
kmalloc(PAGE_SIZE, GFP_KERNEL);
if (p->chunks[p->chunk_ib_idx].kpage[0] == NULL ||
p->chunks[p->chunk_ib_idx].kpage[1] == NULL) {
-   kfree(p->chunks[i].kpage[0]);
-   kfree(p->chunks[i].kpage[1]);
+   kfree(p->chunks[p->chunk_ib_idx].kpage[0]);
+   kfree(p->chunks[p->chunk_ib_idx].kpage[1]);
return -ENOMEM;
}
}
-- 
1.8.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: radeon CS parser refactoring

2013-01-07 Thread Ilija Hadzic

On Mon, Jan 7, 2013 at 7:21 PM, Marek Olšák  wrote:
>
> IIRC, the radeon gallium drivers call abort() if they encounter an
> unsupported DRM version (that is UMS).
>

I am not familiar enough to comment, but my observation was that as
soon as I backed out to classic, the segfault went away. So I assumed
that the mismatch between the UMS and Gallium was the cause.

Anyway, it's a secondary issue for this thread.

-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Patch "drm: Add EDID_QUIRK_FORCE_REDUCED_BLANKING for ASUS VW222S" has been added to the 3.4-stable tree

2013-01-15 Thread Ilija Hadzic



I thought I saw a revert for that patch on the mailing list yesterday:

http://lists.freedesktop.org/archives/dri-devel/2013-January/033322.html


On Tue, 15 Jan 2013, gre...@linuxfoundation.org wrote:



This is a note to let you know that I've just added the patch titled

   drm: Add EDID_QUIRK_FORCE_REDUCED_BLANKING for ASUS VW222S

to the 3.4-stable tree which can be found at:
   
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
0014-drm-Add-EDID_QUIRK_FORCE_REDUCED_BLANKING-for-ASUS-V.patch
and it can be found in the queue-3.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.



From 31d04b5fa45264da10750fc7f526d0e65f894a29 Mon Sep 17 00:00:00 2001

From: Paul Menzel 
Date: Wed, 8 Aug 2012 23:12:19 +0200
Subject: drm: Add EDID_QUIRK_FORCE_REDUCED_BLANKING for ASUS VW222S

From: Paul Menzel 

commit 6f33814bd4d9cfe76033a31b1c0c76c960cd8e4b upstream.

Connecting an ASUS VW222S [1] over VGA a garbled screen is shown with
vertical stripes in the top half.

In commit bc42aabc [2]

   commit bc42aabc6a01b92b0f961d65671564e0e1cd7592
   Author: Adam Jackson 
   Date:   Wed May 23 16:26:54 2012 -0400

   drm/edid/quirks: ViewSonic VA2026w

Adam Jackson added the quirk `EDID_QUIRK_FORCE_REDUCED_BLANKING` which
is also needed for this ASUS monitor.

All log files and output from `xrandr` is included in the referenced
Bugzilla report #17629.

Please note that this monitor only has a VGA (D-Sub) connector [1].

[1] http://www.asus.com/Display/LCD_Monitors/VW222S/
[2] 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=bc42aabc6a01b92b0f961d65671564e0e1cd7592

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=17629
Signed-off-by: Paul Menzel 
Cc: 
Cc: Adam Jackson 
Cc: Ian Pilcher 
Signed-off-by: Dave Airlie 
Signed-off-by: Julien Cristau 
Signed-off-by: Greg Kroah-Hartman 
---
drivers/gpu/drm/drm_edid.c |3 +++
1 file changed, 3 insertions(+)

--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -87,6 +87,9 @@ static struct edid_quirk {
int product_id;
u32 quirks;
} edid_quirk_list[] = {
+   /* ASUS VW222S */
+   { "ACI", 0x22a2, EDID_QUIRK_FORCE_REDUCED_BLANKING },
+
/* Acer AL1706 */
{ "ACR", 44358, EDID_QUIRK_PREFER_LARGE_60 },
/* Acer F51 */


Patches currently in stable-queue which might be from 
paulepan...@users.sourceforge.net are

queue-3.4/0014-drm-Add-EDID_QUIRK_FORCE_REDUCED_BLANKING-for-ASUS-V.patch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/radeon: fix NULL pointer dereference in UMS mode in radeon_cs_parser_fini()

2013-01-16 Thread Ilija Hadzic



Actually, the code path affected by your patch is not executed in UMS mode 
at all. Notice that radeon_cs_parser_fini is only called from 
radeon_cs_ioctl which is a KMS-only ioctl (see radeon_kms.c).


The equivalent of the fix you are trying to do is in
a6b7e1a02b77ab8fe8775d20a88c53d8ba55482e (function patched by that one is 
the one used by legacy-CS ioctl), which you should go together 
with ff4bd0827764e10a428a9d39e6814c5478863f94 if you are backporting UMS 
fixes to 3.7. Both are needed to prevent kernel crashes in UMS mode.


-- Ilija

On Wed, 16 Jan 2013, Shuah Khan wrote:


Fix parser->rdev NULL pointer dereference in radeon_cs_parser_fini().
While back-porting drm/radeon: fix NULL pointer dereference in UMS mode
patch (commit-id: ff4bd0827764e10a428a9d39e6814c5478863f94) to 3,7.y, noticed
another instance of NULL pointer dereference in radeon_cs_parser_fini()
function.

Signed-off-by: Shuah Khan 
CC: sta...@vger.kernel.org 3.7
---
drivers/gpu/drm/radeon/radeon_cs.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 469661f..d1c282c 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -329,7 +329,7 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser 
*parser, int error)
kfree(parser->relocs_ptr);
for (i = 0; i < parser->nchunks; i++) {
kfree(parser->chunks[i].kdata);
-   if ((parser->rdev->flags & RADEON_IS_AGP)) {
+   if (parser->rdev && (parser->rdev->flags & RADEON_IS_AGP)) {
kfree(parser->chunks[i].kpage[0]);
kfree(parser->chunks[i].kpage[1]);
}
--
1.7.9.5



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/radeon: fix NULL pointer dereference in UMS mode in radeon_cs_parser_fini()

2013-01-23 Thread Ilija Hadzic

On Wed, Jan 23, 2013 at 11:07 AM, Herton Ronaldo Krzesinski
 wrote:
> On Thu, Jan 17, 2013 at 10:09:42AM -0700, Shuah Khan wrote:
>> On Wed, 2013-01-16 at 21:06 -0600, Ilija Hadzic wrote:
>> > Actually, the code path affected by your patch is not executed in UMS mode
>> > at all. Notice that radeon_cs_parser_fini is only called from
>> > radeon_cs_ioctl which is a KMS-only ioctl (see radeon_kms.c).
>> >
>> > The equivalent of the fix you are trying to do is in
>> > a6b7e1a02b77ab8fe8775d20a88c53d8ba55482e (function patched by that one is
>> > the one used by legacy-CS ioctl), which you should go together
>> > with ff4bd0827764e10a428a9d39e6814c5478863f94 if you are backporting UMS
>> > fixes to 3.7. Both are needed to prevent kernel crashes in UMS mode.
>> >
>> > -- Ilija
>>
>> Thanks. I will take a look at a6b7e1a02b77ab8fe8775d20a88c53d8ba55482e.
>> I sent back-ported ff4bd0827764e10a428a9d39e6814c5478863f94 patch to
>> stable and I will back-port and send
>> a6b7e1a02b77ab8fe8775d20a88c53d8ba55482e as well.
>
> While at it, also looks like commit
> 25d8999780f8c1f53928f4a24a09c01550423109 could also be added to stables.
> But while looking at it, while the patch itself is correct, I noted that
> there is a possibility of double free: if either of the
> p->chunks[p->chunk_ib_idx].kpage[] items are non NULL, we will free it in
> radeon_cs_parser_init and also when calling one of the fini functions
> (radeon_cs_parser_fini or r600_cs_parser_fini). So either we need to set
> kpage[n] to NULL after kfree, or just return the error.
>

Yes you are right. The error-handling path should force both kpage[]
pointers to NULL before returning -ENOMEM. That would fix the double
free.

Regarding, inclusion of 25d8999780f8c1f53928f4a24a09c01550423109 into
stable (and possible follow-up patch to fix potential double-kfree),
does this pass the "no fixes for theoretical problems in stable"
criterion ?

It is an obvious bug, but it happens so rarely that I am not surprised
that it has never happened: You need an (old) AGP card *and* you need
to run out of kmalloc memory.

-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH] drm/radeon: fix a rare case of double kfree

2013-01-23 Thread Ilija Hadzic

If one (but not both) allocations of p->chunks[].kpage[]
in radeon_cs_parser_init fail, the error path will free
the successfully allocated page, but leave a stale pointer
value in the kpage[] field. This will later cause a
double-free when radeon_cs_parser_fini is called.
This patch fixes the issue by forcing both pointers to NULL
after kfree in the error path.

The circumstances under which the problem happens are very
rare. The card must be AGP and the system must run out of
kmalloc area just at the right time so that one allocation
succeeds, while the other fails.

Signed-off-by: Ilija Hadzic 
Cc: Herton Ronaldo Krzesinski 
---
 drivers/gpu/drm/radeon/radeon_cs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 469661f..5407459 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -286,6 +286,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void 
*data)
p->chunks[p->chunk_ib_idx].kpage[1] == NULL) {
kfree(p->chunks[p->chunk_ib_idx].kpage[0]);
kfree(p->chunks[p->chunk_ib_idx].kpage[1]);
+   p->chunks[p->chunk_ib_idx].kpage[0] = NULL;
+   p->chunks[p->chunk_ib_idx].kpage[1] = NULL;
return -ENOMEM;
}
}
-- 
1.8.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

is airlied's git tree broken ?

2012-09-19 Thread Ilija Hadzic

I am getting this kind of error when I try to do 'git fetch' from
git://people.freedesktop.org/~airlied/linux.git

remote: error: Could not read ec862f894f7870430e4ff5a9249eaa94d368d220
remote: fatal: bad tree object ec862f894f7870430e4ff5a9249eaa94d368d220
remote: aborting due to possible repository corruption on the remote side.
fatal: protocol error: bad pack header

I see a similar error if I try to do a fresh clone:

Cloning into 'linux'...
remote: error: Could not read ec862f894f7870430e4ff5a9249eaa94d368d220
remote: fatal: bad tree object ec862f894f7870430e4ff5a9249eaa94d368d220
remote: aborting due to possible repository corruption on the remote side.
fatal: early EOF
fatal: index-pack failed

Is there a known problem with that repository ?

thanks,

Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 0/3] [RFC] DRM Render Nodes

2012-09-28 Thread Ilija Hadzic




On Fri, 28 Sep 2012, Alex Deucher wrote:


I haven't read through your patches yet, but Dave and Ilija already
did something similar a while back:

http://lists.freedesktop.org/archives/dri-devel/2012-March/020765.html



Actually, there is a, a newer series of patches here (applied few comments 
I got after first series)


http://lists.freedesktop.org/archives/dri-devel/2012-April/021326.html

I have the v3 series (applied more comments I got after v2 series, plus 
some more cleanup) that I have never sent out, because my perception was 
that there was low interest in this work. I can send the v3 out, if there 
is a revived interest in this work.


The original work is by Dave and I did some major cleanup and built more 
on the top of it.


Kristian's patches at the first glance (I have not looked them in detail) 
appear more like prep. work (like which ioctl can a render node use and 
which can't, etc.), but my impression is that they still require the work 
cited above (Dave's original work and/or my followup work) to actually be 
able to create and use the render node.


BTW, the third patch in Kristian's series is for Intel only and we'll 
probably need equivalent patches for radeon and nouveau.


-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/3] drm: Fix DRM_MINOR limits for control and render nodes

2012-09-28 Thread Ilija Hadzic




On Fri, 28 Sep 2012, Kristian Høgsberg wrote:


if (type == DRM_MINOR_CONTROL) {
 base += 64;
-limit = base + 127;
+limit = base + 64;
 } else if (type == DRM_MINOR_RENDER) {
 base += 128;
-limit = base + 255;
+limit = base + 64;

If my first grade arithmetics serves me well, shouldn't limit in the first 
clause be base + 63 and in the second clause, base + 127. The render 
node starts at 128 and spans to 255, so there are total of 128 render 
nodes, 64 card (legacy) nodes and 64 control nodes allowed.


Actually, this construction of limit relative to the base does not achieve 
anything. The code would be much more readable if it were something like 
this:


if (type == DRM_MINOR_CONTROL) {
base = 64;
limit = 127;
} else if (type == DRM_MINOR_RENDER) {
base = 128;
limit = 255;
}

-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 0/3] [RFC] DRM Render Nodes

2012-09-28 Thread Ilija Hadzic




On Fri, 28 Sep 2012, Daniel Vetter wrote:


On a quick look the rendernode Kristian propose and your work seem to
attack slightly different issues. Your/Dave's patch series seems to
put large efforts into (dynamically) splitting up the resources of a
drm device, including the modeset stuff.


Correct, the goal is to be able to run multiseat while sharing a GPU. 
Actually, with my variant of render nodes, I even got multiple desktops
residing in different LXC containers to share the GPU, which is kind of 
cool.



Kristians proposal here is
much more modest, with just enabling a way for to do the same for
render clients. All the modeset (and flink open stuff) would still be
only done through the legacy drm node.



OK I see. From what I can tell from the second patch, drm_get_pci_dev will 
create one (and I guess only one, right ?) render node if the underlying 
driver has DRIVER_RENDER node feature. The third patch (among other 
things) adds that feature to Intel driver.


So if I boot up a system with these patches and with Intel GPU, I will 
automagically get one more /dev/dri/renderD128 node, right ? The intent is 
that the render client opens and uses that render node. The 
/dev/dri/controlDNN node still remains an unused "orphan", right ?


So would you entertain the possibility that the render node is created 
from user space on demand using an ioctl into the control node ? If that's 
a possiblity for you, then my set of patches is a superset of what 
Kristian needs. If you just need a render client, you can create a node 
with no display resources and you would get something quite close to what 
these 3 patches try to do. unless I am missing something.


-- Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Breakage in "track dev_mapping in more robust and flexible way"

2012-10-25 Thread Ilija Hadzic

On Thu, Oct 25, 2012 at 11:10 AM, Thomas Hellström
 wrote:
> On 10/25/12 4:41 PM, Jerome Glisse wrote:
>>
>> On Thu, Oct 25, 2012 at 04:02:25PM +0200, Thomas Hellstrom wrote:
>>>
>>> Hi,
>>>
>>> This commit
>>>
>>>  From 949c4a34afacfe800fc442afac117aba15284962 Mon Sep 17 00:00:00 2001
>>> From: Ilija Hadzic 
>>> Date: Tue, 15 May 2012 16:40:10 -0400
>>> Subject: [PATCH] drm: track dev_mapping in more robust and flexible way
>>>
>>> Setting dev_mapping (pointer to the address_space structure
>>> used for memory mappings) to the address_space of the first
>>> opener's inode and then failing if other openers come in
>>> through a different inode has a few restrictions that are
>>> eliminated by this patch.
>>>
>>> If we already have valid dev_mapping and we spot an opener
>>> with different i_node, we force its i_mapping pointer to the
>>> already established address_space structure (first opener's
>>> inode). This will make all mappings from drm device hang off
>>> the same address_space object.
>>> ...
>>>
>>> Breaks drivers using TTM, since when the X server calls into the
>>> driver open, drm's dev_mapping has not
>>> yet been setup. The setup needs to be moved before the driver's open
>>> hook is called.
>>>
>>> Typically, if a TTM-aware driver is provoked by the Xorg server to
>>> move a buffer from system to VRAM or AGP,
>>> before any other drm client is started, The user-space page table
>>> entries are not killed before the move, and left pointing
>>> into freed pages, causing system crashes and / or user-space access
>>> to arbitrary memory.
>>
>> Doesn't handle move invalidate the drm file mapping before scheduling
>> the move ?
>
> Yes, but to do that it needs a correct value of bdev::dev_mapping, which is
> now incorrectly set on the
> *second* open instead of the first open.
>

So you are implying that in the first open the assignment of dev->dev_mapping is
somehow skipped (which could happen if drm_setup returns an error) or that the
driver on which you are having problems with (nouveau I presume) needs
dev_mapping
in the firstopen hook ?

It's been a while since I did it, but if my memory serves me well I
thought I explicitly
verified that dev_mapping was correctly set in the first open (though
GPUs I use are
primarily AMD).

-- Ilija


> /Thomas
>
>
>
>> Cheers,
>> Jerome
>
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Breakage in "track dev_mapping in more robust and flexible way"

2012-10-25 Thread Ilija Hadzic



Or could the driver that causes the problem be vmwgfx ? I now looked at 
the code and I notice that indeed vmwgfx sets its private dev_mapping in 
the open hook, while all other TTM-based drivers (radeon and nouveau) do 
it when they create an object.


-- Ilija


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Breakage in "track dev_mapping in more robust and flexible way"

2012-10-25 Thread Ilija Hadzic



Can you give the attached patch a whirl and let me know if it fixes the 
problem?


As I indicated in my previous note, vmwgfx should be the only affected 
driver because it looks at dev_mapping in the open hook (others do it when 
they create an object, so they are not affected).


I'll probably revise it (and I'll have some general questions about 
drm_open syscall) before officially send the patch, but I wanted to get 
something quickly to you to check if it fixes your problem. I hope that 
your vmwgfx test environment is such that you can reproduce the original

problem.

thanks,

-- Ilija
From 18a489e7415f495c7ba48cc61733d6c7d8f3fd68 Mon Sep 17 00:00:00 2001
From: Ilija Hadzic 
Date: Thu, 25 Oct 2012 15:28:05 -0400
Subject: [PATCH] drm: set dev_mapping before calling drm_open_helper

Some drivers (specifically vmwgfx) look at dev_mapping
in their open hook, so we have to set dev->dev_mapping
earlier in the process.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_fops.c |   43 +--
 1 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index 7ef1b67..50b7b47 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -121,6 +121,8 @@ int drm_open(struct inode *inode, struct file *filp)
int minor_id = iminor(inode);
struct drm_minor *minor;
int retcode = 0;
+   int need_setup = 0;
+   struct address_space *old_mapping;
 
minor = idr_find(&drm_minors_idr, minor_id);
if (!minor)
@@ -132,23 +134,36 @@ int drm_open(struct inode *inode, struct file *filp)
if (drm_device_is_unplugged(dev))
return -ENODEV;
 
+   if (!dev->open_count++)
+   need_setup = 1;
+   mutex_lock(&dev->struct_mutex);
+   old_mapping = dev->dev_mapping;
+   if (old_mapping == NULL)
+   dev->dev_mapping = &inode->i_data;
+   mutex_unlock(&dev->struct_mutex);
+
retcode = drm_open_helper(inode, filp, dev);
-   if (!retcode) {
-   atomic_inc(&dev->counts[_DRM_STAT_OPENS]);
-   if (!dev->open_count++)
-   retcode = drm_setup(dev);
-   }
-   if (!retcode) {
-   mutex_lock(&dev->struct_mutex);
-   if (dev->dev_mapping == NULL)
-   dev->dev_mapping = &inode->i_data;
-   /* ihold ensures nobody can remove inode with our i_data */
-   ihold(container_of(dev->dev_mapping, struct inode, i_data));
-   inode->i_mapping = dev->dev_mapping;
-   filp->f_mapping = dev->dev_mapping;
-   mutex_unlock(&dev->struct_mutex);
+   if (retcode)
+   goto err_undo;
+   atomic_inc(&dev->counts[_DRM_STAT_OPENS]);
+   if (need_setup) {
+   retcode = drm_setup(dev);
+   if (retcode)
+   goto err_undo;
}
+   /* ihold ensures nobody can remove inode with our i_data */
+   mutex_lock(&dev->struct_mutex);
+   ihold(container_of(dev->dev_mapping, struct inode, i_data));
+   inode->i_mapping = dev->dev_mapping;
+   filp->f_mapping = dev->dev_mapping;
+   mutex_unlock(&dev->struct_mutex);
+   return 0;
 
+err_undo:
+   dev->open_count--;
+   mutex_lock(&dev->struct_mutex);
+   dev->dev_mapping = old_mapping;
+   mutex_unlock(&dev->struct_mutex);
return retcode;
 }
 EXPORT_SYMBOL(drm_open);
-- 
1.7.4.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Breakage in "track dev_mapping in more robust and flexible way"

2012-10-26 Thread Ilija Hadzic




On Fri, 26 Oct 2012, Thomas Hellstrom wrote:


Hi,

On 10/25/2012 11:27 PM, Ilija Hadzic wrote:


Can you give the attached patch a whirl and let me know if it fixes the 
problem?


As I indicated in my previous note, vmwgfx should be the only affected 
driver because it looks at dev_mapping in the open hook (others do it when 
they create an object, so they are not affected).


I'll probably revise it (and I'll have some general questions about 
drm_open syscall) before officially send the patch, but I wanted to get 
something quickly to you to check if it fixes your problem. I hope that 
your vmwgfx test environment is such that you can reproduce the original

problem.

thanks,

-- Ilija


Yes, it appears like this patch fixes the problem. It'd be good to have it in 
3.7 (drm-fixes) with a cc to stable.




OK great. Thanks for testing. Before I send out an "official" patch, I 
have a few questions for those who have been around longer and can 
possibly reflect better than me on the history of drm_open syscall.


Currently, before touching dev->dev_mapping field we grab dev->struct 
mutex. This has been introduced by Dave Airlie a long time ago in 
a2c0a97b784f837300f7b0869c82ab712c600952. I tried to preserve that in all 
patches where I touched dev_open, but looking at the code I don't think 
the mutex is necessary. Namely, drm_open is only set in drm_open, and 
concurrent openers are protected with drm_global_mutex. Other places 
(drivers) where dev->dev_mapping is accessed is read-only and dev_mapping 
is written at first open when there are no file descriptors around to 
issue any other call. Also, it doesn't look to me that any driver locks 
dev->struct_mutex before accessing dev->dev_mapping anyway. So I am 
thinking of dropping the mutex completely, but I would like to hear a 
second thought.


The other issue, I noticed is that of the drm_setup() call fails, the 
open_count counter would remain incremented and I think we need to restore 
it back (or if I am missing something, would someone please enlighten me). 
This was also in the kernel all this time (and I have not noticed until 
now), so I "smuggled" that fix in the patch that I sent you. However, 
wonder if I should cut the separate patch for open_count fix.


Actually, I think that I should cut three patches: one to drop the mutex, 
one to fix the open_count and one to fix your problem with dev_mapping and 
that probably all three should CC stable. Before I do that, I'd like to 
hear opinions of others.


thanks,

Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/2] drm: restore open_count if drm_setup fails

2012-10-29 Thread Ilija Hadzic

If drm_setup (called at first open) fails, the whole
open call has failed, so we should not keep the
open_count incremented.

Signed-off-by: Ilija Hadzic 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/drm_fops.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index 7ef1b67..af68eca 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -135,8 +135,11 @@ int drm_open(struct inode *inode, struct file *filp)
retcode = drm_open_helper(inode, filp, dev);
if (!retcode) {
atomic_inc(&dev->counts[_DRM_STAT_OPENS]);
-   if (!dev->open_count++)
+   if (!dev->open_count++) {
retcode = drm_setup(dev);
+   if (retcode)
+   dev->open_count--;
+   }
}
if (!retcode) {
mutex_lock(&dev->struct_mutex);
-- 
1.7.12.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/2] drm: set dev_mapping before calling drm_open_helper

2012-10-29 Thread Ilija Hadzic

Some drivers (specifically vmwgfx) look at dev_mapping
in their open hook, so we have to set dev->dev_mapping
earlier in the process.

Reference:
http://lists.freedesktop.org/archives/dri-devel/2012-October/029420.html

Signed-off-by: Ilija Hadzic 
Reported-by: Thomas Hellstrom 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/drm_fops.c | 47 +-
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index af68eca..133b413 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -121,6 +121,8 @@ int drm_open(struct inode *inode, struct file *filp)
int minor_id = iminor(inode);
struct drm_minor *minor;
int retcode = 0;
+   int need_setup = 0;
+   struct address_space *old_mapping;
 
minor = idr_find(&drm_minors_idr, minor_id);
if (!minor)
@@ -132,26 +134,37 @@ int drm_open(struct inode *inode, struct file *filp)
if (drm_device_is_unplugged(dev))
return -ENODEV;
 
+   if (!dev->open_count++)
+   need_setup = 1;
+   mutex_lock(&dev->struct_mutex);
+   old_mapping = dev->dev_mapping;
+   if (old_mapping == NULL)
+   dev->dev_mapping = &inode->i_data;
+   /* ihold ensures nobody can remove inode with our i_data */
+   ihold(container_of(dev->dev_mapping, struct inode, i_data));
+   inode->i_mapping = dev->dev_mapping;
+   filp->f_mapping = dev->dev_mapping;
+   mutex_unlock(&dev->struct_mutex);
+
retcode = drm_open_helper(inode, filp, dev);
-   if (!retcode) {
-   atomic_inc(&dev->counts[_DRM_STAT_OPENS]);
-   if (!dev->open_count++) {
-   retcode = drm_setup(dev);
-   if (retcode)
-   dev->open_count--;
-   }
-   }
-   if (!retcode) {
-   mutex_lock(&dev->struct_mutex);
-   if (dev->dev_mapping == NULL)
-   dev->dev_mapping = &inode->i_data;
-   /* ihold ensures nobody can remove inode with our i_data */
-   ihold(container_of(dev->dev_mapping, struct inode, i_data));
-   inode->i_mapping = dev->dev_mapping;
-   filp->f_mapping = dev->dev_mapping;
-   mutex_unlock(&dev->struct_mutex);
+   if (retcode)
+   goto err_undo;
+   atomic_inc(&dev->counts[_DRM_STAT_OPENS]);
+   if (need_setup) {
+   retcode = drm_setup(dev);
+   if (retcode)
+   goto err_undo;
}
+   return 0;
 
+err_undo:
+   mutex_lock(&dev->struct_mutex);
+   filp->f_mapping = old_mapping;
+   inode->i_mapping = old_mapping;
+   iput(container_of(dev->dev_mapping, struct inode, i_data));
+   dev->dev_mapping = old_mapping;
+   mutex_unlock(&dev->struct_mutex);
+   dev->open_count--;
return retcode;
 }
 EXPORT_SYMBOL(drm_open);
-- 
1.7.12.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] drm/radeon: allow pcie gen2 speed on NI

2011-10-12 Thread Ilija Hadzic



Hi Dave,

A few weeks ago I sent the two patches that allow PCI Express interface to 
run at Gen 2 speed on NI parts. Links to the patches in the mailing list 
archive + review from Alex quoted below:


http://lists.freedesktop.org/archives/dri-devel/2011-September/014474.html
http://lists.freedesktop.org/archives/dri-devel/2011-September/014475.html

I saw some activity on drm-next and drm-core-next branches, but I have 
not seen these two patches merge yet. Just wondering if they are in the 
queue for merging or if they may have fell through the cracks?


thanks,

Ilija

On Tue, 20 Sep 2011, Alex Deucher wrote:


On Tue, Sep 20, 2011 at 10:22 AM, Ilija Hadzic
 wrote:

Enabling pcie gen2 speed was skipped for Northern Islands
AISCs, although it looks like it works just fine with the same
initialization sequence used for evergreen.

According to Alex D. gen2 init was skipped to prevent a crash
that has been caused by some other bug that has been
fixed in the meantime; so now it should be safe to enable it.

Signed-off-by: Ilija Hadzic 


I just double checked and BTC and cayman use the same programming
method.  Both patches:

Reviewed-by: Alex Deucher 

Thanks!

Alex



---
 drivers/gpu/drm/radeon/evergreen.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index f09bace..208b59c 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -2987,8 +2987,7 @@ static int evergreen_startup(struct radeon_device *rdev)
       int r;

       /* enable pcie gen2 link */
-       if (!ASIC_IS_DCE5(rdev))
-               evergreen_pcie_gen2_enable(rdev);
+       evergreen_pcie_gen2_enable(rdev);

       if (ASIC_IS_DCE5(rdev)) {
               if (!rdev->me_fw || !rdev->pfp_fw || !rdev->rlc_fw || 
!rdev->mc_fw) {
--
1.7.6

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

drm/radeon/kms: improve performance of blit-copy

2011-10-12 Thread Ilija Hadzic


The following set of patches will improve the performance
of blit-copy functions for Radeon GPUs based on 
R600, R700, Evergreen and NI ASICs.

The foundation for improvement is the use of tiled mode access
(which for copying bo's can be used regardless of whether the
content is tiled or not), and segmenting the memory block
being copied into rectangles whose edge ratio is between 1:1
and 1:2. This maximizes the number of PCIe transactions that
use maximum payload size (typically 128 bytes) and also 
creates a memory access pattern that is more favorable for
both VRAM and host DRAM than what's currently in the kernel.

To come up with the new blit-copy code, I did a lot of 
PCIe traffic analysis with the bus analyzer and also 
had many discussions with Alex, trying to explain what's 
going on (thanks to Alex for his time).

Below (at the end of this note) are the results of some benchmarks
that I did with various GPUs (all in the same host: Intel i7 CPU,
X58 chipset, three DRAM channels). To run the tests on your machine
load the radeon module with 'benchmark=1 pcie_gen2=1' parameters.
Most significant improvement is in the upstream (VRAM to GART)
direction because that's where the PCIe transactions were fragmented 
and also where memory access pattern was such that it created a lot of 
backpressure from the host.

It is also interesting that high-end devices (e.g. Cayman) exhibit
the least improvement and were the worst to begin with. This is
because high-end devices copy more tiles in parallel which 
in turn can create bank conflicts on host memory and cause the
host to do lots of bank-close/precharge/bank-open cycles. 

As an added "bonus", I also did some code cleanup and consolidated
the repeated code into common function, so r600 and evergreen/NI
parts now share the blit-copy code. I also expanded on the
benchmark coverage, so the module now takes benckmark parameter
value between 1 and 8 and each results in running a different 
benchmark.

For details, see the commit log messages and the code.
I have been running with these patches for a few months 
(and I kept rebasing them to drm-core-next as the public 
git progressed) and I used them in a system setup that does
*many* copying of this kind (and does them frequently); I 
have not seen instabilities introduced by these patches. I also
verified the correctness of the copy using test=1 parameter
for each GPU that I had and the test passed.

I would welcome some feedback and if you run the benchmarks
with the new blit code, I would very much like to hear
what kind of improvement you are seeing.


BENCHMARK RESULTS:
==

1) VRAM to GTT 
==

Card (ASIC) VRAMBefore  After
-
5570 (Redwood)  DDR3 1600MHZ 4543912
6450 (Caicos)   DDR5 3200MHz37185090
6570 (Turks)DDR3 1800MHz 4844144
5450 (Cedar)DDR3 1600MHz36795090
5450 (Cedar)DDR2  800MHz26954639
E4690 (RV730)   DDR3 1400MHZ 4854969
E6760 (Turks)   DDR5 3200MHz 4744177
V5700 (RV730)   DDR3 MHz 4884297
2260 (RV620)DDR2 MHz 4943093
6870 (Barts)DDR5 4200MHz 4751113
6970 (Cayman)   DDR5 4200MHz 473 710

2) GTT to VRAM
==

Card (ASIC) VRAMBefore  After
-
5570 (Redwood)  DDR3 1600MHz31583360
6450 (Caicos)   DDR5 3200MHz29953393
6570 (Turks)DDR3 1800MHz30393339
5450 (Cedar)DDR3 1600MHz32463404
5450 (Cedar)DDR2  800MHz26143371
E4690 (RV730)   DDR3 1400MHz30843426
E6760 (Turks)   DDR5 3200MHz24432570
V5700 (RV730)   DDR3 MHz31873506
2260 (RV620)DDR2 MHz 5843246
6870 (Barts)DDR5 4200MHz24722601
6970 (Cayman)   DDR5 4200MHz24602737
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/9] drm/radeon/kms: improve evergreen blit code

2011-10-12 Thread Ilija Hadzic

start with first-cut conceptual patch from Alex Deucher
(commit info below); turn on 1D tiling
make rectangular buffer always 2:1 or 1:2 ratio
make buffer dimenstions an integer multiple of unit dimensions
make sures that integral number of pages map to the buffer
fix a few bugs that resulted in incorrect dimensions
tidy up a little bit to get rid of an ugly if/else
parametrize some "magic" constants
add protections from illegal buffer sizes etc.

>From 77e6703c37f0ad8673b9ab285589d5c26782a515 Mon Sep 17 00:00:00 2001
From: Alex Deucher 
Date: Tue, 17 May 2011 05:08:58 -0400
Subject: [PATCH 1/2] drm/radeon/kms: simplify evergreen blit code

Covert 4k pages to multiples of 64x64x4 tiles.
This is also more efficient than a scanline based
approach from the MC's perspective.

Signed-off-by: Alex Deucher 
Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/evergreen.c  |4 +-
 drivers/gpu/drm/radeon/evergreen_blit_kms.c |  295 +++
 drivers/gpu/drm/radeon/radeon_asic.h|4 +-
 3 files changed, 123 insertions(+), 180 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 5df39bf..5f0ecc7 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3180,14 +3180,14 @@ int evergreen_copy_blit(struct radeon_device *rdev,
 
mutex_lock(&rdev->r600_blit.mutex);
rdev->r600_blit.vb_ib = NULL;
-   r = evergreen_blit_prepare_copy(rdev, num_pages * RADEON_GPU_PAGE_SIZE);
+   r = evergreen_blit_prepare_copy(rdev, num_pages);
if (r) {
if (rdev->r600_blit.vb_ib)
radeon_ib_free(rdev, &rdev->r600_blit.vb_ib);
mutex_unlock(&rdev->r600_blit.mutex);
return r;
}
-   evergreen_kms_blit_copy(rdev, src_offset, dst_offset, num_pages * 
RADEON_GPU_PAGE_SIZE);
+   evergreen_kms_blit_copy(rdev, src_offset, dst_offset, num_pages);
evergreen_blit_done_copy(rdev, fence);
mutex_unlock(&rdev->r600_blit.mutex);
return 0;
diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c 
b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
index 2eb2518..3b24137 100644
--- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c
+++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
@@ -44,6 +44,10 @@
 #define COLOR_5_6_5   0x8
 #define COLOR_8_8_8_8 0x1a
 
+#define RECT_UNIT_H   32
+#define RECT_UNIT_W   (RADEON_GPU_PAGE_SIZE / 4 / RECT_UNIT_H)
+#define MAX_RECT_DIM  16384
+
 /* emits 17 */
 static void
 set_render_target(struct radeon_device *rdev, int format,
@@ -56,7 +60,7 @@ set_render_target(struct radeon_device *rdev, int format,
if (h < 8)
h = 8;
 
-   cb_color_info = ((format << 2) | (1 << 24) | (1 << 8));
+   cb_color_info = ((format << 2) | (1 << 24) | (2 << 8));
pitch = (w / 8) - 1;
slice = ((w * h) / 64) - 1;
 
@@ -67,7 +71,7 @@ set_render_target(struct radeon_device *rdev, int format,
radeon_ring_write(rdev, slice);
radeon_ring_write(rdev, 0);
radeon_ring_write(rdev, cb_color_info);
-   radeon_ring_write(rdev, (1 << 4));
+   radeon_ring_write(rdev, 0);
radeon_ring_write(rdev, (w - 1) | ((h - 1) << 16));
radeon_ring_write(rdev, 0);
radeon_ring_write(rdev, 0);
@@ -179,7 +183,7 @@ set_tex_resource(struct radeon_device *rdev,
sq_tex_resource_word0 = (1 << 0); /* 2D */
sq_tex_resource_word0 |= pitch >> 3) - 1) << 6) |
  ((w - 1) << 18));
-   sq_tex_resource_word1 = ((h - 1) << 0) | (1 << 28);
+   sq_tex_resource_word1 = ((h - 1) << 0) | (2 << 28);
/* xyzw swizzles */
sq_tex_resource_word4 = (0 << 16) | (1 << 19) | (2 << 22) | (3 << 25);
 
@@ -751,30 +755,80 @@ static void evergreen_vb_ib_put(struct radeon_device 
*rdev)
radeon_ib_free(rdev, &rdev->r600_blit.vb_ib);
 }
 
-int evergreen_blit_prepare_copy(struct radeon_device *rdev, int size_bytes)
+
+/* maps the rectangle to the buffer so that satisfies the following properties:
+ * - dimensions are less or equal to the hardware limit (MAX_RECT_DIM)
+ * - rectangle consists of integer number of pages
+ * - height is an integer multiple of RECT_UNIT_H
+ * - width is an integer multiple of RECT_UNIT_W
+ * - (the above three conditions also guarantee tile-aligned size)
+ * - it is as square as possible (sides ratio never greater than 2:1)
+ * - uses maximum number of pages that fit the above constraints
+ *
+ *  input:  buffer size, pointers to width/height variables
+ *  return: number of pages that were successfully mapped to the rectangle
+ *  width/height of the rectangle
+ */
+static unsigned ever

[PATCH 2/9] drm/radeon/kms: improve r6xx blit code

2011-10-12 Thread Ilija Hadzic

start with first-cut conceptual patch from Alex Deucher
(commit info below); turn on 1D tiling
make rectangular buffer always 2:1 or 1:2 ratio
make buffer dimenstions an integer multiple of unit
dimensionsmake sures that integral number of pages map
to the buffer fix a few bugs that resulted in incorrect
dimensions tidy up a little bit to get rid of an ugly
if/else parametrize some "magic" constants
add protections from illegal buffer sizes etc.

>From 2cd7a267d6cbcdf414b7a724237aa24525c12b54 Mon Sep 17 00:00:00 2001
From: Alex Deucher 
Date: Tue, 17 May 2011 05:09:43 -0400
Subject: [PATCH 2/2] drm/radeon/kms: simplify r6xx blit code

Covert 4k pages to multiples of 64x64x4 tiles.
This is also more efficient than a scanline based
approach from the MC's perspective.

Signed-off-by: Alex Deucher 
Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/r600.c  |4 +-
 drivers/gpu/drm/radeon/r600_blit_kms.c |  276 
 drivers/gpu/drm/radeon/radeon_asic.h   |4 +-
 3 files changed, 109 insertions(+), 175 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 334aee6..9fc6844 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2363,14 +2363,14 @@ int r600_copy_blit(struct radeon_device *rdev,
 
mutex_lock(&rdev->r600_blit.mutex);
rdev->r600_blit.vb_ib = NULL;
-   r = r600_blit_prepare_copy(rdev, num_pages * RADEON_GPU_PAGE_SIZE);
+   r = r600_blit_prepare_copy(rdev, num_pages);
if (r) {
if (rdev->r600_blit.vb_ib)
radeon_ib_free(rdev, &rdev->r600_blit.vb_ib);
mutex_unlock(&rdev->r600_blit.mutex);
return r;
}
-   r600_kms_blit_copy(rdev, src_offset, dst_offset, num_pages * 
RADEON_GPU_PAGE_SIZE);
+   r600_kms_blit_copy(rdev, src_offset, dst_offset, num_pages);
r600_blit_done_copy(rdev, fence);
mutex_unlock(&rdev->r600_blit.mutex);
return 0;
diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c 
b/drivers/gpu/drm/radeon/r600_blit_kms.c
index 9aa74c3..d9994c9 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -42,6 +42,10 @@
 #define COLOR_5_6_5   0x8
 #define COLOR_8_8_8_8 0x1a
 
+#define RECT_UNIT_H   32
+#define RECT_UNIT_W   (RADEON_GPU_PAGE_SIZE / 4 / RECT_UNIT_H)
+#define MAX_RECT_DIM  8192
+
 /* emits 21 on rv770+, 23 on r600 */
 static void
 set_render_target(struct radeon_device *rdev, int format,
@@ -600,13 +604,59 @@ static void r600_vb_ib_put(struct radeon_device *rdev)
radeon_ib_free(rdev, &rdev->r600_blit.vb_ib);
 }
 
-int r600_blit_prepare_copy(struct radeon_device *rdev, int size_bytes)
+/* FIXME: the function is very similar to evergreen_blit_create_rect, except
+   that it different predefined constants; consider commonizing */
+static unsigned r600_blit_create_rect(unsigned num_pages, int *width, int 
*height)
+{
+   unsigned max_pages;
+   unsigned pages = num_pages;
+   int w, h;
+
+   if (num_pages == 0) {
+   /* not supposed to be called with no pages, but just in case */
+   h = 0;
+   w = 0;
+   pages = 0;
+   WARN_ON(1);
+   } else {
+   int rect_order = 2;
+   h = RECT_UNIT_H;
+   while (num_pages / rect_order) {
+   h *= 2;
+   rect_order *= 4;
+   if (h >= MAX_RECT_DIM) {
+   h = MAX_RECT_DIM;
+   break;
+   }
+   }
+   max_pages = (MAX_RECT_DIM * h) / (RECT_UNIT_W * RECT_UNIT_H);
+   if (pages > max_pages)
+   pages = max_pages;
+   w = (pages * RECT_UNIT_W * RECT_UNIT_H) / h;
+   w = (w / RECT_UNIT_W) * RECT_UNIT_W;
+   pages = (w * h) / (RECT_UNIT_W * RECT_UNIT_H);
+   BUG_ON(pages == 0);
+   }
+
+
+   DRM_DEBUG("blit_rectangle: h=%d, w=%d, pages=%d\n", h, w, pages);
+
+   /* return width and height only of the caller wants it */
+   if (height)
+   *height = h;
+   if (width)
+   *width = w;
+
+   return pages;
+}
+
+
+int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_pages)
 {
int r;
-   int ring_size, line_size;
-   int max_size;
+   int ring_size;
/* loops of emits 64 + fence emit possible */
-   int dwords_per_loop = 76, num_loops;
+   int dwords_per_loop = 76, num_loops = 0;
 
r = r600_vb_ib_get(rdev);
if (r)
@@ -616,18 +666,12 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, 
int size_bytes)
if (rdev->family > CHIP_R600 && rdev->family < CHIP_RV770)

[PATCH 3/9] drm/radeon/kms: demystify evergreen blit code

2011-10-12 Thread Ilija Hadzic

some bits in 3D registers used by blit functions look like
magic and this is hard to follow; change them to a little bit
more meaningful pre-defined constants

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/evergreen_blit_kms.c |   29 +--
 drivers/gpu/drm/radeon/evergreend.h |   42 +++
 2 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c 
b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
index 3b24137..68d0de2 100644
--- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c
+++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
@@ -60,7 +60,9 @@ set_render_target(struct radeon_device *rdev, int format,
if (h < 8)
h = 8;
 
-   cb_color_info = ((format << 2) | (1 << 24) | (2 << 8));
+   cb_color_info = CB_FORMAT(format) |
+   CB_SOURCE_FORMAT(CB_SF_EXPORT_NORM) |
+   CB_ARRAY_MODE(ARRAY_1D_TILED_THIN1);
pitch = (w / 8) - 1;
slice = ((w * h) / 64) - 1;
 
@@ -137,12 +139,16 @@ set_vtx_resource(struct radeon_device *rdev, u64 gpu_addr)
u32 sq_vtx_constant_word2, sq_vtx_constant_word3;
 
/* high addr, stride */
-   sq_vtx_constant_word2 = ((upper_32_bits(gpu_addr) & 0xff) | (16 << 8));
+   sq_vtx_constant_word2 = SQ_VTXC_BASE_ADDR_HI(upper_32_bits(gpu_addr) & 
0xff) |
+   SQ_VTXC_STRIDE(16);
 #ifdef __BIG_ENDIAN
-   sq_vtx_constant_word2 |= (2 << 30);
+   sq_vtx_constant_word2 |= SQ_VTXC_ENDIAN_SWAP(SQ_ENDIAN_8IN32);
 #endif
/* xyzw swizzles */
-   sq_vtx_constant_word3 = (0 << 3) | (1 << 6) | (2 << 9) | (3 << 12);
+   sq_vtx_constant_word3 = SQ_VTCX_SEL_X(SQ_SEL_X) |
+   SQ_VTCX_SEL_Y(SQ_SEL_Y) |
+   SQ_VTCX_SEL_Z(SQ_SEL_Z) |
+   SQ_VTCX_SEL_W(SQ_SEL_W);
 
radeon_ring_write(rdev, PACKET3(PACKET3_SET_RESOURCE, 8));
radeon_ring_write(rdev, 0x580);
@@ -153,7 +159,7 @@ set_vtx_resource(struct radeon_device *rdev, u64 gpu_addr)
radeon_ring_write(rdev, 0);
radeon_ring_write(rdev, 0);
radeon_ring_write(rdev, 0);
-   radeon_ring_write(rdev, SQ_TEX_VTX_VALID_BUFFER << 30);
+   radeon_ring_write(rdev, S__SQ_CONSTANT_TYPE(SQ_TEX_VTX_VALID_BUFFER));
 
if ((rdev->family == CHIP_CEDAR) ||
(rdev->family == CHIP_PALM) ||
@@ -180,14 +186,19 @@ set_tex_resource(struct radeon_device *rdev,
if (h < 1)
h = 1;
 
-   sq_tex_resource_word0 = (1 << 0); /* 2D */
+   sq_tex_resource_word0 = TEX_DIM(SQ_TEX_DIM_2D);
sq_tex_resource_word0 |= pitch >> 3) - 1) << 6) |
  ((w - 1) << 18));
-   sq_tex_resource_word1 = ((h - 1) << 0) | (2 << 28);
+   sq_tex_resource_word1 = ((h - 1) << 0) |
+   TEX_ARRAY_MODE(ARRAY_1D_TILED_THIN1);
/* xyzw swizzles */
-   sq_tex_resource_word4 = (0 << 16) | (1 << 19) | (2 << 22) | (3 << 25);
+   sq_tex_resource_word4 = TEX_DST_SEL_X(SQ_SEL_X) |
+   TEX_DST_SEL_Y(SQ_SEL_Y) |
+   TEX_DST_SEL_Z(SQ_SEL_Z) |
+   TEX_DST_SEL_W(SQ_SEL_W);
 
-   sq_tex_resource_word7 = format | (SQ_TEX_VTX_VALID_TEXTURE << 30);
+   sq_tex_resource_word7 = format |
+   S__SQ_CONSTANT_TYPE(SQ_TEX_VTX_VALID_TEXTURE);
 
radeon_ring_write(rdev, PACKET3(PACKET3_SET_RESOURCE, 8));
radeon_ring_write(rdev, 0);
diff --git a/drivers/gpu/drm/radeon/evergreend.h 
b/drivers/gpu/drm/radeon/evergreend.h
index 7363d9d..b937c49 100644
--- a/drivers/gpu/drm/radeon/evergreend.h
+++ b/drivers/gpu/drm/radeon/evergreend.h
@@ -941,11 +941,15 @@
 #defineCB_COLOR0_SLICE 0x28c68
 #defineCB_COLOR0_VIEW  0x28c6c
 #defineCB_COLOR0_INFO  0x28c70
+#  define CB_FORMAT(x) ((x) << 2)
 #   define CB_ARRAY_MODE(x) ((x) << 8)
 #   define ARRAY_LINEAR_GENERAL 0
 #   define ARRAY_LINEAR_ALIGNED 1
 #   define ARRAY_1D_TILED_THIN1 2
 #   define ARRAY_2D_TILED_THIN1 4
+#  define CB_SOURCE_FORMAT(x)  ((x) << 24)
+#  define CB_SF_EXPORT_FULL0
+#  define CB_SF_EXPORT_NORM1
 #defineCB_COLOR0_ATTRIB0x28c74
 #defineCB_COLOR0_DIM   0x28c78
 /* only CB0-7 blocks have these regs */
@@ -1107,15 +,53 @@
 #defineCB_COLOR7_CLEAR_WORD3   0x28e3c
 
 #define SQ_TEX_RESOURCE_WORD0_0

[PATCH 4/9] drm/radeon/kms: demystify r600 blit code

2011-10-12 Thread Ilija Hadzic

some 3d register bits look like magic in r600 blit functions
use predefined constants to make it more intuitive what they are

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/r600_blit_kms.c |   30 +-
 drivers/gpu/drm/radeon/r600d.h |   22 ++
 2 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c 
b/drivers/gpu/drm/radeon/r600_blit_kms.c
index d9994c9..71fec92 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -58,7 +58,9 @@ set_render_target(struct radeon_device *rdev, int format,
if (h < 8)
h = 8;
 
-   cb_color_info = ((format << 2) | (1 << 27) | (1 << 8));
+   cb_color_info = CB_FORMAT(format) |
+   CB_SOURCE_FORMAT(CB_SF_EXPORT_NORM) |
+   CB_ARRAY_MODE(ARRAY_1D_TILED_THIN1);
pitch = (w / 8) - 1;
slice = ((w * h) / 64) - 1;
 
@@ -168,9 +170,10 @@ set_vtx_resource(struct radeon_device *rdev, u64 gpu_addr)
 {
u32 sq_vtx_constant_word2;
 
-   sq_vtx_constant_word2 = ((upper_32_bits(gpu_addr) & 0xff) | (16 << 8));
+   sq_vtx_constant_word2 = SQ_VTXC_BASE_ADDR_HI(upper_32_bits(gpu_addr) & 
0xff) |
+   SQ_VTXC_STRIDE(16);
 #ifdef __BIG_ENDIAN
-   sq_vtx_constant_word2 |= (2 << 30);
+   sq_vtx_constant_word2 |=  SQ_VTXC_ENDIAN_SWAP(SQ_ENDIAN_8IN32);
 #endif
 
radeon_ring_write(rdev, PACKET3(PACKET3_SET_RESOURCE, 7));
@@ -206,18 +209,19 @@ set_tex_resource(struct radeon_device *rdev,
if (h < 1)
h = 1;
 
-   sq_tex_resource_word0 = (1 << 0) | (1 << 3);
-   sq_tex_resource_word0 |= pitch >> 3) - 1) << 8) |
- ((w - 1) << 19));
+   sq_tex_resource_word0 = S_038000_DIM(V_038000_SQ_TEX_DIM_2D) |
+   S_038000_TILE_MODE(V_038000_ARRAY_1D_TILED_THIN1);
+   sq_tex_resource_word0 |= S_038000_PITCH((pitch >> 3) - 1) |
+   S_038000_TEX_WIDTH(w - 1);
 
-   sq_tex_resource_word1 = (format << 26);
-   sq_tex_resource_word1 |= ((h - 1) << 0);
+   sq_tex_resource_word1 = S_038004_DATA_FORMAT(format);
+   sq_tex_resource_word1 |= S_038004_TEX_HEIGHT(h - 1);
 
-   sq_tex_resource_word4 = ((1 << 14) |
-(0 << 16) |
-(1 << 19) |
-(2 << 22) |
-(3 << 25));
+   sq_tex_resource_word4 = S_038010_REQUEST_SIZE(1) |
+   S_038010_DST_SEL_X(SQ_SEL_X) |
+   S_038010_DST_SEL_Y(SQ_SEL_Y) |
+   S_038010_DST_SEL_Z(SQ_SEL_Z) |
+   S_038010_DST_SEL_W(SQ_SEL_W);
 
radeon_ring_write(rdev, PACKET3(PACKET3_SET_RESOURCE, 7));
radeon_ring_write(rdev, 0);
diff --git a/drivers/gpu/drm/radeon/r600d.h b/drivers/gpu/drm/radeon/r600d.h
index 0245ae6..bfe1b5d 100644
--- a/drivers/gpu/drm/radeon/r600d.h
+++ b/drivers/gpu/drm/radeon/r600d.h
@@ -79,6 +79,11 @@
 #define CB_COLOR0_SIZE  0x28060
 #define CB_COLOR0_VIEW  0x28080
 #define CB_COLOR0_INFO  0x280a0
+#  define CB_FORMAT(x) ((x) << 2)
+#   define CB_ARRAY_MODE(x) ((x) << 8)
+#  define CB_SOURCE_FORMAT(x)  ((x) << 27)
+#  define CB_SF_EXPORT_FULL0
+#  define CB_SF_EXPORT_NORM1
 #define CB_COLOR0_TILE  0x280c0
 #define CB_COLOR0_FRAG  0x280e0
 #define CB_COLOR0_MASK  0x28100
@@ -417,6 +422,17 @@
 #defineSQ_PGM_START_VS 0x28858
 #define SQ_PGM_RESOURCES_VS 0x28868
 #define SQ_PGM_CF_OFFSET_VS 0x288d0
+
+#define SQ_VTX_CONSTANT_WORD0_00x3
+#define SQ_VTX_CONSTANT_WORD1_00x30004
+#define SQ_VTX_CONSTANT_WORD2_00x30008
+#  define SQ_VTXC_BASE_ADDR_HI(x)  ((x) << 0)
+#  define SQ_VTXC_STRIDE(x)((x) << 8)
+#  define SQ_VTXC_ENDIAN_SWAP(x)   ((x) << 30)
+#  define SQ_ENDIAN_NONE   0
+#  define SQ_ENDIAN_8IN16  1
+#  define SQ_ENDIAN_8IN32  2
+#define SQ_VTX_CONSTANT_WORD3_00x3000c
 #defineSQ_VTX_CONSTANT_WORD6_0 0x38018
 #defineS__SQ_VTX_CONSTANT_TYPE(x)  (((x) & 
3) << 30)
 #define

[PATCH 5/9] drm/radeon/kms: cleanup benchmark code

2011-10-12 Thread Ilija Hadzic

factor out repeated code into functions
fix units in which the throughput is reported (megabytes per second
and megabits per second make sense, others are kind of confusing)
make report more amenable to awk and friends (e.g. whitespace is
always the separator, unit is separated from the number, etc)
add #defines for some hard coded constants

besides "beautification" this reorg is done in preparation
for writing more elaborate benchmarks

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon_benchmark.c |  156 -
 1 files changed, 86 insertions(+), 70 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_benchmark.c 
b/drivers/gpu/drm/radeon/radeon_benchmark.c
index 10191d9..6951426 100644
--- a/drivers/gpu/drm/radeon/radeon_benchmark.c
+++ b/drivers/gpu/drm/radeon/radeon_benchmark.c
@@ -26,21 +26,80 @@
 #include "radeon_reg.h"
 #include "radeon.h"
 
-void radeon_benchmark_move(struct radeon_device *rdev, unsigned bsize,
-  unsigned sdomain, unsigned ddomain)
+#define RADEON_BENCHMARK_COPY_BLIT 1
+#define RADEON_BENCHMARK_COPY_DMA  0
+
+#define RADEON_BENCHMARK_ITERATIONS 1024
+
+static int radeon_benchmark_do_move(struct radeon_device *rdev, unsigned size,
+   uint64_t saddr, uint64_t daddr,
+   int flag, int n)
+{
+   unsigned long start_jiffies;
+   unsigned long end_jiffies;
+   struct radeon_fence *fence = NULL;
+   int i, r;
+
+   start_jiffies = jiffies;
+   for (i = 0; i < n; i++) {
+   r = radeon_fence_create(rdev, &fence);
+   if (r)
+   return r;
+
+   switch (flag) {
+   case RADEON_BENCHMARK_COPY_DMA:
+   r = radeon_copy_dma(rdev, saddr, daddr,
+   size / RADEON_GPU_PAGE_SIZE,
+   fence);
+   break;
+   case RADEON_BENCHMARK_COPY_BLIT:
+   r = radeon_copy_blit(rdev, saddr, daddr,
+size / RADEON_GPU_PAGE_SIZE,
+fence);
+   break;
+   default:
+   DRM_ERROR("Unknown copy method\n");
+   r = -EINVAL;
+   }
+   if (r)
+   goto exit_do_move;
+   r = radeon_fence_wait(fence, false);
+   if (r)
+   goto exit_do_move;
+   radeon_fence_unref(&fence);
+   }
+   end_jiffies = jiffies;
+   r = jiffies_to_msecs(end_jiffies - start_jiffies);
+
+exit_do_move:
+   if (fence)
+   radeon_fence_unref(&fence);
+   return r;
+}
+
+
+static void radeon_benchmark_log_results(int n, unsigned size,
+unsigned int time,
+unsigned sdomain, unsigned ddomain,
+char *kind)
+{
+   unsigned int throughput = (n * (size >> 10)) / time;
+   DRM_INFO("radeon: %s %u bo moves of %u kB from"
+" %d to %d in %u ms, throughput: %u Mb/s or %u MB/s\n",
+kind, n, size >> 10, sdomain, ddomain, time,
+throughput * 8, throughput);
+}
+
+static void radeon_benchmark_move(struct radeon_device *rdev, unsigned size,
+ unsigned sdomain, unsigned ddomain)
 {
struct radeon_bo *dobj = NULL;
struct radeon_bo *sobj = NULL;
-   struct radeon_fence *fence = NULL;
uint64_t saddr, daddr;
-   unsigned long start_jiffies;
-   unsigned long end_jiffies;
-   unsigned long time;
-   unsigned i, n, size;
-   int r;
+   int r, n;
+   unsigned int time;
 
-   size = bsize;
-   n = 1024;
+   n = RADEON_BENCHMARK_ITERATIONS;
r = radeon_bo_create(rdev, size, PAGE_SIZE, true, sdomain, &sobj);
if (r) {
goto out_cleanup;
@@ -68,64 +127,23 @@ void radeon_benchmark_move(struct radeon_device *rdev, 
unsigned bsize,
 
/* r100 doesn't have dma engine so skip the test */
if (rdev->asic->copy_dma) {
-
-   start_jiffies = jiffies;
-   for (i = 0; i < n; i++) {
-   r = radeon_fence_create(rdev, &fence);
-   if (r) {
-   goto out_cleanup;
-   }
-
-   r = radeon_copy_dma(rdev, saddr, daddr,
-   size / RADEON_GPU_PAGE_SIZE, fence);
-
-   if (r) {
-   goto out_cleanup;
-   }
-   r = radeo

[PATCH 6/9] drm/radeon/kms: add more elaborate benchmarks

2011-10-12 Thread Ilija Hadzic

Lots of new (and hopefully useful) benchmark. Load the driver
with radeon_benchmark= and enjoy. Among tests
added are VRAM to VRAM blits and blits with buffer size sweeps.
The latter can be from GTT to VRAM, VRAM to GTT, and VRAM to VRAM
and there are two types of sweeps: powers of two and (probably
more interesting) buffers sizes that correspond to common modes.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon.h   |2 +-
 drivers/gpu/drm/radeon/radeon_benchmark.c |   91 +++--
 drivers/gpu/drm/radeon/radeon_device.c|2 +-
 3 files changed, 87 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index ff5424e..5361dd7 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -868,7 +868,7 @@ struct radeon_pm {
 /*
  * Benchmarking
  */
-void radeon_benchmark(struct radeon_device *rdev);
+void radeon_benchmark(struct radeon_device *rdev, int test_number);
 
 
 /*
diff --git a/drivers/gpu/drm/radeon/radeon_benchmark.c 
b/drivers/gpu/drm/radeon/radeon_benchmark.c
index 6951426..5cafc90 100644
--- a/drivers/gpu/drm/radeon/radeon_benchmark.c
+++ b/drivers/gpu/drm/radeon/radeon_benchmark.c
@@ -30,6 +30,7 @@
 #define RADEON_BENCHMARK_COPY_DMA  0
 
 #define RADEON_BENCHMARK_ITERATIONS 1024
+#define RADEON_BENCHMARK_COMMON_MODES_N 17
 
 static int radeon_benchmark_do_move(struct radeon_device *rdev, unsigned size,
uint64_t saddr, uint64_t daddr,
@@ -126,7 +127,9 @@ static void radeon_benchmark_move(struct radeon_device 
*rdev, unsigned size,
}
 
/* r100 doesn't have dma engine so skip the test */
-   if (rdev->asic->copy_dma) {
+   /* also, VRAM-to-VRAM test doesn't make much sense for DMA */
+   /* skip it as well if domains are the same */
+   if ((rdev->asic->copy_dma) && (sdomain != ddomain)) {
time = radeon_benchmark_do_move(rdev, size, saddr, daddr,
RADEON_BENCHMARK_COPY_DMA, n);
if (time < 0)
@@ -167,10 +170,86 @@ out_cleanup:
}
 }
 
-void radeon_benchmark(struct radeon_device *rdev)
+void radeon_benchmark(struct radeon_device *rdev, int test_number)
 {
-   radeon_benchmark_move(rdev, 1024*1024, RADEON_GEM_DOMAIN_GTT,
- RADEON_GEM_DOMAIN_VRAM);
-   radeon_benchmark_move(rdev, 1024*1024, RADEON_GEM_DOMAIN_VRAM,
- RADEON_GEM_DOMAIN_GTT);
+   int i;
+   int common_modes[RADEON_BENCHMARK_COMMON_MODES_N] = {
+   640 * 480 * 4,
+   720 * 480 * 4,
+   800 * 600 * 4,
+   848 * 480 * 4,
+   1024 * 768 * 4,
+   1152 * 768 * 4,
+   1280 * 720 * 4,
+   1280 * 800 * 4,
+   1280 * 854 * 4,
+   1280 * 960 * 4,
+   1280 * 1024 * 4,
+   1440 * 900 * 4,
+   1400 * 1050 * 4,
+   1680 * 1050 * 4,
+   1600 * 1200 * 4,
+   1920 * 1080 * 4,
+   1920 * 1200 * 4
+   };
+
+   switch (test_number) {
+   case 1:
+   /* simple test, VRAM to GTT and GTT to VRAM */
+   radeon_benchmark_move(rdev, 1024*1024, RADEON_GEM_DOMAIN_GTT,
+ RADEON_GEM_DOMAIN_VRAM);
+   radeon_benchmark_move(rdev, 1024*1024, RADEON_GEM_DOMAIN_VRAM,
+ RADEON_GEM_DOMAIN_GTT);
+   break;
+   case 2:
+   /* simple test, VRAM to VRAM */
+   radeon_benchmark_move(rdev, 1024*1024, RADEON_GEM_DOMAIN_VRAM,
+ RADEON_GEM_DOMAIN_VRAM);
+   break;
+   case 3:
+   /* GTT to VRAM, buffer size sweep, powers of 2 */
+   for (i = 1; i <= 65536; i <<= 1)
+   radeon_benchmark_move(rdev, i*1024,
+ RADEON_GEM_DOMAIN_GTT,
+ RADEON_GEM_DOMAIN_VRAM);
+   break;
+   case 4:
+   /* VRAM to GTT, buffer size sweep, powers of 2 */
+   for (i = 1; i <= 65536; i <<= 1)
+   radeon_benchmark_move(rdev, i*1024,
+ RADEON_GEM_DOMAIN_VRAM,
+ RADEON_GEM_DOMAIN_GTT);
+   break;
+   case 5:
+   /* VRAM to VRAM, buffer size sweep, powers of 2 */
+   for (i = 1; i <= 65536; i <<= 1)
+   radeon_benchmark_move(rdev, i*1024,
+ RADEON_GEM_DOMAIN_VRAM,
+ RADEON_GEM_DOMAIN_VRAM);
+   break;
+   case 6:
+   /* GTT to VRAM, buff

[PATCH 7/9] drm/radeon/kms: cleanup r600 blit code

2011-10-12 Thread Ilija Hadzic

reorganize the code such that only the primitives (i.e., the functions
that load the CP ring) are hardware specific; dynamically link the
primitives in a (new) pointer structure inside r600_blit at
blit initialization time so that the functions that control the blit
operations can be made common for r600 and evergreen parts

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/r600_blit_kms.c |   94 +---
 drivers/gpu/drm/radeon/radeon.h|   21 +++
 2 files changed, 70 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c 
b/drivers/gpu/drm/radeon/r600_blit_kms.c
index 71fec92..07e3df4 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -44,7 +44,6 @@
 
 #define RECT_UNIT_H   32
 #define RECT_UNIT_W   (RADEON_GPU_PAGE_SIZE / 4 / RECT_UNIT_H)
-#define MAX_RECT_DIM  8192
 
 /* emits 21 on rv770+, 23 on r600 */
 static void
@@ -491,6 +490,27 @@ int r600_blit_init(struct radeon_device *rdev)
u32 packet2s[16];
int num_packet2s = 0;
 
+   rdev->r600_blit.primitives.set_render_target = set_render_target;
+   rdev->r600_blit.primitives.cp_set_surface_sync = cp_set_surface_sync;
+   rdev->r600_blit.primitives.set_shaders = set_shaders;
+   rdev->r600_blit.primitives.set_vtx_resource = set_vtx_resource;
+   rdev->r600_blit.primitives.set_tex_resource = set_tex_resource;
+   rdev->r600_blit.primitives.set_scissors = set_scissors;
+   rdev->r600_blit.primitives.draw_auto = draw_auto;
+   rdev->r600_blit.primitives.set_default_state = set_default_state;
+
+   rdev->r600_blit.ring_size_common = 40; /* shaders + def state */
+   rdev->r600_blit.ring_size_common += 10; /* fence emit for VB IB */
+   rdev->r600_blit.ring_size_common += 5; /* done copy */
+   rdev->r600_blit.ring_size_common += 10; /* fence emit for done copy */
+
+   rdev->r600_blit.ring_size_per_loop = 76;
+   /* set_render_target emits 2 extra dwords on rv6xx */
+   if (rdev->family > CHIP_R600 && rdev->family < CHIP_RV770)
+   rdev->r600_blit.ring_size_per_loop += 2;
+
+   rdev->r600_blit.max_dim = 8192;
+
/* pin copy shader into vram if already initialized */
if (rdev->r600_blit.shader_obj)
goto done;
@@ -608,9 +628,8 @@ static void r600_vb_ib_put(struct radeon_device *rdev)
radeon_ib_free(rdev, &rdev->r600_blit.vb_ib);
 }
 
-/* FIXME: the function is very similar to evergreen_blit_create_rect, except
-   that it different predefined constants; consider commonizing */
-static unsigned r600_blit_create_rect(unsigned num_pages, int *width, int 
*height)
+static unsigned r600_blit_create_rect(unsigned num_pages,
+ int *width, int *height, int max_dim)
 {
unsigned max_pages;
unsigned pages = num_pages;
@@ -628,12 +647,12 @@ static unsigned r600_blit_create_rect(unsigned num_pages, 
int *width, int *heigh
while (num_pages / rect_order) {
h *= 2;
rect_order *= 4;
-   if (h >= MAX_RECT_DIM) {
-   h = MAX_RECT_DIM;
+   if (h >= max_dim) {
+   h = max_dim;
break;
}
}
-   max_pages = (MAX_RECT_DIM * h) / (RECT_UNIT_W * RECT_UNIT_H);
+   max_pages = (max_dim * h) / (RECT_UNIT_W * RECT_UNIT_H);
if (pages > max_pages)
pages = max_pages;
w = (pages * RECT_UNIT_W * RECT_UNIT_H) / h;
@@ -659,36 +678,29 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, 
unsigned num_pages)
 {
int r;
int ring_size;
-   /* loops of emits 64 + fence emit possible */
-   int dwords_per_loop = 76, num_loops = 0;
+   int num_loops = 0;
+   int dwords_per_loop = rdev->r600_blit.ring_size_per_loop;
 
r = r600_vb_ib_get(rdev);
if (r)
return r;
 
-   /* set_render_target emits 2 extra dwords on rv6xx */
-   if (rdev->family > CHIP_R600 && rdev->family < CHIP_RV770)
-   dwords_per_loop += 2;
-
/* num loops */
while (num_pages) {
-   num_pages -= r600_blit_create_rect(num_pages, NULL, NULL);
+   num_pages -= r600_blit_create_rect(num_pages, NULL, NULL,
+  rdev->r600_blit.max_dim);
num_loops++;
}
 
/* calculate number of loops correctly */
ring_size = num_loops * dwords_per_loop;
-   /* set default  + shaders */
-   ring_size += 40; /* shaders + def state */
-   ring_size += 10; /* fence emit for VB IB */
-   rin

[PATCH 8/9] drm/radeon/kms: blit code commoning

2011-10-12 Thread Ilija Hadzic

factor out most of evergreen blit code and use the refactored code
from r600 that is now common for both r600 and evergreen

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/evergreen.c  |   25 +---
 drivers/gpu/drm/radeon/evergreen_blit_kms.c |  260 ++-
 drivers/gpu/drm/radeon/ni.c |4 +-
 drivers/gpu/drm/radeon/radeon_asic.c|   16 +-
 drivers/gpu/drm/radeon/radeon_asic.h|   10 -
 5 files changed, 30 insertions(+), 285 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 5f0ecc7..69dded2 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3087,7 +3087,7 @@ static int evergreen_startup(struct radeon_device *rdev)
 
r = evergreen_blit_init(rdev);
if (r) {
-   evergreen_blit_fini(rdev);
+   r600_blit_fini(rdev);
rdev->asic->copy = NULL;
dev_warn(rdev->dev, "failed blitter (%d) falling back to 
memcpy\n", r);
}
@@ -3172,27 +3172,6 @@ int evergreen_suspend(struct radeon_device *rdev)
return 0;
 }
 
-int evergreen_copy_blit(struct radeon_device *rdev,
-   uint64_t src_offset, uint64_t dst_offset,
-   unsigned num_pages, struct radeon_fence *fence)
-{
-   int r;
-
-   mutex_lock(&rdev->r600_blit.mutex);
-   rdev->r600_blit.vb_ib = NULL;
-   r = evergreen_blit_prepare_copy(rdev, num_pages);
-   if (r) {
-   if (rdev->r600_blit.vb_ib)
-   radeon_ib_free(rdev, &rdev->r600_blit.vb_ib);
-   mutex_unlock(&rdev->r600_blit.mutex);
-   return r;
-   }
-   evergreen_kms_blit_copy(rdev, src_offset, dst_offset, num_pages);
-   evergreen_blit_done_copy(rdev, fence);
-   mutex_unlock(&rdev->r600_blit.mutex);
-   return 0;
-}
-
 /* Plan is to move initialization in that function and use
  * helper function so that radeon_device_init pretty much
  * do nothing more than calling asic specific function. This
@@ -3301,7 +3280,7 @@ int evergreen_init(struct radeon_device *rdev)
 
 void evergreen_fini(struct radeon_device *rdev)
 {
-   evergreen_blit_fini(rdev);
+   r600_blit_fini(rdev);
r700_cp_fini(rdev);
r600_irq_fini(rdev);
radeon_wb_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c 
b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
index 68d0de2..dcf11bb 100644
--- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c
+++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
@@ -44,10 +44,6 @@
 #define COLOR_5_6_5   0x8
 #define COLOR_8_8_8_8 0x1a
 
-#define RECT_UNIT_H   32
-#define RECT_UNIT_W   (RADEON_GPU_PAGE_SIZE / 4 / RECT_UNIT_H)
-#define MAX_RECT_DIM  16384
-
 /* emits 17 */
 static void
 set_render_target(struct radeon_device *rdev, int format,
@@ -599,31 +595,6 @@ set_default_state(struct radeon_device *rdev)
 
 }
 
-static inline uint32_t i2f(uint32_t input)
-{
-   u32 result, i, exponent, fraction;
-
-   if ((input & 0x3fff) == 0)
-   result = 0; /* 0 is a special case */
-   else {
-   exponent = 140; /* exponent biased by 127; */
-   fraction = (input & 0x3fff) << 10; /* cheat and only
- handle numbers below 
2^^15 */
-   for (i = 0; i < 14; i++) {
-   if (fraction & 0x80)
-   break;
-   else {
-   fraction = fraction << 1; /* keep
-shifting left 
until top bit = 1 */
-   exponent = exponent - 1;
-   }
-   }
-   result = exponent << 23 | (fraction & 0x7f); /* mask
-   off top 
bit; assumed 1 */
-   }
-   return result;
-}
-
 int evergreen_blit_init(struct radeon_device *rdev)
 {
u32 obj_size;
@@ -632,6 +603,24 @@ int evergreen_blit_init(struct radeon_device *rdev)
u32 packet2s[16];
int num_packet2s = 0;
 
+   rdev->r600_blit.primitives.set_render_target = set_render_target;
+   rdev->r600_blit.primitives.cp_set_surface_sync = cp_set_surface_sync;
+   rdev->r600_blit.primitives.set_shaders = set_shaders;
+   rdev->r600_blit.primitives.set_vtx_resource = set_vtx_resource;
+   rdev->r600_blit.primitives.set_tex_resource = set_tex_resource;
+   rdev->r600_blit.primitives.set_scissors = set_scissors;
+   rdev->r600_blit.primitives.draw_auto = draw_auto;
+   rdev->r600_blit.primitives.set_default_state = set_default_state;
+
+   rdev->r600_blit.ring_size_common =

[PATCH 9/9] drm/radeon/kms: rename a variable for consistency

2011-10-12 Thread Ilija Hadzic

blit copy functions deal with GPU pages, not CPU pages,
so rename the variables and parameters accordingly

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/r600_blit_kms.c |   27 ++-
 drivers/gpu/drm/radeon/radeon_asic.h   |4 ++--
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c 
b/drivers/gpu/drm/radeon/r600_blit_kms.c
index 07e3df4..a9e1fde 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -628,14 +628,14 @@ static void r600_vb_ib_put(struct radeon_device *rdev)
radeon_ib_free(rdev, &rdev->r600_blit.vb_ib);
 }
 
-static unsigned r600_blit_create_rect(unsigned num_pages,
+static unsigned r600_blit_create_rect(unsigned num_gpu_pages,
  int *width, int *height, int max_dim)
 {
unsigned max_pages;
-   unsigned pages = num_pages;
+   unsigned pages = num_gpu_pages;
int w, h;
 
-   if (num_pages == 0) {
+   if (num_gpu_pages == 0) {
/* not supposed to be called with no pages, but just in case */
h = 0;
w = 0;
@@ -644,7 +644,7 @@ static unsigned r600_blit_create_rect(unsigned num_pages,
} else {
int rect_order = 2;
h = RECT_UNIT_H;
-   while (num_pages / rect_order) {
+   while (num_gpu_pages / rect_order) {
h *= 2;
rect_order *= 4;
if (h >= max_dim) {
@@ -674,7 +674,7 @@ static unsigned r600_blit_create_rect(unsigned num_pages,
 }
 
 
-int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_pages)
+int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages)
 {
int r;
int ring_size;
@@ -686,9 +686,10 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, 
unsigned num_pages)
return r;
 
/* num loops */
-   while (num_pages) {
-   num_pages -= r600_blit_create_rect(num_pages, NULL, NULL,
-  rdev->r600_blit.max_dim);
+   while (num_gpu_pages) {
+   num_gpu_pages -=
+   r600_blit_create_rect(num_gpu_pages, NULL, NULL,
+ rdev->r600_blit.max_dim);
num_loops++;
}
 
@@ -719,21 +720,21 @@ void r600_blit_done_copy(struct radeon_device *rdev, 
struct radeon_fence *fence)
 
 void r600_kms_blit_copy(struct radeon_device *rdev,
u64 src_gpu_addr, u64 dst_gpu_addr,
-   unsigned num_pages)
+   unsigned num_gpu_pages)
 {
u64 vb_gpu_addr;
u32 *vb;
 
DRM_DEBUG("emitting copy %16llx %16llx %d %d\n",
  src_gpu_addr, dst_gpu_addr,
- num_pages, rdev->r600_blit.vb_used);
+ num_gpu_pages, rdev->r600_blit.vb_used);
vb = (u32 *)(rdev->r600_blit.vb_ib->ptr + rdev->r600_blit.vb_used);
 
-   while (num_pages) {
+   while (num_gpu_pages) {
int w, h;
unsigned size_in_bytes;
unsigned pages_per_loop =
-   r600_blit_create_rect(num_pages, &w, &h,
+   r600_blit_create_rect(num_gpu_pages, &w, &h,
  rdev->r600_blit.max_dim);
 
size_in_bytes = pages_per_loop * RADEON_GPU_PAGE_SIZE;
@@ -777,6 +778,6 @@ void r600_kms_blit_copy(struct radeon_device *rdev,
rdev->r600_blit.vb_used += 4*12;
src_gpu_addr += size_in_bytes;
dst_gpu_addr += size_in_bytes;
-   num_pages -= pages_per_loop;
+   num_gpu_pages -= pages_per_loop;
}
 }
diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
b/drivers/gpu/drm/radeon/radeon_asic.h
index c8be1d3..e040de3 100644
--- a/drivers/gpu/drm/radeon/radeon_asic.h
+++ b/drivers/gpu/drm/radeon/radeon_asic.h
@@ -364,11 +364,11 @@ void r600_hdmi_init(struct drm_encoder *encoder);
 int r600_hdmi_buffer_status_changed(struct drm_encoder *encoder);
 void r600_hdmi_update_audio_settings(struct drm_encoder *encoder);
 /* r600 blit */
-int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_pages);
+int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages);
 void r600_blit_done_copy(struct radeon_device *rdev, struct radeon_fence 
*fence);
 void r600_kms_blit_copy(struct radeon_device *rdev,
u64 src_gpu_addr, u64 dst_gpu_addr,
-   unsigned num_pages);
+   unsigned num_gpu_pages);
 
 /*
  * rv770,rv730,rv710,rv740
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Reply: Question on S3 on evergreen

2011-10-13 Thread Ilija Hadzic



Frank,

I have found this text particularly useful when it comes to using MIT (or 
BSD) code that resides in the GPL project (e.g. DRM in Linux)


http://www.softwarefreedom.org/resources/2007/gpl-non-gpl-collaboration.html

I think that sections 2.2 and 2.3 are the things to be careful about 
because there is a non-trivial probabilty that at least some out of many 
patches that come in from many authors were created by copying and pasting 
lines of code that may have come from some other GPL'd code (e.g. other 
parts of Linux). Nobody can guarantee that this has not happened.


P.S. The above is my personal opinion, I speak for myself, not for my 
company, I am not a lawyer.


-- Ilija

On Thu, 13 Oct 2011, Huang, FrankR wrote:


Xav, thanks for your reminder. Actually our law leam has already checked the 
license. As Dave said, the DRM kernel driver is all MIT-licensed and we will be 
free to use them. When the drm uses linux kernel function calls, we will use 
freebsd(none-GPL) equivalent to replace.
Dave, by the way, I want to ask you about some exceptions in DRM. you know in some 
files(i.e. drm_fb_helper.c), it includes MODULE_LICENSE("GPL and additional 
rights"). Does it mean it is GPL licensed? Is it free to use this file?

Thanks,
Frank


-Original Message-
From: Xavier Bestel [mailto:xavier.bes...@free.fr]
Sent: 2011-10-13 (??) 20:07
To: Dave Airlie
Cc: Huang, FrankR; dri-devel@lists.freedesktop.org
Subject: Re: Question on S3 on evergreen

On Thu, 2011-10-13 at 13:04 +0100, Dave Airlie wrote:

On Thu, Oct 13, 2011 at 12:56 PM, Xavier Bestel  wrote:

Hi,

On Thu, 2011-10-13 at 17:54 +0800, Huang, FrankR wrote:

[...] I have ported radeon_suspend_kms() and radeon_resume_kms()
functions from linux to CE.


I imagine you already have checked with your company's lawyers, but if I
understand correctly that means your drivers will be distributed under
the GPL ?


All the GPU driver code is licensed under MIT.


Oh, I thought "linux" meant "kernel", not "X11".

Xav



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: drm/radeon/kms: improve performance of blit-copy

2011-10-13 Thread Ilija Hadzic



Dave,

Alex pointed to me that the patches I sent last night under this thread 
may conflict with 003cefe0c238e683a29d2207dba945b508cd45b7 that currently 
resides on drm-fixes branch (my patches are based on drm-next or 
drm-core-next).


I'd like to make sure that the eventual merge goes smoothly:

If you merge drm-fixes before my patches, then I'll rebase my patches and 
resend them after that happens and make sure everything is resolved 
correctly.


If you merge my patches first and then follow with drm-fixes merge, two 
things should happen with 003cefe0c238e683a29d2207dba945b508cd45b7. Hunks 
related to evergreen.c file will fall out but that's expected and OK 
because my patches consolidate the blit code for r600 and evergreen into a 
common one. Then in r600.c, the hunks related to r600_blit_prepare_copy
and r600_kms_blit_copy function calls will show conflicts, which should be 
resolved such that the size argument is num_gpu_pages, not

num_gpu_pages * RADEON_GPU_PAGE_SIZE (this is because the new blit code
takes size argument in pages, not bytes). Everything else will merge 
smoothly.


For reference, pasted below is a patch that resulted after I cherry-picked 
003cefe0c238e683a29d2207dba945b508cd45b7 into drm-next augmented with my 
blit-improvement patches and resolved the conflicts correctly.


I guess the first option is less work for you (and I will be glad to 
rebase my patches if need be), but I hope that the info here is good 
enough to make the second path as easy as it can be


thanks,

Ilija


From b12516c003cb35059f16ace774ef5a21170d6d78 Mon Sep 17 00:00:00 2001
From: Alex Deucher 
Date: Fri, 16 Sep 2011 12:04:08 -0400
Subject: [PATCH 11/14] drm/radeon/kms: Make GPU/CPU page size handling
 consistent in blit code (v3)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The BO blit code inconsistenly handled the page size.  This wasn't
an issue on system with 4k pages since the GPU's page size is 4k as
well.  Switch the driver blit callbacks to take num pages in GPU
page units.

Fixes lemote mipsel systems using AMD rs780/rs880 chipsets.

v2: incorporate suggestions from Michel.

Signed-off-by: Alex Deucher 
Reviewed-by: Michel D??nzer 
Cc: sta...@kernel.org
Signed-off-by: Dave Airlie 

v3: reconcile with changes due to blit-copy improvements on drm-next
branch

substitutes the v2 patch that currently resides on drm-fixes
branch

Conflicts:

drivers/gpu/drm/radeon/evergreen.c
drivers/gpu/drm/radeon/r600.c
drivers/gpu/drm/radeon/radeon_asic.h

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/r100.c|   12 ++--
 drivers/gpu/drm/radeon/r200.c|4 ++--
 drivers/gpu/drm/radeon/r600.c|   10 ++
 drivers/gpu/drm/radeon/radeon.h  |7 ---
 drivers/gpu/drm/radeon/radeon_asic.h |6 +++---
 drivers/gpu/drm/radeon/radeon_ttm.c  |7 ++-
 6 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 5985cb0..df60803 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -724,11 +724,11 @@ void r100_fence_ring_emit(struct radeon_device *rdev,
 int r100_copy_blit(struct radeon_device *rdev,
   uint64_t src_offset,
   uint64_t dst_offset,
-  unsigned num_pages,
+  unsigned num_gpu_pages,
   struct radeon_fence *fence)
 {
uint32_t cur_pages;
-   uint32_t stride_bytes = PAGE_SIZE;
+   uint32_t stride_bytes = RADEON_GPU_PAGE_SIZE;
uint32_t pitch;
uint32_t stride_pixels;
unsigned ndw;
@@ -740,7 +740,7 @@ int r100_copy_blit(struct radeon_device *rdev,
/* radeon pitch is /64 */
pitch = stride_bytes / 64;
stride_pixels = stride_bytes / 4;
-   num_loops = DIV_ROUND_UP(num_pages, 8191);
+   num_loops = DIV_ROUND_UP(num_gpu_pages, 8191);

/* Ask for enough room for blit + flush + fence */
ndw = 64 + (10 * num_loops);
@@ -749,12 +749,12 @@ int r100_copy_blit(struct radeon_device *rdev,
DRM_ERROR("radeon: moving bo (%d) asking for %u dw.\n", r, ndw);
return -EINVAL;
}
-   while (num_pages > 0) {
-   cur_pages = num_pages;
+   while (num_gpu_pages > 0) {
+   cur_pages = num_gpu_pages;
if (cur_pages > 8191) {
cur_pages = 8191;
}
-   num_pages -= cur_pages;
+   num_gpu_pages -= cur_pages;

/* pages are in Y direction - height
   page width in X direction - width */
diff --git a/drivers/gpu/drm/radeon/r200.c b/drivers/gpu/drm/radeon/r200.c
index f240583..a1f3ba0 100644
--- a/drivers/gpu/drm/radeon/r200.c
+++ b/drivers/gpu/drm/radeon/r200.c
@@ -84,7 +84,7 @@ static int r200_get_vtx_size_0(u

Re: drm/radeon/kms: improve performance of blit-copy

2011-10-13 Thread Ilija Hadzic




On Thu, 13 Oct 2011, Roland Scheidegger wrote:


I guess it isn't possible to temporarily disable some RBEs or otherwise
reconfigure the chip that you could get the same performance for the
high-end chips?


According to the conversation I had with Alex, this *is* possible but 
requires the pipeline and cache flush. So it is unclear what the overall 
gain will be given the flush penalty.


Also, this phenomena occurs only when GTT is involved in the copy. 
VRAM-to-VRAM copy in which there is no host memory involved (for which I 
added a benchmark, but didn't report in my note yesterday), high-end 
devices are beating low-end ones big time  they better be ;-)


So if we can get RBE-reduction to work, it should be turned on only when 
one of the BOs is in GTT domain. I looked at what it would take to do 
this, and it's doable, but requires hacks at many places.


-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/radeon/kms: make r600-NI blit suspend code common

2011-10-14 Thread Ilija Hadzic




On Fri, 14 Oct 2011 alexdeuc...@gmail.com wrote:


From: Alex Deucher 

r600-NI shared the same blit suspend code.  Clean it up
and make it a shared function.

Signed-off-by: Alex Deucher 
Cc: Ilija Hadzic 


Thanks, this one slipped my eye in my cleanup. Correctness should be 
obvious, so as much as my review is worth

Reviewed-by: Ilija Hadzic 



---
drivers/gpu/drm/radeon/evergreen.c |   10 +-
drivers/gpu/drm/radeon/ni.c|   10 +-
drivers/gpu/drm/radeon/r600.c  |   26 --
drivers/gpu/drm/radeon/radeon.h|2 ++
drivers/gpu/drm/radeon/rv770.c |   12 ++--
5 files changed, 22 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 80c4ee3..5c515f2 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3107,21 +3107,13 @@ int evergreen_resume(struct radeon_device *rdev)

int evergreen_suspend(struct radeon_device *rdev)
{
-   int r;
-
/* FIXME: we should wait for ring to be empty */
r700_cp_stop(rdev);
rdev->cp.ready = false;
evergreen_irq_suspend(rdev);
radeon_wb_disable(rdev);
evergreen_pcie_gart_disable(rdev);
-
-   /* unpin shaders bo */
-   r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false);
-   if (likely(r == 0)) {
-   radeon_bo_unpin(rdev->r600_blit.shader_obj);
-   radeon_bo_unreserve(rdev->r600_blit.shader_obj);
-   }
+   r600_blit_suspend(rdev);

return 0;
}
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index cc0e911..e560da5 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1423,21 +1423,13 @@ int cayman_resume(struct radeon_device *rdev)

int cayman_suspend(struct radeon_device *rdev)
{
-   int r;
-
/* FIXME: we should wait for ring to be empty */
cayman_cp_enable(rdev, false);
rdev->cp.ready = false;
evergreen_irq_suspend(rdev);
radeon_wb_disable(rdev);
cayman_pcie_gart_disable(rdev);
-
-   /* unpin shaders bo */
-   r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false);
-   if (likely(r == 0)) {
-   radeon_bo_unpin(rdev->r600_blit.shader_obj);
-   radeon_bo_unreserve(rdev->r600_blit.shader_obj);
-   }
+   r600_blit_suspend(rdev);

return 0;
}
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index c2f0dbe..f7f693a 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2375,6 +2375,20 @@ int r600_copy_blit(struct radeon_device *rdev,
return 0;
}

+void r600_blit_suspend(struct radeon_device *rdev)
+{
+   int r;
+
+   /* unpin shaders bo */
+   if (rdev->r600_blit.shader_obj) {
+   r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false);
+   if (!r) {
+   radeon_bo_unpin(rdev->r600_blit.shader_obj);
+   radeon_bo_unreserve(rdev->r600_blit.shader_obj);
+   }
+   }
+}
+
int r600_set_surface_reg(struct radeon_device *rdev, int reg,
 uint32_t tiling_flags, uint32_t pitch,
 uint32_t offset, uint32_t obj_size)
@@ -2494,8 +2508,6 @@ int r600_resume(struct radeon_device *rdev)

int r600_suspend(struct radeon_device *rdev)
{
-   int r;
-
r600_audio_fini(rdev);
/* FIXME: we should wait for ring to be empty */
r600_cp_stop(rdev);
@@ -2503,14 +2515,8 @@ int r600_suspend(struct radeon_device *rdev)
r600_irq_suspend(rdev);
radeon_wb_disable(rdev);
r600_pcie_gart_disable(rdev);
-   /* unpin shaders bo */
-   if (rdev->r600_blit.shader_obj) {
-   r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false);
-   if (!r) {
-   radeon_bo_unpin(rdev->r600_blit.shader_obj);
-   radeon_bo_unreserve(rdev->r600_blit.shader_obj);
-   }
-   }
+   r600_blit_suspend(rdev);
+
return 0;
}

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 7b0cb74..93768f5 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -555,6 +555,8 @@ struct r600_blit {
struct radeon_ib *vb_ib;
};

+void r600_blit_suspend(struct radeon_device *rdev);
+
int radeon_ib_get(struct radeon_device *rdev, struct radeon_ib **ib);
void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib);
int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib);
diff --git a/drivers/gpu/drm/radeon/rv770.c b/drivers/gpu/drm/radeon/rv770.c
index b13c2ee..4901837 100644
--- a/drivers/gpu/drm/radeon/rv770.c
+++ b/drivers/gpu/drm/radeon/rv770.c
@@ -1184,8 +1184,6 @@ int rv770_resume(struct radeon_device *rdev)

int rv770_sus

drm_fb_helper question

2011-10-18 Thread Ilija Hadzic



While hacking something else, I stumbled upon two potential issues
in drm_fb_helper. Could someone who knows this better than me enlighten 
whether the problem is realistic or whether I am overseeing something ?
If the problem is real (or a potential trouble waiting to happen), I have 
cut some patches that address this and I can send them.


First, in  drm_fb_helper_init, the size of crtc_info allocated is for 
crtc_count entries, the loop that populates crtc_info at the end of the 
_init function scans the whole list of connectors. So if GPU specifies 
crtc_count that is smaller than the total number of CRTCs, the loop will 
overrun the allocated space and corrupt memory. Although no GPU driver in 
current tree does that, I could imagine a GPU using the smaller number for 
crtc_count if it does not want some CRTCs used for fbcon.


Second (similar in nature) problem is the loop at the end of 
drm_setup_crtcs function that populates modeset->connectors array.
The size of that array is fb_helper->max_conn_count (which is specified by 
the GPU driver and can be anything), while the the loop runs for 
fb_helper->connector_count iterations which can include all available 
connectors (per drm_fb_helper_single_add_all_connectors). So at least in 
theory that loop can overrun the modset->connectors array and corrupt 
memory too.


Both cases may be theoretical at this time, but they look like a trouble 
waiting to happen (e.g., future GPUs may have a mix of CRTCs and 
connectors that could evoke the described error). If those who know 
better, agree with my analysis, I'll be glad to send some patchwork.


thanks,

Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] DRM: bug: RADEON_DEBUGFS_MAX_{NUM_FILES => COMPONENTS}

2011-10-24 Thread Ilija Hadzic



Maybe you are looking at the wrong branch, but I see it in drm-next (it 
has been there since Oct 10)


http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-next&id=c245cb9e15055ed5dcf7eaf29232badb0059fdc1

On Mon, 24 Oct 2011, Michael Witten wrote:


On Fri, Oct 7, 2011 at 19:20, Michael Witten  wrote:
Date: Fri, 16 Sep 2011 20:45:30 +

The value of RADEON_DEBUGFS_MAX_NUM_FILES has been used to
specify the size of an array, each element of which looks
like this:

 struct radeon_debugfs {
         struct drm_info_list    *files;
         unsigned                num_files;
 };

Consequently, the number of debugfs files may be much greater
than RADEON_DEBUGFS_MAX_NUM_FILES, something that the current
code ignores:

 if ((_radeon_debugfs_count + nfiles) > RADEON_DEBUGFS_MAX_NUM_FILES) {
         DRM_ERROR("Reached maximum number of debugfs files.\n");
         DRM_ERROR("Report so we increase RADEON_DEBUGFS_MAX_NUM_FILES.\n");
         return -EINVAL;
 }

This commit fixes this mistake, and accordingly renames:

 RADEON_DEBUGFS_MAX_NUM_FILES

to:

 RADEON_DEBUGFS_MAX_COMPONENTS

Signed-off-by: Michael Witten 
---
 drivers/gpu/drm/radeon/radeon.h        |    2 +-
 drivers/gpu/drm/radeon/radeon_device.c |   13 -
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index c1e056b..dd7bab9 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -102,7 +102,7 @@ extern int radeon_pcie_gen2;
 #define RADEON_FENCE_JIFFIES_TIMEOUT   (HZ / 2)
 /* RADEON_IB_POOL_SIZE must be a power of 2 */
 #define RADEON_IB_POOL_SIZE            16
-#define RADEON_DEBUGFS_MAX_NUM_FILES   32
+#define RADEON_DEBUGFS_MAX_COMPONENTS  32
 #define RADEONFB_CONN_LIMIT            4
 #define RADEON_BIOS_NUM_SCRATCH                8

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index b51e157..31b1f4b 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -981,7 +981,7 @@ struct radeon_debugfs {
       struct drm_info_list    *files;
       unsigned                num_files;
 };
-static struct radeon_debugfs _radeon_debugfs[RADEON_DEBUGFS_MAX_NUM_FILES];
+static struct radeon_debugfs _radeon_debugfs[RADEON_DEBUGFS_MAX_COMPONENTS];
 static unsigned _radeon_debugfs_count = 0;

 int radeon_debugfs_add_files(struct radeon_device *rdev,
@@ -996,14 +996,17 @@ int radeon_debugfs_add_files(struct radeon_device *rdev,
                       return 0;
               }
       }
-       if ((_radeon_debugfs_count + nfiles) > RADEON_DEBUGFS_MAX_NUM_FILES) {
-               DRM_ERROR("Reached maximum number of debugfs files.\n");
-               DRM_ERROR("Report so we increase 
RADEON_DEBUGFS_MAX_NUM_FILES.\n");
+
+       i = _radeon_debugfs_count + 1;
+       if (i > RADEON_DEBUGFS_MAX_COMPONENTS) {
+               DRM_ERROR("Reached maximum number of debugfs components.\n");
+               DRM_ERROR("Report so we increase "
+                         "RADEON_DEBUGFS_MAX_COMPONENTS.\n");
               return -EINVAL;
       }
       _radeon_debugfs[_radeon_debugfs_count].files = files;
       _radeon_debugfs[_radeon_debugfs_count].num_files = nfiles;
-       _radeon_debugfs_count++;
+       _radeon_debugfs_count = i;
 #if defined(CONFIG_DEBUG_FS)
       drm_debugfs_create_files(files, nfiles,
                                rdev->ddev->control->debugfs_root,
--
1.7.6.409.ge7a85




This patch has not yet been applied. What's wrong?

Sincerely,
Michael Witten
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

drm/fb_helper: prevent some troubles waiting to happen

2011-10-25 Thread Ilija Hadzic

The following two patches address potential problems that I 
called "troubles waiting to happen" in this note
http://lists.freedesktop.org/archives/dri-devel/2011-October/015412.html

I didn't hear anyone take on my question, so I figured I would just send
the patches. I tested the patches on a variety of AMD cards that I have.
I don't have other GPUs handy.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/2] drm/fb_helper: make sure crtc_count is consistent

2011-10-25 Thread Ilija Hadzic

stop adding crtcs from dev->mode_config.crtc_list
to crtc_info array if gpu driver specifies (by mistake
or with a reason) fewer crtcs in crtc_count parameter

also, correct crtc_count value if gpu driver
specifies too many crtcs

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_fb_helper.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index f7c6854..feac888 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -466,10 +466,18 @@ int drm_fb_helper_init(struct drm_device *dev,
 
i = 0;
list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
+   if (i >= crtc_count) {
+   DRM_DEBUG("crtc count set by the gpu reached\n");
+   break;
+   }
fb_helper->crtc_info[i].crtc_id = crtc->base.id;
fb_helper->crtc_info[i].mode_set.crtc = crtc;
i++;
}
+   if (i < fb_helper->crtc_count) {
+   DRM_DEBUG("crtc count known by the drm reached\n");
+   fb_helper->crtc_count = i;
+   }
fb_helper->conn_limit = max_conn_count;
return 0;
 out_free:
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/2] drm/fb_helper: honor the limit on number of connectors per crtc

2011-10-25 Thread Ilija Hadzic

gpu driver can specify the limit on the number of connectors
that a given crtc can use. Add a check to make sure this limit
is honored when building a list of connectors associated
with a crtc.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_fb_helper.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index feac888..19e28e9 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -1333,6 +1333,11 @@ static void drm_setup_crtcs(struct drm_fb_helper 
*fb_helper)
modeset = &fb_crtc->mode_set;
 
if (mode && fb_crtc) {
+   if (modeset->num_connectors >= fb_helper->conn_limit) {
+   DRM_DEBUG("max number of connectors reached for 
crtc %d\n",
+ fb_crtc->crtc_id);
+   break;
+   }
DRM_DEBUG_KMS("desired mode %s set on crtc %d\n",
  mode->name, 
fb_crtc->mode_set.crtc->base.id);
fb_crtc->desired_mode = mode;
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

drm: fix one flawed mutex grab and remove some spurious mutex grabs

2011-10-25 Thread Ilija Hadzic

The following three patches remove unecessary locks around ioctls
in drm module.

First two:

[PATCH 1/3] drm: no need to hold global mutex for static data
[PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex

are rather trivial and straight forward and probably do not
need much explanation. 

The third one:

[PATCH 3/3] drm: do not sleep on vblank while holding a mutex

is more serious and clears a clog that can occur if multiple
processes call drm_wait_vblank as explained in patch commit
message.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/3] drm: no need to hold global mutex for static data

2011-10-25 Thread Ilija Hadzic

drm_getcap and drm_version ioctls only reads static data,
there is no need to protect them with drm_global_mutex,
so make them DRM_UNLOCKED

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_drv.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 7a87e08..e159f17 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -60,14 +60,14 @@ static int drm_version(struct drm_device *dev, void *data,
 
 /** Ioctl table */
 static struct drm_ioctl_desc drm_ioctls[] = {
-   DRM_IOCTL_DEF(DRM_IOCTL_VERSION, drm_version, 0),
+   DRM_IOCTL_DEF(DRM_IOCTL_VERSION, drm_version, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_GET_UNIQUE, drm_getunique, 0),
DRM_IOCTL_DEF(DRM_IOCTL_GET_MAGIC, drm_getmagic, 0),
DRM_IOCTL_DEF(DRM_IOCTL_IRQ_BUSID, drm_irq_by_busid, 
DRM_MASTER|DRM_ROOT_ONLY),
DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, 0),
DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, 0),
DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, 0),
-   DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, 0),
+   DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, DRM_MASTER),
 
DRM_IOCTL_DEF(DRM_IOCTL_SET_UNIQUE, drm_setunique, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex

2011-10-25 Thread Ilija Hadzic

drm_getmap, drm_getclient and drm_getstats are all
protected with their own mutex (dev->struct_mutex)
no need to hold global mutex; make them DRM_UNLOCKED

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_drv.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index e159f17..dbabcb0 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -64,9 +64,9 @@ static struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_GET_UNIQUE, drm_getunique, 0),
DRM_IOCTL_DEF(DRM_IOCTL_GET_MAGIC, drm_getmagic, 0),
DRM_IOCTL_DEF(DRM_IOCTL_IRQ_BUSID, drm_irq_by_busid, 
DRM_MASTER|DRM_ROOT_ONLY),
-   DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, 0),
-   DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, 0),
-   DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, 0),
+   DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, DRM_UNLOCKED),
+   DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, DRM_UNLOCKED),
+   DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, DRM_MASTER),
 
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 3/3] drm: do not sleep on vblank while holding a mutex

2011-10-25 Thread Ilija Hadzic

holding the drm_global_mutex in drm_wait_vblank and then
going to sleep (via DRM_WAIT_ON) is a bad idea in general
because it can block other processes that may issue ioctls
that also grab drm_global_mutex. Blocking can occur until
next vblank which is in the tens of microseconds order of
magnitude.

fix it, but making drm_wait_vblank DRM_UNLOCKED call and
then grabbing the mutex inside the drm_wait_vblank function
and only for the duration of the critical section (i.e.,
from first manipulation of vblank sequence number until
putting the task in the wait queue).

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_drv.c  |2 +-
 drivers/gpu/drm/drm_irq.c  |   16 +++-
 include/drm/drm_os_linux.h |   25 +
 3 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index dbabcb0..dc0eb0b 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -124,7 +124,7 @@ static struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_SG_ALLOC, drm_sg_alloc_ioctl, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
DRM_IOCTL_DEF(DRM_IOCTL_SG_FREE, drm_sg_free, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 
-   DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, 0),
+   DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, DRM_UNLOCKED),
 
DRM_IOCTL_DEF(DRM_IOCTL_MODESET_CTL, drm_modeset_ctl, 0),
 
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 3830e9e..d85d604 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1191,6 +1191,7 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
DRM_DEBUG("failed to acquire vblank counter, %d\n", ret);
return ret;
}
+   mutex_lock(&drm_global_mutex);
seq = drm_vblank_count(dev, crtc);
 
switch (vblwait->request.type & _DRM_VBLANK_TYPES_MASK) {
@@ -1200,12 +1201,15 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
case _DRM_VBLANK_ABSOLUTE:
break;
default:
+   mutex_unlock(&drm_global_mutex);
ret = -EINVAL;
goto done;
}
 
-   if (flags & _DRM_VBLANK_EVENT)
+   if (flags & _DRM_VBLANK_EVENT) {
+   mutex_unlock(&drm_global_mutex);
return drm_queue_vblank_event(dev, crtc, vblwait, file_priv);
+   }
 
if ((flags & _DRM_VBLANK_NEXTONMISS) &&
(seq - vblwait->request.sequence) <= (1<<23)) {
@@ -1215,10 +1219,12 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
  vblwait->request.sequence, crtc);
dev->last_vblank_wait[crtc] = vblwait->request.sequence;
-   DRM_WAIT_ON(ret, dev->vbl_queue[crtc], 3 * DRM_HZ,
-   (((drm_vblank_count(dev, crtc) -
-  vblwait->request.sequence) <= (1 << 23)) ||
-!dev->irq_enabled));
+   /* DRM_WAIT_ON_LOCKED will release drm_global_mutex */
+   /* before going to sleep */
+   DRM_WAIT_ON_LOCKED(ret, dev->vbl_queue[crtc], 3 * DRM_HZ,
+  (((drm_vblank_count(dev, crtc) -
+ vblwait->request.sequence) <= (1 << 23)) ||
+   !dev->irq_enabled));
 
if (ret != -EINTR) {
struct timeval now;
diff --git a/include/drm/drm_os_linux.h b/include/drm/drm_os_linux.h
index 3933691..fc6aaf4 100644
--- a/include/drm/drm_os_linux.h
+++ b/include/drm/drm_os_linux.h
@@ -123,5 +123,30 @@ do {   
\
remove_wait_queue(&(queue), &entry);\
 } while (0)
 
+#define DRM_WAIT_ON_LOCKED( ret, queue, timeout, condition )   \
+do {   \
+   DECLARE_WAITQUEUE(entry, current);  \
+   unsigned long end = jiffies + (timeout);\
+   add_wait_queue(&(queue), &entry);   \
+   mutex_unlock(&drm_global_mutex);\
+   \
+   for (;;) {  \
+   __set_current_state(TASK_INTERRUPTIBLE);\
+   if (condition)  \
+   break;  \
+   if (time_after_eq(jiffies, end)) {  \
+   ret = -EBUSY;   \
+   break;  \
+   }   \
+   schedule_timeout((HZ/100 > 1) ? HZ/100 : 1);\
+   if (si

drm/radeon/kms: a few nits

2011-10-25 Thread Ilija Hadzic

The following three patches address various minor nits. They are all safe
and I have been running with them for several months on a wide variety
of AMD GPUs

The first patch:
[PATCH 1/3] drm/radeon/kms: use crtc-specific dpms functions in
does not change the functionality and affects the function that is used
only by r100 ASICs, but make the code more correct (strictly speaking).

The second patch:
[PATCH 2/3] drm/radeon/kms: fix the crtc number check
makes the crtc number check consistent with the way it is done
in the rest of the driver code without affecting the functionality.
It was actually sent a few months ago and reviewed, but apparently
forgotten (cf. 
http://lists.freedesktop.org/archives/dri-devel/2011-May/010935.html)

The third patch:
[PATCH 3/3] drm/radeon/kms: use num_crtc instead of hard-coded value
Prevents a possible trouble waiting to happen if AMD ever makes a GPU with
more than 6 CRTCs. I don't know what's cooking in their kitchen but
if such a GPU ever comes out, this will prevent at least one regression ;-)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/3] drm/radeon/kms: use crtc-specific dpms functions in prepare and commit

2011-10-25 Thread Ilija Hadzic

it's better that radeon_crtc_commit and radeon_crtc_prepare call
crtc-specific dpms functions instead of hard-coding them to
radeon_crtc_dpms.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon_legacy_crtc.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c 
b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
index 41a5d48..0690a5b 100644
--- a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
+++ b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
@@ -1036,8 +1036,11 @@ static void radeon_crtc_prepare(struct drm_crtc *crtc)
* The hardware wedges sometimes if you reconfigure one CRTC
* whilst another is running (see fdo bug #24611).
*/
-   list_for_each_entry(crtci, &dev->mode_config.crtc_list, head)
-   radeon_crtc_dpms(crtci, DRM_MODE_DPMS_OFF);
+   list_for_each_entry(crtci, &dev->mode_config.crtc_list, head) {
+   struct drm_crtc_helper_funcs *crtc_funcs = 
crtci->helper_private;
+   if  (crtc_funcs->dpms)
+   crtc_funcs->dpms(crtci, DRM_MODE_DPMS_OFF);
+   }
 }
 
 static void radeon_crtc_commit(struct drm_crtc *crtc)
@@ -1049,8 +1052,11 @@ static void radeon_crtc_commit(struct drm_crtc *crtc)
* Reenable the CRTCs that should be running.
*/
list_for_each_entry(crtci, &dev->mode_config.crtc_list, head) {
-   if (crtci->enabled)
-   radeon_crtc_dpms(crtci, DRM_MODE_DPMS_ON);
+   if (crtci->enabled) {
+   struct drm_crtc_helper_funcs *crtc_funcs = 
crtci->helper_private;
+   if  (crtc_funcs->dpms)
+   crtc_funcs->dpms(crtci, DRM_MODE_DPMS_ON);
+   }
}
 }
 
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 3/3] drm/radeon/kms: use num_crtc instead of hard-coded value 6

2011-10-25 Thread Ilija Hadzic

radeon_driver_irq_preinstall_kms and radeon_driver_irq_uninstall_kms
hard code the loop to 6 which happens to be the current maximum
number of crtcs; if one day an ASIC with more crtcs comes out, this
is a trouble waiting to happen. it's better to use num_crtc instead
(for ASICs that have fewer than 6 CRTCs, this is still OK because
higher numbers won't be looked at)

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon_irq_kms.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c 
b/drivers/gpu/drm/radeon/radeon_irq_kms.c
index 9ec830c..a0f9d24 100644
--- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
@@ -69,7 +69,7 @@ void radeon_driver_irq_preinstall_kms(struct drm_device *dev)
rdev->irq.gui_idle = false;
for (i = 0; i < rdev->num_crtc; i++)
rdev->irq.crtc_vblank_int[i] = false;
-   for (i = 0; i < 6; i++) {
+   for (i = 0; i < rdev->num_crtc; i++) {
rdev->irq.hpd[i] = false;
rdev->irq.pflip[i] = false;
}
@@ -101,7 +101,7 @@ void radeon_driver_irq_uninstall_kms(struct drm_device *dev)
rdev->irq.gui_idle = false;
for (i = 0; i < rdev->num_crtc; i++)
rdev->irq.crtc_vblank_int[i] = false;
-   for (i = 0; i < 6; i++) {
+   for (i = 0; i < rdev->num_crtc; i++) {
rdev->irq.hpd[i] = false;
rdev->irq.pflip[i] = false;
}
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/3] drm/radeon/kms: fix the crtc number check

2011-10-25 Thread Ilija Hadzic

the crtc check in radeon_get_vblank_timestamp_kms should be against
the num_crtc field in radeon_device not against num_crtcs in drm_device

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon_kms.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index a749c26..540a9ee 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -347,7 +347,7 @@ int radeon_get_vblank_timestamp_kms(struct drm_device *dev, 
int crtc,
struct drm_crtc *drmcrtc;
struct radeon_device *rdev = dev->dev_private;
 
-   if (crtc < 0 || crtc >= dev->num_crtcs) {
+   if (crtc < 0 || crtc >= rdev->num_crtc) {
DRM_ERROR("Invalid crtc %d\n", crtc);
return -EINVAL;
}
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex

2011-10-26 Thread Ilija Hadzic




On Wed, 26 Oct 2011, Daniel Vetter wrote:



Just to check before I dig into reviewing this: Have you check all the
other users of these data structure that these functions touch whether
they don't accidentally rely on the global lock being taken?


I did and thought it was safe. I re-checked this morning prompted by 
your note and of course there is one (easily fixable) thing that I missed.

Here is the full "report":

drm_getclient is safe: the only thing that should be protected there is 
dev->filelist and that is all protected in other functions using 
struct_mutex.


drm_getstats is safe: actually I think there is no need for any mutex there
because the loop runs through quasi-static (set at load time only) data, 
and the actual count access is done with atomic_read()


drm_getmap has one problem which I can fix (and it's the hardest to 
follow): the structure that should be protected there is the dev->maplist. 
There are three functions that may potentially be an issue:


   drm_getsarea doesn't grab any lock before touching dev->maplist (so
   global mutex won't help here either). However, drivers seem to call
   it only at initialization time so it probably doesn't matter

   drm_master_destroy doesn't grab any lock either, but it is called
   from drm_master_put which is protected with dev->struct_mutex
   everywhere else in drm module. I also see a couple of calls
   to drm_master_destroy from vmwgfx driver that look unlocked
   to me, but drm_global_mutex won't help here either

   drm_getsareactx uses struct_mutex, but it apparently releases it
   too early (that's the problem I missed initially). If it's released
   after it's done with dev->maplist, we should be fine.

I'll redo this patch with a fix to drm_getsareactx and that should do it.
I would still appreciate another pair of eyes to make sure I didn't miss 
anything else


thanks,

-- Ilija






___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: drm/fb_helper: prevent some troubles waiting to happen

2011-10-26 Thread Ilija Hadzic




On Wed, 26 Oct 2011, Daniel Vetter wrote:



I've quickly checked current callsites of drm_fb_helper_init and I think
you can just kill the two arguments crtc_count and max_conn_count.


It is a usable functionality (i.e. allows the driver to select which 
connectors or CRTCs is fbcon allowed to bind to) and I see value in it.
I am actually doing some work (not ready for public release yet, but will 
be in some relatively near future) that makes the use of this feature.


If you (and the rest of the community) would prefer to see the use case 
first before merging these two patches, I am perfectly fine with waiting 
(I'll resend later, after I show the use case), but I would not like to 
see the functionality killed only because drivers don't use it at present.


thanks,

Ilija


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/3] drm/radeon/kms: use crtc-specific dpms functions in prepare and commit

2011-10-26 Thread Ilija Hadzic




On Wed, 26 Oct 2011, Michel [ISO-8859-1] D�nzer wrote:


On Die, 2011-10-25 at 22:40 -0400, Ilija Hadzic wrote:

it's better that radeon_crtc_commit and radeon_crtc_prepare call
crtc-specific dpms functions instead of hard-coding them to
radeon_crtc_dpms.


Is it really better? If it's always radeon_crtc_dpms anyway, this
obfuscates that fact (and introduces a guaranteed branch prediction
miss, but that probably doesn't matter here).



My reasoning for believing that it's better was that the functions are 
fairly generic: they loops through all CRTCs and disable/enable them all.


Although at this time these functions are only used as helpers for legacy 
crtcs (so indeed the pointer always resolves to radeon_crtc_dpms), if they 
are ever called through some other path (in the context of atombios CRTC), 
it doesn't hurt to make them ready for that.


Having said that, I don't really mind whatever the destiny of this patch 
ends up being.


(and yes, branch prediction slot penalty really don't matter here; it's a 
slow path anyway)


-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v2] drm/radeon/kms: use defined constants for crtc/hpd count instead of hard-coded value 6

2011-10-26 Thread Ilija Hadzic

radeon_driver_irq_preinstall_kms and radeon_driver_irq_uninstall_kms
hard code the loop to 6 which happens to be the current maximum
number of crtcs and hpd pins; if one day an ASIC with more crtcs
(or hpd pins) comes out, this is a trouble waiting to happen.

introduce constants for maximum CRTC count, maximum HPD pins count
and maximum HDMI blocks count (per FIXME in radeon_irq structure)
and correct the loops in radeon_driver_irq_preinstall_kms and
radeon_driver_irq_uninstall_kms

v2: take care of goofs pointed out by Alex Deucher

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/radeon/radeon.h |   19 ++-
 drivers/gpu/drm/radeon/radeon_irq_kms.c |   12 ++--
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 156b8b7..c6841fd 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -437,25 +437,26 @@ union radeon_irq_stat_regs {
struct evergreen_irq_stat_regs evergreen;
 };
 
+#define RADEON_MAX_HPD_PINS 6
+#define RADEON_MAX_CRTCS 6
+#define RADEON_MAX_HDMI_BLOCKS 2
+
 struct radeon_irq {
boolinstalled;
boolsw_int;
-   /* FIXME: use a define max crtc rather than hardcode it */
-   boolcrtc_vblank_int[6];
-   boolpflip[6];
+   boolcrtc_vblank_int[RADEON_MAX_CRTCS];
+   boolpflip[RADEON_MAX_CRTCS];
wait_queue_head_t   vblank_queue;
-   /* FIXME: use defines for max hpd/dacs */
-   boolhpd[6];
+   boolhpd[RADEON_MAX_HPD_PINS];
boolgui_idle;
boolgui_idle_acked;
wait_queue_head_t   idle_queue;
-   /* FIXME: use defines for max HDMI blocks */
-   boolhdmi[2];
+   boolhdmi[RADEON_MAX_HDMI_BLOCKS];
spinlock_t sw_lock;
int sw_refcount;
union radeon_irq_stat_regs stat_regs;
-   spinlock_t pflip_lock[6];
-   int pflip_refcount[6];
+   spinlock_t pflip_lock[RADEON_MAX_CRTCS];
+   int pflip_refcount[RADEON_MAX_CRTCS];
 };
 
 int radeon_irq_kms_init(struct radeon_device *rdev);
diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c 
b/drivers/gpu/drm/radeon/radeon_irq_kms.c
index 9ec830c..93da855 100644
--- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
@@ -67,10 +67,10 @@ void radeon_driver_irq_preinstall_kms(struct drm_device 
*dev)
/* Disable *all* interrupts */
rdev->irq.sw_int = false;
rdev->irq.gui_idle = false;
-   for (i = 0; i < rdev->num_crtc; i++)
-   rdev->irq.crtc_vblank_int[i] = false;
-   for (i = 0; i < 6; i++) {
+   for (i = 0; i < RADEON_MAX_HPD_PINS; i++)
rdev->irq.hpd[i] = false;
+   for (i = 0; i < RADEON_MAX_CRTCS; i++) {
+   rdev->irq.crtc_vblank_int[i] = false;
rdev->irq.pflip[i] = false;
}
radeon_irq_set(rdev);
@@ -99,10 +99,10 @@ void radeon_driver_irq_uninstall_kms(struct drm_device *dev)
/* Disable *all* interrupts */
rdev->irq.sw_int = false;
rdev->irq.gui_idle = false;
-   for (i = 0; i < rdev->num_crtc; i++)
-   rdev->irq.crtc_vblank_int[i] = false;
-   for (i = 0; i < 6; i++) {
+   for (i = 0; i < RADEON_MAX_HPD_PINS; i++)
rdev->irq.hpd[i] = false;
+   for (i = 0; i < RADEON_MAX_CRTCS; i++) {
+   rdev->irq.crtc_vblank_int[i] = false;
rdev->irq.pflip[i] = false;
}
radeon_irq_set(rdev);
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/3] drm/radeon/kms: use num_crtc instead of hard-coded value 6

2011-10-26 Thread Ilija Hadzic




On Wed, 26 Oct 2011, Alex Deucher wrote:



This is actually not quite right.  The number of HPD (Hot Plug Detect)
pins is not equal to the number of crtcs.  Radeons have supported 6
HPD pins long before we supported 6 crtcs (e.g., cards with more
connectors than crtcs).  The logic should probably look like:
[SNIP]


I have just sent out a patch that I think rectifies this. Hopefully, it's 
good now.


-- Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex

2011-10-26 Thread Ilija Hadzic




On Wed, 26 Oct 2011, Daniel Vetter wrote:



While you mess around with this code, please use the standard linux
wait_event_* infrastructure.


Right now the order of entering the queue is guaranteed to be the same as 
the order of entering the ioctl, which reflects the order of progressing 
vblank sequence. I wanted to preserve this semantic, so I need a wait 
function that puts the task into the queue and then unlocks the mutex 
(essentially a kernel equivalent of pthread_cond_wait that POSIX threads 
library implements).


The closest "approximation" of that wait_event_interruptable_locked, but 
that requires a spinlock, not mutex (and thus the rework to 
drm_wait_vblank ioctl would be more radical.



I want to stop that DRM_FOO yelling from
spreading around - the idea of the drm core being os agnostic died quite a
while ago.



Is DRM support for other OS-es officially dead ? I know it's not in best 
shape on BSD and friends, but I am not sure if I should make it worse (or 
inconsistent with the current code shape and form).


Anyway, the primary reason for implementing the DRM_WAIT_ON_LOCKED
is because I didn't have the function that enqueues the task and then 
releases the mutex.



Again, have you carefully checked whether this is safe and how the
relevant data structures (vblank counting) are proctected?



I am only moving the global lock further down in the function. The 
structures moved out of the critical section are only local vars. and 
arguments. So it's safe. Had I changed the mutex to something other than 
global one (which is the right thing to do in a long run) then a more 
careful review would be warranted.


Also note that running drm_wait_ioctl as DRM_LOCKED is a severe problem 
that we have to address quickly. So I'd like to get *some* kind of decent 
fix in this merge window and then follow up with more polishing later.


For example, try compiling and running this code (which any bozo in the 
userland can do):


#include 
#include 

main()
{
int fd;
drmVBlank vbl;

fd = open("/dev/dri/card0", 0);
printf("fd=%d\n", fd);

while(1) {
vbl.request.type =  DRM_VBLANK_RELATIVE ;
vbl.request.sequence = 60;
printf("waiting on vblank\n");
ioctl(fd, DRM_IOCTL_WAIT_VBLANK, &vbl);
}
}

Then start glxgears (on CRTC-0) and watch the hell break loose. The 
hostile code will cause any OpenGL application to hang for a full second 
before it can even enter the vblank_wait. Now because all vblank waits go 
through a common function in DDX, the whole windowing system will 
consequently go into a crawl. Run it with Compiz and things are even worse 
(the whole windowing system goes "stupid" as soon as the hostile 
application is started).




+   add_wait_queue(&(queue), &entry);   \
+   mutex_unlock(&drm_global_mutex);\


Hiding a lock-dropping in this fashion is horrible.



I explained above why am I have to release the lock inside the macro. 
That's equivalent to what you can find in the userland in POSIX threads, 
like I said. So that's not bad.


What *is* bad is that I should have passed the mutex as an argument to 
DRM_WAIT_ON_LOCKED and that I'll fix.


-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex

2011-10-26 Thread Ilija Hadzic




On Wed, 26 Oct 2011, Michel [ISO-8859-1] D�nzer wrote:



Does it really need drm_global_mutex at all, as opposed to e.g.
dev->struct_mutex?




It doesn't. Actually, it would probably suffice to have a mutex that is 
one-per-CRTC, because vlbank waits on different CRTCs don't step on each 
other, but I was going for a conservative fix. I tried to avoid 
having to hunt around in other sections of code for lines that possibly 
assume that a global lock is held while touching vblank counter structures 
(one of concerns raised by Daniel which is a non-issue because I didn't 
change the mutex being used).


The "urgent" part of this patch is to make sure we don't go to sleep while 
holding the mutex. In my response to Daniel, I sent an example of a simple 
userland application that can hang up the whole windowing system. That's a 
big deal for which we have to get the fix in quickly.


After that we can follow up with more polishing (e.g. use per-CRTC mutex, 
review and potentially fix other parts of the code that deal with vblank 
and maybe assume global mutex protection).




I agree with Daniel's sentiment on this. AFAICT add_wait_queue()
protects against concurrent access to the wait queue, so I think it
would be better to just drop the mutex explicitly before calling
DRM_WAIT_ON.



The problem is that if I release the mutex just before calling 
DRM_WAIT_ON, the task can be scheduled away right at that point.

Then another task can go all the way into the DRM_WAIT_ON and enter
the queue. Then the first task wakes up and enters the queue.

Now the last task in the queue is *not* the task that updated
the dev->last_vblank_wait[crtc] value last.

If you or someone who knows better than me can tell me that this potential 
reordering is guaranteed harmless, I'll gladly drop the mutex earlier and 
then I no longer need the DRM_WAIT_ON_LOCKED macro.


However, if the preservation of the order is actually important, then it 
will be an ugly race condition to track later.


-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex

2011-10-27 Thread Ilija Hadzic




On Thu, 27 Oct 2011, Alan Coopersmith wrote:



I think the idea of sharing kernel drm code is pretty much dead, yeah.


We are still trying to keep it alive, despite what some may think.



In the context of this patch (in progress), it's probably a moot topic 
because per comments that Daniel sent, most of the locks in 
drm_vblank_wait will become unecessary after I rework the patch

and I won't need to introduce (new) DRM_WAIT_ON_LOCKED any more.

I will, however, *not* convert existing DRM_WAIT_ON to a Linux-centric 
function, so I will not be the one to kill (or contribute to the killing 
of) the portability of that file. That I leave to someone else ;-).


P.S. If the "fight" starts I will side with the "keep the portability 
alive" camp. I have "emotional" ties with other (less widely used) 
operating systems. ;-)


-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex

2011-10-27 Thread Ilija Hadzic




On Thu, 27 Oct 2011, Daniel Vetter wrote:



On a quick glance
- drm_vblank functions call by wait_vblank seem to do correct locking
 already using various spinlocks and atomic ops.
- linux waitqueues have their own locking
- dev->last_vblank_wait is only used for a procfs file. I don't think it
 makes much sense given that we can have more than one vblank waiter
- there's no selective wakeup going on, all waiters get woken for every
 vblank and call schedule again if it's not the right vblank



I concur.



The only really hairy thing going on is racing modeset ioctls against
vblank waiters. But modeset is protected by the dev->mode_config.mutex
and hence already races against wait_vblank with the current code, so I'm
slightly inclined to ignore this for the moment. Would be great if you
coudl check that in-depth, too.



Will leave this for some other patch at some other time; the critical path 
is to fix the hang/crawl that I explained in my previous note.




So I think the right thing to do is
- Kill dev->last_vblank_wait (in a prep patch).


Agreed. Also drm_vblank_info function can go away


- Imo we also have a few too many atomic ops going on, e.g.
 dev->vblank_refcount looks like it's atomic only because or the procfs
 file, so maybe kill that procfs file entirely.


I am not sure about that. dev->vblank_refcount looks critical to me: it is 
incremented through drm_vblank_get which is called from several different 
contexts: page-flip functions in several drivers (which can run in the 
context of multiple user processes), **_pm_** stuff in radeon driver 
(which looks like it runs in the context of kernel worker thread). Mutexes 
used at all these different places are not quite consistent. If anything, 
as the result of a later audit, some other mutexes may be rendered 
unecessary, but definitely, I would keep vblank_refcount atomic.



- Audit the vblank locking, maybe resulting in a patch with updated
 comments, if there's anything misleading or tricky going on. And it's
 always good when the locking of structe members is explained in the
 header file ...


I'll add comments that I feel make sense for this set of patches (if 
anything). Full audit, should come later at a slower pace. There are quite 
a few mutexes and locks many of which are overlapping in function and some 
are spurious. It will take more than just myself to untangle this know.



- Drop the mutex locking because it's not needed.



Agreed. Will drop.


While locking at the code I also noticed that wait_vblank calls
drm_vblank_get, but not always drm_vblank_put. The code is correct, the
missing vblank_put is hidden in drm_queue_vblank_event. Can you also write
the patch to move that around to not trip up reviewers the next time
around?



Actually, there is something fishy here. Look at the 'else' branch under 
the 'if ((seq - vblwait->request.sequence) <= (1 << 23))' at the end of 
the drm_queue_vblank_event. It doesn't have the 'drm_vblank_put'. It looks 
wrong to me, but I need to first convince myself that there is not some 
other obscure place where drm_vblank_put is accounted for. If that branch 
is really missing a drm_vblank_put, then it will be easy pull the 
drm_vblank_put out. Otherwise, it will be another knot to untangle.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex

2011-10-27 Thread Ilija Hadzic




On Thu, 27 Oct 2011, Ilija Hadzic wrote:




While locking at the code I also noticed that wait_vblank calls
drm_vblank_get, but not always drm_vblank_put. The code is correct, the
missing vblank_put is hidden in drm_queue_vblank_event. Can you also write
the patch to move that around to not trip up reviewers the next time
around?



Actually, there is something fishy here. Look at the 'else' branch under the 
'if ((seq - vblwait->request.sequence) <= (1 << 23))' at the end of the 
drm_queue_vblank_event. It doesn't have the 'drm_vblank_put'. It looks wrong 
to me, but I need to first convince myself that there is not some other 
obscure place where drm_vblank_put is accounted for. If that branch is really 
missing a drm_vblank_put, then it will be easy pull the drm_vblank_put out. 
Otherwise, it will be another knot to untangle.





OK, I checked this. Although the code is a little hard to read, it is done 
the way it is with the reason. Whenever the drm_queue_vblank_event and 
that 'else' branch is caught, the drm_vblank_put should *not* be called.
The reason is that the relevant vblank has not come in yet, so the 
reference must remain up so that the vblank interrupts are not disabled 
until the event occurs.


The seemingly missing drm_vblank_put will be accounted for in 
drm_handle_vblank_events.


So I should not move anything around. If anything, this should be 
annotated with comments to explain what's going on.


-- Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex

2011-10-28 Thread Ilija Hadzic




On Fri, 28 Oct 2011, Daniel Vetter wrote:


On Fri, Oct 28, 2011 at 08:59:04AM +0200, Michel Dänzer wrote:

On Don, 2011-10-27 at 22:19 -0500, Ilija Hadzic wrote:

On Thu, 27 Oct 2011, Daniel Vetter wrote:


So I think the right thing to do is
- Kill dev->last_vblank_wait (in a prep patch).


Agreed. Also drm_vblank_info function can go away


Actually, I was rather going to submit a patch to hook it up again —
AFAICT it was unhooked without any justification. It could be useful for
debugging vblank related hangs. Any issues with it, such as
last_vblank_wait not being guaranteed to really be the last one, can
always be improved later on.


I've thought a bit about the usefulness of it for debugging before
proposing to kill it and I think it can die: It's only really useful for a
complete hangs, if we have an issue we just missing a wakeup somewhere,
that's not gonna catch things. Hence I think something that allows you to
watch things while it's running is much better, i.e. either a drm debug
prinkt or a tracecall.

But if you're firmly attached to that debug file it should be pretty easy
to shove that under the protection of one of the vblank spinlocks.


I'll keep it then and figure out the best mutex/spinlock to use. It can be 
anything that exists on one-per-CRTC basis (vblank waits on different CTCs 
are not contending). The critical section is from that switch in which 
vblwait->request.sequence is incremented until it is assigned to 
dev->last_vblank_wait[crtc] (and we are only protecting that section 
against itself, executed in contexts of different PIDs).


I guess we settled now on this patch (other comments will be 
addressed in a different set of patches).


-- Ilija___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v2] drm: do not sleep on vblank while holding a mutex

2011-10-28 Thread Ilija Hadzic

drm_wait_vblank must be DRM_UNLOCKED because otherwise it
will grab the drm_global_mutex and then go to sleep until the vblank
event it is waiting for. That can wreck havoc in the windowing system
because if one process issues this ioctl, it will block all other
processes for the duration of all vblanks between the current and the
one it is waiting for. In some cases it can block the entire windowing
system. So, make this ioctl DRM_UNLOCKED, wrap the remaining
(very short) critical section with dev->vbl_lock spinlock, and
add a comment to the code explaining what we are protecting with
the lock.

v2: incorporate comments received from Daniel Vetter and
Michel Daenzer.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_drv.c |2 +-
 drivers/gpu/drm/drm_irq.c |   14 +-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index e159f17..c990dab 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -124,7 +124,7 @@ static struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_SG_ALLOC, drm_sg_alloc_ioctl, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
DRM_IOCTL_DEF(DRM_IOCTL_SG_FREE, drm_sg_free, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 
-   DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, 0),
+   DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, DRM_UNLOCKED),
 
DRM_IOCTL_DEF(DRM_IOCTL_MODESET_CTL, drm_modeset_ctl, 0),
 
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 3830e9e..c8b4da8 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1160,6 +1160,7 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
union drm_wait_vblank *vblwait = data;
int ret = 0;
unsigned int flags, seq, crtc, high_crtc;
+   unsigned long irqflags;
 
if ((!drm_dev_to_irq(dev)) || (!dev->irq_enabled))
return -EINVAL;
@@ -1193,6 +1194,13 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
}
seq = drm_vblank_count(dev, crtc);
 
+   /* the lock protects this section from itself executed in */
+   /* the context of another PID, ensuring that the process that */
+   /* wrote dev->last_vblank_wait is really the last process */
+   /* that was here; processes waiting on different CRTCs */
+   /* do not need to be interlocked, but rather than introducing */
+   /* a new per-crtc lock, we reuse vbl_lock for simplicity */
+   spin_lock_irqsave(&dev->vbl_lock, irqflags);
switch (vblwait->request.type & _DRM_VBLANK_TYPES_MASK) {
case _DRM_VBLANK_RELATIVE:
vblwait->request.sequence += seq;
@@ -1200,12 +1208,15 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
case _DRM_VBLANK_ABSOLUTE:
break;
default:
+   spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
ret = -EINVAL;
goto done;
}
 
-   if (flags & _DRM_VBLANK_EVENT)
+   if (flags & _DRM_VBLANK_EVENT) {
+   spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
return drm_queue_vblank_event(dev, crtc, vblwait, file_priv);
+   }
 
if ((flags & _DRM_VBLANK_NEXTONMISS) &&
(seq - vblwait->request.sequence) <= (1<<23)) {
@@ -1215,6 +1226,7 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
  vblwait->request.sequence, crtc);
dev->last_vblank_wait[crtc] = vblwait->request.sequence;
+   spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
DRM_WAIT_ON(ret, dev->vbl_queue[crtc], 3 * DRM_HZ,
(((drm_vblank_count(dev, crtc) -
   vblwait->request.sequence) <= (1 << 23)) ||
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v2] drm: make DRM_UNLOCKED ioctls with their own mutex

2011-10-28 Thread Ilija Hadzic

drm_getclient, drm_getstats and drm_getmap (with a few minor
adjustments) do not need global mutex, so fix that and
make the said ioctls DRM_UNLOCKED. Details:

  drm_getclient: the only thing that should be protected here
  is dev->filelist and that is already protected everywhere with
  dev->struct_mutex.

  drm_getstats: there is no need for any mutex here because the
  loop runs through quasi-static (set at load time only)
  data, and the actual count access is done with atomic_read()

  drm_getmap already uses dev->struct_mutex to protect
  dev->maplist, which also used to protect the same structure
  everywhere else except at three places:
  * drm_getsarea, which doesn't grab *any* mutex before
touching dev->maplist (so no drm_global_mutex doesn't help
here either; different issue for a different patch).
However, drivers seem to call it only at
initialization time so it probably doesn't matter
  * drm_master_destroy, which is called from drm_master_put,
which in turn is protected with dev->struct_mutex
everywhere else in drm module, so we are good here too.
  * drm_getsareactx, which releases the dev->struct_mutex
too early, but this patch includes the fix for that.

v2: * incorporate comments received from Daniel Vetter
* include the (long) explanation above to make it clear what
  we are doing (and why), also at Daniel Vetter's request
* tighten up mutex grab/release locations to only
  encompass real critical sections, rather than some
  random code around them

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_context.c |5 +++--
 drivers/gpu/drm/drm_drv.c |6 +++---
 drivers/gpu/drm/drm_ioctl.c   |   15 ---
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/drm_context.c b/drivers/gpu/drm/drm_context.c
index 6d440fb..325365f 100644
--- a/drivers/gpu/drm/drm_context.c
+++ b/drivers/gpu/drm/drm_context.c
@@ -154,8 +154,6 @@ int drm_getsareactx(struct drm_device *dev, void *data,
return -EINVAL;
}
 
-   mutex_unlock(&dev->struct_mutex);
-
request->handle = NULL;
list_for_each_entry(_entry, &dev->maplist, head) {
if (_entry->map == map) {
@@ -164,6 +162,9 @@ int drm_getsareactx(struct drm_device *dev, void *data,
break;
}
}
+
+   mutex_unlock(&dev->struct_mutex);
+
if (request->handle == NULL)
return -EINVAL;
 
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index c990dab..dc0eb0b 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -64,9 +64,9 @@ static struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_GET_UNIQUE, drm_getunique, 0),
DRM_IOCTL_DEF(DRM_IOCTL_GET_MAGIC, drm_getmagic, 0),
DRM_IOCTL_DEF(DRM_IOCTL_IRQ_BUSID, drm_irq_by_busid, 
DRM_MASTER|DRM_ROOT_ONLY),
-   DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, 0),
-   DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, 0),
-   DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, 0),
+   DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, DRM_UNLOCKED),
+   DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, DRM_UNLOCKED),
+   DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, DRM_MASTER),
 
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 904d7e9..956fd38 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -158,14 +158,11 @@ int drm_getmap(struct drm_device *dev, void *data,
int i;
 
idx = map->offset;
-
-   mutex_lock(&dev->struct_mutex);
-   if (idx < 0) {
-   mutex_unlock(&dev->struct_mutex);
+   if (idx < 0)
return -EINVAL;
-   }
 
i = 0;
+   mutex_lock(&dev->struct_mutex);
list_for_each(list, &dev->maplist) {
if (i == idx) {
r_list = list_entry(list, struct drm_map_list, head);
@@ -211,9 +208,9 @@ int drm_getclient(struct drm_device *dev, void *data,
int i;
 
idx = client->idx;
-   mutex_lock(&dev->struct_mutex);
-
i = 0;
+
+   mutex_lock(&dev->struct_mutex);
list_for_each_entry(pt, &dev->filelist, lhead) {
if (i++ >= idx) {
client->auth = pt->authenticated;
@@ -249,8 +246,6 @@ int drm_getstats(struct drm_device *dev, void *data,
 
memset(stats, 0, sizeof(*stats));
 
-   mutex_lock(&dev->struct_mutex);
-
for (i = 0; i < dev->counters; i++) {
if (dev->types[i] == _DRM_STAT_LOCK)
stats->data[i].value =
@@ -262,8 +25

[PATCH] drm: add some comments to drm_wait_vblank and drm_queue_vblank_event

2011-10-28 Thread Ilija Hadzic

during the review of the fix for locks problems in drm_wait_vblank,
a couple of false concerns were raised about how the drm_vblank_get
and drm_vblank_put are used in this function; it turned out that the
code is correct and that it cannot be simplified

add a few comments to explain non-obvious flows in the code,
to prevent "false alarms" in the future

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_irq.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index c8b4da8..e9dd19d 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1065,6 +1065,10 @@ out:
return ret;
 }
 
+/* must acquire vblank reference count (call drm_vblank_get) */
+/* before calling this function; the matching drm_vblank_put */
+/* will either be issued here or in drm_handle_vblank_events */
+/* after the vblank is signaled */
 static int drm_queue_vblank_event(struct drm_device *dev, int pipe,
  union drm_wait_vblank *vblwait,
  struct drm_file *file_priv)
@@ -1124,6 +1128,9 @@ static int drm_queue_vblank_event(struct drm_device *dev, 
int pipe,
trace_drm_vblank_event_delivered(current->pid, pipe,
 vblwait->request.sequence);
} else {
+   /* can't call drm_vblank_put here because interrupt */
+   /* must remain enabled until the event occurs */
+   /* drm_handle_vblank_events will do this for us */
list_add_tail(&e->base.link, &dev->vblank_event_list);
vblwait->reply.sequence = vblwait->request.sequence;
}
@@ -1215,6 +1222,7 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
 
if (flags & _DRM_VBLANK_EVENT) {
spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
+   /* drm_queue_vblank_event() will call drm_vblank_put() */
return drm_queue_vblank_event(dev, crtc, vblwait, file_priv);
}
 
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2] drm: do not sleep on vblank while holding a mutex

2011-10-28 Thread Ilija Hadzic




On Sat, 29 Oct 2011, Daniel Vetter wrote:

+   /* the lock protects this section from itself executed in */
+   /* the context of another PID, ensuring that the process that */
+   /* wrote dev->last_vblank_wait is really the last process */
+   /* that was here; processes waiting on different CRTCs */
+   /* do not need to be interlocked, but rather than introducing */
+   /* a new per-crtc lock, we reuse vbl_lock for simplicity */


Multiline comments are done with one pair of /* */, see CodingStyle




Will fix the multiline comments (though scripts/checkpatch.pl didn't 
complain so I figured it was OK)





I personally vote for no additional locking at all here and just drop the
global lock. Or maybe I'm blind and we need this. In that case, please
clue me up.




What I am doing with the lock, is makeing sure that the last vblank wait 
reported is really last. You said in your previous comment (which I agree 
with) that the value of having last_vblank_wait in debugfs is in case of 
hangs. In that case you want to see who really was the last one to issue 
the vblank.


Now suppose that the drm_wait_vblank is enteret in the context of two 
different PIDs and suppose that there are no locks. Let's say that the 
first process wants to wait on vblank N and the second wants to wait on 
vblank N+1. First process can reach the point just before it wants to 
write to last_vblank_wait and lose the processor there (so it has N in 
vblank.request (on its stack) calculated, but it has not written it into 
last_vblank_wait yet (because it lost the CPU right there). Then the 
second process gets the CPU, and executes and let's say that it goes all 
the way through so it writes N+1 into last_vblank_wait and then schedules 
away (it called DRM_WAIT_ON). Now the first process gets the CPU where it 
left off and overwrites N+1 in last_vblank_wait with N, which is now 
stale.


I explained in the comment (and in my note this morning), that the only 
race condition is against itself in the context of different processes.

The above is the scenario.

Race against drm_vblank_info exists in theory, but in practice doesn't 
matter because the reader of debugfs (or proc file system) runs 
asynchronously (and typically at slower pace) from processes issuing 
vblank waits (if it's a human looking at the filesystem then definitely 
much slower pace). Since processes issuing vblanks are overwriting the 
same last_vblank_value, drm_vblank_info is already doing a form of 
downsampling and debugfs is losing information, anyway, so who cares about 
the lock. In case of a hang, the value of last_vblank_wait doesn't change,
so again the lock in drm_vblank_info doesn't change anything. So adding a 
lock there (which would have to be vbl_lock to make it usable) is only a 
matter of theoretical correctness, but no practical value. So I decided 
not to touch drm_vblank_info.


drm_wait_vblank, however, needs a protection from itself as I explained 
above.




DRM_WAIT_ON(ret, dev->vbl_queue[crtc], 3 * DRM_HZ,
(((drm_vblank_count(dev, crtc) -
   vblwait->request.sequence) <= (1 << 23)) ||


One line below there's the curios case of us rechecking dev->irq_enabled.
Without any locking in place. But afaics this is only changed by the
driver at setup and teardown when we are guaranteed (hopefully) that
there's no open fd and hence no way to call an ioctl. So checking once at
the beginning should be good enough. Whatever.


It's probably redundant statement because it won't be reached if 
irq_enabled is false and there is nothing to change it to true at 
runtime, but that's a topic for a different patch.


-- Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2] drm: do not sleep on vblank while holding a mutex

2011-10-29 Thread Ilija Hadzic




On Sat, 29 Oct 2011, Daniel Vetter wrote:



Ok, and here's why your locking (or any locking that drops the lock before
calling schedule) won't work: [SNIP]



You just came full circle. Recall that in my v1 patch I went all the way 
to enqueuing the process in the wait queue before dropping the lock. That 
would have guaranteed that if there is a hangup, what last_vblank_wait 
says is the last is really the last. If there is no hang, then it doesn't 
matter because last_vlank_wait is constantly overwritten (and is indeed 
stale for N-1 processes). However, that was disliked in the review and I 
didn't want to argue.


So in the interest of making progress, it looks that you would be happy if 
this patch just dropped DRM_UNLOCKED and declared that we don't care about 
(potentially theoretical) critical section related to last_vblank_wait.


If that's the case, I'll cut a relatvely trivial patch that drops 
DRM_UNLOCKED from this system call to solve the problem that I pointed 
earlier in this thread and leave all the rest of the locking discussion 
for other patches.


Would that be fine by you ?

thanks,

Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v3] drm: do not sleep on vblank while holding a mutex

2011-10-31 Thread Ilija Hadzic

drm_wait_vblank must be DRM_UNLOCKED because otherwise it
will grab the drm_global_mutex and then go to sleep until the vblank
event it is waiting for. That can wreck havoc in the windowing system
because if one process issues this ioctl, it will block all other
processes for the duration of all vblanks between the current and the
one it is waiting for. In some cases it can block the entire windowing
system.

v2: incorporate comments received from Daniel Vetter and
Michel Daenzer.

v3: after a lengty discussion with Daniel Vetter, it was concluded
we should not worry about any locking, within drm_wait_vblank
function so this patch becomes a rather trivial removal
of drm_global_mutex from drm_wait_vblank

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_drv.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index dbabcb0..dc0eb0b 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -124,7 +124,7 @@ static struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_SG_ALLOC, drm_sg_alloc_ioctl, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
DRM_IOCTL_DEF(DRM_IOCTL_SG_FREE, drm_sg_free, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 
-   DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, 0),
+   DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, DRM_UNLOCKED),
 
DRM_IOCTL_DEF(DRM_IOCTL_MODESET_CTL, drm_modeset_ctl, 0),
 
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v2] drm: add some comments to drm_wait_vblank and drm_queue_vblank_event

2011-10-31 Thread Ilija Hadzic

during the review of the fix for locks problems in drm_wait_vblank,
a couple of false concerns were raised about how the drm_vblank_get
and drm_vblank_put are used in this function; it turned out that the
code is correct and that it cannot be simplified

add a few comments to explain non-obvious flows in the code,
to prevent "false alarms" in the future

v2: incorporate comments received from Daniel Vetter

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_irq.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 3830e9e..79c02da 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1124,6 +1124,7 @@ static int drm_queue_vblank_event(struct drm_device *dev, 
int pipe,
trace_drm_vblank_event_delivered(current->pid, pipe,
 vblwait->request.sequence);
} else {
+   /* drm_handle_vblank_events will call drm_vblank_put */
list_add_tail(&e->base.link, &dev->vblank_event_list);
vblwait->reply.sequence = vblwait->request.sequence;
}
@@ -1204,8 +1205,12 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
goto done;
}
 
-   if (flags & _DRM_VBLANK_EVENT)
+   if (flags & _DRM_VBLANK_EVENT) {
+   /* must hold on to the vblank ref until the event fires
+* drm_vblank_put will be called asynchronously
+*/
return drm_queue_vblank_event(dev, crtc, vblwait, file_priv);
+   }
 
if ((flags & _DRM_VBLANK_NEXTONMISS) &&
(seq - vblwait->request.sequence) <= (1<<23)) {
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v3] drm: do not sleep on vblank while holding a mutex

2011-10-31 Thread Ilija Hadzic



OK, v4 coming up and your suggestion will be copied verbatim. Hopefully 
that does it.


This patch is probably going to become a record-holder in comment/code 
lines ratio ;-)


-- Ilija


On Mon, 31 Oct 2011, Daniel Vetter wrote:


On Mon, Oct 31, 2011 at 01:10:21PM -0400, Ilija Hadzic wrote:

drm_wait_vblank must be DRM_UNLOCKED because otherwise it
will grab the drm_global_mutex and then go to sleep until the vblank
event it is waiting for. That can wreck havoc in the windowing system
because if one process issues this ioctl, it will block all other
processes for the duration of all vblanks between the current and the
one it is waiting for. In some cases it can block the entire windowing
system.

v2: incorporate comments received from Daniel Vetter and
Michel Daenzer.

v3: after a lengty discussion with Daniel Vetter, it was concluded
we should not worry about any locking, within drm_wait_vblank
function so this patch becomes a rather trivial removal
of drm_global_mutex from drm_wait_vblank


That commit message is a bit garbage. What about ...

"... it was concluded that the only think not yet protected with locks and
atomic ops is the write to dev->last_vblank_wait. It's only used in a
debug file in proc, and the current code already employs no correct
locking: the proc file only takes dev->struct_mutex, whereas
drm_wait_vblank implicitly took the drm_global_mutex. Given all this, it's
not worth bothering to try to fix this at this time."

I think it's important to correctly document the conclusion of this
discussion, because we've worried quite a bit about correct locking.

Yours, Daniel
--
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v4] drm: do not sleep on vblank while holding a mutex

2011-10-31 Thread Ilija Hadzic

drm_wait_vblank must be DRM_UNLOCKED because otherwise it
will grab the drm_global_mutex and then go to sleep until the vblank
event it is waiting for. That can wreck havoc in the windowing system
because if one process issues this ioctl, it will block all other
processes for the duration of all vblanks between the current and the
one it is waiting for. In some cases it can block the entire windowing
system.

v2: incorporate comments received from Daniel Vetter and
Michel Daenzer.

v3/v4: after a lengty discussion with Daniel Vetter, it was concluded
   that the only thing not yet protected with locks and atomic
   ops is the write to dev->last_vblank_wait. It's only used in a
   debug file in proc, and the current code already employs no
   correct locking: the proc file only takes dev->struct_mutex,
   whereas drm_wait_vblank implicitly took the drm_global_mutex.
   Given all this, it's not worth bothering to try to fix
   the locks at this time.

Signed-off-by: Ilija Hadzic 
---
 drivers/gpu/drm/drm_drv.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index dbabcb0..dc0eb0b 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -124,7 +124,7 @@ static struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_SG_ALLOC, drm_sg_alloc_ioctl, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
DRM_IOCTL_DEF(DRM_IOCTL_SG_FREE, drm_sg_free, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 
-   DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, 0),
+   DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, DRM_UNLOCKED),
 
DRM_IOCTL_DEF(DRM_IOCTL_MODESET_CTL, drm_modeset_ctl, 0),
 
-- 
1.7.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC] Virtual CRTCs (proposal + experimental code)

2011-11-03 Thread Ilija Hadzic

Hi everyone,

I would like to bring to the attention of dri-devel and linux-fbdev
community a set of hopefully useful and interesting patches that
I (and a few other colleagues) have been working on during the past
few months. Here, I will provide a short abstract, so that you can
decide whether this is of interest for you. At the end, I will
provide the pointers to the code and documentation.

The code is based on Dave Arilie's tree, drm-next branch and it
allows a GPU driver to have an arbitrary number of CRTCs
(configurable by user) instead of only those CRTCs that represent
real hardware.

The new CRTCs, that we call Virtual CRTCs can be attached to a
foreign device, that we call CTD device (short for Compression
Transmission and Display), and pixels can be streamed out of the GPU
to the device.

In one example, we use AMD/ATI Radeon GPU to do 3D rendering
(accelerated, of course) and we use our code to add additional
monitor heads using DisplayLink devices. In other words, we achieve
accelerated 3D rendering on a DisplayLink monitor. In another example
we funnel rendered pixels to userland by emulating a Video-for-Linux
device (and then userland can do whatever it wants with it). While
doing all this, GPU has no idea that we are doing this, the entire DRI
"thinks" that it is just dealing with a GPU that has a few "extra"
connectors and CRTCs. So everything ports without the need to modify
anything in the userland.

In general any device that can do something good with rendered pixels
can act as a CTD device, allowing a GPU to be an acceleration device
for a less capable display device or (the opposite) a frame-buffer-based
display device to be an expansion card for a GPU. Of course, for
each display device, a driver would have to be made compatible with our
new infrastructure (which is what we have done with DisplayLink driver
and also wrote one "synthetic" driver to fake out a V4L2 device as a
CTD device).

The newly introduced kernel module that we call VCRTCM (short for
Virtual CRTC Manager) handles the "traffic" between GPUs (actually
their CRTCs) and CTDs. The code makes use of DMA wherever possible
and also deals with specifics of CRTCs like modes, vblanks, page
flips, hardware cursor, etc. (just for kick, we played OpenArena
and watched Unigine Heaven demo on a DisplayLink monitor).

At this time, we would like to solicit feedback, comments, and
possibly contributions. The code is on github (pointers below)
and is based on the current state of drm-next branch from Dave's
tree. The code is very experimental, but complete and stable enough
that you can do something useful with it. We will be adding more
CTD drivers and updates to current ones in the near future and will
continue to maintain the code on github.

If the community finds this useful, we would be glad to work with
the maintainers on merging this upstream. So we would especially like
to hear what you would like to see changed to make this code acceptable
for the mainline of development.

My Github page is at https://github.com/ihadzic. To access the kernel
code type:

$ git clone git://github.com/ihadzic/linux-vcrtcm.git
$ git branch drm-next-vcrtcm origin/drm-next-vcrtcm
$ git checkout drm-next-vcrtcm

You will get all that's currently on Dave's drm-next plus our patches on
the top. We kept the development history preserved without squashing patches
(unless we had to due to merge/rebase conflicts), so you can see (and laugh
at) all our goofs and fixes to them.

To access the documentation, type:

$ git clone git://github.com/ihadzic/vcrtcm-doc.git

Then read the HOWTO.txt file. The first few sections provide some
general overview, and the sections that come later provide instructions
how to use our stuff.

Again, all comments, positive or negative, are very welcome.

-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Virtual CRTCs (proposal + experimental code)

2011-11-03 Thread Ilija Hadzic




On Thu, 3 Nov 2011, Roland Scheidegger wrote:



Am I right in assuming this could also be used for making muxless hybrid
GPUs work (i.e. radeon/intel igp)?


Yes, this is actually one of the scenarios on my wish list too. In the 
terminology that I defined with the introduction of VCRTCM, IGP would act

as a CTD driver and Radeon would be the GPU driver.

It should be relatively straightforward to add CTD hooks to an existing
Intel GPU driver and load the module in "CTD mode" when you want to do 
that.


Another (maybe trivial, but solution unknown to me) hurdle would be to 
"convince" your motherboard not to turn off IGP if it senses that there is 
a GPU plugged in its PCI-Express slot. I would assume that there is a way 
to do that and maybe Intel folks could tell us (or maybe it's already 
known, but I am ignorant on this).



Though it would be restricted to do
all work on one gpu and the igp would just send the data to the display
(which ultimately is not really what we want as compositing etc. should
ideally always happen on the IGP so external graphic chip can be turned
off but I don't even want to think about what needs to happen to make
that work).



In a straightforward implementation, that would be a restriction. However, 
when a CTD driver "asks" VCRTCM module to create a push-buffer for it (a 
buffer in the memory to which GPU will push the pixels from a virtual 
CRTC), what comes back are pages associated with a GEM/TTM bo in GART 
domain. I could imagine an IGP "reusing" that object for some additional 
rendering (as opposed to just linking it to its own CRTC).


These are just half-baked thoughts and I am sure that it would require 
some "specialty" in the userland and that things would get messy before 
they get better, but still not outside of the realm of possiblities.


-- Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Virtual CRTCs (proposal + experimental code)

2011-11-03 Thread Ilija Hadzic




On Thu, 3 Nov 2011, David Airlie wrote:



Well the current plan I had for this was to do it in userspace, I don't think 
the kernel
has any business doing it and I think for the simple USB case its fine but will 
fallover
when you get to the non-trivial cases where some sort of acceleration is 
required to move
pixels around. But in saying that its good you've done what something, and I'll 
try and spend
some time reviewing it.



The reason I opted for doing this in kernel is that I wanted to confine 
all the changes to a relatively small set of modules. At first this was a 
pragmatic approach, because I live out of the mainstream development tree 
and I didn't want to turn my life into an ethernal 
merging/conflict-resolution activity.


However, a more fundamental reason for it is that I didn't want to be tied 
to X. I deal with some userland applications (that unfortunately I can't 
provide much detail of  yet) that live directly on the top of libdrm.


So I set myself a goal of "full application transparency". Whatever is 
thrown at me, I wanted to be able to handle without having to touch any 
piece of application or library that the application relies on.


I think I have achieved this goal and really everything I tried just 
worked out of the box (with an exception of two bug fixes to ATI DDX

and Xorg, that are bugs with or without my work).

-- Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: KMS cursor BO semantics

2011-11-04 Thread Ilija Hadzic




On Fri, 4 Nov 2011, Thomas Hellstrom wrote:


Hi.

I have a question about the semantics of the DRM_IOCTL_MODE_CURSOR iotcl:

Some hardware (vmware's virtual in particular) may not be able to pick up the 
changes from a bo directly, since the cursor data is sent though the command 
stream. Hence we need a notification when the cursor image has changed.


Could we *require* that a cursor image change needs to be followed by an 
ioctl call with the flag

DRM_MODE_CURSOR_BO?

Thanks,
Thomas



FWIW, Acked-by: Ilija Hadzic 
I have a few places where I could use such an ioctl.

BTW, Thomas, in the above "since the cursor data is sent though the 
command stream", did you mean "since the cursor data is *not* sent though 
the command stream". If it was sent, through command stream, then CS ioctl 
would know when the cursor changes.


My understanding is that the cursor data are mmap-ed letting userland poke 
it at will (so the case when an "hourglass" changes into "arrow" is 
particularly hard to know that it happened).


-- Ilija
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Virtual CRTCs (proposal + experimental code)

2011-11-07 Thread Ilija Hadzic




On Mon, 7 Nov 2011, Dave Airlie wrote:



So I expect the way I'd like this to work, is that we have full blown drm
drivers for all the devices and then some sort of linkage layer between them,
so one driver can export crtcs from another, instead of having special case
ctd drivers.


I agree and that is actually a long-term plan from our side. CTD 
functionality should be integral part of existing drivers not the new 
driver, unless there is a new functionality that makes sense as CTD-only 
(like v4l2ctd).


In the world that I imagine, likage layer is VCRTCM. Unaccelerated 
frabebuffer devices (UDL for example, but in general anything that "lives" 
in fbdev world) can choose (based on some policy from userland) whether to 
act as CTD driver and register themselves with VCRTCM (when they want 
acceleration "assistance" from a GPU in the system) or to load as

normal fbdev devices (when they want to run on their own).



Now I can see the reason for the v4l one, but I have for example
a udl kms driver, and I'd like to have it work alongside this stuff,
and userspace
could bind the crtcs from it to another driver. I'm not sure how much work
this would be or if its just a matter of adding a CTD interface to the udl kms
device.



The only reason we wrote a new udlctd driver was  because it was quicker 
that way (we just ripped some code from udlfb driver and added our CTD 
functionality).


The plan was always to merge udlctd and udlfb at some point, but first 
I'd like to see how perceptive is the community to the concept. If the 
concept makes sense, then we'll throw in enough programming to consolidate 
the drivers. Nobody wants three competing drivers for the same device 
(udlfb, your udl kms driver and our udlctd).


Speaking of udl driver, is your udl-v2 branch competing with udlfb? 
Externally, they seem to do similar or the same thing, but one is based on 
DRM (and I guess the required DDX is xf86-video-modesetting), while the 
other is based on fbdev and the required DDX is xf8b-video-fbdev. While 
from my perspective both could be consilidated with a CTD functionality, 
but it makes me wonder in which direction is the community development 
moving: is everything from fbdev-world moving under DRM as a set of 
unaccelerated KMS drivers or is fbdev staying separate for good ?
Depending on what the trend is, one or the other udl driver (udl-v2 or 
udlfb) will make more sense to be merged with udlctd.


-- Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Virtual CRTCs (proposal + experimental code)

2011-11-23 Thread Ilija Hadzic




On Wed, 23 Nov 2011, Dave Airlie wrote:


So another question I have is how you would intend this to work from a
user POV, like how it would integrate with a desktop environment, X or
wayland, i.e. with little or no configuration.



First thing to understand is that when a virtual CRTC is created, it looks 
to the user like the GPU has an additional DisplayPort connector.
At present I "abuse" DisplayPort, but I have seen that you pushed a patch 
from VMware that adds Virtual connector, so eventually I'll switch to that 
naming. The number of virtual CRTCs is determined when the driver loads 
and that is a static configuration parameter. This does not restrict the 
user because unused virutal CRTCs are just like disconnected connectors on 
the GPU. In extreme case, a user could max out the number of virtual CRTCs 
(i.e. 32 minus #-of-physical-CRTCs), but in general the system needs to be 
booted with maximum number of anticipated CRTCs. Run-time addition and 
removal of CRTCs is not supported at this time and that would be much 
harder to implement and would affect the whole DRM module everywhere.


So now we have a system that booted up and DRM sees all of its real 
connectors as well as virtual ones (as DisplayPorts at present). If there 
is no CTD device attached to virtual CRTCs, these virtual connectors are 
disconnected as far as DRM is concerned. Now the userspace must call 
"attach/fps" ioctl to associate CTDs with CRTCs. I'll explain shortly how 
to automate that and how to eliminate the burden from the user, but for 
now, please assume that "attach/fps" gets called from userland somehow.


When the attach happens, that is a hotplug event (VCRTCM generates it) to 
DRM, just like someone plugged in the monitor. Then when XOrg starts, it
will use the DisplayPort that represents a virtual CRTC just like any 
other connector. How it will use it, will depend on what the xorg.conf 
says, but the key point is that this connector is no different from any 
other connector that the GPU provides and is thus used as an "equal 
citizen". No special configuration is necessary once attached to CTD.


If CTD is detached and new CTD attached, that is just like yanking out 
monitor cable and plugging in the new one. DRM will get all hotplug events 
and windowing system will do the same thing it would normally do with any 
other port. If RANDR is called to resize the desktop it will also work and 
X will have no idea that one of the connectors is on a virtual CRTC. I 
also have another feature, where when CTD is attached, it can ask the 
device it drives for the connection status and propagate that all the way 
back to DRM (this is useful for CTD devices that drive real monitors, like 
DisplayLink).


So now let's go back to the attach/fps ioctl. To attach a CTD device this 
ioctl must happen as a result of some policy. That can be done by having 
CTD device generate UDEV events when it loads for which one can write 
policies to determine which CTD device attaches to which virtual CRTC.
Ultimately that becomes an user configuration, but it's no different from 
what one has to do now with UDEV policies to customize the system.


Having explained this, let's take your hotplug example that you put up on 
your web page and redo it with virtual CRTCs. Here is how it would work: 
Boot up the system and tell the GPU to create a few virtual CRTCs. Bring 
up Xorg with no DisplayLink dongles in it. Now plug in the DisplayLink. 
Once the CTD driver loads as the result of the hotplug (right now UDLCTD 
is a separate driver, but as we discussed before, this is a temporary 
state and at some point its CTD function should be merged wither with 
UDLFB or with your UDL-V2) CTD function in the driver generates an UDEV 
event. The policy directs UDEV to call run the program that issues the 
ioctl to perform attach/fps. Attach/fps of the UDLCTD is now a hotplug 
event and DRM "thinks" that a new connector changed the status from 
disconnected to connected. That causes it to query the modes for the new 
connector and because it's the virtual CRTC, it lands in the virtual CRTC 
helpers in the GPU driver. Virtual CRTC helpers route it to VCRTCM, which 
further routes to it to CTD (UDLCTD in this case). CTD returns the modes 
and DRM gets them ... the rest you know better than me what happens ;-)


So this is your hotplug demo, but the difference is that the new desktop 
can use direct rendering. Also, everything that would work for a normal 
connector works here without having to do any additional tricks. RANDR 
also works seamlessly without having to do anything special. If you move 
away from Xorg, to some other system (Wayland?), it still works for as 
long as the new system knows how to deal with connectors that connect and 
disconnect.


Everything I described above is ready to go except the UDEV event from 
UDLCTD and UDEV rules to automate this. Both are straightforwar and won't 
take long to do. So very shortly, I'll be able to

Re: [RFC] Virtual CRTCs (proposal + experimental code)

2011-11-24 Thread Ilija Hadzic




On Thu, 24 Nov 2011, Dave Airlie wrote:


Okay so thats pretty much how I expected it to work, I don't think
Virtual makes sense for a displaylink attached device though,
again if you were using a real driver you would just re-use whatever
output type it uses, though I'm not sure how well that works,


That is the consequence of the fact that virtual CRTCs are created at 
startup time when attached CTD is not known, while CTDs are attached at 
runtime. So when I register the virtual CRTC and the associated connector 
I have to use something for the connector type.


Admitting that my logic is biased by my design, to me "Virtual" connector 
type is an indicative that from GPU's perspective it's a connector that 
does not physically exist and is yet to be attached to some real display 
device. At that point the properties of the attached display become known 
to the system.




Do you propogate full EDID information and all the modes or just the
supported modes? we use this in userspace to put monitor names in
GNOME display settings etc.



Right now we propagate the entire list of modes that the attached CTD 
device has queried from the connected display (monitor). Propagating full 
EDID is really easy to add. That's if the CTD is driver for some real 
display. If CTD is just a "make-believe" display whose purpose is to be 
the conduit to some other pixel-processing component (e.g. V4L2CTD), then 
at some point in the chain we have to make up the set of modes that the 
logical display accepts and in that case the EDID does not exist by 
definition.



what does xrandr output looks like for a radeon GPU with 4 vcrtcs? do
you see 4 disconnected connectors? that again isn't a pretty user
experience.



Yes it shows 4 disconnected monitors. To me that is a logical consequence 
of the design in which virtual CRTCs and associated virtual connectors are 
always there. By know, it's clear to me that you are not too thrilled 
about it, but please allow me to turn the question back to you: in your 
solution with udl-v2 driver and a dedicated DDX for it, can you do the big 
desktop that spans across GPU's local and "foreign" displays and have 
acceleration on both? If not, what would it take you to get there and how 
complex the end result will be?


I'll get to Optimus/PRIME use case later, but if we for the moment focus 
on the use case in which a dumb framebuffer device extens the number of 
displays of a rendering-capable GPU, I think that VCRTCM offers quite a 
complete and universal solution and it is completely transparent with 
regard to the application, window manager, and display server.


Radeon + DisplayLink is the specific example. But in general it's Any GPU 
+ Any fbdev. It's not just one use case, it's a whole class of use cases
that would follow the same principle and for then the VCRTC alone 
suffices.




My main problem with this is as I'll explain below it only covers some
of the use cases, and I don't want a 50% solution at this point, by
doing something like this you are making it harder to get proper
support into something like wayland as they can ignore some of the
problems, however since this doesn't solve all the other problems it
means getting to a finished solution is actually less likely to
happen.



I presume that by 50% solution you are referring to Optimus/PRIME use 
case. That case actually consists of two related, but different problems. 
First is "render on node X and display on node Y" and the second is 
"dynamically and hitlessly switch rendering between node X and Y".


I have never claimed that VCRTCs solve the second problem (I could switch 
by restarting Xorg, but I know that this is not the solution you are 
looking for). I fully understand why you want both problems solved at the 
same time. However, I don't understand why solving one first would inhibit 
solving the other.


On the other hand, the Radeon + DisplayLink tandem use case (or in general 
GPU + fbdev tandem) consists only of "render on X, display on Y" problem. 
Here, you will probably say that there one can switch between hardware and 
software rendering and that it also has both problems. That is true, but 
unlike the Optimus/PRIME use case, using fbdev as a display extension to 
GPU is still useful alone. My point is that there is a value in solving 
first one problem and then follow with the other.


I think the crux of the problem is that you are not convinced that the
VCRTCM solution for problem #1 will make solving problem #2 easier and 
maybe you are afraid that it will make it harder. If that's a fair 
statement and if having me create an existence proof for problem #2 that 
still uses VCRTCM will help bring our positions closer, I am perfectly 
willing to do so  I guess I've just signed up for some hacking ;-)


Note that for hitless GPU switching, I fully agree that support must be in 
userspace (you have to swap out paths in Mesa and DDX before even getting 
to kernel), but like I said, that i

Re: [RFC] Virtual CRTCs (proposal + experimental code)

2011-11-24 Thread Ilija Hadzic





So I think we do have enough people interested in this and should be able
to cobble together something that does The Right Thing.



We indeed have a non-trivial set of people interested in the same set of 
problems and each of us has partial and maybe competing solution. I want 
to make it clear that my, maybe disruptive and different from the 
plan-of-record, proposal should not be viewed as destructive or 
distracting. I am just offering to the community what I think is useful.


If this discussion sparks some joint effort that will bring us to the 
solution that everyone is happy with, even if no line of my code is found 
useful, I am perfectly fine with that (and I'll join the effort).


So at this point I think I should put out my back-of-the-napkin 
desiderata. That will hopefully shed some light on where I am coming from 
with VCRTCM proposal.


I want to be able to pull pixels out of the GPU and redirect them to an 
arbitrary device that can do something useful with them. This should not 
be limited to shooting photons into human eyeballs. I want to be able to 
run my applications without having to run X. I'd like the solution to be 
transparent to the application; that is, if I can write an application 
that can render something to a full screen, I want to redirect that 
"screen" to wherever I want without having to rewrite, recompile or relink 
the application. Actually, I want to do that redirection at runtime. I'd 
like to support all of the above in a way that it can also help solve more 
imminent shotcomings of Linux graphics system (Optimus, DisplayLink, etc. 
... cf. previous E-mails in this thread). I'd like it to work with 
multiple render nodes on the same GPU (something like Dave's multiseat 
work, in which both GPU and its display resources are virtual).


The logical consequence of this is that the render node and the display 
node should at some point become logically separate (different driver 
modules) even if they are physically on the same GPU. They are really two 
different subsystems that just happen to reside on the same circuit 
board, so it makes sense to separate them.


I don't think what I am saying is anything unique and what I said probably 
overlaps in good part with what others also want from the graphics 
subsystem. I can see the role of VCRTCM in all of the above, but I am 
open-minded. If we end up with a solution that has nothing to do with 
VCRTCM, I have no emotional ties with my code (and code of my colleagues 
that worked with me so far).


-- Ilija


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

gma500 on drm-next branch, compile problems

2011-11-27 Thread Ilija Hadzic

Dave & Alan:

Maybe I am goofing up something on my end, but gma500 driver on drm-next branch
won't compile for me. I have to apply the two patches that follow this
note to make it work.

The first is a trivial oversight, but the second makes me wonder whether
a stale driver was merged.

-- Ilija

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

1 2 3 4 5 >

1 - 100 of 445 matches

Mail list logo