Breakage in "track dev_mapping in more robust and flexible way"
On Thu, Oct 25, 2012 at 11:10 AM, Thomas Hellstr?m wrote: > On 10/25/12 4:41 PM, Jerome Glisse wrote: >> >> On Thu, Oct 25, 2012 at 04:02:25PM +0200, Thomas Hellstrom wrote: >>> >>> Hi, >>> >>> This commit >>> >>> From 949c4a34afacfe800fc442afac117aba15284962 Mon Sep 17 00:00:00 2001 >>> From: Ilija Hadzic >>> Date: Tue, 15 May 2012 16:40:10 -0400 >>> Subject: [PATCH] drm: track dev_mapping in more robust and flexible way >>> >>> Setting dev_mapping (pointer to the address_space structure >>> used for memory mappings) to the address_space of the first >>> opener's inode and then failing if other openers come in >>> through a different inode has a few restrictions that are >>> eliminated by this patch. >>> >>> If we already have valid dev_mapping and we spot an opener >>> with different i_node, we force its i_mapping pointer to the >>> already established address_space structure (first opener's >>> inode). This will make all mappings from drm device hang off >>> the same address_space object. >>> ... >>> >>> Breaks drivers using TTM, since when the X server calls into the >>> driver open, drm's dev_mapping has not >>> yet been setup. The setup needs to be moved before the driver's open >>> hook is called. >>> >>> Typically, if a TTM-aware driver is provoked by the Xorg server to >>> move a buffer from system to VRAM or AGP, >>> before any other drm client is started, The user-space page table >>> entries are not killed before the move, and left pointing >>> into freed pages, causing system crashes and / or user-space access >>> to arbitrary memory. >> >> Doesn't handle move invalidate the drm file mapping before scheduling >> the move ? > > Yes, but to do that it needs a correct value of bdev::dev_mapping, which is > now incorrectly set on the > *second* open instead of the first open. > So you are implying that in the first open the assignment of dev->dev_mapping is somehow skipped (which could happen if drm_setup returns an error) or that the driver on which you are having problems with (nouveau I presume) needs dev_mapping in the firstopen hook ? It's been a while since I did it, but if my memory serves me well I thought I explicitly verified that dev_mapping was correctly set in the first open (though GPUs I use are primarily AMD). -- Ilija > /Thomas > > > >> Cheers, >> Jerome > > > ___ > dri-devel mailing list > dri-devel at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
Breakage in "track dev_mapping in more robust and flexible way"
Or could the driver that causes the problem be vmwgfx ? I now looked at the code and I notice that indeed vmwgfx sets its private dev_mapping in the open hook, while all other TTM-based drivers (radeon and nouveau) do it when they create an object. -- Ilija
Breakage in "track dev_mapping in more robust and flexible way"
Can you give the attached patch a whirl and let me know if it fixes the problem? As I indicated in my previous note, vmwgfx should be the only affected driver because it looks at dev_mapping in the open hook (others do it when they create an object, so they are not affected). I'll probably revise it (and I'll have some general questions about drm_open syscall) before officially send the patch, but I wanted to get something quickly to you to check if it fixes your problem. I hope that your vmwgfx test environment is such that you can reproduce the original problem. thanks, -- Ilija -- next part --
[PATCH] drm: set dev_mapping before calling drm_open_helper
Some drivers (specifically vmwgfx) look at dev_mapping in their open hook, so we have to set dev->dev_mapping earlier in the process. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_fops.c | 43 +-- 1 files changed, 29 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index 7ef1b67..50b7b47 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -121,6 +121,8 @@ int drm_open(struct inode *inode, struct file *filp) int minor_id = iminor(inode); struct drm_minor *minor; int retcode = 0; + int need_setup = 0; + struct address_space *old_mapping; minor = idr_find(&drm_minors_idr, minor_id); if (!minor) @@ -132,23 +134,36 @@ int drm_open(struct inode *inode, struct file *filp) if (drm_device_is_unplugged(dev)) return -ENODEV; + if (!dev->open_count++) + need_setup = 1; + mutex_lock(&dev->struct_mutex); + old_mapping = dev->dev_mapping; + if (old_mapping == NULL) + dev->dev_mapping = &inode->i_data; + mutex_unlock(&dev->struct_mutex); + retcode = drm_open_helper(inode, filp, dev); - if (!retcode) { - atomic_inc(&dev->counts[_DRM_STAT_OPENS]); - if (!dev->open_count++) - retcode = drm_setup(dev); - } - if (!retcode) { - mutex_lock(&dev->struct_mutex); - if (dev->dev_mapping == NULL) - dev->dev_mapping = &inode->i_data; - /* ihold ensures nobody can remove inode with our i_data */ - ihold(container_of(dev->dev_mapping, struct inode, i_data)); - inode->i_mapping = dev->dev_mapping; - filp->f_mapping = dev->dev_mapping; - mutex_unlock(&dev->struct_mutex); + if (retcode) + goto err_undo; + atomic_inc(&dev->counts[_DRM_STAT_OPENS]); + if (need_setup) { + retcode = drm_setup(dev); + if (retcode) + goto err_undo; } + /* ihold ensures nobody can remove inode with our i_data */ + mutex_lock(&dev->struct_mutex); + ihold(container_of(dev->dev_mapping, struct inode, i_data)); + inode->i_mapping = dev->dev_mapping; + filp->f_mapping = dev->dev_mapping; + mutex_unlock(&dev->struct_mutex); + return 0; +err_undo: + dev->open_count--; + mutex_lock(&dev->struct_mutex); + dev->dev_mapping = old_mapping; + mutex_unlock(&dev->struct_mutex); return retcode; } EXPORT_SYMBOL(drm_open); -- 1.7.4.1
Breakage in "track dev_mapping in more robust and flexible way"
On Fri, 26 Oct 2012, Thomas Hellstrom wrote: > Hi, > > On 10/25/2012 11:27 PM, Ilija Hadzic wrote: >> >> Can you give the attached patch a whirl and let me know if it fixes the >> problem? >> >> As I indicated in my previous note, vmwgfx should be the only affected >> driver because it looks at dev_mapping in the open hook (others do it when >> they create an object, so they are not affected). >> >> I'll probably revise it (and I'll have some general questions about >> drm_open syscall) before officially send the patch, but I wanted to get >> something quickly to you to check if it fixes your problem. I hope that >> your vmwgfx test environment is such that you can reproduce the original >> problem. >> >> thanks, >> >> -- Ilija > > Yes, it appears like this patch fixes the problem. It'd be good to have it in > 3.7 (drm-fixes) with a cc to stable. > OK great. Thanks for testing. Before I send out an "official" patch, I have a few questions for those who have been around longer and can possibly reflect better than me on the history of drm_open syscall. Currently, before touching dev->dev_mapping field we grab dev->struct mutex. This has been introduced by Dave Airlie a long time ago in a2c0a97b784f837300f7b0869c82ab712c600952. I tried to preserve that in all patches where I touched dev_open, but looking at the code I don't think the mutex is necessary. Namely, drm_open is only set in drm_open, and concurrent openers are protected with drm_global_mutex. Other places (drivers) where dev->dev_mapping is accessed is read-only and dev_mapping is written at first open when there are no file descriptors around to issue any other call. Also, it doesn't look to me that any driver locks dev->struct_mutex before accessing dev->dev_mapping anyway. So I am thinking of dropping the mutex completely, but I would like to hear a second thought. The other issue, I noticed is that of the drm_setup() call fails, the open_count counter would remain incremented and I think we need to restore it back (or if I am missing something, would someone please enlighten me). This was also in the kernel all this time (and I have not noticed until now), so I "smuggled" that fix in the patch that I sent you. However, wonder if I should cut the separate patch for open_count fix. Actually, I think that I should cut three patches: one to drop the mutex, one to fix the open_count and one to fix your problem with dev_mapping and that probably all three should CC stable. Before I do that, I'd like to hear opinions of others. thanks, Ilija
[PATCH 1/2] drm: restore open_count if drm_setup fails
If drm_setup (called at first open) fails, the whole open call has failed, so we should not keep the open_count incremented. Signed-off-by: Ilija Hadzic Cc: stable at vger.kernel.org --- drivers/gpu/drm/drm_fops.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index 7ef1b67..af68eca 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -135,8 +135,11 @@ int drm_open(struct inode *inode, struct file *filp) retcode = drm_open_helper(inode, filp, dev); if (!retcode) { atomic_inc(&dev->counts[_DRM_STAT_OPENS]); - if (!dev->open_count++) + if (!dev->open_count++) { retcode = drm_setup(dev); + if (retcode) + dev->open_count--; + } } if (!retcode) { mutex_lock(&dev->struct_mutex); -- 1.7.12.4
[PATCH 2/2] drm: set dev_mapping before calling drm_open_helper
Some drivers (specifically vmwgfx) look at dev_mapping in their open hook, so we have to set dev->dev_mapping earlier in the process. Reference: http://lists.freedesktop.org/archives/dri-devel/2012-October/029420.html Signed-off-by: Ilija Hadzic Reported-by: Thomas Hellstrom Cc: stable at vger.kernel.org --- drivers/gpu/drm/drm_fops.c | 47 +- 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index af68eca..133b413 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -121,6 +121,8 @@ int drm_open(struct inode *inode, struct file *filp) int minor_id = iminor(inode); struct drm_minor *minor; int retcode = 0; + int need_setup = 0; + struct address_space *old_mapping; minor = idr_find(&drm_minors_idr, minor_id); if (!minor) @@ -132,26 +134,37 @@ int drm_open(struct inode *inode, struct file *filp) if (drm_device_is_unplugged(dev)) return -ENODEV; + if (!dev->open_count++) + need_setup = 1; + mutex_lock(&dev->struct_mutex); + old_mapping = dev->dev_mapping; + if (old_mapping == NULL) + dev->dev_mapping = &inode->i_data; + /* ihold ensures nobody can remove inode with our i_data */ + ihold(container_of(dev->dev_mapping, struct inode, i_data)); + inode->i_mapping = dev->dev_mapping; + filp->f_mapping = dev->dev_mapping; + mutex_unlock(&dev->struct_mutex); + retcode = drm_open_helper(inode, filp, dev); - if (!retcode) { - atomic_inc(&dev->counts[_DRM_STAT_OPENS]); - if (!dev->open_count++) { - retcode = drm_setup(dev); - if (retcode) - dev->open_count--; - } - } - if (!retcode) { - mutex_lock(&dev->struct_mutex); - if (dev->dev_mapping == NULL) - dev->dev_mapping = &inode->i_data; - /* ihold ensures nobody can remove inode with our i_data */ - ihold(container_of(dev->dev_mapping, struct inode, i_data)); - inode->i_mapping = dev->dev_mapping; - filp->f_mapping = dev->dev_mapping; - mutex_unlock(&dev->struct_mutex); + if (retcode) + goto err_undo; + atomic_inc(&dev->counts[_DRM_STAT_OPENS]); + if (need_setup) { + retcode = drm_setup(dev); + if (retcode) + goto err_undo; } + return 0; +err_undo: + mutex_lock(&dev->struct_mutex); + filp->f_mapping = old_mapping; + inode->i_mapping = old_mapping; + iput(container_of(dev->dev_mapping, struct inode, i_data)); + dev->dev_mapping = old_mapping; + mutex_unlock(&dev->struct_mutex); + dev->open_count--; return retcode; } EXPORT_SYMBOL(drm_open); -- 1.7.12.4
glxgears frame rate when DPMS is in "off" state
Hi everyone, With relatively recent versions of AMD/ATI DDX (xf86-video-ati library), I have noticed a behavior related to DPMS that looks incorrect to me. Namely, if I run glxgears, the reported frame rate is equal to that of the monitor refresh rate, which is correct. Now if I enter DPMS "off" state, wait a while, and then exit it (back to DPMS "on"), I see that while in "off" mode the frame rate was up in the thousands. Consequently, the CPU utilization went up to 100% (split about 50%/50% between X and and glxgears process). I have traced the problem to DDX and here are some findings. Now, thinking about how to fix it (elaborated later), I realize that I am not sure what would be the conceptually correct thing to do to fix this. Here is how the problem happens: * Screen is put into DPMS "off" mode. * The application requests the buffer-swap and X eventually ends up in radeon_dri2_schedule_swap. * radeon_dri2_schedule_swap tries to determine the CRTC by calling radeon_dri2_drawable_crtc which further leads into radeon_pick_best_crtc * In radeon_pick_best_crtc, no CRTC is found because CRTCs are either unused by the affected drawable or the only right candidate CRTC is skipped by this check (radeon_crtc_is_enabled looks explicitly at DPMS state): if (!radeon_crtc_is_enabled(crtc)) continue; * Consequently, radeon_pick_best_crtc returns -1 to radeon_dri2_schedule_swap, which decides that it can't do the vblank wait and jumps to blit_fallback label. * blit_fallback does its thing, achieving the effect of swap, but now there is no pacing. It returns immediatelly and application proceeds with rendering the next frame without any pause. * As a consequence, we get a glxgears and X to run at maximum speed allowed by the CPU and GPU combined. Now, the reason DPMS exists is to conserve power, but it doesn't make much sense to conserve power through monitor shutoff if we will eat up much more power by thrashing the processor and the GPU. One quick fix that came into my mind is to replace the 'if' in radeon_pick_best_crtc with something like this: if (!crtc->enabled) continue; (whether by design or by accident, crtc in DPMS "off" state is still enabled as far as that flag is concerned). However, that will introduce the regression with regard to this bug: https://bugs.freedesktop.org/show_bug.cgi?id=49761 (which is the reason the above check was originally added). Another possibility would be to enforce some maximum rate per-drawable (using sleep for example) when radeon_dri2_schedule_swap decides to take the blit_fallback path. However, I don't personally like it and I have a gut feeling that sleeping in shedule_swap would probably do harm somewhere else. Also, there may be applications that would want to render an animated scene off-screen at maximum speed (e.g., off-line rendering) and radeon_dri2_schedule_swap has no way of telling whether crtc is -1 because the application wants it that way or because the associated crtc is in power-savings mode. Clearly, the behavior that we have now is wrong from the power-savings perspective (i.e., it completely defeats the purpose of DPMS), but before I try to hack up the fix, I would like to hear some suggestions on what the "right" thing to do would be. thanks, Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
radeon CS parser refactoring
The following set of patches refactor the CS-parser logic in an effort to consolidate the code that is repeated across ASIC-specific files. All patches except #8 are function-neutral, that is they preserve the existing functionality of the driver. Patch #8 adds one extra check for WAIT_REG_MEM packet that I believe should be there. I have been running with these patches for about a month on several machines with Evergreen and NI GPUs without noticing any regressions. I also ran quick tests on all Radeon GPUs that I had around, which range from R300 through NI cards. The patches have been rebased to the current state of Dave's drm-next branch. regards, Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 01/12] drm/radeon: remove unecessary assignment
length_dw field was assigned twice. While at it, move user_ptr assignment together with all other assignments to p->chunks[i] structure to make the code more readable. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon_cs.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 396baba0..4be64e0 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -203,7 +203,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data) p->chunks[i].length_dw = user_chunk.length_dw; p->chunks[i].kdata = NULL; p->chunks[i].chunk_id = user_chunk.chunk_id; - + p->chunks[i].user_ptr = (void __user *)(unsigned long)user_chunk.chunk_data; if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) { p->chunk_relocs_idx = i; } @@ -226,9 +226,6 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data) return -EINVAL; } - p->chunks[i].length_dw = user_chunk.length_dw; - p->chunks[i].user_ptr = (void __user *)(unsigned long)user_chunk.chunk_data; - cdata = (uint32_t *)(unsigned long)user_chunk.chunk_data; if ((p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) || (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS)) { -- 1.7.12 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 02/12] drm/radeon: remove unused prototype from radeon_cs
Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon_cs.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 4be64e0..8b03f1c 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -29,9 +29,6 @@ #include "radeon_reg.h" #include "radeon.h" -void r100_cs_dump_packet(struct radeon_cs_parser *p, -struct radeon_cs_packet *pkt); - static int radeon_cs_parser_relocs(struct radeon_cs_parser *p) { struct drm_device *ddev = p->rdev->ddev; -- 1.7.12 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 03/12] drm/radeon: fix formatting
Preparatory patch: patches to follow will touch a piece of code that had broken indentication, so fix it before touching it. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/r100.c | 45 +-- 1 file changed, 22 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index 8ff7cac..40c0318 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -2106,31 +2106,30 @@ int r100_cs_parse(struct radeon_cs_parser *p) } p->idx += pkt.count + 2; switch (pkt.type) { - case PACKET_TYPE0: - if (p->rdev->family >= CHIP_R200) - r = r100_cs_parse_packet0(p, &pkt, - p->rdev->config.r100.reg_safe_bm, - p->rdev->config.r100.reg_safe_bm_size, - &r200_packet0_check); - else - r = r100_cs_parse_packet0(p, &pkt, - p->rdev->config.r100.reg_safe_bm, - p->rdev->config.r100.reg_safe_bm_size, - &r100_packet0_check); - break; - case PACKET_TYPE2: - break; - case PACKET_TYPE3: - r = r100_packet3_check(p, &pkt); - break; - default: - DRM_ERROR("Unknown packet type %d !\n", - pkt.type); - return -EINVAL; + case PACKET_TYPE0: + if (p->rdev->family >= CHIP_R200) + r = r100_cs_parse_packet0(p, &pkt, + p->rdev->config.r100.reg_safe_bm, + p->rdev->config.r100.reg_safe_bm_size, + &r200_packet0_check); + else + r = r100_cs_parse_packet0(p, &pkt, + p->rdev->config.r100.reg_safe_bm, + p->rdev->config.r100.reg_safe_bm_size, + &r100_packet0_check); + break; + case PACKET_TYPE2: + break; + case PACKET_TYPE3: + r = r100_packet3_check(p, &pkt); + break; + default: + DRM_ERROR("Unknown packet type %d !\n", + pkt.type); + return -EINVAL; } - if (r) { + if (r) return r; - } } while (p->idx < p->chunks[p->chunk_ib_idx].length_dw); return 0; } -- 1.7.12 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 04/12] drm/radeon: implement common cs packet parse function
CS packet parse functions have a lot of in common across all ASICs. Implement a common function and take care of small differences between families inside the function. This patch is a prep for major refactoring and consolidation of CS parsing code. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon_cs.c | 53 + drivers/gpu/drm/radeon/radeon_reg.h | 11 2 files changed, 64 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 8b03f1c..95eb673 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -639,3 +639,56 @@ u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx) idx_value = ibc->kpage[new_page][pg_offset/4]; return idx_value; } + +/** + * radeon_cs_packet_parse() - parse cp packet and point ib index to next packet + * @parser:parser structure holding parsing context. + * @pkt: where to store packet information + * + * Assume that chunk_ib_index is properly set. Will return -EINVAL + * if packet is bigger than remaining ib size. or if packets is unknown. + **/ +int radeon_cs_packet_parse(struct radeon_cs_parser *p, + struct radeon_cs_packet *pkt, + unsigned idx) +{ + struct radeon_cs_chunk *ib_chunk = &p->chunks[p->chunk_ib_idx]; + struct radeon_device *rdev = p->rdev; + uint32_t header; + + if (idx >= ib_chunk->length_dw) { + DRM_ERROR("Can not parse packet at %d after CS end %d !\n", + idx, ib_chunk->length_dw); + return -EINVAL; + } + header = radeon_get_ib_value(p, idx); + pkt->idx = idx; + pkt->type = RADEON_CP_PACKET_GET_TYPE(header); + pkt->count = RADEON_CP_PACKET_GET_COUNT(header); + pkt->one_reg_wr = 0; + switch (pkt->type) { + case RADEON_PACKET_TYPE0: + if (rdev->family < CHIP_R600) { + pkt->reg = R100_CP_PACKET0_GET_REG(header); + pkt->one_reg_wr = + RADEON_CP_PACKET0_GET_ONE_REG_WR(header); + } else + pkt->reg = R600_CP_PACKET0_GET_REG(header); + break; + case RADEON_PACKET_TYPE3: + pkt->opcode = RADEON_CP_PACKET3_GET_OPCODE(header); + break; + case RADEON_PACKET_TYPE2: + pkt->count = -1; + break; + default: + DRM_ERROR("Unknown packet type %d at %d !\n", pkt->type, idx); + return -EINVAL; + } + if ((pkt->count + 1 + pkt->idx) >= ib_chunk->length_dw) { + DRM_ERROR("Packet (%d:%d:%d) end after CS buffer (%d) !\n", + pkt->idx, pkt->type, pkt->count, ib_chunk->length_dw); + return -EINVAL; + } + return 0; +} diff --git a/drivers/gpu/drm/radeon/radeon_reg.h b/drivers/gpu/drm/radeon/radeon_reg.h index 5d8f735..d9e4204 100644 --- a/drivers/gpu/drm/radeon/radeon_reg.h +++ b/drivers/gpu/drm/radeon/radeon_reg.h @@ -3706,4 +3706,15 @@ #define RV530_GB_PIPE_SELECT2 0x4124 +#define RADEON_CP_PACKET_GET_TYPE(h) (((h) >> 30) & 3) +#define RADEON_CP_PACKET_GET_COUNT(h) (((h) >> 16) & 0x3FFF) +#define RADEON_CP_PACKET0_GET_ONE_REG_WR(h) (((h) >> 15) & 1) +#define RADEON_CP_PACKET3_GET_OPCODE(h) (((h) >> 8) & 0xFF) +#define R100_CP_PACKET0_GET_REG(h) (((h) & 0x1FFF) << 2) +#define R600_CP_PACKET0_GET_REG(h) (((h) & 0x) << 2) +#define RADEON_PACKET_TYPE0 0 +#define RADEON_PACKET_TYPE1 1 +#define RADEON_PACKET_TYPE2 2 +#define RADEON_PACKET_TYPE3 3 + #endif -- 1.7.12 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 05/12] drm/radeon: use common cs packet parse function
We now have a common radeon_cs_packet_parse function that is good for all ASICs. Hook it up and eliminate ASIC-specific versions. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/evergreen_cs.c | 57 +++-- drivers/gpu/drm/radeon/r100.c | 55 +++- drivers/gpu/drm/radeon/r300.c | 2 +- drivers/gpu/drm/radeon/r600_cs.c | 59 --- drivers/gpu/drm/radeon/radeon.h | 4 +++ 5 files changed, 20 insertions(+), 157 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 7a44566..1ba4ca3 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -1009,53 +1009,6 @@ static int evergreen_cs_track_check(struct radeon_cs_parser *p) } /** - * evergreen_cs_packet_parse() - parse cp packet and point ib index to next packet - * @parser:parser structure holding parsing context. - * @pkt: where to store packet informations - * - * Assume that chunk_ib_index is properly set. Will return -EINVAL - * if packet is bigger than remaining ib size. or if packets is unknown. - **/ -static int evergreen_cs_packet_parse(struct radeon_cs_parser *p, - struct radeon_cs_packet *pkt, - unsigned idx) -{ - struct radeon_cs_chunk *ib_chunk = &p->chunks[p->chunk_ib_idx]; - uint32_t header; - - if (idx >= ib_chunk->length_dw) { - DRM_ERROR("Can not parse packet at %d after CS end %d !\n", - idx, ib_chunk->length_dw); - return -EINVAL; - } - header = radeon_get_ib_value(p, idx); - pkt->idx = idx; - pkt->type = CP_PACKET_GET_TYPE(header); - pkt->count = CP_PACKET_GET_COUNT(header); - pkt->one_reg_wr = 0; - switch (pkt->type) { - case PACKET_TYPE0: - pkt->reg = CP_PACKET0_GET_REG(header); - break; - case PACKET_TYPE3: - pkt->opcode = CP_PACKET3_GET_OPCODE(header); - break; - case PACKET_TYPE2: - pkt->count = -1; - break; - default: - DRM_ERROR("Unknown packet type %d at %d !\n", pkt->type, idx); - return -EINVAL; - } - if ((pkt->count + 1 + pkt->idx) >= ib_chunk->length_dw) { - DRM_ERROR("Packet (%d:%d:%d) end after CS buffer (%d) !\n", - pkt->idx, pkt->type, pkt->count, ib_chunk->length_dw); - return -EINVAL; - } - return 0; -} - -/** * evergreen_cs_packet_next_reloc() - parse next packet which should be reloc packet3 * @parser:parser structure holding parsing context. * @data: pointer to relocation data @@ -1080,7 +1033,7 @@ static int evergreen_cs_packet_next_reloc(struct radeon_cs_parser *p, } *cs_reloc = NULL; relocs_chunk = &p->chunks[p->chunk_relocs_idx]; - r = evergreen_cs_packet_parse(p, &p3reloc, p->idx); + r = radeon_cs_packet_parse(p, &p3reloc, p->idx); if (r) { return r; } @@ -1112,7 +1065,7 @@ static bool evergreen_cs_packet_next_is_pkt3_nop(struct radeon_cs_parser *p) struct radeon_cs_packet p3reloc; int r; - r = evergreen_cs_packet_parse(p, &p3reloc, p->idx); + r = radeon_cs_packet_parse(p, &p3reloc, p->idx); if (r) { return false; } @@ -1150,7 +1103,7 @@ static int evergreen_cs_packet_parse_vline(struct radeon_cs_parser *p) ib = p->ib.ptr; /* parse the WAIT_REG_MEM */ - r = evergreen_cs_packet_parse(p, &wait_reg_mem, p->idx); + r = radeon_cs_packet_parse(p, &wait_reg_mem, p->idx); if (r) return r; @@ -1183,7 +1136,7 @@ static int evergreen_cs_packet_parse_vline(struct radeon_cs_parser *p) } /* jump over the NOP */ - r = evergreen_cs_packet_parse(p, &p3reloc, p->idx + wait_reg_mem.count + 2); + r = radeon_cs_packet_parse(p, &p3reloc, p->idx + wait_reg_mem.count + 2); if (r) return r; @@ -2819,7 +2772,7 @@ int evergreen_cs_parse(struct radeon_cs_parser *p) p->track = track; } do { - r = evergreen_cs_packet_parse(p, &pkt, p->idx); + r = radeon_cs_packet_parse(p, &pkt, p->idx); if (r) { kfree(p->track); p->track = NULL; diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index 40c0318..7842447 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -1370,53 +1370,6 @@ void r100_cs_dump_packet(struct
[PATCH 06/12] drm/radeon: factor out cs_next_is_pkt3_nop function
Once we factored out radeon_cs_packet_parse function, evergreen_cs_next_is_pkt3_nop and r600_cs_next_is_pkt3_nop functions became identical, so they can be factored out into a common function. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/evergreen_cs.c | 23 +-- drivers/gpu/drm/radeon/r600_cs.c | 30 -- drivers/gpu/drm/radeon/radeon.h | 2 ++ drivers/gpu/drm/radeon/radeon_cs.c| 21 + drivers/gpu/drm/radeon/radeon_reg.h | 2 ++ 5 files changed, 30 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 1ba4ca3..883b9f7 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -1055,27 +1055,6 @@ static int evergreen_cs_packet_next_reloc(struct radeon_cs_parser *p, } /** - * evergreen_cs_packet_next_is_pkt3_nop() - test if the next packet is NOP - * @p: structure holding the parser context. - * - * Check if the next packet is a relocation packet3. - **/ -static bool evergreen_cs_packet_next_is_pkt3_nop(struct radeon_cs_parser *p) -{ - struct radeon_cs_packet p3reloc; - int r; - - r = radeon_cs_packet_parse(p, &p3reloc, p->idx); - if (r) { - return false; - } - if (p3reloc.type != PACKET_TYPE3 || p3reloc.opcode != PACKET3_NOP) { - return false; - } - return true; -} - -/** * evergreen_cs_packet_next_vline() - parse userspace VLINE packet * @parser:parser structure holding parsing context. * @@ -2464,7 +2443,7 @@ static int evergreen_packet3_check(struct radeon_cs_parser *p, if ((tex_dim == SQ_TEX_DIM_2D_MSAA || tex_dim == SQ_TEX_DIM_2D_ARRAY_MSAA) && !mip_address && - !evergreen_cs_packet_next_is_pkt3_nop(p)) { + !radeon_cs_packet_next_is_pkt3_nop(p)) { /* MIP_ADDRESS should point to FMASK for an MSAA texture. * It should be 0 if FMASK is disabled. */ moffset = 0; diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c index 1717fec..e2fb76a 100644 --- a/drivers/gpu/drm/radeon/r600_cs.c +++ b/drivers/gpu/drm/radeon/r600_cs.c @@ -877,28 +877,6 @@ static int r600_cs_packet_next_reloc_nomm(struct radeon_cs_parser *p, } /** - * r600_cs_packet_next_is_pkt3_nop() - test if next packet is packet3 nop for reloc - * @parser:parser structure holding parsing context. - * - * Check next packet is relocation packet3, do bo validation and compute - * GPU offset using the provided start. - **/ -static int r600_cs_packet_next_is_pkt3_nop(struct radeon_cs_parser *p) -{ - struct radeon_cs_packet p3reloc; - int r; - - r = radeon_cs_packet_parse(p, &p3reloc, p->idx); - if (r) { - return 0; - } - if (p3reloc.type != PACKET_TYPE3 || p3reloc.opcode != PACKET3_NOP) { - return 0; - } - return 1; -} - -/** * r600_cs_packet_next_vline() - parse userspace VLINE packet * @parser:parser structure holding parsing context. * @@ -1108,7 +1086,7 @@ static int r600_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) break; case R_028010_DB_DEPTH_INFO: if (!(p->cs_flags & RADEON_CS_KEEP_TILING_FLAGS) && - r600_cs_packet_next_is_pkt3_nop(p)) { + radeon_cs_packet_next_is_pkt3_nop(p)) { r = r600_cs_packet_next_reloc(p, &reloc); if (r) { dev_warn(p->dev, "bad SET_CONTEXT_REG " @@ -1209,7 +1187,7 @@ static int r600_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) case R_0280B8_CB_COLOR6_INFO: case R_0280BC_CB_COLOR7_INFO: if (!(p->cs_flags & RADEON_CS_KEEP_TILING_FLAGS) && -r600_cs_packet_next_is_pkt3_nop(p)) { +radeon_cs_packet_next_is_pkt3_nop(p)) { r = r600_cs_packet_next_reloc(p, &reloc); if (r) { dev_err(p->dev, "bad SET_CONTEXT_REG 0x%04X\n", reg); @@ -1273,7 +1251,7 @@ static int r600_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) case R_0280F8_CB_COLOR6_FRAG: case R_0280FC_CB_COLOR7_FRAG: tmp = (reg - R_0280E0_CB_COLOR0_FRAG) / 4; - if (!r600_cs_packet_next_is_pkt3_nop(p)) { + if (!radeon_cs_packet_next_is_pkt3_nop(p)) { if (!track->cb_color_base_last[tmp]) { dev_err(
[PATCH 07/12] drm/radeon: refactor vline packet parsing function
vline packet parsing function for R600 and Evergreen+ are the same, except that they use different registers. Factor out the algorithm into a common function that uses register table passed from ASIC-specific caller. This reduces ASIC-specific function to (trivial) setup of register table and call into the common function. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/evergreen_cs.c | 120 +++--- drivers/gpu/drm/radeon/r600_cs.c | 61 +++-- drivers/gpu/drm/radeon/radeon.h | 4 +- drivers/gpu/drm/radeon/radeon_reg.h | 2 + 4 files changed, 70 insertions(+), 117 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 883b9f7..2690532 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -1055,109 +1055,35 @@ static int evergreen_cs_packet_next_reloc(struct radeon_cs_parser *p, } /** - * evergreen_cs_packet_next_vline() - parse userspace VLINE packet + * evergreen_cs_packet_parse_vline() - parse userspace VLINE packet * @parser:parser structure holding parsing context. * - * Userspace sends a special sequence for VLINE waits. - * PACKET0 - VLINE_START_END + value - * PACKET3 - WAIT_REG_MEM poll vline status reg - * RELOC (P3) - crtc_id in reloc. - * - * This function parses this and relocates the VLINE START END - * and WAIT_REG_MEM packets to the correct crtc. - * It also detects a switched off crtc and nulls out the - * wait in that case. + * This is an Evergreen(+)-specific function for parsing VLINE packets. + * Real work is done by r600_cs_common_vline_parse function. + * Here we just set up ASIC-specific register table and call + * the common implementation function. */ static int evergreen_cs_packet_parse_vline(struct radeon_cs_parser *p) { - struct drm_mode_object *obj; - struct drm_crtc *crtc; - struct radeon_crtc *radeon_crtc; - struct radeon_cs_packet p3reloc, wait_reg_mem; - int crtc_id; - int r; - uint32_t header, h_idx, reg, wait_reg_mem_info; - volatile uint32_t *ib; - - ib = p->ib.ptr; - - /* parse the WAIT_REG_MEM */ - r = radeon_cs_packet_parse(p, &wait_reg_mem, p->idx); - if (r) - return r; - - /* check its a WAIT_REG_MEM */ - if (wait_reg_mem.type != PACKET_TYPE3 || - wait_reg_mem.opcode != PACKET3_WAIT_REG_MEM) { - DRM_ERROR("vline wait missing WAIT_REG_MEM segment\n"); - return -EINVAL; - } - - wait_reg_mem_info = radeon_get_ib_value(p, wait_reg_mem.idx + 1); - /* bit 4 is reg (0) or mem (1) */ - if (wait_reg_mem_info & 0x10) { - DRM_ERROR("vline WAIT_REG_MEM waiting on MEM rather than REG\n"); - return -EINVAL; - } - /* waiting for value to be equal */ - if ((wait_reg_mem_info & 0x7) != 0x3) { - DRM_ERROR("vline WAIT_REG_MEM function not equal\n"); - return -EINVAL; - } - if ((radeon_get_ib_value(p, wait_reg_mem.idx + 2) << 2) != EVERGREEN_VLINE_STATUS) { - DRM_ERROR("vline WAIT_REG_MEM bad reg\n"); - return -EINVAL; - } - if (radeon_get_ib_value(p, wait_reg_mem.idx + 5) != EVERGREEN_VLINE_STAT) { - DRM_ERROR("vline WAIT_REG_MEM bad bit mask\n"); - return -EINVAL; - } - - /* jump over the NOP */ - r = radeon_cs_packet_parse(p, &p3reloc, p->idx + wait_reg_mem.count + 2); - if (r) - return r; - - h_idx = p->idx - 2; - p->idx += wait_reg_mem.count + 2; - p->idx += p3reloc.count + 2; - - header = radeon_get_ib_value(p, h_idx); - crtc_id = radeon_get_ib_value(p, h_idx + 2 + 7 + 1); - reg = CP_PACKET0_GET_REG(header); - obj = drm_mode_object_find(p->rdev->ddev, crtc_id, DRM_MODE_OBJECT_CRTC); - if (!obj) { - DRM_ERROR("cannot find crtc %d\n", crtc_id); - return -EINVAL; - } - crtc = obj_to_crtc(obj); - radeon_crtc = to_radeon_crtc(crtc); - crtc_id = radeon_crtc->crtc_id; - - if (!crtc->enabled) { - /* if the CRTC isn't enabled - we need to nop out the WAIT_REG_MEM */ - ib[h_idx + 2] = PACKET2(0); - ib[h_idx + 3] = PACKET2(0); - ib[h_idx + 4] = PACKET2(0); - ib[h_idx + 5] = PACKET2(0); - ib[h_idx + 6] = PACKET2(0); - ib[h_idx + 7] = PACKET2(0); - ib[h_idx + 8] = PACKET2(0); - } else { - switch (reg) { - case EVERGREEN_VLINE_START_END: - header &= ~R600_CP_PACKET0_REG_MASK; - header |= (EVERGREEN_VLINE_START_END + radeon_crtc->cr
[PATCH 08/12] drm/radeon: add a check to wait_reg_mem command
WAIT_REG_MEM on register does not allow the use of PFP. Enforce this restriction when checking packets sent from userland. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/evergreen_cs.c | 3 +++ drivers/gpu/drm/radeon/r600_cs.c | 8 2 files changed, 11 insertions(+) diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 2690532..02aeb7f 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -2101,6 +2101,9 @@ static int evergreen_packet3_check(struct radeon_cs_parser *p, ib[idx+1] = (ib[idx+1] & 0x3) | (offset & 0xfffc); ib[idx+2] = upper_32_bits(offset) & 0xff; + } else if (idx_value & 0x100) { + DRM_ERROR("cannot use PFP on REG wait\n"); + return -EINVAL; } break; case PACKET3_CP_DMA: diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c index 5d59275..d64432b 100644 --- a/drivers/gpu/drm/radeon/r600_cs.c +++ b/drivers/gpu/drm/radeon/r600_cs.c @@ -949,6 +949,11 @@ int r600_cs_common_vline_parse(struct radeon_cs_parser *p, DRM_ERROR("vline WAIT_REG_MEM waiting on MEM instead of REG\n"); return -EINVAL; } + /* bit 8 is me (0) or pfp (1) */ + if (wait_reg_mem_info & 0x100) { + DRM_ERROR("vline WAIT_REG_MEM waiting on PFP instead of ME\n"); + return -EINVAL; + } /* waiting for value to be equal */ if ((wait_reg_mem_info & 0x7) != 0x3) { DRM_ERROR("vline WAIT_REG_MEM function not equal\n"); @@ -1847,6 +1852,9 @@ static int r600_packet3_check(struct radeon_cs_parser *p, ib[idx+1] = (ib[idx+1] & 0x3) | (offset & 0xfff0); ib[idx+2] = upper_32_bits(offset) & 0xff; + } else if (idx_value & 0x100) { + DRM_ERROR("cannot use PFP on REG wait\n"); + return -EINVAL; } break; case PACKET3_CP_DMA: -- 1.7.12 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 09/12] drm/radeon: rename r100_cs_dump_packet to radeon_cs_dump_packet
This function is not limited to r100, but it can dump a (raw) packet for any ASIC. Rename it accordingly and move its declaration to radeon.h Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/r100.c | 52 ++--- drivers/gpu/drm/radeon/r100_track.h | 2 -- drivers/gpu/drm/radeon/r200.c | 14 +- drivers/gpu/drm/radeon/r300.c | 18 ++--- drivers/gpu/drm/radeon/radeon.h | 2 ++ drivers/gpu/drm/radeon/radeon_cs.c | 21 +++ 6 files changed, 58 insertions(+), 51 deletions(-) diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index 7842447..cf7e359 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -1219,7 +1219,7 @@ int r100_reloc_pitch_offset(struct radeon_cs_parser *p, if (r) { DRM_ERROR("No reloc for ib[%d]=0x%04X\n", idx, reg); - r100_cs_dump_packet(p, pkt); + radeon_cs_dump_packet(p, pkt); return r; } @@ -1233,7 +1233,7 @@ int r100_reloc_pitch_offset(struct radeon_cs_parser *p, if (reloc->lobj.tiling_flags & RADEON_TILING_MICRO) { if (reg == RADEON_SRC_PITCH_OFFSET) { DRM_ERROR("Cannot src blit from microtiled surface\n"); - r100_cs_dump_packet(p, pkt); + radeon_cs_dump_packet(p, pkt); return -EINVAL; } tile_flags |= RADEON_DST_TILE_MICRO; @@ -1263,7 +1263,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p, if (c > 16) { DRM_ERROR("Only 16 vertex buffers are allowed %d\n", pkt->opcode); - r100_cs_dump_packet(p, pkt); + radeon_cs_dump_packet(p, pkt); return -EINVAL; } track->num_arrays = c; @@ -1272,7 +1272,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p, if (r) { DRM_ERROR("No reloc for packet3 %d\n", pkt->opcode); - r100_cs_dump_packet(p, pkt); + radeon_cs_dump_packet(p, pkt); return r; } idx_value = radeon_get_ib_value(p, idx); @@ -1285,7 +1285,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p, if (r) { DRM_ERROR("No reloc for packet3 %d\n", pkt->opcode); - r100_cs_dump_packet(p, pkt); + radeon_cs_dump_packet(p, pkt); return r; } ib[idx+2] = radeon_get_ib_value(p, idx + 2) + ((u32)reloc->lobj.gpu_offset); @@ -1298,7 +1298,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p, if (r) { DRM_ERROR("No reloc for packet3 %d\n", pkt->opcode); - r100_cs_dump_packet(p, pkt); + radeon_cs_dump_packet(p, pkt); return r; } idx_value = radeon_get_ib_value(p, idx); @@ -1355,20 +1355,6 @@ int r100_cs_parse_packet0(struct radeon_cs_parser *p, return 0; } -void r100_cs_dump_packet(struct radeon_cs_parser *p, -struct radeon_cs_packet *pkt) -{ - volatile uint32_t *ib; - unsigned i; - unsigned idx; - - ib = p->ib.ptr; - idx = pkt->idx; - for (i = 0; i <= (pkt->count + 1); i++, idx++) { - DRM_INFO("ib[%d]=0x%08X\n", idx, ib[idx]); - } -} - /** * r100_cs_packet_next_vline() - parse userspace VLINE packet * @parser:parser structure holding parsing context. @@ -1492,14 +1478,14 @@ int r100_cs_packet_next_reloc(struct radeon_cs_parser *p, if (p3reloc.type != PACKET_TYPE3 || p3reloc.opcode != PACKET3_NOP) { DRM_ERROR("No packet3 for relocation for packet at %d.\n", p3reloc.idx); - r100_cs_dump_packet(p, &p3reloc); + radeon_cs_dump_packet(p, &p3reloc); return -EINVAL; } idx = radeon_get_ib_value(p, p3reloc.idx + 1); if (idx >= relocs_chunk->length_dw) { DRM_ERROR("Relocs at %d after relocations chunk end %d !\n", idx, relocs_chunk->length_dw); - r100_cs_dump_packet(p, &p3reloc); + radeon_cs_dump_packet(p, &p3reloc); return -EINVAL; } /* FIXME: we assume reloc size is 4 dwords */ @@ -1584,7 +1570,7 @@ static int r100_packet0_check(struct
[PATCH 10/12] drm/radeon: pull out common next_reloc function
next_reloc function does the same thing in all ASICs with the exception of R600 which has a special case in legacy mode. Pull out the common function in preparation for refactoring. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon.h| 3 +++ drivers/gpu/drm/radeon/radeon_cs.c | 54 ++ 2 files changed, 57 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index a0e28dc..dd8feab 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -1972,6 +1972,9 @@ int radeon_cs_packet_parse(struct radeon_cs_parser *p, bool radeon_cs_packet_next_is_pkt3_nop(struct radeon_cs_parser *p); void radeon_cs_dump_packet(struct radeon_cs_parser *p, struct radeon_cs_packet *pkt); +int radeon_cs_packet_next_reloc(struct radeon_cs_parser *p, + struct radeon_cs_reloc **cs_reloc, + int nomm); int r600_cs_common_vline_parse(struct radeon_cs_parser *p, uint32_t *vline_start_end, uint32_t *vline_status); diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 2ef7e81..5c3d773 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -734,3 +734,57 @@ void radeon_cs_dump_packet(struct radeon_cs_parser *p, DRM_INFO("ib[%d]=0x%08X\n", idx, ib[idx]); } +/** + * radeon_cs_packet_next_reloc() - parse next (should be reloc) packet + * @parser:parser structure holding parsing context. + * @data: pointer to relocation data + * @offset_start: starting offset + * @offset_mask: offset mask (to align start offset on) + * @reloc: reloc informations + * + * Check if next packet is relocation packet3, do bo validation and compute + * GPU offset using the provided start. + **/ +int radeon_cs_packet_next_reloc(struct radeon_cs_parser *p, + struct radeon_cs_reloc **cs_reloc, + int nomm) +{ + struct radeon_cs_chunk *relocs_chunk; + struct radeon_cs_packet p3reloc; + unsigned idx; + int r; + + if (p->chunk_relocs_idx == -1) { + DRM_ERROR("No relocation chunk !\n"); + return -EINVAL; + } + *cs_reloc = NULL; + relocs_chunk = &p->chunks[p->chunk_relocs_idx]; + r = radeon_cs_packet_parse(p, &p3reloc, p->idx); + if (r) + return r; + p->idx += p3reloc.count + 2; + if (p3reloc.type != RADEON_PACKET_TYPE3 || + p3reloc.opcode != RADEON_PACKET3_NOP) { + DRM_ERROR("No packet3 for relocation for packet at %d.\n", + p3reloc.idx); + radeon_cs_dump_packet(p, &p3reloc); + return -EINVAL; + } + idx = radeon_get_ib_value(p, p3reloc.idx + 1); + if (idx >= relocs_chunk->length_dw) { + DRM_ERROR("Relocs at %d after relocations chunk end %d !\n", + idx, relocs_chunk->length_dw); + radeon_cs_dump_packet(p, &p3reloc); + return -EINVAL; + } + /* FIXME: we assume reloc size is 4 dwords */ + if (nomm) { + *cs_reloc = p->relocs; + (*cs_reloc)->lobj.gpu_offset = + (u64)relocs_chunk->kdata[idx + 3] << 32; + (*cs_reloc)->lobj.gpu_offset |= relocs_chunk->kdata[idx + 0]; + } else + *cs_reloc = p->relocs_ptr[(idx / 4)]; + return 0; +} -- 1.7.12 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 11/12] drm/radeon: use common next_reloc function
This patch eliminates ASIC-specific ***_cs_packet_next_reloc functions and hooks up the new common function. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/evergreen_cs.c | 129 +-- drivers/gpu/drm/radeon/r100.c | 76 +++- drivers/gpu/drm/radeon/r100_track.h | 2 - drivers/gpu/drm/radeon/r200.c | 12 +-- drivers/gpu/drm/radeon/r300.c | 16 ++-- drivers/gpu/drm/radeon/r600_cs.c | 158 +++--- 6 files changed, 98 insertions(+), 295 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 02aeb7f..211b509 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -36,9 +36,6 @@ int r600_dma_cs_next_reloc(struct radeon_cs_parser *p, struct radeon_cs_reloc **cs_reloc); -static int evergreen_cs_packet_next_reloc(struct radeon_cs_parser *p, - struct radeon_cs_reloc **cs_reloc); - struct evergreen_cs_track { u32 group_size; u32 nbanks; @@ -1009,52 +1006,6 @@ static int evergreen_cs_track_check(struct radeon_cs_parser *p) } /** - * evergreen_cs_packet_next_reloc() - parse next packet which should be reloc packet3 - * @parser:parser structure holding parsing context. - * @data: pointer to relocation data - * @offset_start: starting offset - * @offset_mask: offset mask (to align start offset on) - * @reloc: reloc informations - * - * Check next packet is relocation packet3, do bo validation and compute - * GPU offset using the provided start. - **/ -static int evergreen_cs_packet_next_reloc(struct radeon_cs_parser *p, - struct radeon_cs_reloc **cs_reloc) -{ - struct radeon_cs_chunk *relocs_chunk; - struct radeon_cs_packet p3reloc; - unsigned idx; - int r; - - if (p->chunk_relocs_idx == -1) { - DRM_ERROR("No relocation chunk !\n"); - return -EINVAL; - } - *cs_reloc = NULL; - relocs_chunk = &p->chunks[p->chunk_relocs_idx]; - r = radeon_cs_packet_parse(p, &p3reloc, p->idx); - if (r) { - return r; - } - p->idx += p3reloc.count + 2; - if (p3reloc.type != PACKET_TYPE3 || p3reloc.opcode != PACKET3_NOP) { - DRM_ERROR("No packet3 for relocation for packet at %d.\n", - p3reloc.idx); - return -EINVAL; - } - idx = radeon_get_ib_value(p, p3reloc.idx + 1); - if (idx >= relocs_chunk->length_dw) { - DRM_ERROR("Relocs at %d after relocations chunk end %d !\n", - idx, relocs_chunk->length_dw); - return -EINVAL; - } - /* FIXME: we assume reloc size is 4 dwords */ - *cs_reloc = p->relocs_ptr[(idx / 4)]; - return 0; -} - -/** * evergreen_cs_packet_parse_vline() - parse userspace VLINE packet * @parser:parser structure holding parsing context. * @@ -1205,7 +1156,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) case SQ_LSTMP_RING_BASE: case SQ_PSTMP_RING_BASE: case SQ_VSTMP_RING_BASE: - r = evergreen_cs_packet_next_reloc(p, &reloc); + r = radeon_cs_packet_next_reloc(p, &reloc, 0); if (r) { dev_warn(p->dev, "bad SET_CONTEXT_REG " "0x%04X\n", reg); @@ -1234,7 +1185,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) case DB_Z_INFO: track->db_z_info = radeon_get_ib_value(p, idx); if (!(p->cs_flags & RADEON_CS_KEEP_TILING_FLAGS)) { - r = evergreen_cs_packet_next_reloc(p, &reloc); + r = radeon_cs_packet_next_reloc(p, &reloc, 0); if (r) { dev_warn(p->dev, "bad SET_CONTEXT_REG " "0x%04X\n", reg); @@ -1276,7 +1227,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) track->db_dirty = true; break; case DB_Z_READ_BASE: - r = evergreen_cs_packet_next_reloc(p, &reloc); + r = radeon_cs_packet_next_reloc(p, &reloc, 0); if (r) { dev_warn(p->dev, "bad SET_CONTEXT_REG " "0x%04X\n", reg); @@ -1288,7 +1239,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) track->db_dirty = true;
[PATCH 12/12] drm/radeon: consolidate redundant macros and constants
After refactoring the _cs logic, we ended up with many macros and constants that #define the same thing. Clean'em up. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/evergreen_cs.c | 18 +- drivers/gpu/drm/radeon/evergreend.h | 13 ++--- drivers/gpu/drm/radeon/nid.h | 13 ++--- drivers/gpu/drm/radeon/r100.c | 8 drivers/gpu/drm/radeon/r100d.h| 11 --- drivers/gpu/drm/radeon/r300.c | 6 +++--- drivers/gpu/drm/radeon/r300d.h| 11 --- drivers/gpu/drm/radeon/r600_cs.c | 10 +- drivers/gpu/drm/radeon/r600d.h| 13 ++--- drivers/gpu/drm/radeon/rv515d.h | 11 --- drivers/gpu/drm/radeon/si.c | 12 ++-- drivers/gpu/drm/radeon/sid.h | 13 ++--- 12 files changed, 35 insertions(+), 104 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 211b509..4a9760a 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -2639,12 +2639,12 @@ int evergreen_cs_parse(struct radeon_cs_parser *p) } p->idx += pkt.count + 2; switch (pkt.type) { - case PACKET_TYPE0: + case RADEON_PACKET_TYPE0: r = evergreen_cs_parse_packet0(p, &pkt); break; - case PACKET_TYPE2: + case RADEON_PACKET_TYPE2: break; - case PACKET_TYPE3: + case RADEON_PACKET_TYPE3: r = evergreen_packet3_check(p, &pkt); break; default: @@ -3395,19 +3395,19 @@ int evergreen_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib) do { pkt.idx = idx; - pkt.type = CP_PACKET_GET_TYPE(ib->ptr[idx]); - pkt.count = CP_PACKET_GET_COUNT(ib->ptr[idx]); + pkt.type = RADEON_CP_PACKET_GET_TYPE(ib->ptr[idx]); + pkt.count = RADEON_CP_PACKET_GET_COUNT(ib->ptr[idx]); pkt.one_reg_wr = 0; switch (pkt.type) { - case PACKET_TYPE0: + case RADEON_PACKET_TYPE0: dev_err(rdev->dev, "Packet0 not allowed!\n"); ret = -EINVAL; break; - case PACKET_TYPE2: + case RADEON_PACKET_TYPE2: idx += 1; break; - case PACKET_TYPE3: - pkt.opcode = CP_PACKET3_GET_OPCODE(ib->ptr[idx]); + case RADEON_PACKET_TYPE3: + pkt.opcode = RADEON_CP_PACKET3_GET_OPCODE(ib->ptr[idx]); ret = evergreen_vm_packet3_check(rdev, ib->ptr, &pkt); idx += pkt.count + 2; break; diff --git a/drivers/gpu/drm/radeon/evergreend.h b/drivers/gpu/drm/radeon/evergreend.h index cb9baaa..5ce08cf 100644 --- a/drivers/gpu/drm/radeon/evergreend.h +++ b/drivers/gpu/drm/radeon/evergreend.h @@ -979,16 +979,7 @@ /* * PM4 */ -#definePACKET_TYPE00 -#definePACKET_TYPE11 -#definePACKET_TYPE22 -#definePACKET_TYPE33 - -#define CP_PACKET_GET_TYPE(h) (((h) >> 30) & 3) -#define CP_PACKET_GET_COUNT(h) (((h) >> 16) & 0x3FFF) -#define CP_PACKET0_GET_REG(h) (((h) & 0x) << 2) -#define CP_PACKET3_GET_OPCODE(h) (((h) >> 8) & 0xFF) -#define PACKET0(reg, n)((PACKET_TYPE0 << 30) | \ +#define PACKET0(reg, n)((RADEON_PACKET_TYPE0 << 30) | \ (((reg) >> 2) & 0x) | \ ((n) & 0x3FFF) << 16) #define CP_PACKET2 0x8000 @@ -997,7 +988,7 @@ #define PACKET2(v) (CP_PACKET2 | REG_SET(PACKET2_PAD, (v))) -#define PACKET3(op, n) ((PACKET_TYPE3 << 30) | \ +#define PACKET3(op, n) ((RADEON_PACKET_TYPE3 << 30) | \ (((op) & 0xFF) << 8) | \ ((n) & 0x3FFF) << 16) diff --git a/drivers/gpu/drm/radeon/nid.h b/drivers/gpu/drm/radeon/nid.h index b93186b..a3a3212 100644 --- a/drivers/gpu/drm/radeon/nid.h +++ b/drivers/gpu/drm/radeon/nid.h @@ -474,16 +474,7 @@ /* * PM4 */ -#definePACKET_TYPE00 -#definePACKET_TYPE11 -#definePACKET_TYPE22 -#definePACKET_TYPE33 - -#define CP_PACKET_GET_TYPE(h) (((h) >> 30) & 3) -#define CP_PACKET_GET_COUNT(h) (((h) >> 16) & 0x3FFF) -#define CP_PACKET0_GET_REG(h) (((h) & 0x) << 2) -#define
Re: radeon CS parser refactoring
On Fri, 4 Jan 2013, Dave Airlie wrote: Did you run these with pre-kms userspace etc to make sure it doesn't cause regressions there? No, I didn't, but I can give it a quick whirl. I think I still have one or two machines with 6.14.x DDX that I can put in UMS mode and see what happens. Haven't used UMS for at least 2-3 years. My bet is that it is more likely that I will hit a few unrelated UMS problems that are there even without my patches, but I'll let you know the result when I have it. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: radeon CS parser refactoring
On Fri, 4 Jan 2013, Dave Airlie wrote: Did you run these with pre-kms userspace etc to make sure it doesn't cause regressions there? I did some tests with UMS and shuffled a number of cards. As I feared, I ran into a number of unrelated problems, but in each case I have seen identical beahvior between the kernel with my patches and without. So as far as I can tell, my patches do not introduce regressions in legacy modes, althugh I am not sure about the coverage. Also, in one test, I think I have hit a genuine bug in ATI DDX (explained below). Below I describe what I saw with each card family. Maybe for some test cases I am missing some "magic" option in xorg.conf or maybe what I am seeing (mostly reduced feature set) is expected in UMS. * With an NI card (TURKS, HD6570 card), Xorg just plain tells me that the card is supported in KMS only mode. So, I guess, that's it for that card. * A test with Evergreen (CEDAR) card works in UMS mode, but I can't get 3D acceleration. I see these messages in Xorg log file: [37.024] (II) RADEON(0): No DRI yet on Evergreen . . [37.364] (II) AIGLX: Screen 0 is not DRI2 capable [37.364] (II) AIGLX: Screen 0 is not DRI capable [37.664] (II) AIGLX: Loaded and initialized swrast [37.664] (II) GLX: Initialized DRISWRAST GL provider for screen 0 Sounds like just a "property" of UMS to me, but I am not sure. Nevertheless, the behavior with and without my kernel patches is identical. Still, 2D copying should still be exercising the CS parser, so there is some test coverage here. * A test with an R7xx card (RV730, E4690 card) results in a segfault in DDX. Again, this is irrespective of my kernel patches, so I believe that the bug has been there for a while and that it's in userland. The crash occurs in r600_set_render_target() function and what causes it is corrupted cb_conf->surface pointer. When the crash occurs it has a value of 0x1, which doesn't look like something that would live in .bss, .data or come from the heap. I didn't try other R6xx cards, but I suspect that they may have the same problem because they share the code in DDX. I don't know if UMS DDX will be maintained going forward, so I don't know if it makes sense to open a bug for this. BTW, DDX I am testing this with is 6.14.6 * A test with R300 card (Radeon X300 card) works (again identically with and without my patches), but again without 3D acceleration. So it's similar result and comment as with the Evergreen test, though relevant messages in Xorg log file are slightly different: [35.630] (EE) AIGLX error: r300 does not export required DRI extension [35.630] (EE) AIGLX: reverting to software rendering [35.915] (II) AIGLX: Loaded and initialized swrast [35.915] (II) GLX: Initialized DRISWRAST GL provider for screen 0 Again, I don't know if this is just the way things are in UMS mode or if there is some configuration magic I need to do. So at this point I'd say that I have not seen anything that would indicate a regression in legacy mode, although the limitations I have hit make the tests more limited that I thought they would be (and KMS I tested quite a bit, so I am confident there). thanks, Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: radeon CS parser refactoring
On Fri, 4 Jan 2013, Alex Deucher wrote: R6xx and r7xx are really all you need to worry about in this case. R1xx-r5xx UMS uses a different kernel interface for command submission and evergreen and later don't have UMS drm support. UMS r6xx/r7xx support used the same kernel interface for command submission but the kernel side was much simpler. OK, I have found an old machine with RV730 GPU and a known-working UMS: 2.6.35 kernel, 6.13.2 DDX, 7.8.2 mesa (classic), 1.9 Xorg). I changed the kernel to the latest drm-next and ran tests with and without my CS-refactoring patches. Here are the results: * It appears that drm-next in its current state is broken with regard to UMS (nothing to do with my patches, it was pre-broken to begin with). UMS provokes the kernel into a NULL-pointer dereference oops. Good news is that I have tracked down the crash and I will be sending the patches with the fix shortly. * There are multiple patches that contributed to the breakage of UMS. I didn't bother pin-pointing them all, but one that I looked (6a7068b4) dates back to April 2012 so there are kernels out in distros that crash on UMS. That probably tells us how many UMS users are left out there :-). BTW, the reason UMS crashes is because parser->rdev is NULL in UMS mode so every patch that tries to dereference it (and shares the code path with UMS) will cause an oops (it will become clear when you see the patches). * So, having fixed the above incidental finding, I got my machine with ancient UMS-userspace and a shiny latest drm-next kernel (plus my CS-refactoring patches plus my yet-to-be-sent UMS fixes) to work. My test consists of bringing up Gnome (it's Gnome 2 on that machine), running glxgears, sphereworld (an example code from OpenGL Superbible book), and OpenArena. Everything seems to work. * Going back to KMS and retesting there, things still look good. So this (with tests I did on Friday) should cover all the cases. Additionally, UMS requires the old non-gallium 3D drivers. It sounds like some other change in the ddx broke UMS support for r6xx/r7xx. The DDX segfault I reported on Friday was provoked by trying to run Gallium 3D driver on the top of UMS. Once I switched back to classic, the crash went away. So I guess the userspace crash was provoked by some obscure path that was never intended to be exercised. I don't think it's worth investigating further. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
fix for crashes provoked by UMS mode
At Dave's request I ran some regression tests on my CS-refactoring patches [1] against old UMS userspace. The tests have revealed that the current kernel can be provoked into crashing in UMS mode (and the problem is unrelated to refactoring of CS code; it has been here before). The following patches will fix the problem that I discovered while testing (and they should probably go to drm-fixes branch). First two patches are the fix for the acute problem, while the third patch is a fix for an incidental finding uncovered while debugging the problem. [1] http://lists.freedesktop.org/archives/dri-devel/2013-January/032814.html ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/3] drm/radeon: fix NULL pointer dereference in UMS mode
In UMS mode parser->rdev is NULL, so dereferencing will cause an oops. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon_cs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 396baba0..45151c4 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -279,7 +279,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data) p->chunks[p->chunk_ib_idx].length_dw); return -EINVAL; } - if ((p->rdev->flags & RADEON_IS_AGP)) { + if (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) { p->chunks[p->chunk_ib_idx].kpage[0] = kmalloc(PAGE_SIZE, GFP_KERNEL); p->chunks[p->chunk_ib_idx].kpage[1] = kmalloc(PAGE_SIZE, GFP_KERNEL); if (p->chunks[p->chunk_ib_idx].kpage[0] == NULL || @@ -583,7 +583,8 @@ static int radeon_cs_update_pages(struct radeon_cs_parser *p, int pg_idx) struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx]; int i; int size = PAGE_SIZE; - bool copy1 = (p->rdev->flags & RADEON_IS_AGP) ? false : true; + bool copy1 = (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) ? + false : true; for (i = ibc->last_copied_page + 1; i < pg_idx; i++) { if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)), -- 1.8.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/3] drm/radeon: fix a bogus kfree
parser->chunks[.].kpage[.] is not always kmalloc-ed by the parser initialization, so parser_fini should not try to kfree it if it didn't allocate it. This patch fixes a kernel oops that can be provoked in UMS mode. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/r600_cs.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c index 9ea13d0..f8adb01 100644 --- a/drivers/gpu/drm/radeon/r600_cs.c +++ b/drivers/gpu/drm/radeon/r600_cs.c @@ -2476,8 +2476,10 @@ static void r600_cs_parser_fini(struct radeon_cs_parser *parser, int error) kfree(parser->relocs); for (i = 0; i < parser->nchunks; i++) { kfree(parser->chunks[i].kdata); - kfree(parser->chunks[i].kpage[0]); - kfree(parser->chunks[i].kpage[1]); + if (parser->rdev && (parser->rdev->flags & RADEON_IS_AGP)) { + kfree(parser->chunks[i].kpage[0]); + kfree(parser->chunks[i].kpage[1]); + } } kfree(parser->chunks); kfree(parser->chunks_array); -- 1.8.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 3/3] drm/radeon: fix error path in kpage allocation
Index into chunks[] array doesn't look right. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon_cs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 45151c4..469661f 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -284,8 +284,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data) p->chunks[p->chunk_ib_idx].kpage[1] = kmalloc(PAGE_SIZE, GFP_KERNEL); if (p->chunks[p->chunk_ib_idx].kpage[0] == NULL || p->chunks[p->chunk_ib_idx].kpage[1] == NULL) { - kfree(p->chunks[i].kpage[0]); - kfree(p->chunks[i].kpage[1]); + kfree(p->chunks[p->chunk_ib_idx].kpage[0]); + kfree(p->chunks[p->chunk_ib_idx].kpage[1]); return -ENOMEM; } } -- 1.8.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: radeon CS parser refactoring
On Mon, Jan 7, 2013 at 7:21 PM, Marek Olšák wrote: > > IIRC, the radeon gallium drivers call abort() if they encounter an > unsupported DRM version (that is UMS). > I am not familiar enough to comment, but my observation was that as soon as I backed out to classic, the segfault went away. So I assumed that the mismatch between the UMS and Gallium was the cause. Anyway, it's a secondary issue for this thread. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Patch "drm: Add EDID_QUIRK_FORCE_REDUCED_BLANKING for ASUS VW222S" has been added to the 3.4-stable tree
I thought I saw a revert for that patch on the mailing list yesterday: http://lists.freedesktop.org/archives/dri-devel/2013-January/033322.html On Tue, 15 Jan 2013, gre...@linuxfoundation.org wrote: This is a note to let you know that I've just added the patch titled drm: Add EDID_QUIRK_FORCE_REDUCED_BLANKING for ASUS VW222S to the 3.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: 0014-drm-Add-EDID_QUIRK_FORCE_REDUCED_BLANKING-for-ASUS-V.patch and it can be found in the queue-3.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. From 31d04b5fa45264da10750fc7f526d0e65f894a29 Mon Sep 17 00:00:00 2001 From: Paul Menzel Date: Wed, 8 Aug 2012 23:12:19 +0200 Subject: drm: Add EDID_QUIRK_FORCE_REDUCED_BLANKING for ASUS VW222S From: Paul Menzel commit 6f33814bd4d9cfe76033a31b1c0c76c960cd8e4b upstream. Connecting an ASUS VW222S [1] over VGA a garbled screen is shown with vertical stripes in the top half. In commit bc42aabc [2] commit bc42aabc6a01b92b0f961d65671564e0e1cd7592 Author: Adam Jackson Date: Wed May 23 16:26:54 2012 -0400 drm/edid/quirks: ViewSonic VA2026w Adam Jackson added the quirk `EDID_QUIRK_FORCE_REDUCED_BLANKING` which is also needed for this ASUS monitor. All log files and output from `xrandr` is included in the referenced Bugzilla report #17629. Please note that this monitor only has a VGA (D-Sub) connector [1]. [1] http://www.asus.com/Display/LCD_Monitors/VW222S/ [2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=bc42aabc6a01b92b0f961d65671564e0e1cd7592 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=17629 Signed-off-by: Paul Menzel Cc: Cc: Adam Jackson Cc: Ian Pilcher Signed-off-by: Dave Airlie Signed-off-by: Julien Cristau Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/drm_edid.c |3 +++ 1 file changed, 3 insertions(+) --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -87,6 +87,9 @@ static struct edid_quirk { int product_id; u32 quirks; } edid_quirk_list[] = { + /* ASUS VW222S */ + { "ACI", 0x22a2, EDID_QUIRK_FORCE_REDUCED_BLANKING }, + /* Acer AL1706 */ { "ACR", 44358, EDID_QUIRK_PREFER_LARGE_60 }, /* Acer F51 */ Patches currently in stable-queue which might be from paulepan...@users.sourceforge.net are queue-3.4/0014-drm-Add-EDID_QUIRK_FORCE_REDUCED_BLANKING-for-ASUS-V.patch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: fix NULL pointer dereference in UMS mode in radeon_cs_parser_fini()
Actually, the code path affected by your patch is not executed in UMS mode at all. Notice that radeon_cs_parser_fini is only called from radeon_cs_ioctl which is a KMS-only ioctl (see radeon_kms.c). The equivalent of the fix you are trying to do is in a6b7e1a02b77ab8fe8775d20a88c53d8ba55482e (function patched by that one is the one used by legacy-CS ioctl), which you should go together with ff4bd0827764e10a428a9d39e6814c5478863f94 if you are backporting UMS fixes to 3.7. Both are needed to prevent kernel crashes in UMS mode. -- Ilija On Wed, 16 Jan 2013, Shuah Khan wrote: Fix parser->rdev NULL pointer dereference in radeon_cs_parser_fini(). While back-porting drm/radeon: fix NULL pointer dereference in UMS mode patch (commit-id: ff4bd0827764e10a428a9d39e6814c5478863f94) to 3,7.y, noticed another instance of NULL pointer dereference in radeon_cs_parser_fini() function. Signed-off-by: Shuah Khan CC: sta...@vger.kernel.org 3.7 --- drivers/gpu/drm/radeon/radeon_cs.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 469661f..d1c282c 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -329,7 +329,7 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error) kfree(parser->relocs_ptr); for (i = 0; i < parser->nchunks; i++) { kfree(parser->chunks[i].kdata); - if ((parser->rdev->flags & RADEON_IS_AGP)) { + if (parser->rdev && (parser->rdev->flags & RADEON_IS_AGP)) { kfree(parser->chunks[i].kpage[0]); kfree(parser->chunks[i].kpage[1]); } -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: fix NULL pointer dereference in UMS mode in radeon_cs_parser_fini()
On Wed, Jan 23, 2013 at 11:07 AM, Herton Ronaldo Krzesinski wrote: > On Thu, Jan 17, 2013 at 10:09:42AM -0700, Shuah Khan wrote: >> On Wed, 2013-01-16 at 21:06 -0600, Ilija Hadzic wrote: >> > Actually, the code path affected by your patch is not executed in UMS mode >> > at all. Notice that radeon_cs_parser_fini is only called from >> > radeon_cs_ioctl which is a KMS-only ioctl (see radeon_kms.c). >> > >> > The equivalent of the fix you are trying to do is in >> > a6b7e1a02b77ab8fe8775d20a88c53d8ba55482e (function patched by that one is >> > the one used by legacy-CS ioctl), which you should go together >> > with ff4bd0827764e10a428a9d39e6814c5478863f94 if you are backporting UMS >> > fixes to 3.7. Both are needed to prevent kernel crashes in UMS mode. >> > >> > -- Ilija >> >> Thanks. I will take a look at a6b7e1a02b77ab8fe8775d20a88c53d8ba55482e. >> I sent back-ported ff4bd0827764e10a428a9d39e6814c5478863f94 patch to >> stable and I will back-port and send >> a6b7e1a02b77ab8fe8775d20a88c53d8ba55482e as well. > > While at it, also looks like commit > 25d8999780f8c1f53928f4a24a09c01550423109 could also be added to stables. > But while looking at it, while the patch itself is correct, I noted that > there is a possibility of double free: if either of the > p->chunks[p->chunk_ib_idx].kpage[] items are non NULL, we will free it in > radeon_cs_parser_init and also when calling one of the fini functions > (radeon_cs_parser_fini or r600_cs_parser_fini). So either we need to set > kpage[n] to NULL after kfree, or just return the error. > Yes you are right. The error-handling path should force both kpage[] pointers to NULL before returning -ENOMEM. That would fix the double free. Regarding, inclusion of 25d8999780f8c1f53928f4a24a09c01550423109 into stable (and possible follow-up patch to fix potential double-kfree), does this pass the "no fixes for theoretical problems in stable" criterion ? It is an obvious bug, but it happens so rarely that I am not surprised that it has never happened: You need an (old) AGP card *and* you need to run out of kmalloc memory. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: fix a rare case of double kfree
If one (but not both) allocations of p->chunks[].kpage[] in radeon_cs_parser_init fail, the error path will free the successfully allocated page, but leave a stale pointer value in the kpage[] field. This will later cause a double-free when radeon_cs_parser_fini is called. This patch fixes the issue by forcing both pointers to NULL after kfree in the error path. The circumstances under which the problem happens are very rare. The card must be AGP and the system must run out of kmalloc area just at the right time so that one allocation succeeds, while the other fails. Signed-off-by: Ilija Hadzic Cc: Herton Ronaldo Krzesinski --- drivers/gpu/drm/radeon/radeon_cs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 469661f..5407459 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -286,6 +286,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data) p->chunks[p->chunk_ib_idx].kpage[1] == NULL) { kfree(p->chunks[p->chunk_ib_idx].kpage[0]); kfree(p->chunks[p->chunk_ib_idx].kpage[1]); + p->chunks[p->chunk_ib_idx].kpage[0] = NULL; + p->chunks[p->chunk_ib_idx].kpage[1] = NULL; return -ENOMEM; } } -- 1.8.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
is airlied's git tree broken ?
I am getting this kind of error when I try to do 'git fetch' from git://people.freedesktop.org/~airlied/linux.git remote: error: Could not read ec862f894f7870430e4ff5a9249eaa94d368d220 remote: fatal: bad tree object ec862f894f7870430e4ff5a9249eaa94d368d220 remote: aborting due to possible repository corruption on the remote side. fatal: protocol error: bad pack header I see a similar error if I try to do a fresh clone: Cloning into 'linux'... remote: error: Could not read ec862f894f7870430e4ff5a9249eaa94d368d220 remote: fatal: bad tree object ec862f894f7870430e4ff5a9249eaa94d368d220 remote: aborting due to possible repository corruption on the remote side. fatal: early EOF fatal: index-pack failed Is there a known problem with that repository ? thanks, Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 0/3] [RFC] DRM Render Nodes
On Fri, 28 Sep 2012, Alex Deucher wrote: I haven't read through your patches yet, but Dave and Ilija already did something similar a while back: http://lists.freedesktop.org/archives/dri-devel/2012-March/020765.html Actually, there is a, a newer series of patches here (applied few comments I got after first series) http://lists.freedesktop.org/archives/dri-devel/2012-April/021326.html I have the v3 series (applied more comments I got after v2 series, plus some more cleanup) that I have never sent out, because my perception was that there was low interest in this work. I can send the v3 out, if there is a revived interest in this work. The original work is by Dave and I did some major cleanup and built more on the top of it. Kristian's patches at the first glance (I have not looked them in detail) appear more like prep. work (like which ioctl can a render node use and which can't, etc.), but my impression is that they still require the work cited above (Dave's original work and/or my followup work) to actually be able to create and use the render node. BTW, the third patch in Kristian's series is for Intel only and we'll probably need equivalent patches for radeon and nouveau. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/3] drm: Fix DRM_MINOR limits for control and render nodes
On Fri, 28 Sep 2012, Kristian Høgsberg wrote: if (type == DRM_MINOR_CONTROL) { base += 64; -limit = base + 127; +limit = base + 64; } else if (type == DRM_MINOR_RENDER) { base += 128; -limit = base + 255; +limit = base + 64; If my first grade arithmetics serves me well, shouldn't limit in the first clause be base + 63 and in the second clause, base + 127. The render node starts at 128 and spans to 255, so there are total of 128 render nodes, 64 card (legacy) nodes and 64 control nodes allowed. Actually, this construction of limit relative to the base does not achieve anything. The code would be much more readable if it were something like this: if (type == DRM_MINOR_CONTROL) { base = 64; limit = 127; } else if (type == DRM_MINOR_RENDER) { base = 128; limit = 255; } -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 0/3] [RFC] DRM Render Nodes
On Fri, 28 Sep 2012, Daniel Vetter wrote: On a quick look the rendernode Kristian propose and your work seem to attack slightly different issues. Your/Dave's patch series seems to put large efforts into (dynamically) splitting up the resources of a drm device, including the modeset stuff. Correct, the goal is to be able to run multiseat while sharing a GPU. Actually, with my variant of render nodes, I even got multiple desktops residing in different LXC containers to share the GPU, which is kind of cool. Kristians proposal here is much more modest, with just enabling a way for to do the same for render clients. All the modeset (and flink open stuff) would still be only done through the legacy drm node. OK I see. From what I can tell from the second patch, drm_get_pci_dev will create one (and I guess only one, right ?) render node if the underlying driver has DRIVER_RENDER node feature. The third patch (among other things) adds that feature to Intel driver. So if I boot up a system with these patches and with Intel GPU, I will automagically get one more /dev/dri/renderD128 node, right ? The intent is that the render client opens and uses that render node. The /dev/dri/controlDNN node still remains an unused "orphan", right ? So would you entertain the possibility that the render node is created from user space on demand using an ioctl into the control node ? If that's a possiblity for you, then my set of patches is a superset of what Kristian needs. If you just need a render client, you can create a node with no display resources and you would get something quite close to what these 3 patches try to do. unless I am missing something. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Breakage in "track dev_mapping in more robust and flexible way"
On Thu, Oct 25, 2012 at 11:10 AM, Thomas Hellström wrote: > On 10/25/12 4:41 PM, Jerome Glisse wrote: >> >> On Thu, Oct 25, 2012 at 04:02:25PM +0200, Thomas Hellstrom wrote: >>> >>> Hi, >>> >>> This commit >>> >>> From 949c4a34afacfe800fc442afac117aba15284962 Mon Sep 17 00:00:00 2001 >>> From: Ilija Hadzic >>> Date: Tue, 15 May 2012 16:40:10 -0400 >>> Subject: [PATCH] drm: track dev_mapping in more robust and flexible way >>> >>> Setting dev_mapping (pointer to the address_space structure >>> used for memory mappings) to the address_space of the first >>> opener's inode and then failing if other openers come in >>> through a different inode has a few restrictions that are >>> eliminated by this patch. >>> >>> If we already have valid dev_mapping and we spot an opener >>> with different i_node, we force its i_mapping pointer to the >>> already established address_space structure (first opener's >>> inode). This will make all mappings from drm device hang off >>> the same address_space object. >>> ... >>> >>> Breaks drivers using TTM, since when the X server calls into the >>> driver open, drm's dev_mapping has not >>> yet been setup. The setup needs to be moved before the driver's open >>> hook is called. >>> >>> Typically, if a TTM-aware driver is provoked by the Xorg server to >>> move a buffer from system to VRAM or AGP, >>> before any other drm client is started, The user-space page table >>> entries are not killed before the move, and left pointing >>> into freed pages, causing system crashes and / or user-space access >>> to arbitrary memory. >> >> Doesn't handle move invalidate the drm file mapping before scheduling >> the move ? > > Yes, but to do that it needs a correct value of bdev::dev_mapping, which is > now incorrectly set on the > *second* open instead of the first open. > So you are implying that in the first open the assignment of dev->dev_mapping is somehow skipped (which could happen if drm_setup returns an error) or that the driver on which you are having problems with (nouveau I presume) needs dev_mapping in the firstopen hook ? It's been a while since I did it, but if my memory serves me well I thought I explicitly verified that dev_mapping was correctly set in the first open (though GPUs I use are primarily AMD). -- Ilija > /Thomas > > > >> Cheers, >> Jerome > > > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Breakage in "track dev_mapping in more robust and flexible way"
Or could the driver that causes the problem be vmwgfx ? I now looked at the code and I notice that indeed vmwgfx sets its private dev_mapping in the open hook, while all other TTM-based drivers (radeon and nouveau) do it when they create an object. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Breakage in "track dev_mapping in more robust and flexible way"
Can you give the attached patch a whirl and let me know if it fixes the problem? As I indicated in my previous note, vmwgfx should be the only affected driver because it looks at dev_mapping in the open hook (others do it when they create an object, so they are not affected). I'll probably revise it (and I'll have some general questions about drm_open syscall) before officially send the patch, but I wanted to get something quickly to you to check if it fixes your problem. I hope that your vmwgfx test environment is such that you can reproduce the original problem. thanks, -- Ilija From 18a489e7415f495c7ba48cc61733d6c7d8f3fd68 Mon Sep 17 00:00:00 2001 From: Ilija Hadzic Date: Thu, 25 Oct 2012 15:28:05 -0400 Subject: [PATCH] drm: set dev_mapping before calling drm_open_helper Some drivers (specifically vmwgfx) look at dev_mapping in their open hook, so we have to set dev->dev_mapping earlier in the process. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_fops.c | 43 +-- 1 files changed, 29 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index 7ef1b67..50b7b47 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -121,6 +121,8 @@ int drm_open(struct inode *inode, struct file *filp) int minor_id = iminor(inode); struct drm_minor *minor; int retcode = 0; + int need_setup = 0; + struct address_space *old_mapping; minor = idr_find(&drm_minors_idr, minor_id); if (!minor) @@ -132,23 +134,36 @@ int drm_open(struct inode *inode, struct file *filp) if (drm_device_is_unplugged(dev)) return -ENODEV; + if (!dev->open_count++) + need_setup = 1; + mutex_lock(&dev->struct_mutex); + old_mapping = dev->dev_mapping; + if (old_mapping == NULL) + dev->dev_mapping = &inode->i_data; + mutex_unlock(&dev->struct_mutex); + retcode = drm_open_helper(inode, filp, dev); - if (!retcode) { - atomic_inc(&dev->counts[_DRM_STAT_OPENS]); - if (!dev->open_count++) - retcode = drm_setup(dev); - } - if (!retcode) { - mutex_lock(&dev->struct_mutex); - if (dev->dev_mapping == NULL) - dev->dev_mapping = &inode->i_data; - /* ihold ensures nobody can remove inode with our i_data */ - ihold(container_of(dev->dev_mapping, struct inode, i_data)); - inode->i_mapping = dev->dev_mapping; - filp->f_mapping = dev->dev_mapping; - mutex_unlock(&dev->struct_mutex); + if (retcode) + goto err_undo; + atomic_inc(&dev->counts[_DRM_STAT_OPENS]); + if (need_setup) { + retcode = drm_setup(dev); + if (retcode) + goto err_undo; } + /* ihold ensures nobody can remove inode with our i_data */ + mutex_lock(&dev->struct_mutex); + ihold(container_of(dev->dev_mapping, struct inode, i_data)); + inode->i_mapping = dev->dev_mapping; + filp->f_mapping = dev->dev_mapping; + mutex_unlock(&dev->struct_mutex); + return 0; +err_undo: + dev->open_count--; + mutex_lock(&dev->struct_mutex); + dev->dev_mapping = old_mapping; + mutex_unlock(&dev->struct_mutex); return retcode; } EXPORT_SYMBOL(drm_open); -- 1.7.4.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Breakage in "track dev_mapping in more robust and flexible way"
On Fri, 26 Oct 2012, Thomas Hellstrom wrote: Hi, On 10/25/2012 11:27 PM, Ilija Hadzic wrote: Can you give the attached patch a whirl and let me know if it fixes the problem? As I indicated in my previous note, vmwgfx should be the only affected driver because it looks at dev_mapping in the open hook (others do it when they create an object, so they are not affected). I'll probably revise it (and I'll have some general questions about drm_open syscall) before officially send the patch, but I wanted to get something quickly to you to check if it fixes your problem. I hope that your vmwgfx test environment is such that you can reproduce the original problem. thanks, -- Ilija Yes, it appears like this patch fixes the problem. It'd be good to have it in 3.7 (drm-fixes) with a cc to stable. OK great. Thanks for testing. Before I send out an "official" patch, I have a few questions for those who have been around longer and can possibly reflect better than me on the history of drm_open syscall. Currently, before touching dev->dev_mapping field we grab dev->struct mutex. This has been introduced by Dave Airlie a long time ago in a2c0a97b784f837300f7b0869c82ab712c600952. I tried to preserve that in all patches where I touched dev_open, but looking at the code I don't think the mutex is necessary. Namely, drm_open is only set in drm_open, and concurrent openers are protected with drm_global_mutex. Other places (drivers) where dev->dev_mapping is accessed is read-only and dev_mapping is written at first open when there are no file descriptors around to issue any other call. Also, it doesn't look to me that any driver locks dev->struct_mutex before accessing dev->dev_mapping anyway. So I am thinking of dropping the mutex completely, but I would like to hear a second thought. The other issue, I noticed is that of the drm_setup() call fails, the open_count counter would remain incremented and I think we need to restore it back (or if I am missing something, would someone please enlighten me). This was also in the kernel all this time (and I have not noticed until now), so I "smuggled" that fix in the patch that I sent you. However, wonder if I should cut the separate patch for open_count fix. Actually, I think that I should cut three patches: one to drop the mutex, one to fix the open_count and one to fix your problem with dev_mapping and that probably all three should CC stable. Before I do that, I'd like to hear opinions of others. thanks, Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2] drm: restore open_count if drm_setup fails
If drm_setup (called at first open) fails, the whole open call has failed, so we should not keep the open_count incremented. Signed-off-by: Ilija Hadzic Cc: sta...@vger.kernel.org --- drivers/gpu/drm/drm_fops.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index 7ef1b67..af68eca 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -135,8 +135,11 @@ int drm_open(struct inode *inode, struct file *filp) retcode = drm_open_helper(inode, filp, dev); if (!retcode) { atomic_inc(&dev->counts[_DRM_STAT_OPENS]); - if (!dev->open_count++) + if (!dev->open_count++) { retcode = drm_setup(dev); + if (retcode) + dev->open_count--; + } } if (!retcode) { mutex_lock(&dev->struct_mutex); -- 1.7.12.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/2] drm: set dev_mapping before calling drm_open_helper
Some drivers (specifically vmwgfx) look at dev_mapping in their open hook, so we have to set dev->dev_mapping earlier in the process. Reference: http://lists.freedesktop.org/archives/dri-devel/2012-October/029420.html Signed-off-by: Ilija Hadzic Reported-by: Thomas Hellstrom Cc: sta...@vger.kernel.org --- drivers/gpu/drm/drm_fops.c | 47 +- 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index af68eca..133b413 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -121,6 +121,8 @@ int drm_open(struct inode *inode, struct file *filp) int minor_id = iminor(inode); struct drm_minor *minor; int retcode = 0; + int need_setup = 0; + struct address_space *old_mapping; minor = idr_find(&drm_minors_idr, minor_id); if (!minor) @@ -132,26 +134,37 @@ int drm_open(struct inode *inode, struct file *filp) if (drm_device_is_unplugged(dev)) return -ENODEV; + if (!dev->open_count++) + need_setup = 1; + mutex_lock(&dev->struct_mutex); + old_mapping = dev->dev_mapping; + if (old_mapping == NULL) + dev->dev_mapping = &inode->i_data; + /* ihold ensures nobody can remove inode with our i_data */ + ihold(container_of(dev->dev_mapping, struct inode, i_data)); + inode->i_mapping = dev->dev_mapping; + filp->f_mapping = dev->dev_mapping; + mutex_unlock(&dev->struct_mutex); + retcode = drm_open_helper(inode, filp, dev); - if (!retcode) { - atomic_inc(&dev->counts[_DRM_STAT_OPENS]); - if (!dev->open_count++) { - retcode = drm_setup(dev); - if (retcode) - dev->open_count--; - } - } - if (!retcode) { - mutex_lock(&dev->struct_mutex); - if (dev->dev_mapping == NULL) - dev->dev_mapping = &inode->i_data; - /* ihold ensures nobody can remove inode with our i_data */ - ihold(container_of(dev->dev_mapping, struct inode, i_data)); - inode->i_mapping = dev->dev_mapping; - filp->f_mapping = dev->dev_mapping; - mutex_unlock(&dev->struct_mutex); + if (retcode) + goto err_undo; + atomic_inc(&dev->counts[_DRM_STAT_OPENS]); + if (need_setup) { + retcode = drm_setup(dev); + if (retcode) + goto err_undo; } + return 0; +err_undo: + mutex_lock(&dev->struct_mutex); + filp->f_mapping = old_mapping; + inode->i_mapping = old_mapping; + iput(container_of(dev->dev_mapping, struct inode, i_data)); + dev->dev_mapping = old_mapping; + mutex_unlock(&dev->struct_mutex); + dev->open_count--; return retcode; } EXPORT_SYMBOL(drm_open); -- 1.7.12.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/2] drm/radeon: allow pcie gen2 speed on NI
Hi Dave, A few weeks ago I sent the two patches that allow PCI Express interface to run at Gen 2 speed on NI parts. Links to the patches in the mailing list archive + review from Alex quoted below: http://lists.freedesktop.org/archives/dri-devel/2011-September/014474.html http://lists.freedesktop.org/archives/dri-devel/2011-September/014475.html I saw some activity on drm-next and drm-core-next branches, but I have not seen these two patches merge yet. Just wondering if they are in the queue for merging or if they may have fell through the cracks? thanks, Ilija On Tue, 20 Sep 2011, Alex Deucher wrote: On Tue, Sep 20, 2011 at 10:22 AM, Ilija Hadzic wrote: Enabling pcie gen2 speed was skipped for Northern Islands AISCs, although it looks like it works just fine with the same initialization sequence used for evergreen. According to Alex D. gen2 init was skipped to prevent a crash that has been caused by some other bug that has been fixed in the meantime; so now it should be safe to enable it. Signed-off-by: Ilija Hadzic I just double checked and BTC and cayman use the same programming method. Both patches: Reviewed-by: Alex Deucher Thanks! Alex --- Â drivers/gpu/drm/radeon/evergreen.c | Â Â 3 +-- Â 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index f09bace..208b59c 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -2987,8 +2987,7 @@ static int evergreen_startup(struct radeon_device *rdev) Â Â Â Â int r; Â Â Â Â /* enable pcie gen2 link */ - Â Â Â if (!ASIC_IS_DCE5(rdev)) - Â Â Â Â Â Â Â evergreen_pcie_gen2_enable(rdev); + Â Â Â evergreen_pcie_gen2_enable(rdev); Â Â Â Â if (ASIC_IS_DCE5(rdev)) { Â Â Â Â Â Â Â Â if (!rdev->me_fw || !rdev->pfp_fw || !rdev->rlc_fw || !rdev->mc_fw) { -- 1.7.6 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
drm/radeon/kms: improve performance of blit-copy
The following set of patches will improve the performance of blit-copy functions for Radeon GPUs based on R600, R700, Evergreen and NI ASICs. The foundation for improvement is the use of tiled mode access (which for copying bo's can be used regardless of whether the content is tiled or not), and segmenting the memory block being copied into rectangles whose edge ratio is between 1:1 and 1:2. This maximizes the number of PCIe transactions that use maximum payload size (typically 128 bytes) and also creates a memory access pattern that is more favorable for both VRAM and host DRAM than what's currently in the kernel. To come up with the new blit-copy code, I did a lot of PCIe traffic analysis with the bus analyzer and also had many discussions with Alex, trying to explain what's going on (thanks to Alex for his time). Below (at the end of this note) are the results of some benchmarks that I did with various GPUs (all in the same host: Intel i7 CPU, X58 chipset, three DRAM channels). To run the tests on your machine load the radeon module with 'benchmark=1 pcie_gen2=1' parameters. Most significant improvement is in the upstream (VRAM to GART) direction because that's where the PCIe transactions were fragmented and also where memory access pattern was such that it created a lot of backpressure from the host. It is also interesting that high-end devices (e.g. Cayman) exhibit the least improvement and were the worst to begin with. This is because high-end devices copy more tiles in parallel which in turn can create bank conflicts on host memory and cause the host to do lots of bank-close/precharge/bank-open cycles. As an added "bonus", I also did some code cleanup and consolidated the repeated code into common function, so r600 and evergreen/NI parts now share the blit-copy code. I also expanded on the benchmark coverage, so the module now takes benckmark parameter value between 1 and 8 and each results in running a different benchmark. For details, see the commit log messages and the code. I have been running with these patches for a few months (and I kept rebasing them to drm-core-next as the public git progressed) and I used them in a system setup that does *many* copying of this kind (and does them frequently); I have not seen instabilities introduced by these patches. I also verified the correctness of the copy using test=1 parameter for each GPU that I had and the test passed. I would welcome some feedback and if you run the benchmarks with the new blit code, I would very much like to hear what kind of improvement you are seeing. BENCHMARK RESULTS: == 1) VRAM to GTT == Card (ASIC) VRAMBefore After - 5570 (Redwood) DDR3 1600MHZ 4543912 6450 (Caicos) DDR5 3200MHz37185090 6570 (Turks)DDR3 1800MHz 4844144 5450 (Cedar)DDR3 1600MHz36795090 5450 (Cedar)DDR2 800MHz26954639 E4690 (RV730) DDR3 1400MHZ 4854969 E6760 (Turks) DDR5 3200MHz 4744177 V5700 (RV730) DDR3 MHz 4884297 2260 (RV620)DDR2 MHz 4943093 6870 (Barts)DDR5 4200MHz 4751113 6970 (Cayman) DDR5 4200MHz 473 710 2) GTT to VRAM == Card (ASIC) VRAMBefore After - 5570 (Redwood) DDR3 1600MHz31583360 6450 (Caicos) DDR5 3200MHz29953393 6570 (Turks)DDR3 1800MHz30393339 5450 (Cedar)DDR3 1600MHz32463404 5450 (Cedar)DDR2 800MHz26143371 E4690 (RV730) DDR3 1400MHz30843426 E6760 (Turks) DDR5 3200MHz24432570 V5700 (RV730) DDR3 MHz31873506 2260 (RV620)DDR2 MHz 5843246 6870 (Barts)DDR5 4200MHz24722601 6970 (Cayman) DDR5 4200MHz24602737 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/9] drm/radeon/kms: improve evergreen blit code
start with first-cut conceptual patch from Alex Deucher (commit info below); turn on 1D tiling make rectangular buffer always 2:1 or 1:2 ratio make buffer dimenstions an integer multiple of unit dimensions make sures that integral number of pages map to the buffer fix a few bugs that resulted in incorrect dimensions tidy up a little bit to get rid of an ugly if/else parametrize some "magic" constants add protections from illegal buffer sizes etc. >From 77e6703c37f0ad8673b9ab285589d5c26782a515 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Tue, 17 May 2011 05:08:58 -0400 Subject: [PATCH 1/2] drm/radeon/kms: simplify evergreen blit code Covert 4k pages to multiples of 64x64x4 tiles. This is also more efficient than a scanline based approach from the MC's perspective. Signed-off-by: Alex Deucher Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/evergreen.c |4 +- drivers/gpu/drm/radeon/evergreen_blit_kms.c | 295 +++ drivers/gpu/drm/radeon/radeon_asic.h|4 +- 3 files changed, 123 insertions(+), 180 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 5df39bf..5f0ecc7 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3180,14 +3180,14 @@ int evergreen_copy_blit(struct radeon_device *rdev, mutex_lock(&rdev->r600_blit.mutex); rdev->r600_blit.vb_ib = NULL; - r = evergreen_blit_prepare_copy(rdev, num_pages * RADEON_GPU_PAGE_SIZE); + r = evergreen_blit_prepare_copy(rdev, num_pages); if (r) { if (rdev->r600_blit.vb_ib) radeon_ib_free(rdev, &rdev->r600_blit.vb_ib); mutex_unlock(&rdev->r600_blit.mutex); return r; } - evergreen_kms_blit_copy(rdev, src_offset, dst_offset, num_pages * RADEON_GPU_PAGE_SIZE); + evergreen_kms_blit_copy(rdev, src_offset, dst_offset, num_pages); evergreen_blit_done_copy(rdev, fence); mutex_unlock(&rdev->r600_blit.mutex); return 0; diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c b/drivers/gpu/drm/radeon/evergreen_blit_kms.c index 2eb2518..3b24137 100644 --- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c +++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c @@ -44,6 +44,10 @@ #define COLOR_5_6_5 0x8 #define COLOR_8_8_8_8 0x1a +#define RECT_UNIT_H 32 +#define RECT_UNIT_W (RADEON_GPU_PAGE_SIZE / 4 / RECT_UNIT_H) +#define MAX_RECT_DIM 16384 + /* emits 17 */ static void set_render_target(struct radeon_device *rdev, int format, @@ -56,7 +60,7 @@ set_render_target(struct radeon_device *rdev, int format, if (h < 8) h = 8; - cb_color_info = ((format << 2) | (1 << 24) | (1 << 8)); + cb_color_info = ((format << 2) | (1 << 24) | (2 << 8)); pitch = (w / 8) - 1; slice = ((w * h) / 64) - 1; @@ -67,7 +71,7 @@ set_render_target(struct radeon_device *rdev, int format, radeon_ring_write(rdev, slice); radeon_ring_write(rdev, 0); radeon_ring_write(rdev, cb_color_info); - radeon_ring_write(rdev, (1 << 4)); + radeon_ring_write(rdev, 0); radeon_ring_write(rdev, (w - 1) | ((h - 1) << 16)); radeon_ring_write(rdev, 0); radeon_ring_write(rdev, 0); @@ -179,7 +183,7 @@ set_tex_resource(struct radeon_device *rdev, sq_tex_resource_word0 = (1 << 0); /* 2D */ sq_tex_resource_word0 |= pitch >> 3) - 1) << 6) | ((w - 1) << 18)); - sq_tex_resource_word1 = ((h - 1) << 0) | (1 << 28); + sq_tex_resource_word1 = ((h - 1) << 0) | (2 << 28); /* xyzw swizzles */ sq_tex_resource_word4 = (0 << 16) | (1 << 19) | (2 << 22) | (3 << 25); @@ -751,30 +755,80 @@ static void evergreen_vb_ib_put(struct radeon_device *rdev) radeon_ib_free(rdev, &rdev->r600_blit.vb_ib); } -int evergreen_blit_prepare_copy(struct radeon_device *rdev, int size_bytes) + +/* maps the rectangle to the buffer so that satisfies the following properties: + * - dimensions are less or equal to the hardware limit (MAX_RECT_DIM) + * - rectangle consists of integer number of pages + * - height is an integer multiple of RECT_UNIT_H + * - width is an integer multiple of RECT_UNIT_W + * - (the above three conditions also guarantee tile-aligned size) + * - it is as square as possible (sides ratio never greater than 2:1) + * - uses maximum number of pages that fit the above constraints + * + * input: buffer size, pointers to width/height variables + * return: number of pages that were successfully mapped to the rectangle + * width/height of the rectangle + */ +static unsigned ever
[PATCH 2/9] drm/radeon/kms: improve r6xx blit code
start with first-cut conceptual patch from Alex Deucher (commit info below); turn on 1D tiling make rectangular buffer always 2:1 or 1:2 ratio make buffer dimenstions an integer multiple of unit dimensionsmake sures that integral number of pages map to the buffer fix a few bugs that resulted in incorrect dimensions tidy up a little bit to get rid of an ugly if/else parametrize some "magic" constants add protections from illegal buffer sizes etc. >From 2cd7a267d6cbcdf414b7a724237aa24525c12b54 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Tue, 17 May 2011 05:09:43 -0400 Subject: [PATCH 2/2] drm/radeon/kms: simplify r6xx blit code Covert 4k pages to multiples of 64x64x4 tiles. This is also more efficient than a scanline based approach from the MC's perspective. Signed-off-by: Alex Deucher Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/r600.c |4 +- drivers/gpu/drm/radeon/r600_blit_kms.c | 276 drivers/gpu/drm/radeon/radeon_asic.h |4 +- 3 files changed, 109 insertions(+), 175 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 334aee6..9fc6844 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2363,14 +2363,14 @@ int r600_copy_blit(struct radeon_device *rdev, mutex_lock(&rdev->r600_blit.mutex); rdev->r600_blit.vb_ib = NULL; - r = r600_blit_prepare_copy(rdev, num_pages * RADEON_GPU_PAGE_SIZE); + r = r600_blit_prepare_copy(rdev, num_pages); if (r) { if (rdev->r600_blit.vb_ib) radeon_ib_free(rdev, &rdev->r600_blit.vb_ib); mutex_unlock(&rdev->r600_blit.mutex); return r; } - r600_kms_blit_copy(rdev, src_offset, dst_offset, num_pages * RADEON_GPU_PAGE_SIZE); + r600_kms_blit_copy(rdev, src_offset, dst_offset, num_pages); r600_blit_done_copy(rdev, fence); mutex_unlock(&rdev->r600_blit.mutex); return 0; diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index 9aa74c3..d9994c9 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -42,6 +42,10 @@ #define COLOR_5_6_5 0x8 #define COLOR_8_8_8_8 0x1a +#define RECT_UNIT_H 32 +#define RECT_UNIT_W (RADEON_GPU_PAGE_SIZE / 4 / RECT_UNIT_H) +#define MAX_RECT_DIM 8192 + /* emits 21 on rv770+, 23 on r600 */ static void set_render_target(struct radeon_device *rdev, int format, @@ -600,13 +604,59 @@ static void r600_vb_ib_put(struct radeon_device *rdev) radeon_ib_free(rdev, &rdev->r600_blit.vb_ib); } -int r600_blit_prepare_copy(struct radeon_device *rdev, int size_bytes) +/* FIXME: the function is very similar to evergreen_blit_create_rect, except + that it different predefined constants; consider commonizing */ +static unsigned r600_blit_create_rect(unsigned num_pages, int *width, int *height) +{ + unsigned max_pages; + unsigned pages = num_pages; + int w, h; + + if (num_pages == 0) { + /* not supposed to be called with no pages, but just in case */ + h = 0; + w = 0; + pages = 0; + WARN_ON(1); + } else { + int rect_order = 2; + h = RECT_UNIT_H; + while (num_pages / rect_order) { + h *= 2; + rect_order *= 4; + if (h >= MAX_RECT_DIM) { + h = MAX_RECT_DIM; + break; + } + } + max_pages = (MAX_RECT_DIM * h) / (RECT_UNIT_W * RECT_UNIT_H); + if (pages > max_pages) + pages = max_pages; + w = (pages * RECT_UNIT_W * RECT_UNIT_H) / h; + w = (w / RECT_UNIT_W) * RECT_UNIT_W; + pages = (w * h) / (RECT_UNIT_W * RECT_UNIT_H); + BUG_ON(pages == 0); + } + + + DRM_DEBUG("blit_rectangle: h=%d, w=%d, pages=%d\n", h, w, pages); + + /* return width and height only of the caller wants it */ + if (height) + *height = h; + if (width) + *width = w; + + return pages; +} + + +int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_pages) { int r; - int ring_size, line_size; - int max_size; + int ring_size; /* loops of emits 64 + fence emit possible */ - int dwords_per_loop = 76, num_loops; + int dwords_per_loop = 76, num_loops = 0; r = r600_vb_ib_get(rdev); if (r) @@ -616,18 +666,12 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, int size_bytes) if (rdev->family > CHIP_R600 && rdev->family < CHIP_RV770)
[PATCH 3/9] drm/radeon/kms: demystify evergreen blit code
some bits in 3D registers used by blit functions look like magic and this is hard to follow; change them to a little bit more meaningful pre-defined constants Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/evergreen_blit_kms.c | 29 +-- drivers/gpu/drm/radeon/evergreend.h | 42 +++ 2 files changed, 62 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c b/drivers/gpu/drm/radeon/evergreen_blit_kms.c index 3b24137..68d0de2 100644 --- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c +++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c @@ -60,7 +60,9 @@ set_render_target(struct radeon_device *rdev, int format, if (h < 8) h = 8; - cb_color_info = ((format << 2) | (1 << 24) | (2 << 8)); + cb_color_info = CB_FORMAT(format) | + CB_SOURCE_FORMAT(CB_SF_EXPORT_NORM) | + CB_ARRAY_MODE(ARRAY_1D_TILED_THIN1); pitch = (w / 8) - 1; slice = ((w * h) / 64) - 1; @@ -137,12 +139,16 @@ set_vtx_resource(struct radeon_device *rdev, u64 gpu_addr) u32 sq_vtx_constant_word2, sq_vtx_constant_word3; /* high addr, stride */ - sq_vtx_constant_word2 = ((upper_32_bits(gpu_addr) & 0xff) | (16 << 8)); + sq_vtx_constant_word2 = SQ_VTXC_BASE_ADDR_HI(upper_32_bits(gpu_addr) & 0xff) | + SQ_VTXC_STRIDE(16); #ifdef __BIG_ENDIAN - sq_vtx_constant_word2 |= (2 << 30); + sq_vtx_constant_word2 |= SQ_VTXC_ENDIAN_SWAP(SQ_ENDIAN_8IN32); #endif /* xyzw swizzles */ - sq_vtx_constant_word3 = (0 << 3) | (1 << 6) | (2 << 9) | (3 << 12); + sq_vtx_constant_word3 = SQ_VTCX_SEL_X(SQ_SEL_X) | + SQ_VTCX_SEL_Y(SQ_SEL_Y) | + SQ_VTCX_SEL_Z(SQ_SEL_Z) | + SQ_VTCX_SEL_W(SQ_SEL_W); radeon_ring_write(rdev, PACKET3(PACKET3_SET_RESOURCE, 8)); radeon_ring_write(rdev, 0x580); @@ -153,7 +159,7 @@ set_vtx_resource(struct radeon_device *rdev, u64 gpu_addr) radeon_ring_write(rdev, 0); radeon_ring_write(rdev, 0); radeon_ring_write(rdev, 0); - radeon_ring_write(rdev, SQ_TEX_VTX_VALID_BUFFER << 30); + radeon_ring_write(rdev, S__SQ_CONSTANT_TYPE(SQ_TEX_VTX_VALID_BUFFER)); if ((rdev->family == CHIP_CEDAR) || (rdev->family == CHIP_PALM) || @@ -180,14 +186,19 @@ set_tex_resource(struct radeon_device *rdev, if (h < 1) h = 1; - sq_tex_resource_word0 = (1 << 0); /* 2D */ + sq_tex_resource_word0 = TEX_DIM(SQ_TEX_DIM_2D); sq_tex_resource_word0 |= pitch >> 3) - 1) << 6) | ((w - 1) << 18)); - sq_tex_resource_word1 = ((h - 1) << 0) | (2 << 28); + sq_tex_resource_word1 = ((h - 1) << 0) | + TEX_ARRAY_MODE(ARRAY_1D_TILED_THIN1); /* xyzw swizzles */ - sq_tex_resource_word4 = (0 << 16) | (1 << 19) | (2 << 22) | (3 << 25); + sq_tex_resource_word4 = TEX_DST_SEL_X(SQ_SEL_X) | + TEX_DST_SEL_Y(SQ_SEL_Y) | + TEX_DST_SEL_Z(SQ_SEL_Z) | + TEX_DST_SEL_W(SQ_SEL_W); - sq_tex_resource_word7 = format | (SQ_TEX_VTX_VALID_TEXTURE << 30); + sq_tex_resource_word7 = format | + S__SQ_CONSTANT_TYPE(SQ_TEX_VTX_VALID_TEXTURE); radeon_ring_write(rdev, PACKET3(PACKET3_SET_RESOURCE, 8)); radeon_ring_write(rdev, 0); diff --git a/drivers/gpu/drm/radeon/evergreend.h b/drivers/gpu/drm/radeon/evergreend.h index 7363d9d..b937c49 100644 --- a/drivers/gpu/drm/radeon/evergreend.h +++ b/drivers/gpu/drm/radeon/evergreend.h @@ -941,11 +941,15 @@ #defineCB_COLOR0_SLICE 0x28c68 #defineCB_COLOR0_VIEW 0x28c6c #defineCB_COLOR0_INFO 0x28c70 +# define CB_FORMAT(x) ((x) << 2) # define CB_ARRAY_MODE(x) ((x) << 8) # define ARRAY_LINEAR_GENERAL 0 # define ARRAY_LINEAR_ALIGNED 1 # define ARRAY_1D_TILED_THIN1 2 # define ARRAY_2D_TILED_THIN1 4 +# define CB_SOURCE_FORMAT(x) ((x) << 24) +# define CB_SF_EXPORT_FULL0 +# define CB_SF_EXPORT_NORM1 #defineCB_COLOR0_ATTRIB0x28c74 #defineCB_COLOR0_DIM 0x28c78 /* only CB0-7 blocks have these regs */ @@ -1107,15 +,53 @@ #defineCB_COLOR7_CLEAR_WORD3 0x28e3c #define SQ_TEX_RESOURCE_WORD0_0
[PATCH 4/9] drm/radeon/kms: demystify r600 blit code
some 3d register bits look like magic in r600 blit functions use predefined constants to make it more intuitive what they are Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/r600_blit_kms.c | 30 +- drivers/gpu/drm/radeon/r600d.h | 22 ++ 2 files changed, 39 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index d9994c9..71fec92 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -58,7 +58,9 @@ set_render_target(struct radeon_device *rdev, int format, if (h < 8) h = 8; - cb_color_info = ((format << 2) | (1 << 27) | (1 << 8)); + cb_color_info = CB_FORMAT(format) | + CB_SOURCE_FORMAT(CB_SF_EXPORT_NORM) | + CB_ARRAY_MODE(ARRAY_1D_TILED_THIN1); pitch = (w / 8) - 1; slice = ((w * h) / 64) - 1; @@ -168,9 +170,10 @@ set_vtx_resource(struct radeon_device *rdev, u64 gpu_addr) { u32 sq_vtx_constant_word2; - sq_vtx_constant_word2 = ((upper_32_bits(gpu_addr) & 0xff) | (16 << 8)); + sq_vtx_constant_word2 = SQ_VTXC_BASE_ADDR_HI(upper_32_bits(gpu_addr) & 0xff) | + SQ_VTXC_STRIDE(16); #ifdef __BIG_ENDIAN - sq_vtx_constant_word2 |= (2 << 30); + sq_vtx_constant_word2 |= SQ_VTXC_ENDIAN_SWAP(SQ_ENDIAN_8IN32); #endif radeon_ring_write(rdev, PACKET3(PACKET3_SET_RESOURCE, 7)); @@ -206,18 +209,19 @@ set_tex_resource(struct radeon_device *rdev, if (h < 1) h = 1; - sq_tex_resource_word0 = (1 << 0) | (1 << 3); - sq_tex_resource_word0 |= pitch >> 3) - 1) << 8) | - ((w - 1) << 19)); + sq_tex_resource_word0 = S_038000_DIM(V_038000_SQ_TEX_DIM_2D) | + S_038000_TILE_MODE(V_038000_ARRAY_1D_TILED_THIN1); + sq_tex_resource_word0 |= S_038000_PITCH((pitch >> 3) - 1) | + S_038000_TEX_WIDTH(w - 1); - sq_tex_resource_word1 = (format << 26); - sq_tex_resource_word1 |= ((h - 1) << 0); + sq_tex_resource_word1 = S_038004_DATA_FORMAT(format); + sq_tex_resource_word1 |= S_038004_TEX_HEIGHT(h - 1); - sq_tex_resource_word4 = ((1 << 14) | -(0 << 16) | -(1 << 19) | -(2 << 22) | -(3 << 25)); + sq_tex_resource_word4 = S_038010_REQUEST_SIZE(1) | + S_038010_DST_SEL_X(SQ_SEL_X) | + S_038010_DST_SEL_Y(SQ_SEL_Y) | + S_038010_DST_SEL_Z(SQ_SEL_Z) | + S_038010_DST_SEL_W(SQ_SEL_W); radeon_ring_write(rdev, PACKET3(PACKET3_SET_RESOURCE, 7)); radeon_ring_write(rdev, 0); diff --git a/drivers/gpu/drm/radeon/r600d.h b/drivers/gpu/drm/radeon/r600d.h index 0245ae6..bfe1b5d 100644 --- a/drivers/gpu/drm/radeon/r600d.h +++ b/drivers/gpu/drm/radeon/r600d.h @@ -79,6 +79,11 @@ #define CB_COLOR0_SIZE 0x28060 #define CB_COLOR0_VIEW 0x28080 #define CB_COLOR0_INFO 0x280a0 +# define CB_FORMAT(x) ((x) << 2) +# define CB_ARRAY_MODE(x) ((x) << 8) +# define CB_SOURCE_FORMAT(x) ((x) << 27) +# define CB_SF_EXPORT_FULL0 +# define CB_SF_EXPORT_NORM1 #define CB_COLOR0_TILE 0x280c0 #define CB_COLOR0_FRAG 0x280e0 #define CB_COLOR0_MASK 0x28100 @@ -417,6 +422,17 @@ #defineSQ_PGM_START_VS 0x28858 #define SQ_PGM_RESOURCES_VS 0x28868 #define SQ_PGM_CF_OFFSET_VS 0x288d0 + +#define SQ_VTX_CONSTANT_WORD0_00x3 +#define SQ_VTX_CONSTANT_WORD1_00x30004 +#define SQ_VTX_CONSTANT_WORD2_00x30008 +# define SQ_VTXC_BASE_ADDR_HI(x) ((x) << 0) +# define SQ_VTXC_STRIDE(x)((x) << 8) +# define SQ_VTXC_ENDIAN_SWAP(x) ((x) << 30) +# define SQ_ENDIAN_NONE 0 +# define SQ_ENDIAN_8IN16 1 +# define SQ_ENDIAN_8IN32 2 +#define SQ_VTX_CONSTANT_WORD3_00x3000c #defineSQ_VTX_CONSTANT_WORD6_0 0x38018 #defineS__SQ_VTX_CONSTANT_TYPE(x) (((x) & 3) << 30) #define
[PATCH 5/9] drm/radeon/kms: cleanup benchmark code
factor out repeated code into functions fix units in which the throughput is reported (megabytes per second and megabits per second make sense, others are kind of confusing) make report more amenable to awk and friends (e.g. whitespace is always the separator, unit is separated from the number, etc) add #defines for some hard coded constants besides "beautification" this reorg is done in preparation for writing more elaborate benchmarks Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon_benchmark.c | 156 - 1 files changed, 86 insertions(+), 70 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_benchmark.c b/drivers/gpu/drm/radeon/radeon_benchmark.c index 10191d9..6951426 100644 --- a/drivers/gpu/drm/radeon/radeon_benchmark.c +++ b/drivers/gpu/drm/radeon/radeon_benchmark.c @@ -26,21 +26,80 @@ #include "radeon_reg.h" #include "radeon.h" -void radeon_benchmark_move(struct radeon_device *rdev, unsigned bsize, - unsigned sdomain, unsigned ddomain) +#define RADEON_BENCHMARK_COPY_BLIT 1 +#define RADEON_BENCHMARK_COPY_DMA 0 + +#define RADEON_BENCHMARK_ITERATIONS 1024 + +static int radeon_benchmark_do_move(struct radeon_device *rdev, unsigned size, + uint64_t saddr, uint64_t daddr, + int flag, int n) +{ + unsigned long start_jiffies; + unsigned long end_jiffies; + struct radeon_fence *fence = NULL; + int i, r; + + start_jiffies = jiffies; + for (i = 0; i < n; i++) { + r = radeon_fence_create(rdev, &fence); + if (r) + return r; + + switch (flag) { + case RADEON_BENCHMARK_COPY_DMA: + r = radeon_copy_dma(rdev, saddr, daddr, + size / RADEON_GPU_PAGE_SIZE, + fence); + break; + case RADEON_BENCHMARK_COPY_BLIT: + r = radeon_copy_blit(rdev, saddr, daddr, +size / RADEON_GPU_PAGE_SIZE, +fence); + break; + default: + DRM_ERROR("Unknown copy method\n"); + r = -EINVAL; + } + if (r) + goto exit_do_move; + r = radeon_fence_wait(fence, false); + if (r) + goto exit_do_move; + radeon_fence_unref(&fence); + } + end_jiffies = jiffies; + r = jiffies_to_msecs(end_jiffies - start_jiffies); + +exit_do_move: + if (fence) + radeon_fence_unref(&fence); + return r; +} + + +static void radeon_benchmark_log_results(int n, unsigned size, +unsigned int time, +unsigned sdomain, unsigned ddomain, +char *kind) +{ + unsigned int throughput = (n * (size >> 10)) / time; + DRM_INFO("radeon: %s %u bo moves of %u kB from" +" %d to %d in %u ms, throughput: %u Mb/s or %u MB/s\n", +kind, n, size >> 10, sdomain, ddomain, time, +throughput * 8, throughput); +} + +static void radeon_benchmark_move(struct radeon_device *rdev, unsigned size, + unsigned sdomain, unsigned ddomain) { struct radeon_bo *dobj = NULL; struct radeon_bo *sobj = NULL; - struct radeon_fence *fence = NULL; uint64_t saddr, daddr; - unsigned long start_jiffies; - unsigned long end_jiffies; - unsigned long time; - unsigned i, n, size; - int r; + int r, n; + unsigned int time; - size = bsize; - n = 1024; + n = RADEON_BENCHMARK_ITERATIONS; r = radeon_bo_create(rdev, size, PAGE_SIZE, true, sdomain, &sobj); if (r) { goto out_cleanup; @@ -68,64 +127,23 @@ void radeon_benchmark_move(struct radeon_device *rdev, unsigned bsize, /* r100 doesn't have dma engine so skip the test */ if (rdev->asic->copy_dma) { - - start_jiffies = jiffies; - for (i = 0; i < n; i++) { - r = radeon_fence_create(rdev, &fence); - if (r) { - goto out_cleanup; - } - - r = radeon_copy_dma(rdev, saddr, daddr, - size / RADEON_GPU_PAGE_SIZE, fence); - - if (r) { - goto out_cleanup; - } - r = radeo
[PATCH 6/9] drm/radeon/kms: add more elaborate benchmarks
Lots of new (and hopefully useful) benchmark. Load the driver with radeon_benchmark= and enjoy. Among tests added are VRAM to VRAM blits and blits with buffer size sweeps. The latter can be from GTT to VRAM, VRAM to GTT, and VRAM to VRAM and there are two types of sweeps: powers of two and (probably more interesting) buffers sizes that correspond to common modes. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon.h |2 +- drivers/gpu/drm/radeon/radeon_benchmark.c | 91 +++-- drivers/gpu/drm/radeon/radeon_device.c|2 +- 3 files changed, 87 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index ff5424e..5361dd7 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -868,7 +868,7 @@ struct radeon_pm { /* * Benchmarking */ -void radeon_benchmark(struct radeon_device *rdev); +void radeon_benchmark(struct radeon_device *rdev, int test_number); /* diff --git a/drivers/gpu/drm/radeon/radeon_benchmark.c b/drivers/gpu/drm/radeon/radeon_benchmark.c index 6951426..5cafc90 100644 --- a/drivers/gpu/drm/radeon/radeon_benchmark.c +++ b/drivers/gpu/drm/radeon/radeon_benchmark.c @@ -30,6 +30,7 @@ #define RADEON_BENCHMARK_COPY_DMA 0 #define RADEON_BENCHMARK_ITERATIONS 1024 +#define RADEON_BENCHMARK_COMMON_MODES_N 17 static int radeon_benchmark_do_move(struct radeon_device *rdev, unsigned size, uint64_t saddr, uint64_t daddr, @@ -126,7 +127,9 @@ static void radeon_benchmark_move(struct radeon_device *rdev, unsigned size, } /* r100 doesn't have dma engine so skip the test */ - if (rdev->asic->copy_dma) { + /* also, VRAM-to-VRAM test doesn't make much sense for DMA */ + /* skip it as well if domains are the same */ + if ((rdev->asic->copy_dma) && (sdomain != ddomain)) { time = radeon_benchmark_do_move(rdev, size, saddr, daddr, RADEON_BENCHMARK_COPY_DMA, n); if (time < 0) @@ -167,10 +170,86 @@ out_cleanup: } } -void radeon_benchmark(struct radeon_device *rdev) +void radeon_benchmark(struct radeon_device *rdev, int test_number) { - radeon_benchmark_move(rdev, 1024*1024, RADEON_GEM_DOMAIN_GTT, - RADEON_GEM_DOMAIN_VRAM); - radeon_benchmark_move(rdev, 1024*1024, RADEON_GEM_DOMAIN_VRAM, - RADEON_GEM_DOMAIN_GTT); + int i; + int common_modes[RADEON_BENCHMARK_COMMON_MODES_N] = { + 640 * 480 * 4, + 720 * 480 * 4, + 800 * 600 * 4, + 848 * 480 * 4, + 1024 * 768 * 4, + 1152 * 768 * 4, + 1280 * 720 * 4, + 1280 * 800 * 4, + 1280 * 854 * 4, + 1280 * 960 * 4, + 1280 * 1024 * 4, + 1440 * 900 * 4, + 1400 * 1050 * 4, + 1680 * 1050 * 4, + 1600 * 1200 * 4, + 1920 * 1080 * 4, + 1920 * 1200 * 4 + }; + + switch (test_number) { + case 1: + /* simple test, VRAM to GTT and GTT to VRAM */ + radeon_benchmark_move(rdev, 1024*1024, RADEON_GEM_DOMAIN_GTT, + RADEON_GEM_DOMAIN_VRAM); + radeon_benchmark_move(rdev, 1024*1024, RADEON_GEM_DOMAIN_VRAM, + RADEON_GEM_DOMAIN_GTT); + break; + case 2: + /* simple test, VRAM to VRAM */ + radeon_benchmark_move(rdev, 1024*1024, RADEON_GEM_DOMAIN_VRAM, + RADEON_GEM_DOMAIN_VRAM); + break; + case 3: + /* GTT to VRAM, buffer size sweep, powers of 2 */ + for (i = 1; i <= 65536; i <<= 1) + radeon_benchmark_move(rdev, i*1024, + RADEON_GEM_DOMAIN_GTT, + RADEON_GEM_DOMAIN_VRAM); + break; + case 4: + /* VRAM to GTT, buffer size sweep, powers of 2 */ + for (i = 1; i <= 65536; i <<= 1) + radeon_benchmark_move(rdev, i*1024, + RADEON_GEM_DOMAIN_VRAM, + RADEON_GEM_DOMAIN_GTT); + break; + case 5: + /* VRAM to VRAM, buffer size sweep, powers of 2 */ + for (i = 1; i <= 65536; i <<= 1) + radeon_benchmark_move(rdev, i*1024, + RADEON_GEM_DOMAIN_VRAM, + RADEON_GEM_DOMAIN_VRAM); + break; + case 6: + /* GTT to VRAM, buff
[PATCH 7/9] drm/radeon/kms: cleanup r600 blit code
reorganize the code such that only the primitives (i.e., the functions that load the CP ring) are hardware specific; dynamically link the primitives in a (new) pointer structure inside r600_blit at blit initialization time so that the functions that control the blit operations can be made common for r600 and evergreen parts Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/r600_blit_kms.c | 94 +--- drivers/gpu/drm/radeon/radeon.h| 21 +++ 2 files changed, 70 insertions(+), 45 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index 71fec92..07e3df4 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -44,7 +44,6 @@ #define RECT_UNIT_H 32 #define RECT_UNIT_W (RADEON_GPU_PAGE_SIZE / 4 / RECT_UNIT_H) -#define MAX_RECT_DIM 8192 /* emits 21 on rv770+, 23 on r600 */ static void @@ -491,6 +490,27 @@ int r600_blit_init(struct radeon_device *rdev) u32 packet2s[16]; int num_packet2s = 0; + rdev->r600_blit.primitives.set_render_target = set_render_target; + rdev->r600_blit.primitives.cp_set_surface_sync = cp_set_surface_sync; + rdev->r600_blit.primitives.set_shaders = set_shaders; + rdev->r600_blit.primitives.set_vtx_resource = set_vtx_resource; + rdev->r600_blit.primitives.set_tex_resource = set_tex_resource; + rdev->r600_blit.primitives.set_scissors = set_scissors; + rdev->r600_blit.primitives.draw_auto = draw_auto; + rdev->r600_blit.primitives.set_default_state = set_default_state; + + rdev->r600_blit.ring_size_common = 40; /* shaders + def state */ + rdev->r600_blit.ring_size_common += 10; /* fence emit for VB IB */ + rdev->r600_blit.ring_size_common += 5; /* done copy */ + rdev->r600_blit.ring_size_common += 10; /* fence emit for done copy */ + + rdev->r600_blit.ring_size_per_loop = 76; + /* set_render_target emits 2 extra dwords on rv6xx */ + if (rdev->family > CHIP_R600 && rdev->family < CHIP_RV770) + rdev->r600_blit.ring_size_per_loop += 2; + + rdev->r600_blit.max_dim = 8192; + /* pin copy shader into vram if already initialized */ if (rdev->r600_blit.shader_obj) goto done; @@ -608,9 +628,8 @@ static void r600_vb_ib_put(struct radeon_device *rdev) radeon_ib_free(rdev, &rdev->r600_blit.vb_ib); } -/* FIXME: the function is very similar to evergreen_blit_create_rect, except - that it different predefined constants; consider commonizing */ -static unsigned r600_blit_create_rect(unsigned num_pages, int *width, int *height) +static unsigned r600_blit_create_rect(unsigned num_pages, + int *width, int *height, int max_dim) { unsigned max_pages; unsigned pages = num_pages; @@ -628,12 +647,12 @@ static unsigned r600_blit_create_rect(unsigned num_pages, int *width, int *heigh while (num_pages / rect_order) { h *= 2; rect_order *= 4; - if (h >= MAX_RECT_DIM) { - h = MAX_RECT_DIM; + if (h >= max_dim) { + h = max_dim; break; } } - max_pages = (MAX_RECT_DIM * h) / (RECT_UNIT_W * RECT_UNIT_H); + max_pages = (max_dim * h) / (RECT_UNIT_W * RECT_UNIT_H); if (pages > max_pages) pages = max_pages; w = (pages * RECT_UNIT_W * RECT_UNIT_H) / h; @@ -659,36 +678,29 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_pages) { int r; int ring_size; - /* loops of emits 64 + fence emit possible */ - int dwords_per_loop = 76, num_loops = 0; + int num_loops = 0; + int dwords_per_loop = rdev->r600_blit.ring_size_per_loop; r = r600_vb_ib_get(rdev); if (r) return r; - /* set_render_target emits 2 extra dwords on rv6xx */ - if (rdev->family > CHIP_R600 && rdev->family < CHIP_RV770) - dwords_per_loop += 2; - /* num loops */ while (num_pages) { - num_pages -= r600_blit_create_rect(num_pages, NULL, NULL); + num_pages -= r600_blit_create_rect(num_pages, NULL, NULL, + rdev->r600_blit.max_dim); num_loops++; } /* calculate number of loops correctly */ ring_size = num_loops * dwords_per_loop; - /* set default + shaders */ - ring_size += 40; /* shaders + def state */ - ring_size += 10; /* fence emit for VB IB */ - rin
[PATCH 8/9] drm/radeon/kms: blit code commoning
factor out most of evergreen blit code and use the refactored code from r600 that is now common for both r600 and evergreen Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/evergreen.c | 25 +--- drivers/gpu/drm/radeon/evergreen_blit_kms.c | 260 ++- drivers/gpu/drm/radeon/ni.c |4 +- drivers/gpu/drm/radeon/radeon_asic.c| 16 +- drivers/gpu/drm/radeon/radeon_asic.h| 10 - 5 files changed, 30 insertions(+), 285 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 5f0ecc7..69dded2 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3087,7 +3087,7 @@ static int evergreen_startup(struct radeon_device *rdev) r = evergreen_blit_init(rdev); if (r) { - evergreen_blit_fini(rdev); + r600_blit_fini(rdev); rdev->asic->copy = NULL; dev_warn(rdev->dev, "failed blitter (%d) falling back to memcpy\n", r); } @@ -3172,27 +3172,6 @@ int evergreen_suspend(struct radeon_device *rdev) return 0; } -int evergreen_copy_blit(struct radeon_device *rdev, - uint64_t src_offset, uint64_t dst_offset, - unsigned num_pages, struct radeon_fence *fence) -{ - int r; - - mutex_lock(&rdev->r600_blit.mutex); - rdev->r600_blit.vb_ib = NULL; - r = evergreen_blit_prepare_copy(rdev, num_pages); - if (r) { - if (rdev->r600_blit.vb_ib) - radeon_ib_free(rdev, &rdev->r600_blit.vb_ib); - mutex_unlock(&rdev->r600_blit.mutex); - return r; - } - evergreen_kms_blit_copy(rdev, src_offset, dst_offset, num_pages); - evergreen_blit_done_copy(rdev, fence); - mutex_unlock(&rdev->r600_blit.mutex); - return 0; -} - /* Plan is to move initialization in that function and use * helper function so that radeon_device_init pretty much * do nothing more than calling asic specific function. This @@ -3301,7 +3280,7 @@ int evergreen_init(struct radeon_device *rdev) void evergreen_fini(struct radeon_device *rdev) { - evergreen_blit_fini(rdev); + r600_blit_fini(rdev); r700_cp_fini(rdev); r600_irq_fini(rdev); radeon_wb_fini(rdev); diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c b/drivers/gpu/drm/radeon/evergreen_blit_kms.c index 68d0de2..dcf11bb 100644 --- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c +++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c @@ -44,10 +44,6 @@ #define COLOR_5_6_5 0x8 #define COLOR_8_8_8_8 0x1a -#define RECT_UNIT_H 32 -#define RECT_UNIT_W (RADEON_GPU_PAGE_SIZE / 4 / RECT_UNIT_H) -#define MAX_RECT_DIM 16384 - /* emits 17 */ static void set_render_target(struct radeon_device *rdev, int format, @@ -599,31 +595,6 @@ set_default_state(struct radeon_device *rdev) } -static inline uint32_t i2f(uint32_t input) -{ - u32 result, i, exponent, fraction; - - if ((input & 0x3fff) == 0) - result = 0; /* 0 is a special case */ - else { - exponent = 140; /* exponent biased by 127; */ - fraction = (input & 0x3fff) << 10; /* cheat and only - handle numbers below 2^^15 */ - for (i = 0; i < 14; i++) { - if (fraction & 0x80) - break; - else { - fraction = fraction << 1; /* keep -shifting left until top bit = 1 */ - exponent = exponent - 1; - } - } - result = exponent << 23 | (fraction & 0x7f); /* mask - off top bit; assumed 1 */ - } - return result; -} - int evergreen_blit_init(struct radeon_device *rdev) { u32 obj_size; @@ -632,6 +603,24 @@ int evergreen_blit_init(struct radeon_device *rdev) u32 packet2s[16]; int num_packet2s = 0; + rdev->r600_blit.primitives.set_render_target = set_render_target; + rdev->r600_blit.primitives.cp_set_surface_sync = cp_set_surface_sync; + rdev->r600_blit.primitives.set_shaders = set_shaders; + rdev->r600_blit.primitives.set_vtx_resource = set_vtx_resource; + rdev->r600_blit.primitives.set_tex_resource = set_tex_resource; + rdev->r600_blit.primitives.set_scissors = set_scissors; + rdev->r600_blit.primitives.draw_auto = draw_auto; + rdev->r600_blit.primitives.set_default_state = set_default_state; + + rdev->r600_blit.ring_size_common =
[PATCH 9/9] drm/radeon/kms: rename a variable for consistency
blit copy functions deal with GPU pages, not CPU pages, so rename the variables and parameters accordingly Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/r600_blit_kms.c | 27 ++- drivers/gpu/drm/radeon/radeon_asic.h |4 ++-- 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index 07e3df4..a9e1fde 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -628,14 +628,14 @@ static void r600_vb_ib_put(struct radeon_device *rdev) radeon_ib_free(rdev, &rdev->r600_blit.vb_ib); } -static unsigned r600_blit_create_rect(unsigned num_pages, +static unsigned r600_blit_create_rect(unsigned num_gpu_pages, int *width, int *height, int max_dim) { unsigned max_pages; - unsigned pages = num_pages; + unsigned pages = num_gpu_pages; int w, h; - if (num_pages == 0) { + if (num_gpu_pages == 0) { /* not supposed to be called with no pages, but just in case */ h = 0; w = 0; @@ -644,7 +644,7 @@ static unsigned r600_blit_create_rect(unsigned num_pages, } else { int rect_order = 2; h = RECT_UNIT_H; - while (num_pages / rect_order) { + while (num_gpu_pages / rect_order) { h *= 2; rect_order *= 4; if (h >= max_dim) { @@ -674,7 +674,7 @@ static unsigned r600_blit_create_rect(unsigned num_pages, } -int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_pages) +int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages) { int r; int ring_size; @@ -686,9 +686,10 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_pages) return r; /* num loops */ - while (num_pages) { - num_pages -= r600_blit_create_rect(num_pages, NULL, NULL, - rdev->r600_blit.max_dim); + while (num_gpu_pages) { + num_gpu_pages -= + r600_blit_create_rect(num_gpu_pages, NULL, NULL, + rdev->r600_blit.max_dim); num_loops++; } @@ -719,21 +720,21 @@ void r600_blit_done_copy(struct radeon_device *rdev, struct radeon_fence *fence) void r600_kms_blit_copy(struct radeon_device *rdev, u64 src_gpu_addr, u64 dst_gpu_addr, - unsigned num_pages) + unsigned num_gpu_pages) { u64 vb_gpu_addr; u32 *vb; DRM_DEBUG("emitting copy %16llx %16llx %d %d\n", src_gpu_addr, dst_gpu_addr, - num_pages, rdev->r600_blit.vb_used); + num_gpu_pages, rdev->r600_blit.vb_used); vb = (u32 *)(rdev->r600_blit.vb_ib->ptr + rdev->r600_blit.vb_used); - while (num_pages) { + while (num_gpu_pages) { int w, h; unsigned size_in_bytes; unsigned pages_per_loop = - r600_blit_create_rect(num_pages, &w, &h, + r600_blit_create_rect(num_gpu_pages, &w, &h, rdev->r600_blit.max_dim); size_in_bytes = pages_per_loop * RADEON_GPU_PAGE_SIZE; @@ -777,6 +778,6 @@ void r600_kms_blit_copy(struct radeon_device *rdev, rdev->r600_blit.vb_used += 4*12; src_gpu_addr += size_in_bytes; dst_gpu_addr += size_in_bytes; - num_pages -= pages_per_loop; + num_gpu_pages -= pages_per_loop; } } diff --git a/drivers/gpu/drm/radeon/radeon_asic.h b/drivers/gpu/drm/radeon/radeon_asic.h index c8be1d3..e040de3 100644 --- a/drivers/gpu/drm/radeon/radeon_asic.h +++ b/drivers/gpu/drm/radeon/radeon_asic.h @@ -364,11 +364,11 @@ void r600_hdmi_init(struct drm_encoder *encoder); int r600_hdmi_buffer_status_changed(struct drm_encoder *encoder); void r600_hdmi_update_audio_settings(struct drm_encoder *encoder); /* r600 blit */ -int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_pages); +int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages); void r600_blit_done_copy(struct radeon_device *rdev, struct radeon_fence *fence); void r600_kms_blit_copy(struct radeon_device *rdev, u64 src_gpu_addr, u64 dst_gpu_addr, - unsigned num_pages); + unsigned num_gpu_pages); /* * rv770,rv730,rv710,rv740 -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Reply: Question on S3 on evergreen
Frank, I have found this text particularly useful when it comes to using MIT (or BSD) code that resides in the GPL project (e.g. DRM in Linux) http://www.softwarefreedom.org/resources/2007/gpl-non-gpl-collaboration.html I think that sections 2.2 and 2.3 are the things to be careful about because there is a non-trivial probabilty that at least some out of many patches that come in from many authors were created by copying and pasting lines of code that may have come from some other GPL'd code (e.g. other parts of Linux). Nobody can guarantee that this has not happened. P.S. The above is my personal opinion, I speak for myself, not for my company, I am not a lawyer. -- Ilija On Thu, 13 Oct 2011, Huang, FrankR wrote: Xav, thanks for your reminder. Actually our law leam has already checked the license. As Dave said, the DRM kernel driver is all MIT-licensed and we will be free to use them. When the drm uses linux kernel function calls, we will use freebsd(none-GPL) equivalent to replace. Dave, by the way, I want to ask you about some exceptions in DRM. you know in some files(i.e. drm_fb_helper.c), it includes MODULE_LICENSE("GPL and additional rights"). Does it mean it is GPL licensed? Is it free to use this file? Thanks, Frank -Original Message- From: Xavier Bestel [mailto:xavier.bes...@free.fr] Sent: 2011-10-13 (??) 20:07 To: Dave Airlie Cc: Huang, FrankR; dri-devel@lists.freedesktop.org Subject: Re: Question on S3 on evergreen On Thu, 2011-10-13 at 13:04 +0100, Dave Airlie wrote: On Thu, Oct 13, 2011 at 12:56 PM, Xavier Bestel wrote: Hi, On Thu, 2011-10-13 at 17:54 +0800, Huang, FrankR wrote: [...] I have ported radeon_suspend_kms() and radeon_resume_kms() functions from linux to CE. I imagine you already have checked with your company's lawyers, but if I understand correctly that means your drivers will be distributed under the GPL ? All the GPU driver code is licensed under MIT. Oh, I thought "linux" meant "kernel", not "X11". Xav ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm/radeon/kms: improve performance of blit-copy
Dave, Alex pointed to me that the patches I sent last night under this thread may conflict with 003cefe0c238e683a29d2207dba945b508cd45b7 that currently resides on drm-fixes branch (my patches are based on drm-next or drm-core-next). I'd like to make sure that the eventual merge goes smoothly: If you merge drm-fixes before my patches, then I'll rebase my patches and resend them after that happens and make sure everything is resolved correctly. If you merge my patches first and then follow with drm-fixes merge, two things should happen with 003cefe0c238e683a29d2207dba945b508cd45b7. Hunks related to evergreen.c file will fall out but that's expected and OK because my patches consolidate the blit code for r600 and evergreen into a common one. Then in r600.c, the hunks related to r600_blit_prepare_copy and r600_kms_blit_copy function calls will show conflicts, which should be resolved such that the size argument is num_gpu_pages, not num_gpu_pages * RADEON_GPU_PAGE_SIZE (this is because the new blit code takes size argument in pages, not bytes). Everything else will merge smoothly. For reference, pasted below is a patch that resulted after I cherry-picked 003cefe0c238e683a29d2207dba945b508cd45b7 into drm-next augmented with my blit-improvement patches and resolved the conflicts correctly. I guess the first option is less work for you (and I will be glad to rebase my patches if need be), but I hope that the info here is good enough to make the second path as easy as it can be thanks, Ilija From b12516c003cb35059f16ace774ef5a21170d6d78 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Fri, 16 Sep 2011 12:04:08 -0400 Subject: [PATCH 11/14] drm/radeon/kms: Make GPU/CPU page size handling consistent in blit code (v3) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The BO blit code inconsistenly handled the page size. This wasn't an issue on system with 4k pages since the GPU's page size is 4k as well. Switch the driver blit callbacks to take num pages in GPU page units. Fixes lemote mipsel systems using AMD rs780/rs880 chipsets. v2: incorporate suggestions from Michel. Signed-off-by: Alex Deucher Reviewed-by: Michel D??nzer Cc: sta...@kernel.org Signed-off-by: Dave Airlie v3: reconcile with changes due to blit-copy improvements on drm-next branch substitutes the v2 patch that currently resides on drm-fixes branch Conflicts: drivers/gpu/drm/radeon/evergreen.c drivers/gpu/drm/radeon/r600.c drivers/gpu/drm/radeon/radeon_asic.h Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/r100.c| 12 ++-- drivers/gpu/drm/radeon/r200.c|4 ++-- drivers/gpu/drm/radeon/r600.c| 10 ++ drivers/gpu/drm/radeon/radeon.h |7 --- drivers/gpu/drm/radeon/radeon_asic.h |6 +++--- drivers/gpu/drm/radeon/radeon_ttm.c |7 ++- 6 files changed, 27 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index 5985cb0..df60803 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -724,11 +724,11 @@ void r100_fence_ring_emit(struct radeon_device *rdev, int r100_copy_blit(struct radeon_device *rdev, uint64_t src_offset, uint64_t dst_offset, - unsigned num_pages, + unsigned num_gpu_pages, struct radeon_fence *fence) { uint32_t cur_pages; - uint32_t stride_bytes = PAGE_SIZE; + uint32_t stride_bytes = RADEON_GPU_PAGE_SIZE; uint32_t pitch; uint32_t stride_pixels; unsigned ndw; @@ -740,7 +740,7 @@ int r100_copy_blit(struct radeon_device *rdev, /* radeon pitch is /64 */ pitch = stride_bytes / 64; stride_pixels = stride_bytes / 4; - num_loops = DIV_ROUND_UP(num_pages, 8191); + num_loops = DIV_ROUND_UP(num_gpu_pages, 8191); /* Ask for enough room for blit + flush + fence */ ndw = 64 + (10 * num_loops); @@ -749,12 +749,12 @@ int r100_copy_blit(struct radeon_device *rdev, DRM_ERROR("radeon: moving bo (%d) asking for %u dw.\n", r, ndw); return -EINVAL; } - while (num_pages > 0) { - cur_pages = num_pages; + while (num_gpu_pages > 0) { + cur_pages = num_gpu_pages; if (cur_pages > 8191) { cur_pages = 8191; } - num_pages -= cur_pages; + num_gpu_pages -= cur_pages; /* pages are in Y direction - height page width in X direction - width */ diff --git a/drivers/gpu/drm/radeon/r200.c b/drivers/gpu/drm/radeon/r200.c index f240583..a1f3ba0 100644 --- a/drivers/gpu/drm/radeon/r200.c +++ b/drivers/gpu/drm/radeon/r200.c @@ -84,7 +84,7 @@ static int r200_get_vtx_size_0(u
Re: drm/radeon/kms: improve performance of blit-copy
On Thu, 13 Oct 2011, Roland Scheidegger wrote: I guess it isn't possible to temporarily disable some RBEs or otherwise reconfigure the chip that you could get the same performance for the high-end chips? According to the conversation I had with Alex, this *is* possible but requires the pipeline and cache flush. So it is unclear what the overall gain will be given the flush penalty. Also, this phenomena occurs only when GTT is involved in the copy. VRAM-to-VRAM copy in which there is no host memory involved (for which I added a benchmark, but didn't report in my note yesterday), high-end devices are beating low-end ones big time they better be ;-) So if we can get RBE-reduction to work, it should be turned on only when one of the BOs is in GTT domain. I looked at what it would take to do this, and it's doable, but requires hacks at many places. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon/kms: make r600-NI blit suspend code common
On Fri, 14 Oct 2011 alexdeuc...@gmail.com wrote: From: Alex Deucher r600-NI shared the same blit suspend code. Clean it up and make it a shared function. Signed-off-by: Alex Deucher Cc: Ilija Hadzic Thanks, this one slipped my eye in my cleanup. Correctness should be obvious, so as much as my review is worth Reviewed-by: Ilija Hadzic --- drivers/gpu/drm/radeon/evergreen.c | 10 +- drivers/gpu/drm/radeon/ni.c| 10 +- drivers/gpu/drm/radeon/r600.c | 26 -- drivers/gpu/drm/radeon/radeon.h|2 ++ drivers/gpu/drm/radeon/rv770.c | 12 ++-- 5 files changed, 22 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 80c4ee3..5c515f2 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3107,21 +3107,13 @@ int evergreen_resume(struct radeon_device *rdev) int evergreen_suspend(struct radeon_device *rdev) { - int r; - /* FIXME: we should wait for ring to be empty */ r700_cp_stop(rdev); rdev->cp.ready = false; evergreen_irq_suspend(rdev); radeon_wb_disable(rdev); evergreen_pcie_gart_disable(rdev); - - /* unpin shaders bo */ - r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); - if (likely(r == 0)) { - radeon_bo_unpin(rdev->r600_blit.shader_obj); - radeon_bo_unreserve(rdev->r600_blit.shader_obj); - } + r600_blit_suspend(rdev); return 0; } diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index cc0e911..e560da5 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1423,21 +1423,13 @@ int cayman_resume(struct radeon_device *rdev) int cayman_suspend(struct radeon_device *rdev) { - int r; - /* FIXME: we should wait for ring to be empty */ cayman_cp_enable(rdev, false); rdev->cp.ready = false; evergreen_irq_suspend(rdev); radeon_wb_disable(rdev); cayman_pcie_gart_disable(rdev); - - /* unpin shaders bo */ - r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); - if (likely(r == 0)) { - radeon_bo_unpin(rdev->r600_blit.shader_obj); - radeon_bo_unreserve(rdev->r600_blit.shader_obj); - } + r600_blit_suspend(rdev); return 0; } diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index c2f0dbe..f7f693a 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2375,6 +2375,20 @@ int r600_copy_blit(struct radeon_device *rdev, return 0; } +void r600_blit_suspend(struct radeon_device *rdev) +{ + int r; + + /* unpin shaders bo */ + if (rdev->r600_blit.shader_obj) { + r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); + if (!r) { + radeon_bo_unpin(rdev->r600_blit.shader_obj); + radeon_bo_unreserve(rdev->r600_blit.shader_obj); + } + } +} + int r600_set_surface_reg(struct radeon_device *rdev, int reg, uint32_t tiling_flags, uint32_t pitch, uint32_t offset, uint32_t obj_size) @@ -2494,8 +2508,6 @@ int r600_resume(struct radeon_device *rdev) int r600_suspend(struct radeon_device *rdev) { - int r; - r600_audio_fini(rdev); /* FIXME: we should wait for ring to be empty */ r600_cp_stop(rdev); @@ -2503,14 +2515,8 @@ int r600_suspend(struct radeon_device *rdev) r600_irq_suspend(rdev); radeon_wb_disable(rdev); r600_pcie_gart_disable(rdev); - /* unpin shaders bo */ - if (rdev->r600_blit.shader_obj) { - r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); - if (!r) { - radeon_bo_unpin(rdev->r600_blit.shader_obj); - radeon_bo_unreserve(rdev->r600_blit.shader_obj); - } - } + r600_blit_suspend(rdev); + return 0; } diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 7b0cb74..93768f5 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -555,6 +555,8 @@ struct r600_blit { struct radeon_ib *vb_ib; }; +void r600_blit_suspend(struct radeon_device *rdev); + int radeon_ib_get(struct radeon_device *rdev, struct radeon_ib **ib); void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib); int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib); diff --git a/drivers/gpu/drm/radeon/rv770.c b/drivers/gpu/drm/radeon/rv770.c index b13c2ee..4901837 100644 --- a/drivers/gpu/drm/radeon/rv770.c +++ b/drivers/gpu/drm/radeon/rv770.c @@ -1184,8 +1184,6 @@ int rv770_resume(struct radeon_device *rdev) int rv770_sus
drm_fb_helper question
While hacking something else, I stumbled upon two potential issues in drm_fb_helper. Could someone who knows this better than me enlighten whether the problem is realistic or whether I am overseeing something ? If the problem is real (or a potential trouble waiting to happen), I have cut some patches that address this and I can send them. First, in drm_fb_helper_init, the size of crtc_info allocated is for crtc_count entries, the loop that populates crtc_info at the end of the _init function scans the whole list of connectors. So if GPU specifies crtc_count that is smaller than the total number of CRTCs, the loop will overrun the allocated space and corrupt memory. Although no GPU driver in current tree does that, I could imagine a GPU using the smaller number for crtc_count if it does not want some CRTCs used for fbcon. Second (similar in nature) problem is the loop at the end of drm_setup_crtcs function that populates modeset->connectors array. The size of that array is fb_helper->max_conn_count (which is specified by the GPU driver and can be anything), while the the loop runs for fb_helper->connector_count iterations which can include all available connectors (per drm_fb_helper_single_add_all_connectors). So at least in theory that loop can overrun the modset->connectors array and corrupt memory too. Both cases may be theoretical at this time, but they look like a trouble waiting to happen (e.g., future GPUs may have a mix of CRTCs and connectors that could evoke the described error). If those who know better, agree with my analysis, I'll be glad to send some patchwork. thanks, Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] DRM: bug: RADEON_DEBUGFS_MAX_{NUM_FILES => COMPONENTS}
Maybe you are looking at the wrong branch, but I see it in drm-next (it has been there since Oct 10) http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-next&id=c245cb9e15055ed5dcf7eaf29232badb0059fdc1 On Mon, 24 Oct 2011, Michael Witten wrote: On Fri, Oct 7, 2011 at 19:20, Michael Witten wrote: Date: Fri, 16 Sep 2011 20:45:30 + The value of RADEON_DEBUGFS_MAX_NUM_FILES has been used to specify the size of an array, each element of which looks like this:  struct radeon_debugfs {      struct drm_info_list   *files;      unsigned         num_files;  }; Consequently, the number of debugfs files may be much greater than RADEON_DEBUGFS_MAX_NUM_FILES, something that the current code ignores:  if ((_radeon_debugfs_count + nfiles) > RADEON_DEBUGFS_MAX_NUM_FILES) {      DRM_ERROR("Reached maximum number of debugfs files.\n");      DRM_ERROR("Report so we increase RADEON_DEBUGFS_MAX_NUM_FILES.\n");      return -EINVAL;  } This commit fixes this mistake, and accordingly renames:  RADEON_DEBUGFS_MAX_NUM_FILES to:  RADEON_DEBUGFS_MAX_COMPONENTS Signed-off-by: Michael Witten ---  drivers/gpu/drm/radeon/radeon.h     |   2 +-  drivers/gpu/drm/radeon/radeon_device.c |  13 -  2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index c1e056b..dd7bab9 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -102,7 +102,7 @@ extern int radeon_pcie_gen2;  #define RADEON_FENCE_JIFFIES_TIMEOUT  (HZ / 2)  /* RADEON_IB_POOL_SIZE must be a power of 2 */  #define RADEON_IB_POOL_SIZE       16 -#define RADEON_DEBUGFS_MAX_NUM_FILES  32 +#define RADEON_DEBUGFS_MAX_COMPONENTS  32  #define RADEONFB_CONN_LIMIT       4  #define RADEON_BIOS_NUM_SCRATCH         8 diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index b51e157..31b1f4b 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -981,7 +981,7 @@ struct radeon_debugfs {     struct drm_info_list   *files;     unsigned         num_files;  }; -static struct radeon_debugfs _radeon_debugfs[RADEON_DEBUGFS_MAX_NUM_FILES]; +static struct radeon_debugfs _radeon_debugfs[RADEON_DEBUGFS_MAX_COMPONENTS];  static unsigned _radeon_debugfs_count = 0;  int radeon_debugfs_add_files(struct radeon_device *rdev, @@ -996,14 +996,17 @@ int radeon_debugfs_add_files(struct radeon_device *rdev,             return 0;         }     } -    if ((_radeon_debugfs_count + nfiles) > RADEON_DEBUGFS_MAX_NUM_FILES) { -        DRM_ERROR("Reached maximum number of debugfs files.\n"); -        DRM_ERROR("Report so we increase RADEON_DEBUGFS_MAX_NUM_FILES.\n"); + +    i = _radeon_debugfs_count + 1; +    if (i > RADEON_DEBUGFS_MAX_COMPONENTS) { +        DRM_ERROR("Reached maximum number of debugfs components.\n"); +        DRM_ERROR("Report so we increase " +             "RADEON_DEBUGFS_MAX_COMPONENTS.\n");         return -EINVAL;     }     _radeon_debugfs[_radeon_debugfs_count].files = files;     _radeon_debugfs[_radeon_debugfs_count].num_files = nfiles; -    _radeon_debugfs_count++; +    _radeon_debugfs_count = i;  #if defined(CONFIG_DEBUG_FS)     drm_debugfs_create_files(files, nfiles,                 rdev->ddev->control->debugfs_root, -- 1.7.6.409.ge7a85 This patch has not yet been applied. What's wrong? Sincerely, Michael Witten ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
drm/fb_helper: prevent some troubles waiting to happen
The following two patches address potential problems that I called "troubles waiting to happen" in this note http://lists.freedesktop.org/archives/dri-devel/2011-October/015412.html I didn't hear anyone take on my question, so I figured I would just send the patches. I tested the patches on a variety of AMD cards that I have. I don't have other GPUs handy. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2] drm/fb_helper: make sure crtc_count is consistent
stop adding crtcs from dev->mode_config.crtc_list to crtc_info array if gpu driver specifies (by mistake or with a reason) fewer crtcs in crtc_count parameter also, correct crtc_count value if gpu driver specifies too many crtcs Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_fb_helper.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index f7c6854..feac888 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -466,10 +466,18 @@ int drm_fb_helper_init(struct drm_device *dev, i = 0; list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) { + if (i >= crtc_count) { + DRM_DEBUG("crtc count set by the gpu reached\n"); + break; + } fb_helper->crtc_info[i].crtc_id = crtc->base.id; fb_helper->crtc_info[i].mode_set.crtc = crtc; i++; } + if (i < fb_helper->crtc_count) { + DRM_DEBUG("crtc count known by the drm reached\n"); + fb_helper->crtc_count = i; + } fb_helper->conn_limit = max_conn_count; return 0; out_free: -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/2] drm/fb_helper: honor the limit on number of connectors per crtc
gpu driver can specify the limit on the number of connectors that a given crtc can use. Add a check to make sure this limit is honored when building a list of connectors associated with a crtc. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_fb_helper.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index feac888..19e28e9 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -1333,6 +1333,11 @@ static void drm_setup_crtcs(struct drm_fb_helper *fb_helper) modeset = &fb_crtc->mode_set; if (mode && fb_crtc) { + if (modeset->num_connectors >= fb_helper->conn_limit) { + DRM_DEBUG("max number of connectors reached for crtc %d\n", + fb_crtc->crtc_id); + break; + } DRM_DEBUG_KMS("desired mode %s set on crtc %d\n", mode->name, fb_crtc->mode_set.crtc->base.id); fb_crtc->desired_mode = mode; -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
drm: fix one flawed mutex grab and remove some spurious mutex grabs
The following three patches remove unecessary locks around ioctls in drm module. First two: [PATCH 1/3] drm: no need to hold global mutex for static data [PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex are rather trivial and straight forward and probably do not need much explanation. The third one: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex is more serious and clears a clog that can occur if multiple processes call drm_wait_vblank as explained in patch commit message. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/3] drm: no need to hold global mutex for static data
drm_getcap and drm_version ioctls only reads static data, there is no need to protect them with drm_global_mutex, so make them DRM_UNLOCKED Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_drv.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 7a87e08..e159f17 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -60,14 +60,14 @@ static int drm_version(struct drm_device *dev, void *data, /** Ioctl table */ static struct drm_ioctl_desc drm_ioctls[] = { - DRM_IOCTL_DEF(DRM_IOCTL_VERSION, drm_version, 0), + DRM_IOCTL_DEF(DRM_IOCTL_VERSION, drm_version, DRM_UNLOCKED), DRM_IOCTL_DEF(DRM_IOCTL_GET_UNIQUE, drm_getunique, 0), DRM_IOCTL_DEF(DRM_IOCTL_GET_MAGIC, drm_getmagic, 0), DRM_IOCTL_DEF(DRM_IOCTL_IRQ_BUSID, drm_irq_by_busid, DRM_MASTER|DRM_ROOT_ONLY), DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, 0), DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, 0), DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, 0), - DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, 0), + DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, DRM_UNLOCKED), DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, DRM_MASTER), DRM_IOCTL_DEF(DRM_IOCTL_SET_UNIQUE, drm_setunique, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex
drm_getmap, drm_getclient and drm_getstats are all protected with their own mutex (dev->struct_mutex) no need to hold global mutex; make them DRM_UNLOCKED Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_drv.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index e159f17..dbabcb0 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -64,9 +64,9 @@ static struct drm_ioctl_desc drm_ioctls[] = { DRM_IOCTL_DEF(DRM_IOCTL_GET_UNIQUE, drm_getunique, 0), DRM_IOCTL_DEF(DRM_IOCTL_GET_MAGIC, drm_getmagic, 0), DRM_IOCTL_DEF(DRM_IOCTL_IRQ_BUSID, drm_irq_by_busid, DRM_MASTER|DRM_ROOT_ONLY), - DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, 0), - DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, 0), - DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, 0), + DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, DRM_UNLOCKED), + DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, DRM_UNLOCKED), + DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, DRM_UNLOCKED), DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, DRM_UNLOCKED), DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, DRM_MASTER), -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 3/3] drm: do not sleep on vblank while holding a mutex
holding the drm_global_mutex in drm_wait_vblank and then going to sleep (via DRM_WAIT_ON) is a bad idea in general because it can block other processes that may issue ioctls that also grab drm_global_mutex. Blocking can occur until next vblank which is in the tens of microseconds order of magnitude. fix it, but making drm_wait_vblank DRM_UNLOCKED call and then grabbing the mutex inside the drm_wait_vblank function and only for the duration of the critical section (i.e., from first manipulation of vblank sequence number until putting the task in the wait queue). Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_drv.c |2 +- drivers/gpu/drm/drm_irq.c | 16 +++- include/drm/drm_os_linux.h | 25 + 3 files changed, 37 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index dbabcb0..dc0eb0b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -124,7 +124,7 @@ static struct drm_ioctl_desc drm_ioctls[] = { DRM_IOCTL_DEF(DRM_IOCTL_SG_ALLOC, drm_sg_alloc_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), DRM_IOCTL_DEF(DRM_IOCTL_SG_FREE, drm_sg_free, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), - DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, 0), + DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, DRM_UNLOCKED), DRM_IOCTL_DEF(DRM_IOCTL_MODESET_CTL, drm_modeset_ctl, 0), diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 3830e9e..d85d604 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1191,6 +1191,7 @@ int drm_wait_vblank(struct drm_device *dev, void *data, DRM_DEBUG("failed to acquire vblank counter, %d\n", ret); return ret; } + mutex_lock(&drm_global_mutex); seq = drm_vblank_count(dev, crtc); switch (vblwait->request.type & _DRM_VBLANK_TYPES_MASK) { @@ -1200,12 +1201,15 @@ int drm_wait_vblank(struct drm_device *dev, void *data, case _DRM_VBLANK_ABSOLUTE: break; default: + mutex_unlock(&drm_global_mutex); ret = -EINVAL; goto done; } - if (flags & _DRM_VBLANK_EVENT) + if (flags & _DRM_VBLANK_EVENT) { + mutex_unlock(&drm_global_mutex); return drm_queue_vblank_event(dev, crtc, vblwait, file_priv); + } if ((flags & _DRM_VBLANK_NEXTONMISS) && (seq - vblwait->request.sequence) <= (1<<23)) { @@ -1215,10 +1219,12 @@ int drm_wait_vblank(struct drm_device *dev, void *data, DRM_DEBUG("waiting on vblank count %d, crtc %d\n", vblwait->request.sequence, crtc); dev->last_vblank_wait[crtc] = vblwait->request.sequence; - DRM_WAIT_ON(ret, dev->vbl_queue[crtc], 3 * DRM_HZ, - (((drm_vblank_count(dev, crtc) - - vblwait->request.sequence) <= (1 << 23)) || -!dev->irq_enabled)); + /* DRM_WAIT_ON_LOCKED will release drm_global_mutex */ + /* before going to sleep */ + DRM_WAIT_ON_LOCKED(ret, dev->vbl_queue[crtc], 3 * DRM_HZ, + (((drm_vblank_count(dev, crtc) - + vblwait->request.sequence) <= (1 << 23)) || + !dev->irq_enabled)); if (ret != -EINTR) { struct timeval now; diff --git a/include/drm/drm_os_linux.h b/include/drm/drm_os_linux.h index 3933691..fc6aaf4 100644 --- a/include/drm/drm_os_linux.h +++ b/include/drm/drm_os_linux.h @@ -123,5 +123,30 @@ do { \ remove_wait_queue(&(queue), &entry);\ } while (0) +#define DRM_WAIT_ON_LOCKED( ret, queue, timeout, condition ) \ +do { \ + DECLARE_WAITQUEUE(entry, current); \ + unsigned long end = jiffies + (timeout);\ + add_wait_queue(&(queue), &entry); \ + mutex_unlock(&drm_global_mutex);\ + \ + for (;;) { \ + __set_current_state(TASK_INTERRUPTIBLE);\ + if (condition) \ + break; \ + if (time_after_eq(jiffies, end)) { \ + ret = -EBUSY; \ + break; \ + } \ + schedule_timeout((HZ/100 > 1) ? HZ/100 : 1);\ + if (si
drm/radeon/kms: a few nits
The following three patches address various minor nits. They are all safe and I have been running with them for several months on a wide variety of AMD GPUs The first patch: [PATCH 1/3] drm/radeon/kms: use crtc-specific dpms functions in does not change the functionality and affects the function that is used only by r100 ASICs, but make the code more correct (strictly speaking). The second patch: [PATCH 2/3] drm/radeon/kms: fix the crtc number check makes the crtc number check consistent with the way it is done in the rest of the driver code without affecting the functionality. It was actually sent a few months ago and reviewed, but apparently forgotten (cf. http://lists.freedesktop.org/archives/dri-devel/2011-May/010935.html) The third patch: [PATCH 3/3] drm/radeon/kms: use num_crtc instead of hard-coded value Prevents a possible trouble waiting to happen if AMD ever makes a GPU with more than 6 CRTCs. I don't know what's cooking in their kitchen but if such a GPU ever comes out, this will prevent at least one regression ;-) ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/3] drm/radeon/kms: use crtc-specific dpms functions in prepare and commit
it's better that radeon_crtc_commit and radeon_crtc_prepare call crtc-specific dpms functions instead of hard-coding them to radeon_crtc_dpms. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon_legacy_crtc.c | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c index 41a5d48..0690a5b 100644 --- a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c +++ b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c @@ -1036,8 +1036,11 @@ static void radeon_crtc_prepare(struct drm_crtc *crtc) * The hardware wedges sometimes if you reconfigure one CRTC * whilst another is running (see fdo bug #24611). */ - list_for_each_entry(crtci, &dev->mode_config.crtc_list, head) - radeon_crtc_dpms(crtci, DRM_MODE_DPMS_OFF); + list_for_each_entry(crtci, &dev->mode_config.crtc_list, head) { + struct drm_crtc_helper_funcs *crtc_funcs = crtci->helper_private; + if (crtc_funcs->dpms) + crtc_funcs->dpms(crtci, DRM_MODE_DPMS_OFF); + } } static void radeon_crtc_commit(struct drm_crtc *crtc) @@ -1049,8 +1052,11 @@ static void radeon_crtc_commit(struct drm_crtc *crtc) * Reenable the CRTCs that should be running. */ list_for_each_entry(crtci, &dev->mode_config.crtc_list, head) { - if (crtci->enabled) - radeon_crtc_dpms(crtci, DRM_MODE_DPMS_ON); + if (crtci->enabled) { + struct drm_crtc_helper_funcs *crtc_funcs = crtci->helper_private; + if (crtc_funcs->dpms) + crtc_funcs->dpms(crtci, DRM_MODE_DPMS_ON); + } } } -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 3/3] drm/radeon/kms: use num_crtc instead of hard-coded value 6
radeon_driver_irq_preinstall_kms and radeon_driver_irq_uninstall_kms hard code the loop to 6 which happens to be the current maximum number of crtcs; if one day an ASIC with more crtcs comes out, this is a trouble waiting to happen. it's better to use num_crtc instead (for ASICs that have fewer than 6 CRTCs, this is still OK because higher numbers won't be looked at) Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon_irq_kms.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c index 9ec830c..a0f9d24 100644 --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c @@ -69,7 +69,7 @@ void radeon_driver_irq_preinstall_kms(struct drm_device *dev) rdev->irq.gui_idle = false; for (i = 0; i < rdev->num_crtc; i++) rdev->irq.crtc_vblank_int[i] = false; - for (i = 0; i < 6; i++) { + for (i = 0; i < rdev->num_crtc; i++) { rdev->irq.hpd[i] = false; rdev->irq.pflip[i] = false; } @@ -101,7 +101,7 @@ void radeon_driver_irq_uninstall_kms(struct drm_device *dev) rdev->irq.gui_idle = false; for (i = 0; i < rdev->num_crtc; i++) rdev->irq.crtc_vblank_int[i] = false; - for (i = 0; i < 6; i++) { + for (i = 0; i < rdev->num_crtc; i++) { rdev->irq.hpd[i] = false; rdev->irq.pflip[i] = false; } -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/3] drm/radeon/kms: fix the crtc number check
the crtc check in radeon_get_vblank_timestamp_kms should be against the num_crtc field in radeon_device not against num_crtcs in drm_device Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon_kms.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c index a749c26..540a9ee 100644 --- a/drivers/gpu/drm/radeon/radeon_kms.c +++ b/drivers/gpu/drm/radeon/radeon_kms.c @@ -347,7 +347,7 @@ int radeon_get_vblank_timestamp_kms(struct drm_device *dev, int crtc, struct drm_crtc *drmcrtc; struct radeon_device *rdev = dev->dev_private; - if (crtc < 0 || crtc >= dev->num_crtcs) { + if (crtc < 0 || crtc >= rdev->num_crtc) { DRM_ERROR("Invalid crtc %d\n", crtc); return -EINVAL; } -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex
On Wed, 26 Oct 2011, Daniel Vetter wrote: Just to check before I dig into reviewing this: Have you check all the other users of these data structure that these functions touch whether they don't accidentally rely on the global lock being taken? I did and thought it was safe. I re-checked this morning prompted by your note and of course there is one (easily fixable) thing that I missed. Here is the full "report": drm_getclient is safe: the only thing that should be protected there is dev->filelist and that is all protected in other functions using struct_mutex. drm_getstats is safe: actually I think there is no need for any mutex there because the loop runs through quasi-static (set at load time only) data, and the actual count access is done with atomic_read() drm_getmap has one problem which I can fix (and it's the hardest to follow): the structure that should be protected there is the dev->maplist. There are three functions that may potentially be an issue: drm_getsarea doesn't grab any lock before touching dev->maplist (so global mutex won't help here either). However, drivers seem to call it only at initialization time so it probably doesn't matter drm_master_destroy doesn't grab any lock either, but it is called from drm_master_put which is protected with dev->struct_mutex everywhere else in drm module. I also see a couple of calls to drm_master_destroy from vmwgfx driver that look unlocked to me, but drm_global_mutex won't help here either drm_getsareactx uses struct_mutex, but it apparently releases it too early (that's the problem I missed initially). If it's released after it's done with dev->maplist, we should be fine. I'll redo this patch with a fix to drm_getsareactx and that should do it. I would still appreciate another pair of eyes to make sure I didn't miss anything else thanks, -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm/fb_helper: prevent some troubles waiting to happen
On Wed, 26 Oct 2011, Daniel Vetter wrote: I've quickly checked current callsites of drm_fb_helper_init and I think you can just kill the two arguments crtc_count and max_conn_count. It is a usable functionality (i.e. allows the driver to select which connectors or CRTCs is fbcon allowed to bind to) and I see value in it. I am actually doing some work (not ready for public release yet, but will be in some relatively near future) that makes the use of this feature. If you (and the rest of the community) would prefer to see the use case first before merging these two patches, I am perfectly fine with waiting (I'll resend later, after I show the use case), but I would not like to see the functionality killed only because drivers don't use it at present. thanks, Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/3] drm/radeon/kms: use crtc-specific dpms functions in prepare and commit
On Wed, 26 Oct 2011, Michel [ISO-8859-1] D�nzer wrote: On Die, 2011-10-25 at 22:40 -0400, Ilija Hadzic wrote: it's better that radeon_crtc_commit and radeon_crtc_prepare call crtc-specific dpms functions instead of hard-coding them to radeon_crtc_dpms. Is it really better? If it's always radeon_crtc_dpms anyway, this obfuscates that fact (and introduces a guaranteed branch prediction miss, but that probably doesn't matter here). My reasoning for believing that it's better was that the functions are fairly generic: they loops through all CRTCs and disable/enable them all. Although at this time these functions are only used as helpers for legacy crtcs (so indeed the pointer always resolves to radeon_crtc_dpms), if they are ever called through some other path (in the context of atombios CRTC), it doesn't hurt to make them ready for that. Having said that, I don't really mind whatever the destiny of this patch ends up being. (and yes, branch prediction slot penalty really don't matter here; it's a slow path anyway) -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2] drm/radeon/kms: use defined constants for crtc/hpd count instead of hard-coded value 6
radeon_driver_irq_preinstall_kms and radeon_driver_irq_uninstall_kms hard code the loop to 6 which happens to be the current maximum number of crtcs and hpd pins; if one day an ASIC with more crtcs (or hpd pins) comes out, this is a trouble waiting to happen. introduce constants for maximum CRTC count, maximum HPD pins count and maximum HDMI blocks count (per FIXME in radeon_irq structure) and correct the loops in radeon_driver_irq_preinstall_kms and radeon_driver_irq_uninstall_kms v2: take care of goofs pointed out by Alex Deucher Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/radeon/radeon.h | 19 ++- drivers/gpu/drm/radeon/radeon_irq_kms.c | 12 ++-- 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 156b8b7..c6841fd 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -437,25 +437,26 @@ union radeon_irq_stat_regs { struct evergreen_irq_stat_regs evergreen; }; +#define RADEON_MAX_HPD_PINS 6 +#define RADEON_MAX_CRTCS 6 +#define RADEON_MAX_HDMI_BLOCKS 2 + struct radeon_irq { boolinstalled; boolsw_int; - /* FIXME: use a define max crtc rather than hardcode it */ - boolcrtc_vblank_int[6]; - boolpflip[6]; + boolcrtc_vblank_int[RADEON_MAX_CRTCS]; + boolpflip[RADEON_MAX_CRTCS]; wait_queue_head_t vblank_queue; - /* FIXME: use defines for max hpd/dacs */ - boolhpd[6]; + boolhpd[RADEON_MAX_HPD_PINS]; boolgui_idle; boolgui_idle_acked; wait_queue_head_t idle_queue; - /* FIXME: use defines for max HDMI blocks */ - boolhdmi[2]; + boolhdmi[RADEON_MAX_HDMI_BLOCKS]; spinlock_t sw_lock; int sw_refcount; union radeon_irq_stat_regs stat_regs; - spinlock_t pflip_lock[6]; - int pflip_refcount[6]; + spinlock_t pflip_lock[RADEON_MAX_CRTCS]; + int pflip_refcount[RADEON_MAX_CRTCS]; }; int radeon_irq_kms_init(struct radeon_device *rdev); diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c index 9ec830c..93da855 100644 --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c @@ -67,10 +67,10 @@ void radeon_driver_irq_preinstall_kms(struct drm_device *dev) /* Disable *all* interrupts */ rdev->irq.sw_int = false; rdev->irq.gui_idle = false; - for (i = 0; i < rdev->num_crtc; i++) - rdev->irq.crtc_vblank_int[i] = false; - for (i = 0; i < 6; i++) { + for (i = 0; i < RADEON_MAX_HPD_PINS; i++) rdev->irq.hpd[i] = false; + for (i = 0; i < RADEON_MAX_CRTCS; i++) { + rdev->irq.crtc_vblank_int[i] = false; rdev->irq.pflip[i] = false; } radeon_irq_set(rdev); @@ -99,10 +99,10 @@ void radeon_driver_irq_uninstall_kms(struct drm_device *dev) /* Disable *all* interrupts */ rdev->irq.sw_int = false; rdev->irq.gui_idle = false; - for (i = 0; i < rdev->num_crtc; i++) - rdev->irq.crtc_vblank_int[i] = false; - for (i = 0; i < 6; i++) { + for (i = 0; i < RADEON_MAX_HPD_PINS; i++) rdev->irq.hpd[i] = false; + for (i = 0; i < RADEON_MAX_CRTCS; i++) { + rdev->irq.crtc_vblank_int[i] = false; rdev->irq.pflip[i] = false; } radeon_irq_set(rdev); -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] drm/radeon/kms: use num_crtc instead of hard-coded value 6
On Wed, 26 Oct 2011, Alex Deucher wrote: This is actually not quite right. The number of HPD (Hot Plug Detect) pins is not equal to the number of crtcs. Radeons have supported 6 HPD pins long before we supported 6 crtcs (e.g., cards with more connectors than crtcs). The logic should probably look like: [SNIP] I have just sent out a patch that I think rectifies this. Hopefully, it's good now. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
On Wed, 26 Oct 2011, Daniel Vetter wrote: While you mess around with this code, please use the standard linux wait_event_* infrastructure. Right now the order of entering the queue is guaranteed to be the same as the order of entering the ioctl, which reflects the order of progressing vblank sequence. I wanted to preserve this semantic, so I need a wait function that puts the task into the queue and then unlocks the mutex (essentially a kernel equivalent of pthread_cond_wait that POSIX threads library implements). The closest "approximation" of that wait_event_interruptable_locked, but that requires a spinlock, not mutex (and thus the rework to drm_wait_vblank ioctl would be more radical. I want to stop that DRM_FOO yelling from spreading around - the idea of the drm core being os agnostic died quite a while ago. Is DRM support for other OS-es officially dead ? I know it's not in best shape on BSD and friends, but I am not sure if I should make it worse (or inconsistent with the current code shape and form). Anyway, the primary reason for implementing the DRM_WAIT_ON_LOCKED is because I didn't have the function that enqueues the task and then releases the mutex. Again, have you carefully checked whether this is safe and how the relevant data structures (vblank counting) are proctected? I am only moving the global lock further down in the function. The structures moved out of the critical section are only local vars. and arguments. So it's safe. Had I changed the mutex to something other than global one (which is the right thing to do in a long run) then a more careful review would be warranted. Also note that running drm_wait_ioctl as DRM_LOCKED is a severe problem that we have to address quickly. So I'd like to get *some* kind of decent fix in this merge window and then follow up with more polishing later. For example, try compiling and running this code (which any bozo in the userland can do): #include #include main() { int fd; drmVBlank vbl; fd = open("/dev/dri/card0", 0); printf("fd=%d\n", fd); while(1) { vbl.request.type = DRM_VBLANK_RELATIVE ; vbl.request.sequence = 60; printf("waiting on vblank\n"); ioctl(fd, DRM_IOCTL_WAIT_VBLANK, &vbl); } } Then start glxgears (on CRTC-0) and watch the hell break loose. The hostile code will cause any OpenGL application to hang for a full second before it can even enter the vblank_wait. Now because all vblank waits go through a common function in DDX, the whole windowing system will consequently go into a crawl. Run it with Compiz and things are even worse (the whole windowing system goes "stupid" as soon as the hostile application is started). + add_wait_queue(&(queue), &entry); \ + mutex_unlock(&drm_global_mutex);\ Hiding a lock-dropping in this fashion is horrible. I explained above why am I have to release the lock inside the macro. That's equivalent to what you can find in the userland in POSIX threads, like I said. So that's not bad. What *is* bad is that I should have passed the mutex as an argument to DRM_WAIT_ON_LOCKED and that I'll fix. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
On Wed, 26 Oct 2011, Michel [ISO-8859-1] D�nzer wrote: Does it really need drm_global_mutex at all, as opposed to e.g. dev->struct_mutex? It doesn't. Actually, it would probably suffice to have a mutex that is one-per-CRTC, because vlbank waits on different CRTCs don't step on each other, but I was going for a conservative fix. I tried to avoid having to hunt around in other sections of code for lines that possibly assume that a global lock is held while touching vblank counter structures (one of concerns raised by Daniel which is a non-issue because I didn't change the mutex being used). The "urgent" part of this patch is to make sure we don't go to sleep while holding the mutex. In my response to Daniel, I sent an example of a simple userland application that can hang up the whole windowing system. That's a big deal for which we have to get the fix in quickly. After that we can follow up with more polishing (e.g. use per-CRTC mutex, review and potentially fix other parts of the code that deal with vblank and maybe assume global mutex protection). I agree with Daniel's sentiment on this. AFAICT add_wait_queue() protects against concurrent access to the wait queue, so I think it would be better to just drop the mutex explicitly before calling DRM_WAIT_ON. The problem is that if I release the mutex just before calling DRM_WAIT_ON, the task can be scheduled away right at that point. Then another task can go all the way into the DRM_WAIT_ON and enter the queue. Then the first task wakes up and enters the queue. Now the last task in the queue is *not* the task that updated the dev->last_vblank_wait[crtc] value last. If you or someone who knows better than me can tell me that this potential reordering is guaranteed harmless, I'll gladly drop the mutex earlier and then I no longer need the DRM_WAIT_ON_LOCKED macro. However, if the preservation of the order is actually important, then it will be an ugly race condition to track later. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
On Thu, 27 Oct 2011, Alan Coopersmith wrote: I think the idea of sharing kernel drm code is pretty much dead, yeah. We are still trying to keep it alive, despite what some may think. In the context of this patch (in progress), it's probably a moot topic because per comments that Daniel sent, most of the locks in drm_vblank_wait will become unecessary after I rework the patch and I won't need to introduce (new) DRM_WAIT_ON_LOCKED any more. I will, however, *not* convert existing DRM_WAIT_ON to a Linux-centric function, so I will not be the one to kill (or contribute to the killing of) the portability of that file. That I leave to someone else ;-). P.S. If the "fight" starts I will side with the "keep the portability alive" camp. I have "emotional" ties with other (less widely used) operating systems. ;-) -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
On Thu, 27 Oct 2011, Daniel Vetter wrote: On a quick glance - drm_vblank functions call by wait_vblank seem to do correct locking already using various spinlocks and atomic ops. - linux waitqueues have their own locking - dev->last_vblank_wait is only used for a procfs file. I don't think it makes much sense given that we can have more than one vblank waiter - there's no selective wakeup going on, all waiters get woken for every vblank and call schedule again if it's not the right vblank I concur. The only really hairy thing going on is racing modeset ioctls against vblank waiters. But modeset is protected by the dev->mode_config.mutex and hence already races against wait_vblank with the current code, so I'm slightly inclined to ignore this for the moment. Would be great if you coudl check that in-depth, too. Will leave this for some other patch at some other time; the critical path is to fix the hang/crawl that I explained in my previous note. So I think the right thing to do is - Kill dev->last_vblank_wait (in a prep patch). Agreed. Also drm_vblank_info function can go away - Imo we also have a few too many atomic ops going on, e.g. dev->vblank_refcount looks like it's atomic only because or the procfs file, so maybe kill that procfs file entirely. I am not sure about that. dev->vblank_refcount looks critical to me: it is incremented through drm_vblank_get which is called from several different contexts: page-flip functions in several drivers (which can run in the context of multiple user processes), **_pm_** stuff in radeon driver (which looks like it runs in the context of kernel worker thread). Mutexes used at all these different places are not quite consistent. If anything, as the result of a later audit, some other mutexes may be rendered unecessary, but definitely, I would keep vblank_refcount atomic. - Audit the vblank locking, maybe resulting in a patch with updated comments, if there's anything misleading or tricky going on. And it's always good when the locking of structe members is explained in the header file ... I'll add comments that I feel make sense for this set of patches (if anything). Full audit, should come later at a slower pace. There are quite a few mutexes and locks many of which are overlapping in function and some are spurious. It will take more than just myself to untangle this know. - Drop the mutex locking because it's not needed. Agreed. Will drop. While locking at the code I also noticed that wait_vblank calls drm_vblank_get, but not always drm_vblank_put. The code is correct, the missing vblank_put is hidden in drm_queue_vblank_event. Can you also write the patch to move that around to not trip up reviewers the next time around? Actually, there is something fishy here. Look at the 'else' branch under the 'if ((seq - vblwait->request.sequence) <= (1 << 23))' at the end of the drm_queue_vblank_event. It doesn't have the 'drm_vblank_put'. It looks wrong to me, but I need to first convince myself that there is not some other obscure place where drm_vblank_put is accounted for. If that branch is really missing a drm_vblank_put, then it will be easy pull the drm_vblank_put out. Otherwise, it will be another knot to untangle. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
On Thu, 27 Oct 2011, Ilija Hadzic wrote: While locking at the code I also noticed that wait_vblank calls drm_vblank_get, but not always drm_vblank_put. The code is correct, the missing vblank_put is hidden in drm_queue_vblank_event. Can you also write the patch to move that around to not trip up reviewers the next time around? Actually, there is something fishy here. Look at the 'else' branch under the 'if ((seq - vblwait->request.sequence) <= (1 << 23))' at the end of the drm_queue_vblank_event. It doesn't have the 'drm_vblank_put'. It looks wrong to me, but I need to first convince myself that there is not some other obscure place where drm_vblank_put is accounted for. If that branch is really missing a drm_vblank_put, then it will be easy pull the drm_vblank_put out. Otherwise, it will be another knot to untangle. OK, I checked this. Although the code is a little hard to read, it is done the way it is with the reason. Whenever the drm_queue_vblank_event and that 'else' branch is caught, the drm_vblank_put should *not* be called. The reason is that the relevant vblank has not come in yet, so the reference must remain up so that the vblank interrupts are not disabled until the event occurs. The seemingly missing drm_vblank_put will be accounted for in drm_handle_vblank_events. So I should not move anything around. If anything, this should be annotated with comments to explain what's going on. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
On Fri, 28 Oct 2011, Daniel Vetter wrote: On Fri, Oct 28, 2011 at 08:59:04AM +0200, Michel Dänzer wrote: On Don, 2011-10-27 at 22:19 -0500, Ilija Hadzic wrote: On Thu, 27 Oct 2011, Daniel Vetter wrote: So I think the right thing to do is - Kill dev->last_vblank_wait (in a prep patch). Agreed. Also drm_vblank_info function can go away Actually, I was rather going to submit a patch to hook it up again — AFAICT it was unhooked without any justification. It could be useful for debugging vblank related hangs. Any issues with it, such as last_vblank_wait not being guaranteed to really be the last one, can always be improved later on. I've thought a bit about the usefulness of it for debugging before proposing to kill it and I think it can die: It's only really useful for a complete hangs, if we have an issue we just missing a wakeup somewhere, that's not gonna catch things. Hence I think something that allows you to watch things while it's running is much better, i.e. either a drm debug prinkt or a tracecall. But if you're firmly attached to that debug file it should be pretty easy to shove that under the protection of one of the vblank spinlocks. I'll keep it then and figure out the best mutex/spinlock to use. It can be anything that exists on one-per-CRTC basis (vblank waits on different CTCs are not contending). The critical section is from that switch in which vblwait->request.sequence is incremented until it is assigned to dev->last_vblank_wait[crtc] (and we are only protecting that section against itself, executed in contexts of different PIDs). I guess we settled now on this patch (other comments will be addressed in a different set of patches). -- Ilija___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2] drm: do not sleep on vblank while holding a mutex
drm_wait_vblank must be DRM_UNLOCKED because otherwise it will grab the drm_global_mutex and then go to sleep until the vblank event it is waiting for. That can wreck havoc in the windowing system because if one process issues this ioctl, it will block all other processes for the duration of all vblanks between the current and the one it is waiting for. In some cases it can block the entire windowing system. So, make this ioctl DRM_UNLOCKED, wrap the remaining (very short) critical section with dev->vbl_lock spinlock, and add a comment to the code explaining what we are protecting with the lock. v2: incorporate comments received from Daniel Vetter and Michel Daenzer. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_drv.c |2 +- drivers/gpu/drm/drm_irq.c | 14 +- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index e159f17..c990dab 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -124,7 +124,7 @@ static struct drm_ioctl_desc drm_ioctls[] = { DRM_IOCTL_DEF(DRM_IOCTL_SG_ALLOC, drm_sg_alloc_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), DRM_IOCTL_DEF(DRM_IOCTL_SG_FREE, drm_sg_free, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), - DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, 0), + DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, DRM_UNLOCKED), DRM_IOCTL_DEF(DRM_IOCTL_MODESET_CTL, drm_modeset_ctl, 0), diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 3830e9e..c8b4da8 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1160,6 +1160,7 @@ int drm_wait_vblank(struct drm_device *dev, void *data, union drm_wait_vblank *vblwait = data; int ret = 0; unsigned int flags, seq, crtc, high_crtc; + unsigned long irqflags; if ((!drm_dev_to_irq(dev)) || (!dev->irq_enabled)) return -EINVAL; @@ -1193,6 +1194,13 @@ int drm_wait_vblank(struct drm_device *dev, void *data, } seq = drm_vblank_count(dev, crtc); + /* the lock protects this section from itself executed in */ + /* the context of another PID, ensuring that the process that */ + /* wrote dev->last_vblank_wait is really the last process */ + /* that was here; processes waiting on different CRTCs */ + /* do not need to be interlocked, but rather than introducing */ + /* a new per-crtc lock, we reuse vbl_lock for simplicity */ + spin_lock_irqsave(&dev->vbl_lock, irqflags); switch (vblwait->request.type & _DRM_VBLANK_TYPES_MASK) { case _DRM_VBLANK_RELATIVE: vblwait->request.sequence += seq; @@ -1200,12 +1208,15 @@ int drm_wait_vblank(struct drm_device *dev, void *data, case _DRM_VBLANK_ABSOLUTE: break; default: + spin_unlock_irqrestore(&dev->vbl_lock, irqflags); ret = -EINVAL; goto done; } - if (flags & _DRM_VBLANK_EVENT) + if (flags & _DRM_VBLANK_EVENT) { + spin_unlock_irqrestore(&dev->vbl_lock, irqflags); return drm_queue_vblank_event(dev, crtc, vblwait, file_priv); + } if ((flags & _DRM_VBLANK_NEXTONMISS) && (seq - vblwait->request.sequence) <= (1<<23)) { @@ -1215,6 +1226,7 @@ int drm_wait_vblank(struct drm_device *dev, void *data, DRM_DEBUG("waiting on vblank count %d, crtc %d\n", vblwait->request.sequence, crtc); dev->last_vblank_wait[crtc] = vblwait->request.sequence; + spin_unlock_irqrestore(&dev->vbl_lock, irqflags); DRM_WAIT_ON(ret, dev->vbl_queue[crtc], 3 * DRM_HZ, (((drm_vblank_count(dev, crtc) - vblwait->request.sequence) <= (1 << 23)) || -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2] drm: make DRM_UNLOCKED ioctls with their own mutex
drm_getclient, drm_getstats and drm_getmap (with a few minor adjustments) do not need global mutex, so fix that and make the said ioctls DRM_UNLOCKED. Details: drm_getclient: the only thing that should be protected here is dev->filelist and that is already protected everywhere with dev->struct_mutex. drm_getstats: there is no need for any mutex here because the loop runs through quasi-static (set at load time only) data, and the actual count access is done with atomic_read() drm_getmap already uses dev->struct_mutex to protect dev->maplist, which also used to protect the same structure everywhere else except at three places: * drm_getsarea, which doesn't grab *any* mutex before touching dev->maplist (so no drm_global_mutex doesn't help here either; different issue for a different patch). However, drivers seem to call it only at initialization time so it probably doesn't matter * drm_master_destroy, which is called from drm_master_put, which in turn is protected with dev->struct_mutex everywhere else in drm module, so we are good here too. * drm_getsareactx, which releases the dev->struct_mutex too early, but this patch includes the fix for that. v2: * incorporate comments received from Daniel Vetter * include the (long) explanation above to make it clear what we are doing (and why), also at Daniel Vetter's request * tighten up mutex grab/release locations to only encompass real critical sections, rather than some random code around them Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_context.c |5 +++-- drivers/gpu/drm/drm_drv.c |6 +++--- drivers/gpu/drm/drm_ioctl.c | 15 --- 3 files changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/drm_context.c b/drivers/gpu/drm/drm_context.c index 6d440fb..325365f 100644 --- a/drivers/gpu/drm/drm_context.c +++ b/drivers/gpu/drm/drm_context.c @@ -154,8 +154,6 @@ int drm_getsareactx(struct drm_device *dev, void *data, return -EINVAL; } - mutex_unlock(&dev->struct_mutex); - request->handle = NULL; list_for_each_entry(_entry, &dev->maplist, head) { if (_entry->map == map) { @@ -164,6 +162,9 @@ int drm_getsareactx(struct drm_device *dev, void *data, break; } } + + mutex_unlock(&dev->struct_mutex); + if (request->handle == NULL) return -EINVAL; diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index c990dab..dc0eb0b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -64,9 +64,9 @@ static struct drm_ioctl_desc drm_ioctls[] = { DRM_IOCTL_DEF(DRM_IOCTL_GET_UNIQUE, drm_getunique, 0), DRM_IOCTL_DEF(DRM_IOCTL_GET_MAGIC, drm_getmagic, 0), DRM_IOCTL_DEF(DRM_IOCTL_IRQ_BUSID, drm_irq_by_busid, DRM_MASTER|DRM_ROOT_ONLY), - DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, 0), - DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, 0), - DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, 0), + DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, DRM_UNLOCKED), + DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, DRM_UNLOCKED), + DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, DRM_UNLOCKED), DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, DRM_UNLOCKED), DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, DRM_MASTER), diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 904d7e9..956fd38 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -158,14 +158,11 @@ int drm_getmap(struct drm_device *dev, void *data, int i; idx = map->offset; - - mutex_lock(&dev->struct_mutex); - if (idx < 0) { - mutex_unlock(&dev->struct_mutex); + if (idx < 0) return -EINVAL; - } i = 0; + mutex_lock(&dev->struct_mutex); list_for_each(list, &dev->maplist) { if (i == idx) { r_list = list_entry(list, struct drm_map_list, head); @@ -211,9 +208,9 @@ int drm_getclient(struct drm_device *dev, void *data, int i; idx = client->idx; - mutex_lock(&dev->struct_mutex); - i = 0; + + mutex_lock(&dev->struct_mutex); list_for_each_entry(pt, &dev->filelist, lhead) { if (i++ >= idx) { client->auth = pt->authenticated; @@ -249,8 +246,6 @@ int drm_getstats(struct drm_device *dev, void *data, memset(stats, 0, sizeof(*stats)); - mutex_lock(&dev->struct_mutex); - for (i = 0; i < dev->counters; i++) { if (dev->types[i] == _DRM_STAT_LOCK) stats->data[i].value = @@ -262,8 +25
[PATCH] drm: add some comments to drm_wait_vblank and drm_queue_vblank_event
during the review of the fix for locks problems in drm_wait_vblank, a couple of false concerns were raised about how the drm_vblank_get and drm_vblank_put are used in this function; it turned out that the code is correct and that it cannot be simplified add a few comments to explain non-obvious flows in the code, to prevent "false alarms" in the future Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_irq.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index c8b4da8..e9dd19d 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1065,6 +1065,10 @@ out: return ret; } +/* must acquire vblank reference count (call drm_vblank_get) */ +/* before calling this function; the matching drm_vblank_put */ +/* will either be issued here or in drm_handle_vblank_events */ +/* after the vblank is signaled */ static int drm_queue_vblank_event(struct drm_device *dev, int pipe, union drm_wait_vblank *vblwait, struct drm_file *file_priv) @@ -1124,6 +1128,9 @@ static int drm_queue_vblank_event(struct drm_device *dev, int pipe, trace_drm_vblank_event_delivered(current->pid, pipe, vblwait->request.sequence); } else { + /* can't call drm_vblank_put here because interrupt */ + /* must remain enabled until the event occurs */ + /* drm_handle_vblank_events will do this for us */ list_add_tail(&e->base.link, &dev->vblank_event_list); vblwait->reply.sequence = vblwait->request.sequence; } @@ -1215,6 +1222,7 @@ int drm_wait_vblank(struct drm_device *dev, void *data, if (flags & _DRM_VBLANK_EVENT) { spin_unlock_irqrestore(&dev->vbl_lock, irqflags); + /* drm_queue_vblank_event() will call drm_vblank_put() */ return drm_queue_vblank_event(dev, crtc, vblwait, file_priv); } -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2] drm: do not sleep on vblank while holding a mutex
On Sat, 29 Oct 2011, Daniel Vetter wrote: + /* the lock protects this section from itself executed in */ + /* the context of another PID, ensuring that the process that */ + /* wrote dev->last_vblank_wait is really the last process */ + /* that was here; processes waiting on different CRTCs */ + /* do not need to be interlocked, but rather than introducing */ + /* a new per-crtc lock, we reuse vbl_lock for simplicity */ Multiline comments are done with one pair of /* */, see CodingStyle Will fix the multiline comments (though scripts/checkpatch.pl didn't complain so I figured it was OK) I personally vote for no additional locking at all here and just drop the global lock. Or maybe I'm blind and we need this. In that case, please clue me up. What I am doing with the lock, is makeing sure that the last vblank wait reported is really last. You said in your previous comment (which I agree with) that the value of having last_vblank_wait in debugfs is in case of hangs. In that case you want to see who really was the last one to issue the vblank. Now suppose that the drm_wait_vblank is enteret in the context of two different PIDs and suppose that there are no locks. Let's say that the first process wants to wait on vblank N and the second wants to wait on vblank N+1. First process can reach the point just before it wants to write to last_vblank_wait and lose the processor there (so it has N in vblank.request (on its stack) calculated, but it has not written it into last_vblank_wait yet (because it lost the CPU right there). Then the second process gets the CPU, and executes and let's say that it goes all the way through so it writes N+1 into last_vblank_wait and then schedules away (it called DRM_WAIT_ON). Now the first process gets the CPU where it left off and overwrites N+1 in last_vblank_wait with N, which is now stale. I explained in the comment (and in my note this morning), that the only race condition is against itself in the context of different processes. The above is the scenario. Race against drm_vblank_info exists in theory, but in practice doesn't matter because the reader of debugfs (or proc file system) runs asynchronously (and typically at slower pace) from processes issuing vblank waits (if it's a human looking at the filesystem then definitely much slower pace). Since processes issuing vblanks are overwriting the same last_vblank_value, drm_vblank_info is already doing a form of downsampling and debugfs is losing information, anyway, so who cares about the lock. In case of a hang, the value of last_vblank_wait doesn't change, so again the lock in drm_vblank_info doesn't change anything. So adding a lock there (which would have to be vbl_lock to make it usable) is only a matter of theoretical correctness, but no practical value. So I decided not to touch drm_vblank_info. drm_wait_vblank, however, needs a protection from itself as I explained above. DRM_WAIT_ON(ret, dev->vbl_queue[crtc], 3 * DRM_HZ, (((drm_vblank_count(dev, crtc) - vblwait->request.sequence) <= (1 << 23)) || One line below there's the curios case of us rechecking dev->irq_enabled. Without any locking in place. But afaics this is only changed by the driver at setup and teardown when we are guaranteed (hopefully) that there's no open fd and hence no way to call an ioctl. So checking once at the beginning should be good enough. Whatever. It's probably redundant statement because it won't be reached if irq_enabled is false and there is nothing to change it to true at runtime, but that's a topic for a different patch. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2] drm: do not sleep on vblank while holding a mutex
On Sat, 29 Oct 2011, Daniel Vetter wrote: Ok, and here's why your locking (or any locking that drops the lock before calling schedule) won't work: [SNIP] You just came full circle. Recall that in my v1 patch I went all the way to enqueuing the process in the wait queue before dropping the lock. That would have guaranteed that if there is a hangup, what last_vblank_wait says is the last is really the last. If there is no hang, then it doesn't matter because last_vlank_wait is constantly overwritten (and is indeed stale for N-1 processes). However, that was disliked in the review and I didn't want to argue. So in the interest of making progress, it looks that you would be happy if this patch just dropped DRM_UNLOCKED and declared that we don't care about (potentially theoretical) critical section related to last_vblank_wait. If that's the case, I'll cut a relatvely trivial patch that drops DRM_UNLOCKED from this system call to solve the problem that I pointed earlier in this thread and leave all the rest of the locking discussion for other patches. Would that be fine by you ? thanks, Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v3] drm: do not sleep on vblank while holding a mutex
drm_wait_vblank must be DRM_UNLOCKED because otherwise it will grab the drm_global_mutex and then go to sleep until the vblank event it is waiting for. That can wreck havoc in the windowing system because if one process issues this ioctl, it will block all other processes for the duration of all vblanks between the current and the one it is waiting for. In some cases it can block the entire windowing system. v2: incorporate comments received from Daniel Vetter and Michel Daenzer. v3: after a lengty discussion with Daniel Vetter, it was concluded we should not worry about any locking, within drm_wait_vblank function so this patch becomes a rather trivial removal of drm_global_mutex from drm_wait_vblank Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_drv.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index dbabcb0..dc0eb0b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -124,7 +124,7 @@ static struct drm_ioctl_desc drm_ioctls[] = { DRM_IOCTL_DEF(DRM_IOCTL_SG_ALLOC, drm_sg_alloc_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), DRM_IOCTL_DEF(DRM_IOCTL_SG_FREE, drm_sg_free, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), - DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, 0), + DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, DRM_UNLOCKED), DRM_IOCTL_DEF(DRM_IOCTL_MODESET_CTL, drm_modeset_ctl, 0), -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2] drm: add some comments to drm_wait_vblank and drm_queue_vblank_event
during the review of the fix for locks problems in drm_wait_vblank, a couple of false concerns were raised about how the drm_vblank_get and drm_vblank_put are used in this function; it turned out that the code is correct and that it cannot be simplified add a few comments to explain non-obvious flows in the code, to prevent "false alarms" in the future v2: incorporate comments received from Daniel Vetter Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_irq.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 3830e9e..79c02da 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1124,6 +1124,7 @@ static int drm_queue_vblank_event(struct drm_device *dev, int pipe, trace_drm_vblank_event_delivered(current->pid, pipe, vblwait->request.sequence); } else { + /* drm_handle_vblank_events will call drm_vblank_put */ list_add_tail(&e->base.link, &dev->vblank_event_list); vblwait->reply.sequence = vblwait->request.sequence; } @@ -1204,8 +1205,12 @@ int drm_wait_vblank(struct drm_device *dev, void *data, goto done; } - if (flags & _DRM_VBLANK_EVENT) + if (flags & _DRM_VBLANK_EVENT) { + /* must hold on to the vblank ref until the event fires +* drm_vblank_put will be called asynchronously +*/ return drm_queue_vblank_event(dev, crtc, vblwait, file_priv); + } if ((flags & _DRM_VBLANK_NEXTONMISS) && (seq - vblwait->request.sequence) <= (1<<23)) { -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3] drm: do not sleep on vblank while holding a mutex
OK, v4 coming up and your suggestion will be copied verbatim. Hopefully that does it. This patch is probably going to become a record-holder in comment/code lines ratio ;-) -- Ilija On Mon, 31 Oct 2011, Daniel Vetter wrote: On Mon, Oct 31, 2011 at 01:10:21PM -0400, Ilija Hadzic wrote: drm_wait_vblank must be DRM_UNLOCKED because otherwise it will grab the drm_global_mutex and then go to sleep until the vblank event it is waiting for. That can wreck havoc in the windowing system because if one process issues this ioctl, it will block all other processes for the duration of all vblanks between the current and the one it is waiting for. In some cases it can block the entire windowing system. v2: incorporate comments received from Daniel Vetter and Michel Daenzer. v3: after a lengty discussion with Daniel Vetter, it was concluded we should not worry about any locking, within drm_wait_vblank function so this patch becomes a rather trivial removal of drm_global_mutex from drm_wait_vblank That commit message is a bit garbage. What about ... "... it was concluded that the only think not yet protected with locks and atomic ops is the write to dev->last_vblank_wait. It's only used in a debug file in proc, and the current code already employs no correct locking: the proc file only takes dev->struct_mutex, whereas drm_wait_vblank implicitly took the drm_global_mutex. Given all this, it's not worth bothering to try to fix this at this time." I think it's important to correctly document the conclusion of this discussion, because we've worried quite a bit about correct locking. Yours, Daniel -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v4] drm: do not sleep on vblank while holding a mutex
drm_wait_vblank must be DRM_UNLOCKED because otherwise it will grab the drm_global_mutex and then go to sleep until the vblank event it is waiting for. That can wreck havoc in the windowing system because if one process issues this ioctl, it will block all other processes for the duration of all vblanks between the current and the one it is waiting for. In some cases it can block the entire windowing system. v2: incorporate comments received from Daniel Vetter and Michel Daenzer. v3/v4: after a lengty discussion with Daniel Vetter, it was concluded that the only thing not yet protected with locks and atomic ops is the write to dev->last_vblank_wait. It's only used in a debug file in proc, and the current code already employs no correct locking: the proc file only takes dev->struct_mutex, whereas drm_wait_vblank implicitly took the drm_global_mutex. Given all this, it's not worth bothering to try to fix the locks at this time. Signed-off-by: Ilija Hadzic --- drivers/gpu/drm/drm_drv.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index dbabcb0..dc0eb0b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -124,7 +124,7 @@ static struct drm_ioctl_desc drm_ioctls[] = { DRM_IOCTL_DEF(DRM_IOCTL_SG_ALLOC, drm_sg_alloc_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), DRM_IOCTL_DEF(DRM_IOCTL_SG_FREE, drm_sg_free, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), - DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, 0), + DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, DRM_UNLOCKED), DRM_IOCTL_DEF(DRM_IOCTL_MODESET_CTL, drm_modeset_ctl, 0), -- 1.7.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC] Virtual CRTCs (proposal + experimental code)
Hi everyone, I would like to bring to the attention of dri-devel and linux-fbdev community a set of hopefully useful and interesting patches that I (and a few other colleagues) have been working on during the past few months. Here, I will provide a short abstract, so that you can decide whether this is of interest for you. At the end, I will provide the pointers to the code and documentation. The code is based on Dave Arilie's tree, drm-next branch and it allows a GPU driver to have an arbitrary number of CRTCs (configurable by user) instead of only those CRTCs that represent real hardware. The new CRTCs, that we call Virtual CRTCs can be attached to a foreign device, that we call CTD device (short for Compression Transmission and Display), and pixels can be streamed out of the GPU to the device. In one example, we use AMD/ATI Radeon GPU to do 3D rendering (accelerated, of course) and we use our code to add additional monitor heads using DisplayLink devices. In other words, we achieve accelerated 3D rendering on a DisplayLink monitor. In another example we funnel rendered pixels to userland by emulating a Video-for-Linux device (and then userland can do whatever it wants with it). While doing all this, GPU has no idea that we are doing this, the entire DRI "thinks" that it is just dealing with a GPU that has a few "extra" connectors and CRTCs. So everything ports without the need to modify anything in the userland. In general any device that can do something good with rendered pixels can act as a CTD device, allowing a GPU to be an acceleration device for a less capable display device or (the opposite) a frame-buffer-based display device to be an expansion card for a GPU. Of course, for each display device, a driver would have to be made compatible with our new infrastructure (which is what we have done with DisplayLink driver and also wrote one "synthetic" driver to fake out a V4L2 device as a CTD device). The newly introduced kernel module that we call VCRTCM (short for Virtual CRTC Manager) handles the "traffic" between GPUs (actually their CRTCs) and CTDs. The code makes use of DMA wherever possible and also deals with specifics of CRTCs like modes, vblanks, page flips, hardware cursor, etc. (just for kick, we played OpenArena and watched Unigine Heaven demo on a DisplayLink monitor). At this time, we would like to solicit feedback, comments, and possibly contributions. The code is on github (pointers below) and is based on the current state of drm-next branch from Dave's tree. The code is very experimental, but complete and stable enough that you can do something useful with it. We will be adding more CTD drivers and updates to current ones in the near future and will continue to maintain the code on github. If the community finds this useful, we would be glad to work with the maintainers on merging this upstream. So we would especially like to hear what you would like to see changed to make this code acceptable for the mainline of development. My Github page is at https://github.com/ihadzic. To access the kernel code type: $ git clone git://github.com/ihadzic/linux-vcrtcm.git $ git branch drm-next-vcrtcm origin/drm-next-vcrtcm $ git checkout drm-next-vcrtcm You will get all that's currently on Dave's drm-next plus our patches on the top. We kept the development history preserved without squashing patches (unless we had to due to merge/rebase conflicts), so you can see (and laugh at) all our goofs and fixes to them. To access the documentation, type: $ git clone git://github.com/ihadzic/vcrtcm-doc.git Then read the HOWTO.txt file. The first few sections provide some general overview, and the sections that come later provide instructions how to use our stuff. Again, all comments, positive or negative, are very welcome. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Virtual CRTCs (proposal + experimental code)
On Thu, 3 Nov 2011, Roland Scheidegger wrote: Am I right in assuming this could also be used for making muxless hybrid GPUs work (i.e. radeon/intel igp)? Yes, this is actually one of the scenarios on my wish list too. In the terminology that I defined with the introduction of VCRTCM, IGP would act as a CTD driver and Radeon would be the GPU driver. It should be relatively straightforward to add CTD hooks to an existing Intel GPU driver and load the module in "CTD mode" when you want to do that. Another (maybe trivial, but solution unknown to me) hurdle would be to "convince" your motherboard not to turn off IGP if it senses that there is a GPU plugged in its PCI-Express slot. I would assume that there is a way to do that and maybe Intel folks could tell us (or maybe it's already known, but I am ignorant on this). Though it would be restricted to do all work on one gpu and the igp would just send the data to the display (which ultimately is not really what we want as compositing etc. should ideally always happen on the IGP so external graphic chip can be turned off but I don't even want to think about what needs to happen to make that work). In a straightforward implementation, that would be a restriction. However, when a CTD driver "asks" VCRTCM module to create a push-buffer for it (a buffer in the memory to which GPU will push the pixels from a virtual CRTC), what comes back are pages associated with a GEM/TTM bo in GART domain. I could imagine an IGP "reusing" that object for some additional rendering (as opposed to just linking it to its own CRTC). These are just half-baked thoughts and I am sure that it would require some "specialty" in the userland and that things would get messy before they get better, but still not outside of the realm of possiblities. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Virtual CRTCs (proposal + experimental code)
On Thu, 3 Nov 2011, David Airlie wrote: Well the current plan I had for this was to do it in userspace, I don't think the kernel has any business doing it and I think for the simple USB case its fine but will fallover when you get to the non-trivial cases where some sort of acceleration is required to move pixels around. But in saying that its good you've done what something, and I'll try and spend some time reviewing it. The reason I opted for doing this in kernel is that I wanted to confine all the changes to a relatively small set of modules. At first this was a pragmatic approach, because I live out of the mainstream development tree and I didn't want to turn my life into an ethernal merging/conflict-resolution activity. However, a more fundamental reason for it is that I didn't want to be tied to X. I deal with some userland applications (that unfortunately I can't provide much detail of yet) that live directly on the top of libdrm. So I set myself a goal of "full application transparency". Whatever is thrown at me, I wanted to be able to handle without having to touch any piece of application or library that the application relies on. I think I have achieved this goal and really everything I tried just worked out of the box (with an exception of two bug fixes to ATI DDX and Xorg, that are bugs with or without my work). -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: KMS cursor BO semantics
On Fri, 4 Nov 2011, Thomas Hellstrom wrote: Hi. I have a question about the semantics of the DRM_IOCTL_MODE_CURSOR iotcl: Some hardware (vmware's virtual in particular) may not be able to pick up the changes from a bo directly, since the cursor data is sent though the command stream. Hence we need a notification when the cursor image has changed. Could we *require* that a cursor image change needs to be followed by an ioctl call with the flag DRM_MODE_CURSOR_BO? Thanks, Thomas FWIW, Acked-by: Ilija Hadzic I have a few places where I could use such an ioctl. BTW, Thomas, in the above "since the cursor data is sent though the command stream", did you mean "since the cursor data is *not* sent though the command stream". If it was sent, through command stream, then CS ioctl would know when the cursor changes. My understanding is that the cursor data are mmap-ed letting userland poke it at will (so the case when an "hourglass" changes into "arrow" is particularly hard to know that it happened). -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Virtual CRTCs (proposal + experimental code)
On Mon, 7 Nov 2011, Dave Airlie wrote: So I expect the way I'd like this to work, is that we have full blown drm drivers for all the devices and then some sort of linkage layer between them, so one driver can export crtcs from another, instead of having special case ctd drivers. I agree and that is actually a long-term plan from our side. CTD functionality should be integral part of existing drivers not the new driver, unless there is a new functionality that makes sense as CTD-only (like v4l2ctd). In the world that I imagine, likage layer is VCRTCM. Unaccelerated frabebuffer devices (UDL for example, but in general anything that "lives" in fbdev world) can choose (based on some policy from userland) whether to act as CTD driver and register themselves with VCRTCM (when they want acceleration "assistance" from a GPU in the system) or to load as normal fbdev devices (when they want to run on their own). Now I can see the reason for the v4l one, but I have for example a udl kms driver, and I'd like to have it work alongside this stuff, and userspace could bind the crtcs from it to another driver. I'm not sure how much work this would be or if its just a matter of adding a CTD interface to the udl kms device. The only reason we wrote a new udlctd driver was because it was quicker that way (we just ripped some code from udlfb driver and added our CTD functionality). The plan was always to merge udlctd and udlfb at some point, but first I'd like to see how perceptive is the community to the concept. If the concept makes sense, then we'll throw in enough programming to consolidate the drivers. Nobody wants three competing drivers for the same device (udlfb, your udl kms driver and our udlctd). Speaking of udl driver, is your udl-v2 branch competing with udlfb? Externally, they seem to do similar or the same thing, but one is based on DRM (and I guess the required DDX is xf86-video-modesetting), while the other is based on fbdev and the required DDX is xf8b-video-fbdev. While from my perspective both could be consilidated with a CTD functionality, but it makes me wonder in which direction is the community development moving: is everything from fbdev-world moving under DRM as a set of unaccelerated KMS drivers or is fbdev staying separate for good ? Depending on what the trend is, one or the other udl driver (udl-v2 or udlfb) will make more sense to be merged with udlctd. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Virtual CRTCs (proposal + experimental code)
On Wed, 23 Nov 2011, Dave Airlie wrote: So another question I have is how you would intend this to work from a user POV, like how it would integrate with a desktop environment, X or wayland, i.e. with little or no configuration. First thing to understand is that when a virtual CRTC is created, it looks to the user like the GPU has an additional DisplayPort connector. At present I "abuse" DisplayPort, but I have seen that you pushed a patch from VMware that adds Virtual connector, so eventually I'll switch to that naming. The number of virtual CRTCs is determined when the driver loads and that is a static configuration parameter. This does not restrict the user because unused virutal CRTCs are just like disconnected connectors on the GPU. In extreme case, a user could max out the number of virtual CRTCs (i.e. 32 minus #-of-physical-CRTCs), but in general the system needs to be booted with maximum number of anticipated CRTCs. Run-time addition and removal of CRTCs is not supported at this time and that would be much harder to implement and would affect the whole DRM module everywhere. So now we have a system that booted up and DRM sees all of its real connectors as well as virtual ones (as DisplayPorts at present). If there is no CTD device attached to virtual CRTCs, these virtual connectors are disconnected as far as DRM is concerned. Now the userspace must call "attach/fps" ioctl to associate CTDs with CRTCs. I'll explain shortly how to automate that and how to eliminate the burden from the user, but for now, please assume that "attach/fps" gets called from userland somehow. When the attach happens, that is a hotplug event (VCRTCM generates it) to DRM, just like someone plugged in the monitor. Then when XOrg starts, it will use the DisplayPort that represents a virtual CRTC just like any other connector. How it will use it, will depend on what the xorg.conf says, but the key point is that this connector is no different from any other connector that the GPU provides and is thus used as an "equal citizen". No special configuration is necessary once attached to CTD. If CTD is detached and new CTD attached, that is just like yanking out monitor cable and plugging in the new one. DRM will get all hotplug events and windowing system will do the same thing it would normally do with any other port. If RANDR is called to resize the desktop it will also work and X will have no idea that one of the connectors is on a virtual CRTC. I also have another feature, where when CTD is attached, it can ask the device it drives for the connection status and propagate that all the way back to DRM (this is useful for CTD devices that drive real monitors, like DisplayLink). So now let's go back to the attach/fps ioctl. To attach a CTD device this ioctl must happen as a result of some policy. That can be done by having CTD device generate UDEV events when it loads for which one can write policies to determine which CTD device attaches to which virtual CRTC. Ultimately that becomes an user configuration, but it's no different from what one has to do now with UDEV policies to customize the system. Having explained this, let's take your hotplug example that you put up on your web page and redo it with virtual CRTCs. Here is how it would work: Boot up the system and tell the GPU to create a few virtual CRTCs. Bring up Xorg with no DisplayLink dongles in it. Now plug in the DisplayLink. Once the CTD driver loads as the result of the hotplug (right now UDLCTD is a separate driver, but as we discussed before, this is a temporary state and at some point its CTD function should be merged wither with UDLFB or with your UDL-V2) CTD function in the driver generates an UDEV event. The policy directs UDEV to call run the program that issues the ioctl to perform attach/fps. Attach/fps of the UDLCTD is now a hotplug event and DRM "thinks" that a new connector changed the status from disconnected to connected. That causes it to query the modes for the new connector and because it's the virtual CRTC, it lands in the virtual CRTC helpers in the GPU driver. Virtual CRTC helpers route it to VCRTCM, which further routes to it to CTD (UDLCTD in this case). CTD returns the modes and DRM gets them ... the rest you know better than me what happens ;-) So this is your hotplug demo, but the difference is that the new desktop can use direct rendering. Also, everything that would work for a normal connector works here without having to do any additional tricks. RANDR also works seamlessly without having to do anything special. If you move away from Xorg, to some other system (Wayland?), it still works for as long as the new system knows how to deal with connectors that connect and disconnect. Everything I described above is ready to go except the UDEV event from UDLCTD and UDEV rules to automate this. Both are straightforwar and won't take long to do. So very shortly, I'll be able to
Re: [RFC] Virtual CRTCs (proposal + experimental code)
On Thu, 24 Nov 2011, Dave Airlie wrote: Okay so thats pretty much how I expected it to work, I don't think Virtual makes sense for a displaylink attached device though, again if you were using a real driver you would just re-use whatever output type it uses, though I'm not sure how well that works, That is the consequence of the fact that virtual CRTCs are created at startup time when attached CTD is not known, while CTDs are attached at runtime. So when I register the virtual CRTC and the associated connector I have to use something for the connector type. Admitting that my logic is biased by my design, to me "Virtual" connector type is an indicative that from GPU's perspective it's a connector that does not physically exist and is yet to be attached to some real display device. At that point the properties of the attached display become known to the system. Do you propogate full EDID information and all the modes or just the supported modes? we use this in userspace to put monitor names in GNOME display settings etc. Right now we propagate the entire list of modes that the attached CTD device has queried from the connected display (monitor). Propagating full EDID is really easy to add. That's if the CTD is driver for some real display. If CTD is just a "make-believe" display whose purpose is to be the conduit to some other pixel-processing component (e.g. V4L2CTD), then at some point in the chain we have to make up the set of modes that the logical display accepts and in that case the EDID does not exist by definition. what does xrandr output looks like for a radeon GPU with 4 vcrtcs? do you see 4 disconnected connectors? that again isn't a pretty user experience. Yes it shows 4 disconnected monitors. To me that is a logical consequence of the design in which virtual CRTCs and associated virtual connectors are always there. By know, it's clear to me that you are not too thrilled about it, but please allow me to turn the question back to you: in your solution with udl-v2 driver and a dedicated DDX for it, can you do the big desktop that spans across GPU's local and "foreign" displays and have acceleration on both? If not, what would it take you to get there and how complex the end result will be? I'll get to Optimus/PRIME use case later, but if we for the moment focus on the use case in which a dumb framebuffer device extens the number of displays of a rendering-capable GPU, I think that VCRTCM offers quite a complete and universal solution and it is completely transparent with regard to the application, window manager, and display server. Radeon + DisplayLink is the specific example. But in general it's Any GPU + Any fbdev. It's not just one use case, it's a whole class of use cases that would follow the same principle and for then the VCRTC alone suffices. My main problem with this is as I'll explain below it only covers some of the use cases, and I don't want a 50% solution at this point, by doing something like this you are making it harder to get proper support into something like wayland as they can ignore some of the problems, however since this doesn't solve all the other problems it means getting to a finished solution is actually less likely to happen. I presume that by 50% solution you are referring to Optimus/PRIME use case. That case actually consists of two related, but different problems. First is "render on node X and display on node Y" and the second is "dynamically and hitlessly switch rendering between node X and Y". I have never claimed that VCRTCs solve the second problem (I could switch by restarting Xorg, but I know that this is not the solution you are looking for). I fully understand why you want both problems solved at the same time. However, I don't understand why solving one first would inhibit solving the other. On the other hand, the Radeon + DisplayLink tandem use case (or in general GPU + fbdev tandem) consists only of "render on X, display on Y" problem. Here, you will probably say that there one can switch between hardware and software rendering and that it also has both problems. That is true, but unlike the Optimus/PRIME use case, using fbdev as a display extension to GPU is still useful alone. My point is that there is a value in solving first one problem and then follow with the other. I think the crux of the problem is that you are not convinced that the VCRTCM solution for problem #1 will make solving problem #2 easier and maybe you are afraid that it will make it harder. If that's a fair statement and if having me create an existence proof for problem #2 that still uses VCRTCM will help bring our positions closer, I am perfectly willing to do so I guess I've just signed up for some hacking ;-) Note that for hitless GPU switching, I fully agree that support must be in userspace (you have to swap out paths in Mesa and DDX before even getting to kernel), but like I said, that i
Re: [RFC] Virtual CRTCs (proposal + experimental code)
So I think we do have enough people interested in this and should be able to cobble together something that does The Right Thing. We indeed have a non-trivial set of people interested in the same set of problems and each of us has partial and maybe competing solution. I want to make it clear that my, maybe disruptive and different from the plan-of-record, proposal should not be viewed as destructive or distracting. I am just offering to the community what I think is useful. If this discussion sparks some joint effort that will bring us to the solution that everyone is happy with, even if no line of my code is found useful, I am perfectly fine with that (and I'll join the effort). So at this point I think I should put out my back-of-the-napkin desiderata. That will hopefully shed some light on where I am coming from with VCRTCM proposal. I want to be able to pull pixels out of the GPU and redirect them to an arbitrary device that can do something useful with them. This should not be limited to shooting photons into human eyeballs. I want to be able to run my applications without having to run X. I'd like the solution to be transparent to the application; that is, if I can write an application that can render something to a full screen, I want to redirect that "screen" to wherever I want without having to rewrite, recompile or relink the application. Actually, I want to do that redirection at runtime. I'd like to support all of the above in a way that it can also help solve more imminent shotcomings of Linux graphics system (Optimus, DisplayLink, etc. ... cf. previous E-mails in this thread). I'd like it to work with multiple render nodes on the same GPU (something like Dave's multiseat work, in which both GPU and its display resources are virtual). The logical consequence of this is that the render node and the display node should at some point become logically separate (different driver modules) even if they are physically on the same GPU. They are really two different subsystems that just happen to reside on the same circuit board, so it makes sense to separate them. I don't think what I am saying is anything unique and what I said probably overlaps in good part with what others also want from the graphics subsystem. I can see the role of VCRTCM in all of the above, but I am open-minded. If we end up with a solution that has nothing to do with VCRTCM, I have no emotional ties with my code (and code of my colleagues that worked with me so far). -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
gma500 on drm-next branch, compile problems
Dave & Alan: Maybe I am goofing up something on my end, but gma500 driver on drm-next branch won't compile for me. I have to apply the two patches that follow this note to make it work. The first is a trivial oversight, but the second makes me wonder whether a stale driver was merged. -- Ilija ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel