[PATCH] drm/radeon: avoid kernel segfault in vce when gpu fails to resume

2017-02-06 Thread j . glisse
From: Jérôme Glisse 

When GPU fails to resume we can not trust that value we write to GPU
memory will post and we might get garbage (more like 0x on
x86) when reading them back. This trigger out of range memory access
in the kernel inside the vce resume code path.

This patch use canonical value to compute offset instead of reading
back value from GPU memory.

Signed-off-by: Jérôme Glisse 
---
 drivers/gpu/drm/radeon/vce_v1_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/vce_v1_0.c 
b/drivers/gpu/drm/radeon/vce_v1_0.c
index a01efe3..f541a4b 100644
--- a/drivers/gpu/drm/radeon/vce_v1_0.c
+++ b/drivers/gpu/drm/radeon/vce_v1_0.c
@@ -196,7 +196,7 @@ int vce_v1_0_load_fw(struct radeon_device *rdev, uint32_t 
*data)
memset(&data[5], 0, 44);
memcpy(&data[16], &sign[1], rdev->vce_fw->size - sizeof(*sign));
 
-   data += le32_to_cpu(data[4]) / 4;
+   data += (le32_to_cpu(sign->len) + 64) / 4;
data[0] = sign->val[i].sigval[0];
data[1] = sign->val[i].sigval[1];
data[2] = sign->val[i].sigval[2];
-- 
2.9.3

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] radeon: fix pll/ctrc mapping on dce2 and dce3 hardware

2012-11-27 Thread j . glisse
From: Jerome Glisse 

This fix black screen on resume issue that some people are
experiencing. There is a bug in the atombios code regarding
pll/crtc mapping. The atombios code reverse the logic for
the pll and crtc mapping.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/atombios_crtc.c | 54 +-
 1 file changed, 20 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c 
b/drivers/gpu/drm/radeon/atombios_crtc.c
index 3bce029..7c1f080 100644
--- a/drivers/gpu/drm/radeon/atombios_crtc.c
+++ b/drivers/gpu/drm/radeon/atombios_crtc.c
@@ -1696,42 +1696,28 @@ static int radeon_atom_pick_pll(struct drm_crtc *crtc)
return ATOM_PPLL2;
DRM_ERROR("unable to allocate a PPLL\n");
return ATOM_PPLL_INVALID;
-   } else if (ASIC_IS_AVIVO(rdev)) {
-   /* in DP mode, the DP ref clock can come from either PPLL
-* depending on the asic:
-* DCE3: PPLL1 or PPLL2
-*/
-   if 
(ENCODER_MODE_IS_DP(atombios_get_encoder_mode(radeon_crtc->encoder))) {
-   /* use the same PPLL for all DP monitors */
-   pll = radeon_get_shared_dp_ppll(crtc);
-   if (pll != ATOM_PPLL_INVALID)
-   return pll;
-   } else {
-   /* use the same PPLL for all monitors with the same 
clock */
-   pll = radeon_get_shared_nondp_ppll(crtc);
-   if (pll != ATOM_PPLL_INVALID)
-   return pll;
-   }
-   /* all other cases */
-   pll_in_use = radeon_get_pll_use_mask(crtc);
-   /* the order shouldn't matter here, but we probably
-* need this until we have atomic modeset
-*/
-   if (rdev->flags & RADEON_IS_IGP) {
-   if (!(pll_in_use & (1 << ATOM_PPLL1)))
-   return ATOM_PPLL1;
-   if (!(pll_in_use & (1 << ATOM_PPLL2)))
-   return ATOM_PPLL2;
-   } else {
-   if (!(pll_in_use & (1 << ATOM_PPLL2)))
-   return ATOM_PPLL2;
-   if (!(pll_in_use & (1 << ATOM_PPLL1)))
-   return ATOM_PPLL1;
-   }
-   DRM_ERROR("unable to allocate a PPLL\n");
-   return ATOM_PPLL_INVALID;
} else {
/* on pre-R5xx asics, the crtc to pll mapping is hardcoded */
+   /* some atombios (observed in some DCE2/DCE3) code have a bug,
+* the matching btw pll and crtc is done through
+* PCLK_CRTC[1|2]_CNTL (0x480/0x484) but atombios code use the
+* pll (1 or 2) to select which register to write. ie if using
+* pll1 it will use PCLK_CRTC1_CNTL (0x480) and if using pll2
+* it will use PCLK_CRTC2_CNTL (0x484), it then use crtc id to
+* choose which value to write. Which is reverse order from
+* register logic. So only case that works is when pllid is
+* same as crtcid or when both pll and crtc are enabled and
+* both use same clock.
+*
+* So just return crtc id as if crtc and pll were hard linked
+* together even if they aren't
+*/
+   if (radeon_crtc->crtc_id > 1) {
+   /* crtc other than crtc1 and crtc2 can only be use for
+* DP those doesn't need a valid pll to work.
+*/
+   return ATOM_PPLL_INVALID;
+   }
return radeon_crtc->crtc_id;
}
 }
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/2] radeon: fix pll/ctrc mapping on dce2 and dce3 hardware v2

2012-11-27 Thread j . glisse
From: Jerome Glisse 

This fix black screen on resume issue that some people are
experiencing. There is a bug in the atombios code regarding
pll/crtc mapping. The atombios code reverse the logic for
the pll and crtc mapping.

v2: DCE3 or DCE2 only have 2 crtc

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/atombios_crtc.c | 48 ++
 1 file changed, 14 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c 
b/drivers/gpu/drm/radeon/atombios_crtc.c
index 3bce029..24d932f 100644
--- a/drivers/gpu/drm/radeon/atombios_crtc.c
+++ b/drivers/gpu/drm/radeon/atombios_crtc.c
@@ -1696,42 +1696,22 @@ static int radeon_atom_pick_pll(struct drm_crtc *crtc)
return ATOM_PPLL2;
DRM_ERROR("unable to allocate a PPLL\n");
return ATOM_PPLL_INVALID;
-   } else if (ASIC_IS_AVIVO(rdev)) {
-   /* in DP mode, the DP ref clock can come from either PPLL
-* depending on the asic:
-* DCE3: PPLL1 or PPLL2
-*/
-   if 
(ENCODER_MODE_IS_DP(atombios_get_encoder_mode(radeon_crtc->encoder))) {
-   /* use the same PPLL for all DP monitors */
-   pll = radeon_get_shared_dp_ppll(crtc);
-   if (pll != ATOM_PPLL_INVALID)
-   return pll;
-   } else {
-   /* use the same PPLL for all monitors with the same 
clock */
-   pll = radeon_get_shared_nondp_ppll(crtc);
-   if (pll != ATOM_PPLL_INVALID)
-   return pll;
-   }
-   /* all other cases */
-   pll_in_use = radeon_get_pll_use_mask(crtc);
-   /* the order shouldn't matter here, but we probably
-* need this until we have atomic modeset
-*/
-   if (rdev->flags & RADEON_IS_IGP) {
-   if (!(pll_in_use & (1 << ATOM_PPLL1)))
-   return ATOM_PPLL1;
-   if (!(pll_in_use & (1 << ATOM_PPLL2)))
-   return ATOM_PPLL2;
-   } else {
-   if (!(pll_in_use & (1 << ATOM_PPLL2)))
-   return ATOM_PPLL2;
-   if (!(pll_in_use & (1 << ATOM_PPLL1)))
-   return ATOM_PPLL1;
-   }
-   DRM_ERROR("unable to allocate a PPLL\n");
-   return ATOM_PPLL_INVALID;
} else {
/* on pre-R5xx asics, the crtc to pll mapping is hardcoded */
+   /* some atombios (observed in some DCE2/DCE3) code have a bug,
+* the matching btw pll and crtc is done through
+* PCLK_CRTC[1|2]_CNTL (0x480/0x484) but atombios code use the
+* pll (1 or 2) to select which register to write. ie if using
+* pll1 it will use PCLK_CRTC1_CNTL (0x480) and if using pll2
+* it will use PCLK_CRTC2_CNTL (0x484), it then use crtc id to
+* choose which value to write. Which is reverse order from
+* register logic. So only case that works is when pllid is
+* same as crtcid or when both pll and crtc are enabled and
+* both use same clock.
+*
+* So just return crtc id as if crtc and pll were hard linked
+* together even if they aren't
+*/
return radeon_crtc->crtc_id;
}
 }
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/2] drm/radeon: fix deadlock when bo is associated to different handle

2012-11-27 Thread j . glisse
From: Jerome Glisse 

There is a rare case, that seems to only happen accross suspend/resume
cycle, where a bo is associated with several different handle. This
lead to a deadlock in ttm buffer reservation path. This could only
happen with flinked(globaly exported) object. Userspace should not
reopen multiple time a globaly exported object.

However the kernel should handle gracefully this corner case and not
keep rejecting the userspace command stream. This is the object of
this patch.

Fix suspend/resume issue where user see following message :
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35!

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon_cs.c | 53 ++
 1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 41672cc..064e64d 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -54,39 +54,48 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser 
*p)
return -ENOMEM;
}
for (i = 0; i < p->nrelocs; i++) {
-   struct drm_radeon_cs_reloc *r;
-
+   struct drm_radeon_cs_reloc *reloc;
+
+   /* One bo could be associated with several different handle.
+* Only happen for flinked bo that are open several time.
+*
+* FIXME:
+* Maybe we should consider an alternative to idr for gem
+* object to insure a 1:1 uniq mapping btw handle and gem
+* object.
+*/
duplicate = false;
-   r = (struct drm_radeon_cs_reloc *)&chunk->kdata[i*4];
+   reloc = (struct drm_radeon_cs_reloc *)&chunk->kdata[i*4];
+   p->relocs[i].handle = 0;
+   p->relocs[i].flags = reloc->flags;
+   p->relocs[i].gobj = drm_gem_object_lookup(ddev,
+ p->filp,
+ reloc->handle);
+   if (p->relocs[i].gobj == NULL) {
+   DRM_ERROR("gem object lookup failed 0x%x\n",
+ reloc->handle);
+   return -ENOENT;
+   }
+   p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
+   p->relocs[i].lobj.bo = p->relocs[i].robj;
+   p->relocs[i].lobj.wdomain = reloc->write_domain;
+   p->relocs[i].lobj.rdomain = reloc->read_domains;
+   p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo;
+
for (j = 0; j < i; j++) {
-   if (r->handle == p->relocs[j].handle) {
+   if (p->relocs[i].lobj.bo == p->relocs[j].lobj.bo) {
p->relocs_ptr[i] = &p->relocs[j];
duplicate = true;
break;
}
}
+
if (!duplicate) {
-   p->relocs[i].gobj = drm_gem_object_lookup(ddev,
- p->filp,
- r->handle);
-   if (p->relocs[i].gobj == NULL) {
-   DRM_ERROR("gem object lookup failed 0x%x\n",
- r->handle);
-   return -ENOENT;
-   }
p->relocs_ptr[i] = &p->relocs[i];
-   p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
-   p->relocs[i].lobj.bo = p->relocs[i].robj;
-   p->relocs[i].lobj.wdomain = r->write_domain;
-   p->relocs[i].lobj.rdomain = r->read_domains;
-   p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo;
-   p->relocs[i].handle = r->handle;
-   p->relocs[i].flags = r->flags;
+   p->relocs[i].handle = reloc->handle;
radeon_bo_list_add_object(&p->relocs[i].lobj,
  &p->validated);
-
-   } else
-   p->relocs[i].handle = 0;
+   }
}
return radeon_bo_list_validate(&p->validated);
 }
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: track global bo name and always return the same

2012-11-27 Thread j . glisse
From: Jerome Glisse 

To avoid kernel rejecting cs if we return different global name
for same bo keep track of global name and always return the same.
Seems to fix issue with suspend/resume failing and repeatly printing
following message :
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35!

There might still be way for a rogue program to trigger this issue.

Signed-off-by: Jerome Glisse 
---
 radeon/radeon_bo_gem.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/radeon/radeon_bo_gem.c b/radeon/radeon_bo_gem.c
index 265f177..fca0aaf 100644
--- a/radeon/radeon_bo_gem.c
+++ b/radeon/radeon_bo_gem.c
@@ -47,11 +47,11 @@
 #include "radeon_bo_gem.h"
 #include 
 struct radeon_bo_gem {
-struct radeon_bo_int base;
-uint32_tname;
-int map_count;
-atomic_treloc_in_cs;
-void *priv_ptr;
+struct radeon_bo_intbase;
+uint32_tname;
+int map_count;
+atomic_treloc_in_cs;
+void*priv_ptr;
 };
 
 struct bo_manager_gem {
@@ -320,15 +320,21 @@ void *radeon_gem_get_reloc_in_cs(struct radeon_bo *bo)
 
 int radeon_gem_get_kernel_name(struct radeon_bo *bo, uint32_t *name)
 {
+struct radeon_bo_gem *bo_gem = (struct radeon_bo_gem*)bo;
 struct radeon_bo_int *boi = (struct radeon_bo_int *)bo;
 struct drm_gem_flink flink;
 int r;
 
+if (bo_gem->name) {
+*name = bo_gem->name;
+return 0;
+}
 flink.handle = bo->handle;
 r = drmIoctl(boi->bom->fd, DRM_IOCTL_GEM_FLINK, &flink);
 if (r) {
 return r;
 }
+bo_gem->name = flink.name;
 *name = flink.name;
 return 0;
 }
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/ttm: do not try to preserve caching state

2012-11-28 Thread j . glisse
From: Jerome Glisse 

It make no sense to preserve caching state especialy when
moving from vram to system. It burden the page allocator to
match the vram caching (often WC) which just burn CPU cycle
for no good reasons.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bf6e4b5..39dcc58 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -896,19 +896,12 @@ static int ttm_bo_mem_force_space(struct 
ttm_buffer_object *bo,
 }
 
 static uint32_t ttm_bo_select_caching(struct ttm_mem_type_manager *man,
- uint32_t cur_placement,
  uint32_t proposed_placement)
 {
uint32_t caching = proposed_placement & TTM_PL_MASK_CACHING;
uint32_t result = proposed_placement & ~TTM_PL_MASK_CACHING;
 
-   /**
-* Keep current caching if possible.
-*/
-
-   if ((cur_placement & caching) != 0)
-   result |= (cur_placement & caching);
-   else if ((man->default_caching & caching) != 0)
+   if ((man->default_caching & caching) != 0)
result |= man->default_caching;
else if ((TTM_PL_FLAG_CACHED & caching) != 0)
result |= TTM_PL_FLAG_CACHED;
@@ -978,8 +971,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
if (!type_ok)
continue;
 
-   cur_flags = ttm_bo_select_caching(man, bo->mem.placement,
- cur_flags);
+   cur_flags = ttm_bo_select_caching(man, cur_flags);
/*
 * Use the access and other non-mapping-related flag bits from
 * the memory placement flags to the current flags
@@ -1023,8 +1015,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
&cur_flags))
continue;
 
-   cur_flags = ttm_bo_select_caching(man, bo->mem.placement,
- cur_flags);
+   cur_flags = ttm_bo_select_caching(man, cur_flags);
/*
 * Use the access and other non-mapping-related flag bits from
 * the memory placement flags to the current flags
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC] drm/ttm: add minimum residency constraint for bo eviction

2012-11-28 Thread j . glisse
So i spend the day looking at ttm and eviction. The first patch i sent
earlier is i believe something that should be merged. This patch however
is more about discussing if other people are interested in similar mecanism
to be share among driver through ttm. I could otherwise just move its logic
to the radeon driver.

So the idea of this patch is that we don't want to constantly move object
in and out of certain memory pool, mostly VRAM. So it adds a minimum
residency time and no object that have been in the given pool for less
than this residency time can be moved out. It closely solve regression
we are having with radeon since gallium driver change and probably improve
some other workload.

Statistic i gathered on xonotic/realquake showed that we can have as much
as 1GB in each direction (VRAM to system and system to vram) over a second.
So we are obviously not saturating the PCIE bandwidth. Profiling shows that
80-90% of the cost of this eviction is in memory allocation/deallocation for
the system memory (lot of irqlock, and mostly kernel spending time
allocating pages thing 256 000 or more page per second to allocate/deallocate.

I used this WIP patch to gather statistic and play with various combination :
http://people.freedesktop.org/~glisse/0001-TTM-EVICT-WIP.patch

Some numbers with xonotic :
17.369fps stock 3.7 kernel
27.883fps 3.7 kernel + do not preserve caching patch ~ +60%
49.292fps 3.7 kernel + WIP with 500ms residency for all pool and no bo wait
  for eviction
49.258fps 3.7 kernel + WIP with 500ms residency for all pool and bo wait
48.213fps 3.7 kernel always allowing GTT placement (basicly revent the
  gallium patch effect)

Other design i am thinking of is changing the way radeon handle it's memory
and stop trying to revalidate object to different memory pool at each cs,
instead i think we should keep a vram lru list probably per process and move
bo out of vram according to this lru and following some euristic. So radeon
would only move bo into vram when there is room.

Other improvement i am thinking of is to reuse GTT memory of object that are
moved in for object that are evicted as statistic i gathered showed that it's
often close amount that move in and out. But this would require true dma
as it would mean scheduling in/out move on page granularity or group of
page (write 4 pages from vram to scratch 4pages into sys, write 4 pages of
system memory bo to vram 4 pages, write 4pages of vram to the just moved
4pages of system memory ...).

Cheers,
Jerome

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/ttm: add minimum residency constraint for bo eviction

2012-11-28 Thread j . glisse
From: Jerome Glisse 

This patch add a minimum residency time configurable for each memory
pool (VRAM, GTT, ...). Intention is to avoid having a lot of memory
eviction from VRAM up to a point where the GPU pretty much spend all
it's time moving things in and out.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 3 +++
 drivers/gpu/drm/ttm/ttm_bo.c| 7 +++
 include/drm/ttm/ttm_bo_api.h| 1 +
 include/drm/ttm/ttm_bo_driver.h | 1 +
 4 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 5ebe1b3..88722c4 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -129,11 +129,13 @@ static int radeon_init_mem_type(struct ttm_bo_device 
*bdev, uint32_t type,
switch (type) {
case TTM_PL_SYSTEM:
/* System memory */
+   man->minimum_residency_time_ms = 0;
man->flags = TTM_MEMTYPE_FLAG_MAPPABLE;
man->available_caching = TTM_PL_MASK_CACHING;
man->default_caching = TTM_PL_FLAG_CACHED;
break;
case TTM_PL_TT:
+   man->minimum_residency_time_ms = 0;
man->func = &ttm_bo_manager_func;
man->gpu_offset = rdev->mc.gtt_start;
man->available_caching = TTM_PL_MASK_CACHING;
@@ -156,6 +158,7 @@ static int radeon_init_mem_type(struct ttm_bo_device *bdev, 
uint32_t type,
break;
case TTM_PL_VRAM:
/* "On-card" video ram */
+   man->minimum_residency_time_ms = 500;
man->func = &ttm_bo_manager_func;
man->gpu_offset = rdev->mc.vram_start;
man->flags = TTM_MEMTYPE_FLAG_FIXED |
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 39dcc58..40476121 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -452,6 +452,7 @@ moved:
bo->cur_placement = bo->mem.placement;
} else
bo->offset = 0;
+   bo->jiffies = jiffies;
 
return 0;
 
@@ -810,6 +811,12 @@ retry:
}
 
bo = list_first_entry(&man->lru, struct ttm_buffer_object, lru);
+
+   if (time_after(jiffies, bo->jiffies) && jiffies_to_msecs(jiffies - 
bo->jiffies) >= man->minimum_residency_time_ms) {
+   spin_unlock(&glob->lru_lock);
+   return -EBUSY;
+   }
+
kref_get(&bo->list_kref);
 
if (!list_empty(&bo->ddestroy)) {
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index e8028ad..9e12313 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -275,6 +275,7 @@ struct ttm_buffer_object {
 
unsigned long offset;
uint32_t cur_placement;
+   unsigned long jiffies;
 
struct sg_table *sg;
 };
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index d803b92..7f60a18e6 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -280,6 +280,7 @@ struct ttm_mem_type_manager {
struct mutex io_reserve_mutex;
bool use_io_reserve_lru;
bool io_reserve_fastpath;
+   unsigned long minimum_residency_time_ms;
 
/*
 * Protected by @io_reserve_mutex:
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: use cached memory when evicting for vram on non agp

2012-11-28 Thread j . glisse
From: Jerome Glisse 

Force the use of cached memory when evicting from vram on non agp
hardware. Also force write combine on agp hw. This is to insure
the minimum cache type change when allocating memory and improving
memory eviction especialy on pci/pcie hw.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon_object.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index b91118c..3f9f3bb 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -88,10 +88,20 @@ void radeon_ttm_placement_from_domain(struct radeon_bo 
*rbo, u32 domain)
if (domain & RADEON_GEM_DOMAIN_VRAM)
rbo->placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_UNCACHED |
TTM_PL_FLAG_VRAM;
-   if (domain & RADEON_GEM_DOMAIN_GTT)
-   rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_TT;
-   if (domain & RADEON_GEM_DOMAIN_CPU)
-   rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM;
+   if (domain & RADEON_GEM_DOMAIN_GTT) {
+   if (rbo->rdev->flags & RADEON_IS_AGP) {
+   rbo->placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_TT;
+   } else {
+   rbo->placements[c++] = TTM_PL_FLAG_CACHED | 
TTM_PL_FLAG_TT;
+   }
+   }
+   if (domain & RADEON_GEM_DOMAIN_CPU) {
+   if (rbo->rdev->flags & RADEON_IS_AGP) {
+   rbo->placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_TT;
+   } else {
+   rbo->placements[c++] = TTM_PL_FLAG_CACHED | 
TTM_PL_FLAG_TT;
+   }
+   }
if (!c)
rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM;
rbo->placement.num_placement = c;
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: fix rare segfault after gpu lockup on r7xx

2012-11-29 Thread j . glisse
From: Jerome Glisse 

If GPU reset fails the gart table ptr might be NULL avoid a
kernel segfault in this rare event.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/r600.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index cda280d..0e3a68a 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -843,7 +843,9 @@ void r600_pcie_gart_tlb_flush(struct radeon_device *rdev)
 * method for them.
 */
WREG32(HDP_DEBUG1, 0);
-   tmp = readl((void __iomem *)ptr);
+   if (ptr) {
+   tmp = readl((void __iomem *)ptr);
+   }
} else
WREG32(R_005480_HDP_MEM_COHERENCY_FLUSH_CNTL, 0x1);
 
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC] improve memory placement for radeon

2012-11-29 Thread j . glisse
So as a followup is 2 patch. The first one just stop trying to move
object at each cs ioctl i believe it could be included in 3.7 as it
improve performances (especialy with vram change from userspace).

The second one implement a vram eviction policy. It's a simple one,
buffer used for write operation are more important than buffer used
for read operation. Buffer get evicted from vram only if they haven't
been use in the last 50ms (so in the last few frames) and only if
there is buffer that have been recently use and that could be move
into vram. This is mostly were i believe discussion should be,
what kind of heuristic would work better than tat.

So without first patch and with mesa master xonotic high is at 17fps,
with first patch it goes to 40fps, with second patch it goes to 48fps.

Cheers,
Jerome

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/2] drm/radeon: do not move bo to different placement at each cs

2012-11-29 Thread j . glisse
From: Jerome Glisse 

The bo creation placement is where the bo will be. Instead of trying
to move bo at each command stream let this work to another worker
thread that will use more advance heuristic.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h|  1 +
 drivers/gpu/drm/radeon/radeon_object.c | 17 -
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8c42d54..0a2664c 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -313,6 +313,7 @@ struct radeon_bo {
struct list_headlist;
/* Protected by tbo.reserved */
u32 placements[3];
+   u32 busy_placements[3];
struct ttm_placementplacement;
struct ttm_buffer_objecttbo;
struct ttm_bo_kmap_obj  kmap;
diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index 3f9f3bb..e25ae20 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -84,7 +84,6 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, 
u32 domain)
rbo->placement.fpfn = 0;
rbo->placement.lpfn = 0;
rbo->placement.placement = rbo->placements;
-   rbo->placement.busy_placement = rbo->placements;
if (domain & RADEON_GEM_DOMAIN_VRAM)
rbo->placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_UNCACHED |
TTM_PL_FLAG_VRAM;
@@ -105,6 +104,14 @@ void radeon_ttm_placement_from_domain(struct radeon_bo 
*rbo, u32 domain)
if (!c)
rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM;
rbo->placement.num_placement = c;
+
+   c = 0;
+   rbo->placement.busy_placement = rbo->busy_placements;
+   if (rbo->rdev->flags & RADEON_IS_AGP) {
+   rbo->busy_placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_TT;
+   } else {
+   rbo->busy_placements[c++] = TTM_PL_FLAG_CACHED | TTM_PL_FLAG_TT;
+   }
rbo->placement.num_busy_placement = c;
 }
 
@@ -360,17 +367,9 @@ int radeon_bo_list_validate(struct list_head *head)
list_for_each_entry(lobj, head, tv.head) {
bo = lobj->bo;
if (!bo->pin_count) {
-   domain = lobj->wdomain ? lobj->wdomain : lobj->rdomain;
-   
-   retry:
-   radeon_ttm_placement_from_domain(bo, domain);
r = ttm_bo_validate(&bo->tbo, &bo->placement,
true, false, false);
if (unlikely(r)) {
-   if (r != -ERESTARTSYS && domain == 
RADEON_GEM_DOMAIN_VRAM) {
-   domain |= RADEON_GEM_DOMAIN_GTT;
-   goto retry;
-   }
return r;
}
}
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/2] drm/radeon: buffer memory placement work thread WIP

2012-11-29 Thread j . glisse
From: Jerome Glisse 

Use delayed work thread to move buffer out of vram if they haven't
been use over some period of time. This allow to make room for
buffer that are actively use.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h|  13 ++
 drivers/gpu/drm/radeon/radeon_cs.c |   2 +-
 drivers/gpu/drm/radeon/radeon_device.c |   8 ++
 drivers/gpu/drm/radeon/radeon_object.c | 241 -
 drivers/gpu/drm/radeon/radeon_object.h |   3 +-
 5 files changed, 262 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 0a2664c..a2e92da 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -102,6 +102,8 @@ extern int radeon_lockup_timeout;
  */
 #define RADEON_MAX_USEC_TIMEOUT10  /* 100 ms */
 #define RADEON_FENCE_JIFFIES_TIMEOUT   (HZ / 2)
+#define RADEON_PLACEMENT_WORK_MS   500
+#define RADEON_PLACEMENT_MAX_EVICTION  8
 /* RADEON_IB_POOL_SIZE must be a power of 2 */
 #define RADEON_IB_POOL_SIZE16
 #define RADEON_DEBUGFS_MAX_COMPONENTS  32
@@ -311,6 +313,10 @@ struct radeon_bo_va {
 struct radeon_bo {
/* Protected by gem.mutex */
struct list_headlist;
+   /* Protected by rdev->placement_mutex */
+   struct list_headplist;
+   struct list_head*head;
+   unsigned long   last_use_jiffies;
/* Protected by tbo.reserved */
u32 placements[3];
u32 busy_placements[3];
@@ -1523,6 +1529,13 @@ struct radeon_device {
struct drm_device   *ddev;
struct pci_dev  *pdev;
struct rw_semaphore exclusive_lock;
+   struct mutexplacement_mutex;
+   struct list_headwvram_in_list;
+   struct list_headrvram_in_list;
+   struct list_headwvram_out_list;
+   struct list_headrvram_out_list;
+   struct delayed_work placement_work;
+   unsigned long   vram_in_size;
/* ASIC */
union radeon_asic_configconfig;
enum radeon_family  family;
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 41672cc..e9e90bc 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -88,7 +88,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
} else
p->relocs[i].handle = 0;
}
-   return radeon_bo_list_validate(&p->validated);
+   return radeon_bo_list_validate(p->rdev, &p->validated);
 }
 
 static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 
priority)
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index e2f5f88..0c4c874 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1001,6 +1001,14 @@ int radeon_device_init(struct radeon_device *rdev,
init_rwsem(&rdev->pm.mclk_lock);
init_rwsem(&rdev->exclusive_lock);
init_waitqueue_head(&rdev->irq.vblank_queue);
+
+   mutex_init(&rdev->placement_mutex);
+   INIT_LIST_HEAD(&rdev->wvram_in_list);
+   INIT_LIST_HEAD(&rdev->rvram_in_list);
+   INIT_LIST_HEAD(&rdev->wvram_out_list);
+   INIT_LIST_HEAD(&rdev->rvram_out_list);
+   INIT_DELAYED_WORK(&rdev->placement_work, radeon_placement_work_handler);
+
r = radeon_gem_init(rdev);
if (r)
return r;
diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index e25ae20..f2bcc5f 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -64,6 +64,10 @@ static void radeon_ttm_bo_destroy(struct ttm_buffer_object 
*tbo)
mutex_lock(&bo->rdev->gem.mutex);
list_del_init(&bo->list);
mutex_unlock(&bo->rdev->gem.mutex);
+   mutex_lock(&bo->rdev->placement_mutex);
+   list_del_init(&bo->plist);
+   bo->head = NULL;
+   mutex_unlock(&bo->rdev->placement_mutex);
radeon_bo_clear_surface_reg(bo);
radeon_bo_clear_va(bo);
drm_gem_object_release(&bo->gem_base);
@@ -153,6 +157,8 @@ int radeon_bo_create(struct radeon_device *rdev,
bo->surface_reg = -1;
INIT_LIST_HEAD(&bo->list);
INIT_LIST_HEAD(&bo->va);
+   INIT_LIST_HEAD(&bo->plist);
+   bo->head = NULL;
radeon_ttm_placement_from_domain(bo, domain);
/* Kernel allocation are uninterruptible */
down_read(&rdev->pm.mclk_lock);
@@ -263,8 +269,14 @@ int radeon_bo_pin_restricted(struct radeon_bo *bo, u32 
domain, u64 max_offset,
if (gpu_addr != NULL)
*

[PATCH] drm/radeon: fix amd afusion gpu setup aka sumo v2

2012-12-11 Thread j . glisse
From: Jerome Glisse 

Set the proper number of tile pipe that should be a multiple of
pipe depending on the number of se engine.

Fix:
https://bugs.freedesktop.org/show_bug.cgi?id=56405
https://bugs.freedesktop.org/show_bug.cgi?id=56720

v2: Don't change sumo2

Signed-off-by: Jerome Glisse 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/radeon/evergreen.c  | 8 
 drivers/gpu/drm/radeon/evergreend.h | 2 ++
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 14313ad..b957de1 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -1819,7 +1819,7 @@ static void evergreen_gpu_init(struct radeon_device *rdev)
case CHIP_SUMO:
rdev->config.evergreen.num_ses = 1;
rdev->config.evergreen.max_pipes = 4;
-   rdev->config.evergreen.max_tile_pipes = 2;
+   rdev->config.evergreen.max_tile_pipes = 4;
if (rdev->pdev->device == 0x9648)
rdev->config.evergreen.max_simds = 3;
else if ((rdev->pdev->device == 0x9647) ||
@@ -1842,7 +1842,7 @@ static void evergreen_gpu_init(struct radeon_device *rdev)
rdev->config.evergreen.sc_prim_fifo_size = 0x40;
rdev->config.evergreen.sc_hiz_tile_fifo_size = 0x30;
rdev->config.evergreen.sc_earlyz_tile_fifo_size = 0x130;
-   gb_addr_config = REDWOOD_GB_ADDR_CONFIG_GOLDEN;
+   gb_addr_config = SUMO_GB_ADDR_CONFIG_GOLDEN;
break;
case CHIP_SUMO2:
rdev->config.evergreen.num_ses = 1;
@@ -1864,7 +1864,7 @@ static void evergreen_gpu_init(struct radeon_device *rdev)
rdev->config.evergreen.sc_prim_fifo_size = 0x40;
rdev->config.evergreen.sc_hiz_tile_fifo_size = 0x30;
rdev->config.evergreen.sc_earlyz_tile_fifo_size = 0x130;
-   gb_addr_config = REDWOOD_GB_ADDR_CONFIG_GOLDEN;
+   gb_addr_config = SUMO2_GB_ADDR_CONFIG_GOLDEN;
break;
case CHIP_BARTS:
rdev->config.evergreen.num_ses = 2;
@@ -1912,7 +1912,7 @@ static void evergreen_gpu_init(struct radeon_device *rdev)
break;
case CHIP_CAICOS:
rdev->config.evergreen.num_ses = 1;
-   rdev->config.evergreen.max_pipes = 4;
+   rdev->config.evergreen.max_pipes = 2;
rdev->config.evergreen.max_tile_pipes = 2;
rdev->config.evergreen.max_simds = 2;
rdev->config.evergreen.max_backends = 1 * 
rdev->config.evergreen.num_ses;
diff --git a/drivers/gpu/drm/radeon/evergreend.h 
b/drivers/gpu/drm/radeon/evergreend.h
index df542f1..52c89c9 100644
--- a/drivers/gpu/drm/radeon/evergreend.h
+++ b/drivers/gpu/drm/radeon/evergreend.h
@@ -45,6 +45,8 @@
 #define TURKS_GB_ADDR_CONFIG_GOLDEN  0x02010002
 #define CEDAR_GB_ADDR_CONFIG_GOLDEN  0x02010001
 #define CAICOS_GB_ADDR_CONFIG_GOLDEN 0x02010001
+#define SUMO_GB_ADDR_CONFIG_GOLDEN   0x02010002
+#define SUMO2_GB_ADDR_CONFIG_GOLDEN  0x02010002
 
 /* Registers */
 
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: fix fence driver for dma ring when wb is disabled

2012-12-12 Thread j . glisse
From: Jerome Glisse 

The dma ring can't write to register thus have to write to memory
its fence value. This ensure that it doesn't try to use scratch
register for dma ring fence driver.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/r600.c | 3 ++-
 drivers/gpu/drm/radeon/radeon_fence.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index a76eca1..2aaf147 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2533,11 +2533,12 @@ void r600_dma_fence_ring_emit(struct radeon_device 
*rdev,
 {
struct radeon_ring *ring = &rdev->ring[fence->ring];
u64 addr = rdev->fence_drv[fence->ring].gpu_addr;
+
/* write the fence */
radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_FENCE, 0, 0, 0));
radeon_ring_write(ring, addr & 0xfffc);
radeon_ring_write(ring, (upper_32_bits(addr) & 0xff));
-   radeon_ring_write(ring, fence->seq);
+   radeon_ring_write(ring, lower_32_bits(fence->seq));
/* generate an interrupt */
radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_TRAP, 0, 0, 0));
 }
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 22bd6c2..410a975 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -772,7 +772,7 @@ int radeon_fence_driver_start_ring(struct radeon_device 
*rdev, int ring)
int r;
 
radeon_scratch_free(rdev, rdev->fence_drv[ring].scratch_reg);
-   if (rdev->wb.use_event) {
+   if (rdev->wb.use_event || !radeon_ring_supports_scratch_reg(rdev, 
&rdev->ring[ring])) {
rdev->fence_drv[ring].scratch_reg = 0;
index = R600_WB_EVENT_OFFSET + ring * 4;
} else {
-- 
1.8.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: fix htile buffer size computation for command stream checker

2012-12-13 Thread j . glisse
From: Jerome Glisse 

Fix the size computation of the htile buffer.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/evergreen_cs.c | 17 +--
 drivers/gpu/drm/radeon/r600_cs.c  | 92 ---
 drivers/gpu/drm/radeon/radeon_drv.c   |  3 +-
 3 files changed, 35 insertions(+), 77 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 62c2271..fc7e613 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -507,20 +507,28 @@ static int evergreen_cs_track_validate_htile(struct 
radeon_cs_parser *p,
/* height is npipes htiles aligned == npipes * 8 pixel aligned 
*/
nby = round_up(nby, track->npipes * 8);
} else {
+   /* always assume 8x8 htile */
+   /* align is htile align * 8, htile align vary according to
+* number of pipe and tile width and nby
+*/
switch (track->npipes) {
case 8:
+   /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/
nbx = round_up(nbx, 64 * 8);
nby = round_up(nby, 64 * 8);
break;
case 4:
+   /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/
nbx = round_up(nbx, 64 * 8);
nby = round_up(nby, 32 * 8);
break;
case 2:
+   /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/
nbx = round_up(nbx, 32 * 8);
nby = round_up(nby, 32 * 8);
break;
case 1:
+   /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/
nbx = round_up(nbx, 32 * 8);
nby = round_up(nby, 16 * 8);
break;
@@ -531,9 +539,10 @@ static int evergreen_cs_track_validate_htile(struct 
radeon_cs_parser *p,
}
}
/* compute number of htile */
-   nbx = nbx / 8;
-   nby = nby / 8;
-   size = nbx * nby * 4;
+   nbx = nbx >> 3;
+   nby = nby >> 3;
+   /* size must be aligned on npipes * 2K boundary */
+   size = roundup(nbx * nby * 4, track->npipes * (2 << 10));
size += track->htile_offset;
 
if (size > radeon_bo_size(track->htile_bo)) {
@@ -1790,6 +1799,8 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser 
*p, u32 reg, u32 idx)
case DB_HTILE_SURFACE:
/* 8x8 only */
track->htile_surface = radeon_get_ib_value(p, idx);
+   /* force 8x8 htile width and height */
+   ib[idx] |= 3;
track->db_dirty = true;
break;
case CB_IMMED0_BASE:
diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
index 5d6e7f9..0b4d833 100644
--- a/drivers/gpu/drm/radeon/r600_cs.c
+++ b/drivers/gpu/drm/radeon/r600_cs.c
@@ -657,87 +657,30 @@ static int r600_cs_track_validate_db(struct 
radeon_cs_parser *p)
/* nby is npipes htiles aligned == npipes * 8 pixel 
aligned */
nby = round_up(nby, track->npipes * 8);
} else {
-   /* htile widht & nby (8 or 4) make 2 bits number */
-   tmp = track->htile_surface & 3;
+   /* always assume 8x8 htile */
/* align is htile align * 8, htile align vary according 
to
 * number of pipe and tile width and nby
 */
switch (track->npipes) {
case 8:
-   switch (tmp) {
-   case 3: /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/
-   nbx = round_up(nbx, 64 * 8);
-   nby = round_up(nby, 64 * 8);
-   break;
-   case 2: /* HTILE_WIDTH = 4 & HTILE_HEIGHT = 8*/
-   case 1: /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 4*/
-   nbx = round_up(nbx, 64 * 8);
-   nby = round_up(nby, 32 * 8);
-   break;
-   case 0: /* HTILE_WIDTH = 4 & HTILE_HEIGHT = 4*/
-   nbx = round_up(nbx, 32 * 8);
-   nby = round_up(nby, 32 * 8);
-   break;
-   default:
-   return -EINVAL;
-   }
+   /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/
+   nbx = round_up(nbx, 64 * 8);
+   nby = round_up(nby, 64 * 8);
 

[PATCH] drm/radeon: resume fence driver to last sync sequence on lockup

2012-12-14 Thread j . glisse
From: Jerome Glisse 

After lockup we need to resume fence to last sync sequence and not
last received sequence so that all thread waiting on command stream
that lockedup resume. Otherwise GPU reset will be ineffective in most
cases.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 22bd6c2..38233e7 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -787,7 +787,7 @@ int radeon_fence_driver_start_ring(struct radeon_device 
*rdev, int ring)
}
rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4];
rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index;
-   radeon_fence_write(rdev, 
atomic64_read(&rdev->fence_drv[ring].last_seq), ring);
+   radeon_fence_write(rdev, rdev->fence_drv[ring].sync_seq[ring], ring);
rdev->fence_drv[ring].initialized = true;
dev_info(rdev->dev, "fence driver on ring %d use gpu addr 0x%016llx and 
cpu addr 0x%p\n",
 ring, rdev->fence_drv[ring].gpu_addr, 
rdev->fence_drv[ring].cpu_addr);
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: restore modeset late in GPU reset path

2012-12-14 Thread j . glisse
From: Jerome Glisse 

Modeset path seems to conflict sometimes with the memory management
leading to kernel deadlock. This move modesetting reset after GPU
acceleration reset.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon_device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index e2f5f88..ffd5534 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1337,7 +1337,6 @@ retry:
}
 
radeon_restore_bios_scratch_regs(rdev);
-   drm_helper_resume_force_mode(rdev->ddev);
 
if (!r) {
for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -1362,6 +1361,8 @@ retry:
}
}
 
+   drm_helper_resume_force_mode(rdev->ddev);
+
ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
if (r) {
/* bad news, how to tell it to userspace ? */
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: don't leave fence blocked process on failed GPU reset

2012-12-17 Thread j . glisse
From: Jerome Glisse 

Force all fence to signal if GPU reset failed so no process get stuck
on waiting fence.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h|  1 +
 drivers/gpu/drm/radeon/radeon_device.c |  1 +
 drivers/gpu/drm/radeon/radeon_fence.c  | 19 +++
 3 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 5d68346..9c7625c 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -225,6 +225,7 @@ struct radeon_fence {
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
 int radeon_fence_driver_init(struct radeon_device *rdev);
 void radeon_fence_driver_fini(struct radeon_device *rdev);
+void radeon_fence_driver_force_completion(struct radeon_device *rdev);
 int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence **fence, 
int ring);
 void radeon_fence_process(struct radeon_device *rdev, int ring);
 bool radeon_fence_signaled(struct radeon_fence *fence);
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index e2f5f88..774fae7 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1357,6 +1357,7 @@ retry:
}
}
} else {
+   radeon_fence_driver_force_completion(rdev);
for (i = 0; i < RADEON_NUM_RINGS; ++i) {
kfree(ring_data[i]);
}
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 22bd6c2..bf7b20e 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -868,6 +868,25 @@ void radeon_fence_driver_fini(struct radeon_device *rdev)
mutex_unlock(&rdev->ring_lock);
 }
 
+/**
+ * radeon_fence_driver_force_completion - force all fence waiter to complete
+ *
+ * @rdev: radeon device pointer
+ *
+ * In case of GPU reset failure make sure no process keep waiting on fence
+ * that will never complete.
+ */
+void radeon_fence_driver_force_completion(struct radeon_device *rdev)
+{
+   int ring;
+
+   for (ring = 0; ring < RADEON_NUM_RINGS; ring++) {
+   if (!rdev->fence_drv[ring].initialized)
+   continue;
+   radeon_fence_write(rdev, rdev->fence_drv[ring].sync_seq[ring], 
ring);
+   }
+}
+
 
 /*
  * Fence debugfs
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: avoid deadlock in pm path when waiting for fence

2012-12-17 Thread j . glisse
From: Jerome Glisse 

radeon_fence_wait_empty_locked should not trigger GPU reset as no
place where it's call from would benefit from such thing and it
actually lead to a kernel deadlock in case the reset is triggered
from pm codepath. Instead force ring completion in place where it
makes sense or return early in others.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h|  2 +-
 drivers/gpu/drm/radeon/radeon_device.c | 13 +++--
 drivers/gpu/drm/radeon/radeon_fence.c  | 30 ++
 drivers/gpu/drm/radeon/radeon_pm.c | 15 ---
 4 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 9c7625c..071b2d7 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -231,7 +231,7 @@ void radeon_fence_process(struct radeon_device *rdev, int 
ring);
 bool radeon_fence_signaled(struct radeon_fence *fence);
 int radeon_fence_wait(struct radeon_fence *fence, bool interruptible);
 int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring);
-void radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring);
+int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring);
 int radeon_fence_wait_any(struct radeon_device *rdev,
  struct radeon_fence **fences,
  bool intr);
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 774fae7..53a9223 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1163,6 +1163,7 @@ int radeon_suspend_kms(struct drm_device *dev, 
pm_message_t state)
struct drm_crtc *crtc;
struct drm_connector *connector;
int i, r;
+   bool force_completion = false;
 
if (dev == NULL || dev->dev_private == NULL) {
return -ENODEV;
@@ -1205,8 +1206,16 @@ int radeon_suspend_kms(struct drm_device *dev, 
pm_message_t state)
 
mutex_lock(&rdev->ring_lock);
/* wait for gpu to finish processing current batch */
-   for (i = 0; i < RADEON_NUM_RINGS; i++)
-   radeon_fence_wait_empty_locked(rdev, i);
+   for (i = 0; i < RADEON_NUM_RINGS; i++) {
+   r = radeon_fence_wait_empty_locked(rdev, i);
+   if (r) {
+   /* delay GPU reset to resume */
+   force_completion = true;
+   }
+   }
+   if (force_completion) {
+   radeon_fence_driver_force_completion(rdev);
+   }
mutex_unlock(&rdev->ring_lock);
 
radeon_save_bios_scratch_regs(rdev);
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index bf7b20e..28c09b6 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -609,26 +609,20 @@ int radeon_fence_wait_next_locked(struct radeon_device 
*rdev, int ring)
  * Returns 0 if the fences have passed, error for all other cases.
  * Caller must hold ring lock.
  */
-void radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring)
+int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring)
 {
uint64_t seq = rdev->fence_drv[ring].sync_seq[ring];
+   int r;
 
-   while(1) {
-   int r;
-   r = radeon_fence_wait_seq(rdev, seq, ring, false, false);
+   r = radeon_fence_wait_seq(rdev, seq, ring, false, false);
+   if (r) {
if (r == -EDEADLK) {
-   mutex_unlock(&rdev->ring_lock);
-   r = radeon_gpu_reset(rdev);
-   mutex_lock(&rdev->ring_lock);
-   if (!r)
-   continue;
-   }
-   if (r) {
-   dev_err(rdev->dev, "error waiting for ring to become"
-   " idle (%d)\n", r);
+   return -EDEADLK;
}
-   return;
+   dev_err(rdev->dev, "error waiting for ring[%d] to become idle 
(%d)\n",
+   ring, r);
}
+   return 0;
 }
 
 /**
@@ -854,13 +848,17 @@ int radeon_fence_driver_init(struct radeon_device *rdev)
  */
 void radeon_fence_driver_fini(struct radeon_device *rdev)
 {
-   int ring;
+   int ring, r;
 
mutex_lock(&rdev->ring_lock);
for (ring = 0; ring < RADEON_NUM_RINGS; ring++) {
if (!rdev->fence_drv[ring].initialized)
continue;
-   radeon_fence_wait_empty_locked(rdev, ring);
+   r = radeon_fence_wait_empty_locked(rdev, ring);
+   if (r) {
+   /* no need to trigger GPU reset as we are unloading */
+   radeon_fence_driver_force_completion(rdev);
+   }
wake_up_all(&rdev->fence_queue);
radeon_scratch

[PATCH] drm/radeon: add support for MEM_WRITE packet

2012-12-19 Thread j . glisse
From: Jerome Glisse 

To make it easier to debug some lockup from userspace add support
to MEM_WRITE packet.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/evergreen_cs.c | 29 +
 drivers/gpu/drm/radeon/r600_cs.c  | 29 +
 drivers/gpu/drm/radeon/radeon_drv.c   |  3 ++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 74c6b42..5cea852 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -2654,6 +2654,35 @@ static int evergreen_packet3_check(struct 
radeon_cs_parser *p,
ib[idx+4] = upper_32_bits(offset) & 0xff;
}
break;
+   case PACKET3_MEM_WRITE:
+   {
+   u64 offset;
+
+   if (pkt->count != 3) {
+   DRM_ERROR("bad MEM_WRITE (invalid count)\n");
+   return -EINVAL;
+   }
+   r = evergreen_cs_packet_next_reloc(p, &reloc);
+   if (r) {
+   DRM_ERROR("bad MEM_WRITE (missing reloc)\n");
+   return -EINVAL;
+   }
+   offset = radeon_get_ib_value(p, idx+0);
+   offset += ((u64)(radeon_get_ib_value(p, idx+1) & 0xff)) << 32UL;
+   if (offset & 0x7) {
+   DRM_ERROR("bad MEM_WRITE (address not qwords 
aligned)\n");
+   return -EINVAL;
+   }
+   if ((offset + 8) > radeon_bo_size(reloc->robj)) {
+   DRM_ERROR("bad MEM_WRITE bo too small: 0x%llx, 0x%lx\n",
+ offset + 8, radeon_bo_size(reloc->robj));
+   return -EINVAL;
+   }
+   offset += reloc->lobj.gpu_offset;
+   ib[idx+0] = offset;
+   ib[idx+1] = upper_32_bits(offset) & 0xff;
+   break;
+   }
case PACKET3_COPY_DW:
if (pkt->count != 4) {
DRM_ERROR("bad COPY_DW (invalid count)\n");
diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
index 0be768b..9ea13d0 100644
--- a/drivers/gpu/drm/radeon/r600_cs.c
+++ b/drivers/gpu/drm/radeon/r600_cs.c
@@ -2294,6 +2294,35 @@ static int r600_packet3_check(struct radeon_cs_parser *p,
ib[idx+4] = upper_32_bits(offset) & 0xff;
}
break;
+   case PACKET3_MEM_WRITE:
+   {
+   u64 offset;
+
+   if (pkt->count != 3) {
+   DRM_ERROR("bad MEM_WRITE (invalid count)\n");
+   return -EINVAL;
+   }
+   r = r600_cs_packet_next_reloc(p, &reloc);
+   if (r) {
+   DRM_ERROR("bad MEM_WRITE (missing reloc)\n");
+   return -EINVAL;
+   }
+   offset = radeon_get_ib_value(p, idx+0);
+   offset += ((u64)(radeon_get_ib_value(p, idx+1) & 0xff)) << 32UL;
+   if (offset & 0x7) {
+   DRM_ERROR("bad MEM_WRITE (address not qwords 
aligned)\n");
+   return -EINVAL;
+   }
+   if ((offset + 8) > radeon_bo_size(reloc->robj)) {
+   DRM_ERROR("bad MEM_WRITE bo too small: 0x%llx, 0x%lx\n",
+ offset + 8, radeon_bo_size(reloc->robj));
+   return -EINVAL;
+   }
+   offset += reloc->lobj.gpu_offset;
+   ib[idx+0] = offset;
+   ib[idx+1] = upper_32_bits(offset) & 0xff;
+   break;
+   }
case PACKET3_COPY_DW:
if (pkt->count != 4) {
DRM_ERROR("bad COPY_DW (invalid count)\n");
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
b/drivers/gpu/drm/radeon/radeon_drv.c
index 9b1a727..ff75934 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -68,9 +68,10 @@
  *   2.25.0 - eg+: new info request for num SE and num SH
  *   2.26.0 - r600-eg: fix htile size computation
  *   2.27.0 - r600-SI: Add CS ioctl support for async DMA
+ *   2.28.0 - r600-eg: Add MEM_WRITE packet support
  */
 #define KMS_DRIVER_MAJOR   2
-#define KMS_DRIVER_MINOR   27
+#define KMS_DRIVER_MINOR   28
 #define KMS_DRIVER_PATCHLEVEL  0
 int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags);
 int radeon_driver_unload_kms(struct drm_device *dev);
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/2] drm/radeon: add debugfs file for dma rings

2013-01-02 Thread j . glisse
From: Jerome Glisse 

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon_ring.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index ebd6956..9410e43 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -794,11 +794,15 @@ static int radeon_debugfs_ring_info(struct seq_file *m, 
void *data)
 static int radeon_ring_type_gfx_index = RADEON_RING_TYPE_GFX_INDEX;
 static int cayman_ring_type_cp1_index = CAYMAN_RING_TYPE_CP1_INDEX;
 static int cayman_ring_type_cp2_index = CAYMAN_RING_TYPE_CP2_INDEX;
+static int radeon_ring_type_dma1_index = R600_RING_TYPE_DMA_INDEX;
+static int radeon_ring_type_dma2_index = CAYMAN_RING_TYPE_DMA1_INDEX;
 
 static struct drm_info_list radeon_debugfs_ring_info_list[] = {
{"radeon_ring_gfx", radeon_debugfs_ring_info, 0, 
&radeon_ring_type_gfx_index},
{"radeon_ring_cp1", radeon_debugfs_ring_info, 0, 
&cayman_ring_type_cp1_index},
{"radeon_ring_cp2", radeon_debugfs_ring_info, 0, 
&cayman_ring_type_cp2_index},
+   {"radeon_ring_dma1", radeon_debugfs_ring_info, 0, 
&radeon_ring_type_dma1_index},
+   {"radeon_ring_dma2", radeon_debugfs_ring_info, 0, 
&radeon_ring_type_dma2_index},
 };
 
 static int radeon_debugfs_sa_info(struct seq_file *m, void *data)
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/2] drm/radeon: print dma status reg on lockup

2013-01-02 Thread j . glisse
From: Jerome Glisse 

To help debug dma related lockup.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/evergreen.c  | 4 
 drivers/gpu/drm/radeon/evergreend.h | 3 +++
 drivers/gpu/drm/radeon/ni.c | 4 
 drivers/gpu/drm/radeon/nid.h| 1 -
 drivers/gpu/drm/radeon/r600.c   | 4 
 5 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index f95d7fc..6dc9ee7 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -2331,6 +2331,8 @@ static int evergreen_gpu_soft_reset(struct radeon_device 
*rdev)
RREG32(CP_BUSY_STAT));
dev_info(rdev->dev, "  R_008680_CP_STAT  = 0x%08X\n",
RREG32(CP_STAT));
+   dev_info(rdev->dev, "  R_00D034_DMA_STATUS_REG   = 0x%08X\n",
+   RREG32(DMA_STATUS_REG));
evergreen_mc_stop(rdev, &save);
if (evergreen_mc_wait_for_idle(rdev)) {
dev_warn(rdev->dev, "Wait for MC idle timedout !\n");
@@ -2376,6 +2378,8 @@ static int evergreen_gpu_soft_reset(struct radeon_device 
*rdev)
RREG32(CP_BUSY_STAT));
dev_info(rdev->dev, "  R_008680_CP_STAT  = 0x%08X\n",
RREG32(CP_STAT));
+   dev_info(rdev->dev, "  R_00D034_DMA_STATUS_REG   = 0x%08X\n",
+   RREG32(DMA_STATUS_REG));
evergreen_mc_resume(rdev, &save);
return 0;
 }
diff --git a/drivers/gpu/drm/radeon/evergreend.h 
b/drivers/gpu/drm/radeon/evergreend.h
index cb9baaa..f82f98a 100644
--- a/drivers/gpu/drm/radeon/evergreend.h
+++ b/drivers/gpu/drm/radeon/evergreend.h
@@ -2027,4 +2027,7 @@
 /* cayman packet3 addition */
 #defineCAYMAN_PACKET3_DEALLOC_STATE0x14
 
+/* DMA regs common on r6xx/r7xx/evergreen/ni */
+#define DMA_STATUS_REG0xd034
+
 #endif
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index 7bdbcb0..6dae387 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1331,6 +1331,8 @@ static int cayman_gpu_soft_reset(struct radeon_device 
*rdev)
RREG32(CP_BUSY_STAT));
dev_info(rdev->dev, "  R_008680_CP_STAT  = 0x%08X\n",
RREG32(CP_STAT));
+   dev_info(rdev->dev, "  R_00D034_DMA_STATUS_REG   = 0x%08X\n",
+   RREG32(DMA_STATUS_REG));
dev_info(rdev->dev, "  VM_CONTEXT0_PROTECTION_FAULT_ADDR   0x%08X\n",
 RREG32(0x14F8));
dev_info(rdev->dev, "  VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x%08X\n",
@@ -1387,6 +1389,8 @@ static int cayman_gpu_soft_reset(struct radeon_device 
*rdev)
RREG32(CP_BUSY_STAT));
dev_info(rdev->dev, "  R_008680_CP_STAT  = 0x%08X\n",
RREG32(CP_STAT));
+   dev_info(rdev->dev, "  R_00D034_DMA_STATUS_REG   = 0x%08X\n",
+   RREG32(DMA_STATUS_REG));
evergreen_mc_resume(rdev, &save);
return 0;
 }
diff --git a/drivers/gpu/drm/radeon/nid.h b/drivers/gpu/drm/radeon/nid.h
index b93186b..22a62c6 100644
--- a/drivers/gpu/drm/radeon/nid.h
+++ b/drivers/gpu/drm/radeon/nid.h
@@ -675,4 +675,3 @@
 #defineDMA_PACKET_NOP0xf
 
 #endif
-
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 2aaf147..4605551 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -1297,6 +1297,8 @@ static int r600_gpu_soft_reset(struct radeon_device *rdev)
RREG32(CP_BUSY_STAT));
dev_info(rdev->dev, "  R_008680_CP_STAT  = 0x%08X\n",
RREG32(CP_STAT));
+   dev_info(rdev->dev, "  R_00D034_DMA_STATUS_REG   = 0x%08X\n",
+   RREG32(DMA_STATUS_REG));
rv515_mc_stop(rdev, &save);
if (r600_mc_wait_for_idle(rdev)) {
dev_warn(rdev->dev, "Wait for MC idle timedout !\n");
@@ -1348,6 +1350,8 @@ static int r600_gpu_soft_reset(struct radeon_device *rdev)
RREG32(CP_BUSY_STAT));
dev_info(rdev->dev, "  R_008680_CP_STAT  = 0x%08X\n",
RREG32(CP_STAT));
+   dev_info(rdev->dev, "  R_00D034_DMA_STATUS_REG   = 0x%08X\n",
+   RREG32(DMA_STATUS_REG));
rv515_mc_resume(rdev, &save);
return 0;
 }
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/2] drm/radeon: improve ring debugfs printing

2013-01-02 Thread j . glisse
From: Jerome Glisse 

Print 32dword before last know rptr as problem most likely comes
from previous command. Also small cosmetic change to the printing.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon_ring.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 9410e43..141f2b6 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -770,22 +770,28 @@ static int radeon_debugfs_ring_info(struct seq_file *m, 
void *data)
int ridx = *(int*)node->info_ent->data;
struct radeon_ring *ring = &rdev->ring[ridx];
unsigned count, i, j;
+   u32 tmp;
 
radeon_ring_free_size(rdev, ring);
count = (ring->ring_size / 4) - ring->ring_free_dw;
-   seq_printf(m, "wptr(0x%04x): 0x%08x\n", ring->wptr_reg, 
RREG32(ring->wptr_reg));
-   seq_printf(m, "rptr(0x%04x): 0x%08x\n", ring->rptr_reg, 
RREG32(ring->rptr_reg));
+   tmp = RREG32(ring->wptr_reg) >> ring->ptr_reg_shift;
+   seq_printf(m, "wptr(0x%04x): 0x%08x [%5d]\n", ring->wptr_reg, tmp, tmp);
+   tmp = RREG32(ring->rptr_reg) >> ring->ptr_reg_shift;
+   seq_printf(m, "rptr(0x%04x): 0x%08x [%5d]\n", ring->rptr_reg, tmp, tmp);
if (ring->rptr_save_reg) {
seq_printf(m, "rptr next(0x%04x): 0x%08x\n", 
ring->rptr_save_reg,
   RREG32(ring->rptr_save_reg));
}
-   seq_printf(m, "driver's copy of the wptr: 0x%08x\n", ring->wptr);
-   seq_printf(m, "driver's copy of the rptr: 0x%08x\n", ring->rptr);
+   seq_printf(m, "driver's copy of the wptr: 0x%08x [%5d]\n", ring->wptr, 
ring->wptr);
+   seq_printf(m, "driver's copy of the rptr: 0x%08x [%5d]\n", ring->rptr, 
ring->rptr);
seq_printf(m, "%u free dwords in ring\n", ring->ring_free_dw);
seq_printf(m, "%u dwords in ring\n", count);
-   i = ring->rptr;
-   for (j = 0; j <= count; j++) {
-   seq_printf(m, "r[%04d]=0x%08x\n", i, ring->ring[i]);
+   /* print 8 dw before current rptr as often it's the last executed
+* packet that is the root issue
+*/
+   i = (ring->rptr + ring->ptr_mask + 1 - 32) & ring->ptr_mask;
+   for (j = 0; j <= (count + 32); j++) {
+   seq_printf(m, "r[%5d]=0x%08x\n", i, ring->ring[i]);
i = (i + 1) & ring->ptr_mask;
}
return 0;
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/2] drm/radeon: reset dma engine on gpu reset

2013-01-02 Thread j . glisse
From: Jerome Glisse 

This try to reset the dma engine when performing gpu reset. Hopefully
bringing back the gpu dma engine in sane state.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/evergreen.c  | 30 +-
 drivers/gpu/drm/radeon/evergreend.h | 10 +-
 drivers/gpu/drm/radeon/ni.c | 30 +-
 drivers/gpu/drm/radeon/nid.h|  2 +-
 drivers/gpu/drm/radeon/r600.c   | 28 ++--
 5 files changed, 74 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 6dc9ee7..f92f6bb 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -2309,19 +2309,19 @@ bool evergreen_gpu_is_lockup(struct radeon_device 
*rdev, struct radeon_ring *rin
 static int evergreen_gpu_soft_reset(struct radeon_device *rdev)
 {
struct evergreen_mc_save save;
-   u32 grbm_reset = 0;
+   u32 grbm_reset = 0, tmp;
 
if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE))
return 0;
 
dev_info(rdev->dev, "GPU softreset \n");
-   dev_info(rdev->dev, "  GRBM_STATUS=0x%08X\n",
+   dev_info(rdev->dev, "  GRBM_STATUS   = 0x%08X\n",
RREG32(GRBM_STATUS));
-   dev_info(rdev->dev, "  GRBM_STATUS_SE0=0x%08X\n",
+   dev_info(rdev->dev, "  GRBM_STATUS_SE0   = 0x%08X\n",
RREG32(GRBM_STATUS_SE0));
-   dev_info(rdev->dev, "  GRBM_STATUS_SE1=0x%08X\n",
+   dev_info(rdev->dev, "  GRBM_STATUS_SE1   = 0x%08X\n",
RREG32(GRBM_STATUS_SE1));
-   dev_info(rdev->dev, "  SRBM_STATUS=0x%08X\n",
+   dev_info(rdev->dev, "  SRBM_STATUS   = 0x%08X\n",
RREG32(SRBM_STATUS));
dev_info(rdev->dev, "  R_008674_CP_STALLED_STAT1 = 0x%08X\n",
RREG32(CP_STALLED_STAT1));
@@ -2337,9 +2337,21 @@ static int evergreen_gpu_soft_reset(struct radeon_device 
*rdev)
if (evergreen_mc_wait_for_idle(rdev)) {
dev_warn(rdev->dev, "Wait for MC idle timedout !\n");
}
+
/* Disable CP parsing/prefetching */
WREG32(CP_ME_CNTL, CP_ME_HALT | CP_PFP_HALT);
 
+   /* Disable DMA */
+   tmp = RREG32(DMA_RB_CNTL);
+   tmp &= ~DMA_RB_ENABLE;
+   WREG32(DMA_RB_CNTL, tmp);
+
+   /* Reset dma */
+   WREG32(SRBM_SOFT_RESET, SOFT_RESET_DMA);
+   RREG32(SRBM_SOFT_RESET);
+   udelay(50);
+   WREG32(SRBM_SOFT_RESET, 0);
+
/* reset all the gfx blocks */
grbm_reset = (SOFT_RESET_CP |
  SOFT_RESET_CB |
@@ -2362,13 +2374,13 @@ static int evergreen_gpu_soft_reset(struct 
radeon_device *rdev)
(void)RREG32(GRBM_SOFT_RESET);
/* Wait a little for things to settle down */
udelay(50);
-   dev_info(rdev->dev, "  GRBM_STATUS=0x%08X\n",
+   dev_info(rdev->dev, "  GRBM_STATUS   = 0x%08X\n",
RREG32(GRBM_STATUS));
-   dev_info(rdev->dev, "  GRBM_STATUS_SE0=0x%08X\n",
+   dev_info(rdev->dev, "  GRBM_STATUS_SE0   = 0x%08X\n",
RREG32(GRBM_STATUS_SE0));
-   dev_info(rdev->dev, "  GRBM_STATUS_SE1=0x%08X\n",
+   dev_info(rdev->dev, "  GRBM_STATUS_SE1   = 0x%08X\n",
RREG32(GRBM_STATUS_SE1));
-   dev_info(rdev->dev, "  SRBM_STATUS=0x%08X\n",
+   dev_info(rdev->dev, "  SRBM_STATUS   = 0x%08X\n",
RREG32(SRBM_STATUS));
dev_info(rdev->dev, "  R_008674_CP_STALLED_STAT1 = 0x%08X\n",
RREG32(CP_STALLED_STAT1));
diff --git a/drivers/gpu/drm/radeon/evergreend.h 
b/drivers/gpu/drm/radeon/evergreend.h
index f82f98a..5786a32 100644
--- a/drivers/gpu/drm/radeon/evergreend.h
+++ b/drivers/gpu/drm/radeon/evergreend.h
@@ -742,8 +742,9 @@
 #defineSOFT_RESET_ROM  (1 << 14)
 #defineSOFT_RESET_SEM  (1 << 15)
 #defineSOFT_RESET_VMC  (1 << 17)
+#defineSOFT_RESET_DMA  (1 << 20)
 #defineSOFT_RESET_TST  (1 << 21)
-#defineSOFT_RESET_REGBB(1 << 22)
+#defineSOFT_RESET_REGBB(1 << 22)
 #defineSOFT_RESET_ORB  (1 << 23)
 
 /* display watermarks */
@@ -2028,6 +2029,13 @@
 #defineCAYMAN_PACKET3_DEALLOC_STATE0x14
 
 /* DMA regs common on r6xx/r7xx/evergreen/ni */
+#define DMA_RB_CNTL   0xd000
+#   define DMA_RB_ENABLE  (1 << 0)
+#   define DMA_RB_SIZE(x) ((x) << 1) /* log2 */
+#   define DMA_RB_SWAP_ENABLE (1 << 9) /* 8IN32 */
+#   define DMA_RPTR_WRITEBACK_ENABLE  (1 << 12)
+#   define DMA_RPTR_

[PATCH] radeon/kms: force rn50 chip to always report connected on analog output

2013-01-08 Thread j . glisse
From: Jerome Glisse 

Those rn50 chip are often connected to console remoting hw and load
detection often fails with those. Just don't try to load detect and
report connect.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon_legacy_encoders.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c 
b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
index f5ba224..62cd512 100644
--- a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
+++ b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
@@ -640,6 +640,14 @@ static enum drm_connector_status 
radeon_legacy_primary_dac_detect(struct drm_enc
enum drm_connector_status found = connector_status_disconnected;
bool color = true;
 
+   /* just don't bother on RN50 those chip are often connected to remoting
+* console hw and often we get failure to load detect those. So to make
+* everyone happy report the encoder as always connected.
+*/
+   if (ASIC_IS_RN50(rdev)) {
+   return connector_status_connected;
+   }
+
/* save the regs we need */
vclk_ecp_cntl = RREG32_PLL(RADEON_VCLK_ECP_CNTL);
crtc_ext_cntl = RREG32(RADEON_CRTC_EXT_CNTL);
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/2] radeon/kms: fix dma relocation checking

2013-01-09 Thread j . glisse
From: Jerome Glisse 

We were checking the index against the size of the relocation buffer
instead of against the last index. This fix kernel segfault when
userspace submit ill formated command stream/relocation buffer pair.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/r600_cs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
index 9ea13d0..f91919e 100644
--- a/drivers/gpu/drm/radeon/r600_cs.c
+++ b/drivers/gpu/drm/radeon/r600_cs.c
@@ -2561,16 +2561,16 @@ int r600_dma_cs_next_reloc(struct radeon_cs_parser *p,
struct radeon_cs_chunk *relocs_chunk;
unsigned idx;
 
+   *cs_reloc = NULL;
if (p->chunk_relocs_idx == -1) {
DRM_ERROR("No relocation chunk !\n");
return -EINVAL;
}
-   *cs_reloc = NULL;
relocs_chunk = &p->chunks[p->chunk_relocs_idx];
idx = p->dma_reloc_idx;
-   if (idx >= relocs_chunk->length_dw) {
+   if (idx >= p->nrelocs) {
DRM_ERROR("Relocs at %d after relocations chunk end %d !\n",
- idx, relocs_chunk->length_dw);
+ idx, p->nrelocs);
return -EINVAL;
}
*cs_reloc = p->relocs_ptr[idx];
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/2] radeon/kms: cleanup async dma packet checking

2013-01-09 Thread j . glisse
From: Jerome Glisse 

This simplify and cleanup the async dma checking.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/evergreen.c|  16 +-
 drivers/gpu/drm/radeon/evergreen_cs.c | 807 +-
 drivers/gpu/drm/radeon/evergreend.h   |  29 +-
 3 files changed, 417 insertions(+), 435 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index f92f6bb..28f8d4f 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3223,14 +3223,14 @@ void evergreen_dma_fence_ring_emit(struct radeon_device 
*rdev,
struct radeon_ring *ring = &rdev->ring[fence->ring];
u64 addr = rdev->fence_drv[fence->ring].gpu_addr;
/* write the fence */
-   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_FENCE, 0, 0, 0));
+   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_FENCE, 0, 0));
radeon_ring_write(ring, addr & 0xfffc);
radeon_ring_write(ring, (upper_32_bits(addr) & 0xff));
radeon_ring_write(ring, fence->seq);
/* generate an interrupt */
-   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_TRAP, 0, 0, 0));
+   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_TRAP, 0, 0));
/* flush HDP */
-   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_SRBM_WRITE, 0, 0, 0));
+   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_SRBM_WRITE, 0, 0));
radeon_ring_write(ring, (0xf << 16) | HDP_MEM_COHERENCY_FLUSH_CNTL);
radeon_ring_write(ring, 1);
 }
@@ -3253,7 +3253,7 @@ void evergreen_dma_ring_ib_execute(struct radeon_device 
*rdev,
while ((next_rptr & 7) != 5)
next_rptr++;
next_rptr += 3;
-   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_WRITE, 0, 0, 1));
+   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_WRITE, 0, 1));
radeon_ring_write(ring, ring->next_rptr_gpu_addr & 0xfffc);
radeon_ring_write(ring, upper_32_bits(ring->next_rptr_gpu_addr) 
& 0xff);
radeon_ring_write(ring, next_rptr);
@@ -3263,8 +3263,8 @@ void evergreen_dma_ring_ib_execute(struct radeon_device 
*rdev,
 * Pad as necessary with NOPs.
 */
while ((ring->wptr & 7) != 5)
-   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_NOP, 0, 0, 0));
-   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_INDIRECT_BUFFER, 0, 0, 
0));
+   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_NOP, 0, 0));
+   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_INDIRECT_BUFFER, 0, 0));
radeon_ring_write(ring, (ib->gpu_addr & 0xFFE0));
radeon_ring_write(ring, (ib->length_dw << 12) | 
(upper_32_bits(ib->gpu_addr) & 0xFF));
 
@@ -3323,7 +3323,7 @@ int evergreen_copy_dma(struct radeon_device *rdev,
if (cur_size_in_dw > 0xF)
cur_size_in_dw = 0xF;
size_in_dw -= cur_size_in_dw;
-   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_COPY, 0, 0, 
cur_size_in_dw));
+   radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_COPY, 0, 
cur_size_in_dw));
radeon_ring_write(ring, dst_offset & 0xfffc);
radeon_ring_write(ring, src_offset & 0xfffc);
radeon_ring_write(ring, upper_32_bits(dst_offset) & 0xff);
@@ -3431,7 +3431,7 @@ static int evergreen_startup(struct radeon_device *rdev)
ring = &rdev->ring[R600_RING_TYPE_DMA_INDEX];
r = radeon_ring_init(rdev, ring, ring->ring_size, 
R600_WB_DMA_RPTR_OFFSET,
 DMA_RB_RPTR, DMA_RB_WPTR,
-2, 0x3fffc, DMA_PACKET(DMA_PACKET_NOP, 0, 0, 0));
+2, 0x3fffc, DMA_PACKET(DMA_PACKET_NOP, 0, 0));
if (r)
return r;
 
diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 7a44566..32c07bb 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -2858,16 +2858,6 @@ int evergreen_cs_parse(struct radeon_cs_parser *p)
return 0;
 }
 
-/*
- *  DMA
- */
-
-#define GET_DMA_CMD(h) (((h) & 0xf000) >> 28)
-#define GET_DMA_COUNT(h) ((h) & 0x000f)
-#define GET_DMA_T(h) (((h) & 0x0080) >> 23)
-#define GET_DMA_NEW(h) (((h) & 0x0400) >> 26)
-#define GET_DMA_MISC(h) (((h) & 0x070) >> 20)
-
 /**
  * evergreen_dma_cs_parse() - parse the DMA IB
  * @p: parser structure holding parsing context.
@@ -2881,9 +2871,9 @@ int evergreen_dma_cs_parse(struct radeon_cs_parser *p)
 {
struct radeon_cs_chunk *ib_chunk = &p->chunks[p->chunk_ib_idx];
struct radeon_cs_reloc *src_reloc, *dst_reloc, *dst2_reloc;
-   u32 header, cmd, count, tiled, new_cmd, misc;
+   u32 header, cmd, count, sub_cmd;
volatile u32 *ib = p->ib.ptr;
-   u32 idx, idx_value;
+   u32 idx;
u64 src_offset, dst_offset, dst2_offset;
int r;
 
@@ -

[PATCH] drm/radeon: improve semaphore debugging on lockup

2013-01-11 Thread j . glisse
From: Jerome Glisse 

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h   | 2 ++
 drivers/gpu/drm/radeon/radeon_ring.c  | 2 ++
 drivers/gpu/drm/radeon/radeon_semaphore.c | 4 
 3 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 9b9422c..f0bb8d5 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -649,6 +649,8 @@ struct radeon_ring {
u32 ptr_reg_mask;
u32 nop;
u32 idx;
+   u64 last_semaphore_signal_addr;
+   u64 last_semaphore_wait_addr;
 };
 
 /*
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 141f2b6..2430d80 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -784,6 +784,8 @@ static int radeon_debugfs_ring_info(struct seq_file *m, 
void *data)
}
seq_printf(m, "driver's copy of the wptr: 0x%08x [%5d]\n", ring->wptr, 
ring->wptr);
seq_printf(m, "driver's copy of the rptr: 0x%08x [%5d]\n", ring->rptr, 
ring->rptr);
+   seq_printf(m, "last semaphore signal addr : 0x%016llx\n", 
ring->last_semaphore_signal_addr);
+   seq_printf(m, "last semaphore wait addr   : 0x%016llx\n", 
ring->last_semaphore_wait_addr);
seq_printf(m, "%u free dwords in ring\n", ring->ring_free_dw);
seq_printf(m, "%u dwords in ring\n", count);
/* print 8 dw before current rptr as often it's the last executed
diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c 
b/drivers/gpu/drm/radeon/radeon_semaphore.c
index 97f3ece..8dcc20f 100644
--- a/drivers/gpu/drm/radeon/radeon_semaphore.c
+++ b/drivers/gpu/drm/radeon/radeon_semaphore.c
@@ -95,6 +95,10 @@ int radeon_semaphore_sync_rings(struct radeon_device *rdev,
/* we assume caller has already allocated space on waiters ring */
radeon_semaphore_emit_wait(rdev, waiter, semaphore);
 
+   /* for debugging lockup only, used by sysfs debug files */
+   rdev->ring[signaler].last_semaphore_signal_addr = semaphore->gpu_addr;
+   rdev->ring[waiter].last_semaphore_wait_addr = semaphore->gpu_addr;
+
return 0;
 }
 
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: fix cursor corruption on aruba and newer

2013-01-21 Thread j . glisse
From: Jerome Glisse 

Aruba and newer gpu does not need the avivo cursor work around,
quite the opposite this work around lead to corruption.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon_cursor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cursor.c 
b/drivers/gpu/drm/radeon/radeon_cursor.c
index ad6df62..30f71cc 100644
--- a/drivers/gpu/drm/radeon/radeon_cursor.c
+++ b/drivers/gpu/drm/radeon/radeon_cursor.c
@@ -241,7 +241,7 @@ int radeon_crtc_cursor_move(struct drm_crtc *crtc,
y = 0;
}
 
-   if (ASIC_IS_AVIVO(rdev)) {
+   if (ASIC_IS_AVIVO(rdev) && (rdev->family < CHIP_ARUBA)) {
int i = 0;
struct drm_crtc *crtc_p;
 
-- 
1.7.11.7

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: avoid turning off spread spectrum for used pll

2012-08-17 Thread j . glisse
From: Jerome Glisse 

If spread spectrum is enabled and in use for a given pll we
should not turn it off as it will lead to turning off display
for crtc that use the pll (this behavior was observed on chelsea
edp).

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/atombios_crtc.c |   25 +
 1 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c 
b/drivers/gpu/drm/radeon/atombios_crtc.c
index c6fcb5b..cb18813 100644
--- a/drivers/gpu/drm/radeon/atombios_crtc.c
+++ b/drivers/gpu/drm/radeon/atombios_crtc.c
@@ -444,11 +444,28 @@ union atom_enable_ss {
 static void atombios_crtc_program_ss(struct radeon_device *rdev,
 int enable,
 int pll_id,
+int crtc_id,
 struct radeon_atom_ss *ss)
 {
+   unsigned i;
int index = GetIndexIntoMasterTable(COMMAND, 
EnableSpreadSpectrumOnPPLL);
union atom_enable_ss args;
 
+   if (!enable) {
+   for (i = 0; i < 6; i++) {
+   if (rdev->mode_info.crtcs[i] &&
+   rdev->mode_info.crtcs[i]->enabled &&
+   i != crtc_id &&
+   pll_id == rdev->mode_info.crtcs[i]->pll_id) {
+   /* one other crtc is using this pll don't turn
+* off spread spectrum as it might turn off
+* display on active crtc
+*/
+   return;
+   }
+   }
+   }
+
memset(&args, 0, sizeof(args));
 
if (ASIC_IS_DCE5(rdev)) {
@@ -1028,7 +1045,7 @@ static void atombios_crtc_set_pll(struct drm_crtc *crtc, 
struct drm_display_mode
radeon_compute_pll_legacy(pll, adjusted_clock, &pll_clock, 
&fb_div, &frac_fb_div,
  &ref_div, &post_div);
 
-   atombios_crtc_program_ss(rdev, ATOM_DISABLE, radeon_crtc->pll_id, &ss);
+   atombios_crtc_program_ss(rdev, ATOM_DISABLE, radeon_crtc->pll_id, 
radeon_crtc->crtc_id, &ss);
 
atombios_crtc_program_pll(crtc, radeon_crtc->crtc_id, 
radeon_crtc->pll_id,
  encoder_mode, radeon_encoder->encoder_id, 
mode->clock,
@@ -1051,7 +1068,7 @@ static void atombios_crtc_set_pll(struct drm_crtc *crtc, 
struct drm_display_mode
ss.step = step_size;
}
 
-   atombios_crtc_program_ss(rdev, ATOM_ENABLE, 
radeon_crtc->pll_id, &ss);
+   atombios_crtc_program_ss(rdev, ATOM_ENABLE, 
radeon_crtc->pll_id, radeon_crtc->crtc_id, &ss);
}
 }
 
@@ -1572,11 +1589,11 @@ void radeon_atom_disp_eng_pll_init(struct radeon_device 
*rdev)
   
ASIC_INTERNAL_SS_ON_DCPLL,
   
rdev->clock.default_dispclk);
if (ss_enabled)
-   atombios_crtc_program_ss(rdev, ATOM_DISABLE, 
ATOM_DCPLL, &ss);
+   atombios_crtc_program_ss(rdev, ATOM_DISABLE, 
ATOM_DCPLL, -1, &ss);
/* XXX: DCE5, make sure voltage, dispclk is high enough */
atombios_crtc_set_disp_eng_pll(rdev, 
rdev->clock.default_dispclk);
if (ss_enabled)
-   atombios_crtc_program_ss(rdev, ATOM_ENABLE, ATOM_DCPLL, 
&ss);
+   atombios_crtc_program_ss(rdev, ATOM_ENABLE, ATOM_DCPLL, 
-1, &ss);
}
 
 }
-- 
1.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: force dma32 on rs400, rs690, rs740 IGP

2012-08-28 Thread j . glisse
From: Jerome Glisse 

It seems some of those IGP dislike non dma32 page.

https://bugzilla.redhat.com/show_bug.cgi?id=785375

Signed-off-by: Jerome Glisse 
Cc: 
---
 drivers/gpu/drm/radeon/radeon_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 066c98b..8867400 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -774,7 +774,7 @@ int radeon_device_init(struct radeon_device *rdev,
if (rdev->flags & RADEON_IS_AGP)
rdev->need_dma32 = true;
if ((rdev->flags & RADEON_IS_PCI) &&
-   (rdev->family < CHIP_RS400))
+   (rdev->family <= CHIP_RS740))
rdev->need_dma32 = true;
 
dma_bits = rdev->need_dma32 ? 32 : 40;
-- 
1.7.11.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: force dma32 to fix regression rs4xx,rs6xx,rs740

2012-08-28 Thread j . glisse
From: Jerome Glisse 

It seems some of those IGP dislike non dma32 page despite what
documentation says. Fix regression since we allowed non dma32
pages. It seems it only affect some revision of those IGP chips
as we don't know which one just force dma32 for all of them.

https://bugzilla.redhat.com/show_bug.cgi?id=785375

Signed-off-by: Jerome Glisse 
Cc: 
---
 drivers/gpu/drm/radeon/radeon_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 066c98b..8867400 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -774,7 +774,7 @@ int radeon_device_init(struct radeon_device *rdev,
if (rdev->flags & RADEON_IS_AGP)
rdev->need_dma32 = true;
if ((rdev->flags & RADEON_IS_PCI) &&
-   (rdev->family < CHIP_RS400))
+   (rdev->family <= CHIP_RS740))
rdev->need_dma32 = true;
 
dma_bits = rdev->need_dma32 ? 32 : 40;
-- 
1.7.11.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/ttm: Pass the buffer object on backend creation

2011-10-10 Thread j . glisse
From: Jerome Glisse 

In case of multiple page table for GART, driver want to know which
buffer object is being bind/unbind. This allow driver to bind/unbind
buffer object from several different GART.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c   |3 ++-
 drivers/gpu/drm/radeon/radeon_ttm.c|   11 +++
 drivers/gpu/drm/ttm/ttm_bo.c   |4 ++--
 drivers/gpu/drm/ttm/ttm_tt.c   |9 ++---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 ++-
 include/drm/ttm/ttm_bo_driver.h|5 -
 6 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 890d50e..9f65371 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -344,7 +344,8 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, 
u32 val)
 }
 
 static struct ttm_backend *
-nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev)
+nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev,
+   struct ttm_buffer_object *bo)
 {
struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev);
struct drm_device *dev = dev_priv->dev;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 0b5468b..0bad266 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -114,10 +114,12 @@ static void radeon_ttm_global_fini(struct radeon_device 
*rdev)
}
 }
 
-struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev);
+struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev,
+ struct ttm_buffer_object *bo);
 
 static struct ttm_backend*
-radeon_create_ttm_backend_entry(struct ttm_bo_device *bdev)
+radeon_create_ttm_backend_entry(struct ttm_bo_device *bdev,
+   struct ttm_buffer_object *bo)
 {
struct radeon_device *rdev;
 
@@ -128,7 +130,7 @@ radeon_create_ttm_backend_entry(struct ttm_bo_device *bdev)
} else
 #endif
{
-   return radeon_ttm_backend_create(rdev);
+   return radeon_ttm_backend_create(rdev, bo);
}
 }
 
@@ -778,7 +780,8 @@ static struct ttm_backend_func radeon_backend_func = {
.destroy = &radeon_ttm_backend_destroy,
 };
 
-struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev)
+struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev,
+ struct ttm_buffer_object *bo)
 {
struct radeon_ttm_backend *gtt;
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index ef06194..fe957e7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -337,13 +337,13 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, 
bool zero_alloc)
if (zero_alloc)
page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC;
case ttm_bo_type_kernel:
-   bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT,
+   bo->ttm = ttm_tt_create(bdev, bo, bo->num_pages << PAGE_SHIFT,
page_flags, glob->dummy_read_page);
if (unlikely(bo->ttm == NULL))
ret = -ENOMEM;
break;
case ttm_bo_type_user:
-   bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT,
+   bo->ttm = ttm_tt_create(bdev, bo, bo->num_pages << PAGE_SHIFT,
page_flags | TTM_PAGE_FLAG_USER,
glob->dummy_read_page);
if (unlikely(bo->ttm == NULL)) {
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 58c271e..202e16e 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -379,8 +379,11 @@ int ttm_tt_set_user(struct ttm_tt *ttm,
return 0;
 }
 
-struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size,
-uint32_t page_flags, struct page *dummy_read_page)
+struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev,
+struct ttm_buffer_object *bo,
+unsigned long size,
+uint32_t page_flags,
+struct page *dummy_read_page)
 {
struct ttm_bo_driver *bo_driver = bdev->driver;
struct ttm_tt *ttm;
@@ -407,7 +410,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, 
unsigned long size,
printk(KERN_ERR TTM_PFX "Failed allocating page table\n");
return NULL;
}
-   ttm->be = bo_driver->create_ttm_backend_entry(bdev);
+   ttm->be = bo_driver->create_ttm_backend_entry(bdev, bo);
if (!ttm->be) {
ttm_tt_destroy(ttm);
printk(KERN_ERR 

[PATCH] drm/radeon/kms: consolidate GART code, fix memory fault after GPU lockup

2011-10-13 Thread j . glisse
From: Jerome Glisse 

After GPU lockup VRAM gart table is unpinned and thus its pointer
becomes unvalid. This patch move the unpin code to a common helper
function and set pointer to NULL so that page update code can check
if it should update GPU page table or not. That way bo still bound
to GART can be unbound (pci_unmap_page for all there page) properly
while there is no need to update the GPU page table.

Signed-off-by: Jerome Glisse 
cc: sta...@kernel.org
---
 drivers/gpu/drm/radeon/evergreen.c   |   12 +-
 drivers/gpu/drm/radeon/ni.c  |   13 +--
 drivers/gpu/drm/radeon/r100.c|6 ++-
 drivers/gpu/drm/radeon/r300.c|   16 ++--
 drivers/gpu/drm/radeon/r600.c|   17 +++--
 drivers/gpu/drm/radeon/radeon.h  |   22 +++-
 drivers/gpu/drm/radeon/radeon_gart.c |   66 -
 drivers/gpu/drm/radeon/rs400.c   |5 ++-
 drivers/gpu/drm/radeon/rs600.c   |   16 ++--
 drivers/gpu/drm/radeon/rv770.c   |   13 ++-
 10 files changed, 72 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index c4ffa14f..fe5cf3e 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -893,7 +893,7 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev)
u32 tmp;
int r;
 
-   if (rdev->gart.table.vram.robj == NULL) {
+   if (rdev->gart.robj == NULL) {
dev_err(rdev->dev, "No VRAM object for PCIE GART.\n");
return -EINVAL;
}
@@ -942,7 +942,6 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev)
 void evergreen_pcie_gart_disable(struct radeon_device *rdev)
 {
u32 tmp;
-   int r;
 
/* Disable all tables */
WREG32(VM_CONTEXT0_CNTL, 0);
@@ -962,14 +961,7 @@ void evergreen_pcie_gart_disable(struct radeon_device 
*rdev)
WREG32(MC_VM_MB_L1_TLB1_CNTL, tmp);
WREG32(MC_VM_MB_L1_TLB2_CNTL, tmp);
WREG32(MC_VM_MB_L1_TLB3_CNTL, tmp);
-   if (rdev->gart.table.vram.robj) {
-   r = radeon_bo_reserve(rdev->gart.table.vram.robj, false);
-   if (likely(r == 0)) {
-   radeon_bo_kunmap(rdev->gart.table.vram.robj);
-   radeon_bo_unpin(rdev->gart.table.vram.robj);
-   radeon_bo_unreserve(rdev->gart.table.vram.robj);
-   }
-   }
+   radeon_gart_table_vram_unpin(rdev);
 }
 
 void evergreen_pcie_gart_fini(struct radeon_device *rdev)
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index 8c79ca9..529aaee 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -931,7 +931,7 @@ int cayman_pcie_gart_enable(struct radeon_device *rdev)
 {
int r;
 
-   if (rdev->gart.table.vram.robj == NULL) {
+   if (rdev->gart.robj == NULL) {
dev_err(rdev->dev, "No VRAM object for PCIE GART.\n");
return -EINVAL;
}
@@ -973,8 +973,6 @@ int cayman_pcie_gart_enable(struct radeon_device *rdev)
 
 void cayman_pcie_gart_disable(struct radeon_device *rdev)
 {
-   int r;
-
/* Disable all tables */
WREG32(VM_CONTEXT0_CNTL, 0);
WREG32(VM_CONTEXT1_CNTL, 0);
@@ -990,14 +988,7 @@ void cayman_pcie_gart_disable(struct radeon_device *rdev)
WREG32(VM_L2_CNTL2, 0);
WREG32(VM_L2_CNTL3, L2_CACHE_BIGK_ASSOCIATIVITY |
   L2_CACHE_BIGK_FRAGMENT_SIZE(6));
-   if (rdev->gart.table.vram.robj) {
-   r = radeon_bo_reserve(rdev->gart.table.vram.robj, false);
-   if (likely(r == 0)) {
-   radeon_bo_kunmap(rdev->gart.table.vram.robj);
-   radeon_bo_unpin(rdev->gart.table.vram.robj);
-   radeon_bo_unreserve(rdev->gart.table.vram.robj);
-   }
-   }
+   radeon_gart_table_vram_unpin(rdev);
 }
 
 void cayman_pcie_gart_fini(struct radeon_device *rdev)
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 7fcdbbb..8ad6769 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -474,7 +474,7 @@ int r100_pci_gart_init(struct radeon_device *rdev)
 {
int r;
 
-   if (rdev->gart.table.ram.ptr) {
+   if (rdev->gart.ptr) {
WARN(1, "R100 PCI GART already initialized\n");
return 0;
}
@@ -530,10 +530,12 @@ void r100_pci_gart_disable(struct radeon_device *rdev)
 
 int r100_pci_gart_set_page(struct radeon_device *rdev, int i, uint64_t addr)
 {
+   u32 *gtt = rdev->gart.ptr;
+
if (i < 0 || i > rdev->gart.num_gpu_pages) {
return -EINVAL;
}
-   rdev->gart.table.ram.ptr[i] = cpu_to_le32(lower_32_bits(addr));
+   gtt[i] = cpu_to_le32(lower_32_bits(addr));
return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c
index 55a7f19..6c62d88 100644
--- a/drivers

[PATCH] drm/radeon: avoid bouncing connector status btw disconnected & unknown

2011-10-24 Thread j . glisse
From: Jerome Glisse 

Since force handling rework of d0d0a225e6ad43314c9aa7ea081f76adc5098ad4
we could end up bouncing connector status btw disconnected and unknown.
When connector status change a call to output_poll_changed happen which
in turn ask again for detect but with force set.

So set the load detect flags whenever we report the connector as
connected or unknown this avoid bouncing btw disconnected and unknown.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon_connectors.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c 
b/drivers/gpu/drm/radeon/radeon_connectors.c
index dec6cbe..ff6a2e0 100644
--- a/drivers/gpu/drm/radeon/radeon_connectors.c
+++ b/drivers/gpu/drm/radeon/radeon_connectors.c
@@ -764,7 +764,7 @@ radeon_vga_detect(struct drm_connector *connector, bool 
force)
if (radeon_connector->dac_load_detect && encoder) {
encoder_funcs = encoder->helper_private;
ret = encoder_funcs->detect(encoder, connector);
-   if (ret == connector_status_connected)
+   if (ret != connector_status_disconnected)
radeon_connector->detected_by_load = true;
}
}
@@ -1005,8 +1005,9 @@ radeon_dvi_detect(struct drm_connector *connector, bool 
force)
ret = encoder_funcs->detect(encoder, 
connector);
if (ret == connector_status_connected) {
radeon_connector->use_digital = 
false;
-   
radeon_connector->detected_by_load = true;
}
+   if (ret != 
connector_status_disconnected)
+   
radeon_connector->detected_by_load = true;
}
break;
}
-- 
1.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: flush read cache for gtt with fence on r6xx and newer GPU V2

2011-10-26 Thread j . glisse
From: Jerome Glisse 

Cayman seems to be particularly sensitive to read cache returning
old data after bind/unbind to GTT. Flush read cache for GTT range
with each fences for all new hw. Should fix several rendering glitches.
Like

V2 flush whole address space

https://bugs.freedesktop.org/show_bug.cgi?id=40221
https://bugs.freedesktop.org/show_bug.cgi?id=38022
https://bugzilla.redhat.com/show_bug.cgi?id=738790

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/evergreen_blit_kms.c |4 ++--
 drivers/gpu/drm/radeon/r600.c   |   12 
 drivers/gpu/drm/radeon/r600_blit_kms.c  |4 ++--
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c 
b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
index dcf11bb..e9aeeed 100644
--- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c
+++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
@@ -613,9 +613,9 @@ int evergreen_blit_init(struct radeon_device *rdev)
rdev->r600_blit.primitives.set_default_state = set_default_state;
 
rdev->r600_blit.ring_size_common = 55; /* shaders + def state */
-   rdev->r600_blit.ring_size_common += 10; /* fence emit for VB IB */
+   rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */
rdev->r600_blit.ring_size_common += 5; /* done copy */
-   rdev->r600_blit.ring_size_common += 10; /* fence emit for done copy */
+   rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */
 
rdev->r600_blit.ring_size_per_loop = 74;
 
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 12470b0..983808a 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2331,6 +2331,12 @@ void r600_fence_ring_emit(struct radeon_device *rdev,
if (rdev->wb.use_event) {
u64 addr = rdev->wb.gpu_addr + R600_WB_EVENT_OFFSET +
(u64)(rdev->fence_drv.scratch_reg - 
rdev->scratch.reg_base);
+   /* flush read cache over gart */
+   radeon_ring_write(rdev, PACKET3(PACKET3_SURFACE_SYNC, 3));
+   radeon_ring_write(rdev, PACKET3_TC_ACTION_ENA | 
PACKET3_VC_ACTION_ENA);
+   radeon_ring_write(rdev, 0x);
+   radeon_ring_write(rdev, 0);
+   radeon_ring_write(rdev, 10); /* poll interval */
/* EVENT_WRITE_EOP - flush caches, send int */
radeon_ring_write(rdev, PACKET3(PACKET3_EVENT_WRITE_EOP, 4));
radeon_ring_write(rdev, 
EVENT_TYPE(CACHE_FLUSH_AND_INV_EVENT_TS) | EVENT_INDEX(5));
@@ -2339,6 +2345,12 @@ void r600_fence_ring_emit(struct radeon_device *rdev,
radeon_ring_write(rdev, fence->seq);
radeon_ring_write(rdev, 0);
} else {
+   /* flush read cache over gart */
+   radeon_ring_write(rdev, PACKET3(PACKET3_SURFACE_SYNC, 3));
+   radeon_ring_write(rdev, PACKET3_TC_ACTION_ENA | 
PACKET3_VC_ACTION_ENA);
+   radeon_ring_write(rdev, 0x);
+   radeon_ring_write(rdev, 0);
+   radeon_ring_write(rdev, 10); /* poll interval */
radeon_ring_write(rdev, PACKET3(PACKET3_EVENT_WRITE, 0));
radeon_ring_write(rdev, EVENT_TYPE(CACHE_FLUSH_AND_INV_EVENT) | 
EVENT_INDEX(0));
/* wait for 3D idle clean */
diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c 
b/drivers/gpu/drm/radeon/r600_blit_kms.c
index c4cf130..36e62f2 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -500,9 +500,9 @@ int r600_blit_init(struct radeon_device *rdev)
rdev->r600_blit.primitives.set_default_state = set_default_state;
 
rdev->r600_blit.ring_size_common = 40; /* shaders + def state */
-   rdev->r600_blit.ring_size_common += 10; /* fence emit for VB IB */
+   rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */
rdev->r600_blit.ring_size_common += 5; /* done copy */
-   rdev->r600_blit.ring_size_common += 10; /* fence emit for done copy */
+   rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */
 
rdev->r600_blit.ring_size_per_loop = 76;
/* set_render_target emits 2 extra dwords on rv6xx */
-- 
1.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: flush read cache for gtt with fence on r6xx and newer GPU V3

2011-10-26 Thread j . glisse
From: Jerome Glisse 

Cayman seems to be particularly sensitive to read cache returning
old data after bind/unbind to GTT. Flush read cache for GTT range
with each fences for all new hw. Should fix several rendering glitches.
Like

V2 flush whole address space
V3 also flush shader read cache

https://bugs.freedesktop.org/show_bug.cgi?id=40221
https://bugs.freedesktop.org/show_bug.cgi?id=38022
https://bugzilla.redhat.com/show_bug.cgi?id=738790

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/evergreen_blit_kms.c |4 ++--
 drivers/gpu/drm/radeon/r600.c   |   16 
 drivers/gpu/drm/radeon/r600_blit_kms.c  |4 ++--
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c 
b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
index dcf11bb..e9aeeed 100644
--- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c
+++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
@@ -613,9 +613,9 @@ int evergreen_blit_init(struct radeon_device *rdev)
rdev->r600_blit.primitives.set_default_state = set_default_state;
 
rdev->r600_blit.ring_size_common = 55; /* shaders + def state */
-   rdev->r600_blit.ring_size_common += 10; /* fence emit for VB IB */
+   rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */
rdev->r600_blit.ring_size_common += 5; /* done copy */
-   rdev->r600_blit.ring_size_common += 10; /* fence emit for done copy */
+   rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */
 
rdev->r600_blit.ring_size_per_loop = 74;
 
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 12470b0..1f007ad 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2331,6 +2331,14 @@ void r600_fence_ring_emit(struct radeon_device *rdev,
if (rdev->wb.use_event) {
u64 addr = rdev->wb.gpu_addr + R600_WB_EVENT_OFFSET +
(u64)(rdev->fence_drv.scratch_reg - 
rdev->scratch.reg_base);
+   /* flush read cache over gart */
+   radeon_ring_write(rdev, PACKET3(PACKET3_SURFACE_SYNC, 3));
+   radeon_ring_write(rdev, PACKET3_TC_ACTION_ENA |
+   PACKET3_VC_ACTION_ENA |
+   PACKET3_SH_ACTION_ENA);
+   radeon_ring_write(rdev, 0x);
+   radeon_ring_write(rdev, 0);
+   radeon_ring_write(rdev, 10); /* poll interval */
/* EVENT_WRITE_EOP - flush caches, send int */
radeon_ring_write(rdev, PACKET3(PACKET3_EVENT_WRITE_EOP, 4));
radeon_ring_write(rdev, 
EVENT_TYPE(CACHE_FLUSH_AND_INV_EVENT_TS) | EVENT_INDEX(5));
@@ -2339,6 +2347,14 @@ void r600_fence_ring_emit(struct radeon_device *rdev,
radeon_ring_write(rdev, fence->seq);
radeon_ring_write(rdev, 0);
} else {
+   /* flush read cache over gart */
+   radeon_ring_write(rdev, PACKET3(PACKET3_SURFACE_SYNC, 3));
+   radeon_ring_write(rdev, PACKET3_TC_ACTION_ENA |
+   PACKET3_VC_ACTION_ENA |
+   PACKET3_SH_ACTION_ENA);
+   radeon_ring_write(rdev, 0x);
+   radeon_ring_write(rdev, 0);
+   radeon_ring_write(rdev, 10); /* poll interval */
radeon_ring_write(rdev, PACKET3(PACKET3_EVENT_WRITE, 0));
radeon_ring_write(rdev, EVENT_TYPE(CACHE_FLUSH_AND_INV_EVENT) | 
EVENT_INDEX(0));
/* wait for 3D idle clean */
diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c 
b/drivers/gpu/drm/radeon/r600_blit_kms.c
index c4cf130..36e62f2 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -500,9 +500,9 @@ int r600_blit_init(struct radeon_device *rdev)
rdev->r600_blit.primitives.set_default_state = set_default_state;
 
rdev->r600_blit.ring_size_common = 40; /* shaders + def state */
-   rdev->r600_blit.ring_size_common += 10; /* fence emit for VB IB */
+   rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */
rdev->r600_blit.ring_size_common += 5; /* done copy */
-   rdev->r600_blit.ring_size_common += 10; /* fence emit for done copy */
+   rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */
 
rdev->r600_blit.ring_size_per_loop = 76;
/* set_render_target emits 2 extra dwords on rv6xx */
-- 
1.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: set hpd polarity at init time so hotplug detect works

2011-10-28 Thread j . glisse
From: Jerome Glisse 

Polarity needs to be set accordingly to connector status (connected
or disconnected). Set it up at module init so first hotplug works
reliably no matter what is the initial set of connector.

Signed-off-by: Jerome Glisse 
cc: sta...@kernel.org
---
 drivers/gpu/drm/radeon/radeon_connectors.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c 
b/drivers/gpu/drm/radeon/radeon_connectors.c
index dec6cbe..bfdd48b 100644
--- a/drivers/gpu/drm/radeon/radeon_connectors.c
+++ b/drivers/gpu/drm/radeon/radeon_connectors.c
@@ -1789,6 +1789,7 @@ radeon_add_atom_connector(struct drm_device *dev,
connector->polled = DRM_CONNECTOR_POLL_CONNECT;
} else
connector->polled = DRM_CONNECTOR_POLL_HPD;
+   radeon_hpd_set_polarity(rdev, radeon_connector->hpd.hpd);
 
connector->display_info.subpixel_order = subpixel_order;
drm_sysfs_connector_add(connector);
-- 
1.7.6.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC] ttm merge ttm_backend & ttm_t V2

2011-11-02 Thread j . glisse
Hi,

So attached is last batch of patch, i split the ttm put page
fix and i fixed a bug in the pages alloc when clear flags
wasn't set. I tested them on a bunch of radeon and everythings
seems fine (several gl app, firefox, compositor ...). I will
do more testing on agp and nouveau tomorrow.

The last patch add callback for populating and unpopulating
(better name if any welcome) a ttm_tt. Allowing the driver
to choose btw different choice, idea is that Konrad dma
allocator would provide helper function the driver can
call.

I choosed to allocate all page at once because ttm_tt object
are meant to be bind and thus to be fully populated in their
lifetime (vmwgfx might be different in this regard). It
simplify code in several place. I didn't see any performances
impact in the few gl benchmark i ran.

Konrad so i am planning on rebasing the last 4 patch of your
patchset on top of that. They will likely shrink in size a
bit.

Cheers,
Jerome Glisse

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/8] drm/ttm: remove userspace backed ttm object support

2011-11-02 Thread j . glisse
From: Jerome Glisse 

This was never use in none of the driver, properly using userspace
page for bo would need more code (vma interaction mostly). Removing
this dead code in preparation of ttm_tt & backend merge.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_bo.c|   22 
 drivers/gpu/drm/ttm/ttm_tt.c|  105 +--
 include/drm/ttm/ttm_bo_api.h|5 --
 include/drm/ttm/ttm_bo_driver.h |   24 -
 4 files changed, 1 insertions(+), 155 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 617b646..4bde335 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -342,22 +342,6 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, 
bool zero_alloc)
if (unlikely(bo->ttm == NULL))
ret = -ENOMEM;
break;
-   case ttm_bo_type_user:
-   bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT,
-   page_flags | TTM_PAGE_FLAG_USER,
-   glob->dummy_read_page);
-   if (unlikely(bo->ttm == NULL)) {
-   ret = -ENOMEM;
-   break;
-   }
-
-   ret = ttm_tt_set_user(bo->ttm, current,
- bo->buffer_start, bo->num_pages);
-   if (unlikely(ret != 0)) {
-   ttm_tt_destroy(bo->ttm);
-   bo->ttm = NULL;
-   }
-   break;
default:
printk(KERN_ERR TTM_PFX "Illegal buffer object type\n");
ret = -EINVAL;
@@ -907,16 +891,12 @@ static uint32_t ttm_bo_select_caching(struct 
ttm_mem_type_manager *man,
 }
 
 static bool ttm_bo_mt_compatible(struct ttm_mem_type_manager *man,
-bool disallow_fixed,
 uint32_t mem_type,
 uint32_t proposed_placement,
 uint32_t *masked_placement)
 {
uint32_t cur_flags = ttm_bo_type_flags(mem_type);
 
-   if ((man->flags & TTM_MEMTYPE_FLAG_FIXED) && disallow_fixed)
-   return false;
-
if ((cur_flags & proposed_placement & TTM_PL_MASK_MEM) == 0)
return false;
 
@@ -961,7 +941,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
man = &bdev->man[mem_type];
 
type_ok = ttm_bo_mt_compatible(man,
-   bo->type == ttm_bo_type_user,
mem_type,
placement->placement[i],
&cur_flags);
@@ -1009,7 +988,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
if (!man->has_type)
continue;
if (!ttm_bo_mt_compatible(man,
-   bo->type == ttm_bo_type_user,
mem_type,
placement->busy_placement[i],
&cur_flags))
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 58c271e..82a1161 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -62,43 +62,6 @@ static void ttm_tt_free_page_directory(struct ttm_tt *ttm)
ttm->dma_address = NULL;
 }
 
-static void ttm_tt_free_user_pages(struct ttm_tt *ttm)
-{
-   int write;
-   int dirty;
-   struct page *page;
-   int i;
-   struct ttm_backend *be = ttm->be;
-
-   BUG_ON(!(ttm->page_flags & TTM_PAGE_FLAG_USER));
-   write = ((ttm->page_flags & TTM_PAGE_FLAG_WRITE) != 0);
-   dirty = ((ttm->page_flags & TTM_PAGE_FLAG_USER_DIRTY) != 0);
-
-   if (be)
-   be->func->clear(be);
-
-   for (i = 0; i < ttm->num_pages; ++i) {
-   page = ttm->pages[i];
-   if (page == NULL)
-   continue;
-
-   if (page == ttm->dummy_read_page) {
-   BUG_ON(write);
-   continue;
-   }
-
-   if (write && dirty && !PageReserved(page))
-   set_page_dirty_lock(page);
-
-   ttm->pages[i] = NULL;
-   ttm_mem_global_free(ttm->glob->mem_glob, PAGE_SIZE);
-   put_page(page);
-   }
-   ttm->state = tt_unpopulated;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
-}
-
 static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index)
 {
struct page *p;
@@ -325,10 +288,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm)
}
 
if (likely(ttm->pages != NULL)) {
-   if (ttm->page_flags & TTM_PAGE_FLAG_USER)
-  

[PATCH 2/8] drm/ttm: remove split btw highmen and lowmem page

2011-11-02 Thread j . glisse
From: Jerome Glisse 

Split btw highmem and lowmem page was rendered useless by the
pool code. Remove it. Note further cleanup would change the
ttm page allocation helper to actualy take an array instead
of relying on list this could drasticly reduce the number of
function call in the common case of allocation whole buffer.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_tt.c|   11 ++-
 include/drm/ttm/ttm_bo_driver.h |7 ---
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 82a1161..8b7a6d0 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -69,7 +69,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int 
index)
struct ttm_mem_global *mem_glob = ttm->glob->mem_glob;
int ret;
 
-   while (NULL == (p = ttm->pages[index])) {
+   if (NULL == (p = ttm->pages[index])) {
 
INIT_LIST_HEAD(&h);
 
@@ -85,10 +85,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, 
int index)
if (unlikely(ret != 0))
goto out_err;
 
-   if (PageHighMem(p))
-   ttm->pages[--ttm->first_himem_page] = p;
-   else
-   ttm->pages[++ttm->last_lomem_page] = p;
+   ttm->pages[index] = p;
}
return p;
 out_err:
@@ -270,8 +267,6 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm)
ttm_put_pages(&h, count, ttm->page_flags, ttm->caching_state,
  ttm->dma_address);
ttm->state = tt_unpopulated;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
 }
 
 void ttm_tt_destroy(struct ttm_tt *ttm)
@@ -315,8 +310,6 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, 
unsigned long size,
 
ttm->glob = bdev->glob;
ttm->num_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
ttm->caching_state = tt_cached;
ttm->page_flags = page_flags;
 
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 37527d6..9da182b 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -136,11 +136,6 @@ enum ttm_caching_state {
  * @dummy_read_page: Page to map where the ttm_tt page array contains a NULL
  * pointer.
  * @pages: Array of pages backing the data.
- * @first_himem_page: Himem pages are put last in the page array, which
- * enables us to run caching attribute changes on only the first part
- * of the page array containing lomem pages. This is the index of the
- * first himem page.
- * @last_lomem_page: Index of the last lomem page in the page array.
  * @num_pages: Number of pages in the page array.
  * @bdev: Pointer to the current struct ttm_bo_device.
  * @be: Pointer to the ttm backend.
@@ -157,8 +152,6 @@ enum ttm_caching_state {
 struct ttm_tt {
struct page *dummy_read_page;
struct page **pages;
-   long first_himem_page;
-   long last_lomem_page;
uint32_t page_flags;
unsigned long num_pages;
struct ttm_bo_global *glob;
-- 
1.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 3/8] drm/ttm: remove unused backend flags field

2011-11-02 Thread j . glisse
From: Jerome Glisse 

This field is not use by any of the driver just drop it.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/radeon/radeon_ttm.c |1 -
 include/drm/ttm/ttm_bo_driver.h |2 --
 2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 0b5468b..97c76ae 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -787,7 +787,6 @@ struct ttm_backend *radeon_ttm_backend_create(struct 
radeon_device *rdev)
return NULL;
}
gtt->backend.bdev = &rdev->mman.bdev;
-   gtt->backend.flags = 0;
gtt->backend.func = &radeon_backend_func;
gtt->rdev = rdev;
gtt->pages = NULL;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 9da182b..6d17140 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -106,7 +106,6 @@ struct ttm_backend_func {
  * struct ttm_backend
  *
  * @bdev: Pointer to a struct ttm_bo_device.
- * @flags: For driver use.
  * @func: Pointer to a struct ttm_backend_func that describes
  * the backend methods.
  *
@@ -114,7 +113,6 @@ struct ttm_backend_func {
 
 struct ttm_backend {
struct ttm_bo_device *bdev;
-   uint32_t flags;
struct ttm_backend_func *func;
 };
 
-- 
1.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 4/8] drm/ttm: use ttm put pages function to properly restore cache attribute

2011-11-02 Thread j . glisse
From: Jerome Glisse 

On failure we need to make sure the page we free has wb cache
attribute. Do this pas call the proper ttm page helper function.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/ttm/ttm_tt.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 8b7a6d0..3fb4c6d 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,7 +89,10 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, 
int index)
}
return p;
 out_err:
-   put_page(p);
+   INIT_LIST_HEAD(&h);
+   list_add(&p->lru, &h);
+   ttm_put_pages(&h, 1, ttm->page_flags,
+ ttm->caching_state, &ttm->dma_address[index]);
return NULL;
 }
 
-- 
1.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 5/8] drm/ttm: convert page allocation to use page ptr array instead of list V2

2011-11-02 Thread j . glisse
From: Jerome Glisse 

Use the ttm_tt page ptr array for page allocation, move the list to
array unwinding into the page allocation functions.

V2 split the fix to use ttm put page as a separate fix
properly fill pages array when TTM_PAGE_FLAG_ZERO_ALLOC is not
set

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/ttm/ttm_memory.c |   44 +
 drivers/gpu/drm/ttm/ttm_page_alloc.c |   70 +++---
 drivers/gpu/drm/ttm/ttm_tt.c |   61 ++
 include/drm/ttm/ttm_memory.h |   11 +++--
 include/drm/ttm/ttm_page_alloc.h |   17 
 5 files changed, 101 insertions(+), 102 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
index e70ddd8..3a3a58b 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -543,41 +543,53 @@ int ttm_mem_global_alloc(struct ttm_mem_global *glob, 
uint64_t memory,
 }
 EXPORT_SYMBOL(ttm_mem_global_alloc);
 
-int ttm_mem_global_alloc_page(struct ttm_mem_global *glob,
- struct page *page,
- bool no_wait, bool interruptible)
+int ttm_mem_global_alloc_pages(struct ttm_mem_global *glob,
+  struct page **pages,
+  unsigned npages,
+  bool no_wait, bool interruptible)
 {
 
struct ttm_mem_zone *zone = NULL;
+   unsigned i;
+   int r;
 
/**
 * Page allocations may be registed in a single zone
 * only if highmem or !dma32.
 */
-
+   for (i = 0; i < npages; i++) {
 #ifdef CONFIG_HIGHMEM
-   if (PageHighMem(page) && glob->zone_highmem != NULL)
-   zone = glob->zone_highmem;
+   if (PageHighMem(pages[i]) && glob->zone_highmem != NULL)
+   zone = glob->zone_highmem;
 #else
-   if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL)
-   zone = glob->zone_kernel;
+   if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL)
+   zone = glob->zone_kernel;
 #endif
-   return ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait,
-interruptible);
+   r = ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait,
+ interruptible);
+   if (r) {
+   return r;
+   }
+   }
+   return 0;
 }
 
-void ttm_mem_global_free_page(struct ttm_mem_global *glob, struct page *page)
+void ttm_mem_global_free_pages(struct ttm_mem_global *glob,
+  struct page **pages, unsigned npages)
 {
struct ttm_mem_zone *zone = NULL;
+   unsigned i;
 
+   for (i = 0; i < npages; i++) {
 #ifdef CONFIG_HIGHMEM
-   if (PageHighMem(page) && glob->zone_highmem != NULL)
-   zone = glob->zone_highmem;
+   if (PageHighMem(pages[i]) && glob->zone_highmem != NULL)
+   zone = glob->zone_highmem;
 #else
-   if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL)
-   zone = glob->zone_kernel;
+   if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL)
+   zone = glob->zone_kernel;
 #endif
-   ttm_mem_global_free_zone(glob, zone, PAGE_SIZE);
+   ttm_mem_global_free_zone(glob, zone, PAGE_SIZE);
+   }
 }
 
 
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index 727e93d..e94ff12 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -619,8 +619,10 @@ static void ttm_page_pool_fill_locked(struct ttm_page_pool 
*pool,
  * @return count of pages still required to fulfill the request.
  */
 static unsigned ttm_page_pool_get_pages(struct ttm_page_pool *pool,
-   struct list_head *pages, int ttm_flags,
-   enum ttm_caching_state cstate, unsigned count)
+   struct list_head *pages,
+   int ttm_flags,
+   enum ttm_caching_state cstate,
+   unsigned count)
 {
unsigned long irq_flags;
struct list_head *p;
@@ -664,13 +666,14 @@ out:
  * On success pages list will hold count number of correctly
  * cached pages.
  */
-int ttm_get_pages(struct list_head *pages, int flags,
- enum ttm_caching_state cstate, unsigned count,
- dma_addr_t *dma_address)
+int ttm_get_pages(struct page **pages, unsigned npages, int flags,
+ enum ttm_caching_state cstate, dma_addr_t *dma_address)
 {
struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
struct page *p = NULL;
+   struct list_head plist;
gfp_t gfp_flags = GFP_USER;
+   unsigned count = 0;
int r;
 
/*

[PATCH 6/8] drm/ttm: test for dma_address array allocation failure

2011-11-02 Thread j . glisse
From: Jerome Glisse 

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/ttm/ttm_tt.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 2dd45ca..58ea7dc 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -298,7 +298,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, 
unsigned long size,
ttm->dummy_read_page = dummy_read_page;
 
ttm_tt_alloc_page_directory(ttm);
-   if (!ttm->pages) {
+   if (!ttm->pages || !ttm->dma_address) {
ttm_tt_destroy(ttm);
printk(KERN_ERR TTM_PFX "Failed allocating page table\n");
return NULL;
-- 
1.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 7/8] drm/ttm: merge ttm_backend and ttm_tt

2011-11-02 Thread j . glisse
From: Jerome Glisse 

ttm_backend will exist only and only with a ttm_tt, and ttm_tt
will be of interesting use only when bind to a backend. Thus to
avoid code & data duplication btw the two merge them.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c|   14 ++-
 drivers/gpu/drm/nouveau/nouveau_drv.h   |5 +-
 drivers/gpu/drm/nouveau/nouveau_sgdma.c |  188 --
 drivers/gpu/drm/radeon/radeon_ttm.c |  222 ---
 drivers/gpu/drm/ttm/ttm_agp_backend.c   |   88 +
 drivers/gpu/drm/ttm/ttm_bo.c|9 +-
 drivers/gpu/drm/ttm/ttm_tt.c|   59 ++---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c  |   66 +++--
 include/drm/ttm/ttm_bo_driver.h |  104 ++-
 9 files changed, 295 insertions(+), 460 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 7226f41..b060fa4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -343,8 +343,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, 
u32 val)
*mem = val;
 }
 
-static struct ttm_backend *
-nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev)
+static struct ttm_tt *
+nouveau_ttm_tt_create(struct ttm_bo_device *bdev,
+ unsigned long size, uint32_t page_flags,
+ struct page *dummy_read_page)
 {
struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev);
struct drm_device *dev = dev_priv->dev;
@@ -352,11 +354,13 @@ nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device 
*bdev)
switch (dev_priv->gart_info.type) {
 #if __OS_HAS_AGP
case NOUVEAU_GART_AGP:
-   return ttm_agp_backend_init(bdev, dev->agp->bridge);
+   return ttm_agp_tt_create(bdev, dev->agp->bridge,
+size, page_flags, dummy_read_page);
 #endif
case NOUVEAU_GART_PDMA:
case NOUVEAU_GART_HW:
-   return nouveau_sgdma_init_ttm(dev);
+   return nouveau_sgdma_create_ttm(bdev, size, page_flags,
+   dummy_read_page);
default:
NV_ERROR(dev, "Unknown GART type %d\n",
 dev_priv->gart_info.type);
@@ -1045,7 +1049,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
 }
 
 struct ttm_bo_driver nouveau_bo_driver = {
-   .create_ttm_backend_entry = nouveau_bo_create_ttm_backend_entry,
+   .ttm_tt_create = &nouveau_ttm_tt_create,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h 
b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 29837da..0c53e39 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -1000,7 +1000,10 @@ extern int nouveau_sgdma_init(struct drm_device *);
 extern void nouveau_sgdma_takedown(struct drm_device *);
 extern uint32_t nouveau_sgdma_get_physical(struct drm_device *,
   uint32_t offset);
-extern struct ttm_backend *nouveau_sgdma_init_ttm(struct drm_device *);
+extern struct ttm_tt *nouveau_sgdma_create_ttm(struct ttm_bo_device *bdev,
+  unsigned long size,
+  uint32_t page_flags,
+  struct page *dummy_read_page);
 
 /* nouveau_debugfs.c */
 #if defined(CONFIG_DRM_NOUVEAU_DEBUG)
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c 
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index b75258a..bc2ab90 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -8,44 +8,23 @@
 #define NV_CTXDMA_PAGE_MASK  (NV_CTXDMA_PAGE_SIZE - 1)
 
 struct nouveau_sgdma_be {
-   struct ttm_backend backend;
+   struct ttm_tt ttm;
struct drm_device *dev;
-
-   dma_addr_t *pages;
-   unsigned nr_pages;
-   bool unmap_pages;
-
u64 offset;
-   bool bound;
 };
 
 static int
-nouveau_sgdma_populate(struct ttm_backend *be, unsigned long num_pages,
-  struct page **pages, struct page *dummy_read_page,
-  dma_addr_t *dma_addrs)
+nouveau_sgdma_dma_map(struct ttm_tt *ttm)
 {
-   struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)be;
+   struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
struct drm_device *dev = nvbe->dev;
int i;
 
-   NV_DEBUG(nvbe->dev, "num_pages = %ld\n", num_pages);
-
-   nvbe->pages = dma_addrs;
-   nvbe->nr_pages = num_pages;
-   nvbe->unmap_pages = true;
-
-   /* this code path isn't called and is incorrect anyways */
-   if (0) { /* dma_addrs[0] != DMA_ERROR_CODE) { */
-   nvbe->unmap_pages = fa

[PATCH 8/8] drm/ttm: introduce callback for ttm_tt populate & unpopulate

2011-11-02 Thread j . glisse
From: Jerome Glisse 

Move the page allocation and freeing to driver callback and
provide ttm code helper function for those.

Most intrusive change, is the fact that we now only fully
populate an object this simplify some of code designed around
the page fault design.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c   |3 +
 drivers/gpu/drm/radeon/radeon_ttm.c|2 +
 drivers/gpu/drm/ttm/ttm_bo_util.c  |   31 ++-
 drivers/gpu/drm/ttm/ttm_bo_vm.c|   13 ++--
 drivers/gpu/drm/ttm/ttm_page_alloc.c   |   42 ++
 drivers/gpu/drm/ttm/ttm_tt.c   |   97 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 +
 include/drm/ttm/ttm_bo_driver.h|   41 --
 include/drm/ttm/ttm_page_alloc.h   |   18 ++
 9 files changed, 125 insertions(+), 125 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b060fa4..7e5ca3f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -28,6 +28,7 @@
  */
 
 #include "drmP.h"
+#include "ttm/ttm_page_alloc.h"
 
 #include "nouveau_drm.h"
 #include "nouveau_drv.h"
@@ -1050,6 +1051,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
 
 struct ttm_bo_driver nouveau_bo_driver = {
.ttm_tt_create = &nouveau_ttm_tt_create,
+   .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate,
+   .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 53ff62b..490afce 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -584,6 +584,8 @@ struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device 
*bdev,
 
 static struct ttm_bo_driver radeon_bo_driver = {
.ttm_tt_create = &radeon_ttm_tt_create,
+   .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate,
+   .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate,
.invalidate_caches = &radeon_invalidate_caches,
.init_mem_type = &radeon_init_mem_type,
.evict_flags = &radeon_evict_flags,
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 082fcae..60f204d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -244,7 +244,7 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void 
*src,
unsigned long page,
pgprot_t prot)
 {
-   struct page *d = ttm_tt_get_page(ttm, page);
+   struct page *d = ttm->pages[page];
void *dst;
 
if (!d)
@@ -281,7 +281,7 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void 
*dst,
unsigned long page,
pgprot_t prot)
 {
-   struct page *s = ttm_tt_get_page(ttm, page);
+   struct page *s = ttm->pages[page];
void *src;
 
if (!s)
@@ -342,6 +342,12 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
if (old_iomap == NULL && ttm == NULL)
goto out2;
 
+   if (ttm->state == tt_unpopulated) {
+   ret = ttm->bdev->driver->ttm_tt_populate(ttm);
+   if (ret)
+   goto out1;
+   }
+
add = 0;
dir = 1;
 
@@ -502,10 +508,16 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 {
struct ttm_mem_reg *mem = &bo->mem; pgprot_t prot;
struct ttm_tt *ttm = bo->ttm;
-   struct page *d;
-   int i;
+   int ret;
 
BUG_ON(!ttm);
+
+   if (ttm->state == tt_unpopulated) {
+   ret = ttm->bdev->driver->ttm_tt_populate(ttm);
+   if (ret)
+   return ret;
+   }
+
if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED)) {
/*
 * We're mapping a single page, and the desired
@@ -513,18 +525,9 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 */
 
map->bo_kmap_type = ttm_bo_map_kmap;
-   map->page = ttm_tt_get_page(ttm, start_page);
+   map->page = ttm->pages[start_page];
map->virtual = kmap(map->page);
} else {
-   /*
-* Populate the part we're mapping;
-*/
-   for (i = start_page; i < start_page + num_pages; ++i) {
-   d = ttm_tt_get_page(ttm, i);
-   if (!d)
-   return -ENOMEM;
-   }
-
/*
 * We need to use vmap to get the desired page protection
 * or to make the buffer object look contiguous.
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo

[PATCH] drm/radeon/kms: consolidate GART code, fix segfault after GPU lockup V2

2011-11-03 Thread j . glisse
From: Jerome Glisse 

After GPU lockup VRAM gart table is unpinned and thus its pointer
becomes unvalid. This patch move the unpin code to a common helper
function and set pointer to NULL so that page update code can check
if it should update GPU page table or not. That way bo still bound
to GART can be unbound (pci_unmap_page for all there page) properly
while there is no need to update the GPU page table.

V2 move the test for null gart out of the loop, small optimization

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/evergreen.c   |   12 +-
 drivers/gpu/drm/radeon/ni.c  |   13 +-
 drivers/gpu/drm/radeon/r100.c|6 ++-
 drivers/gpu/drm/radeon/r300.c|   16 ++--
 drivers/gpu/drm/radeon/r600.c|   17 ++--
 drivers/gpu/drm/radeon/radeon.h  |   22 ++
 drivers/gpu/drm/radeon/radeon_gart.c |   71 -
 drivers/gpu/drm/radeon/rs400.c   |5 +-
 drivers/gpu/drm/radeon/rs600.c   |   16 ++--
 drivers/gpu/drm/radeon/rv770.c   |   13 +-
 10 files changed, 75 insertions(+), 116 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index ed406e8..ebd2092 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -893,7 +893,7 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev)
u32 tmp;
int r;
 
-   if (rdev->gart.table.vram.robj == NULL) {
+   if (rdev->gart.robj == NULL) {
dev_err(rdev->dev, "No VRAM object for PCIE GART.\n");
return -EINVAL;
}
@@ -945,7 +945,6 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev)
 void evergreen_pcie_gart_disable(struct radeon_device *rdev)
 {
u32 tmp;
-   int r;
 
/* Disable all tables */
WREG32(VM_CONTEXT0_CNTL, 0);
@@ -965,14 +964,7 @@ void evergreen_pcie_gart_disable(struct radeon_device 
*rdev)
WREG32(MC_VM_MB_L1_TLB1_CNTL, tmp);
WREG32(MC_VM_MB_L1_TLB2_CNTL, tmp);
WREG32(MC_VM_MB_L1_TLB3_CNTL, tmp);
-   if (rdev->gart.table.vram.robj) {
-   r = radeon_bo_reserve(rdev->gart.table.vram.robj, false);
-   if (likely(r == 0)) {
-   radeon_bo_kunmap(rdev->gart.table.vram.robj);
-   radeon_bo_unpin(rdev->gart.table.vram.robj);
-   radeon_bo_unreserve(rdev->gart.table.vram.robj);
-   }
-   }
+   radeon_gart_table_vram_unpin(rdev);
 }
 
 void evergreen_pcie_gart_fini(struct radeon_device *rdev)
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index 556b7bc..927af99 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -932,7 +932,7 @@ int cayman_pcie_gart_enable(struct radeon_device *rdev)
 {
int r;
 
-   if (rdev->gart.table.vram.robj == NULL) {
+   if (rdev->gart.robj == NULL) {
dev_err(rdev->dev, "No VRAM object for PCIE GART.\n");
return -EINVAL;
}
@@ -977,8 +977,6 @@ int cayman_pcie_gart_enable(struct radeon_device *rdev)
 
 void cayman_pcie_gart_disable(struct radeon_device *rdev)
 {
-   int r;
-
/* Disable all tables */
WREG32(VM_CONTEXT0_CNTL, 0);
WREG32(VM_CONTEXT1_CNTL, 0);
@@ -994,14 +992,7 @@ void cayman_pcie_gart_disable(struct radeon_device *rdev)
WREG32(VM_L2_CNTL2, 0);
WREG32(VM_L2_CNTL3, L2_CACHE_BIGK_ASSOCIATIVITY |
   L2_CACHE_BIGK_FRAGMENT_SIZE(6));
-   if (rdev->gart.table.vram.robj) {
-   r = radeon_bo_reserve(rdev->gart.table.vram.robj, false);
-   if (likely(r == 0)) {
-   radeon_bo_kunmap(rdev->gart.table.vram.robj);
-   radeon_bo_unpin(rdev->gart.table.vram.robj);
-   radeon_bo_unreserve(rdev->gart.table.vram.robj);
-   }
-   }
+   radeon_gart_table_vram_unpin(rdev);
 }
 
 void cayman_pcie_gart_fini(struct radeon_device *rdev)
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 8f8b8fa..00d2fa9 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -576,7 +576,7 @@ int r100_pci_gart_init(struct radeon_device *rdev)
 {
int r;
 
-   if (rdev->gart.table.ram.ptr) {
+   if (rdev->gart.ptr) {
WARN(1, "R100 PCI GART already initialized\n");
return 0;
}
@@ -635,10 +635,12 @@ void r100_pci_gart_disable(struct radeon_device *rdev)
 
 int r100_pci_gart_set_page(struct radeon_device *rdev, int i, uint64_t addr)
 {
+   u32 *gtt = rdev->gart.ptr;
+
if (i < 0 || i > rdev->gart.num_gpu_pages) {
return -EINVAL;
}
-   rdev->gart.table.ram.ptr[i] = cpu_to_le32(lower_32_bits(addr));
+   gtt[i] = cpu_to_le32(lower_32_bits(addr));
return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c
inde

ttm: merge ttm_backend & ttm_tt, introduce ttm dma allocator

2011-11-03 Thread j . glisse
Hi,

So updated patchset, only patch 5 seen change since last set.
Last 3 patch are from your patchset, modified on top of mine.

Konrad so i added you dma pool allocator on top of that
and added support for it to radeon. All in all it's slightly
smaller than your patchset.

Biggest change is use of a list_head in ttm_tt to keep the
dma_page list inside the ttm_tt object allowing faster and
lot simpler deallocation of page.

I only briefly test this code, it seems ok so far. Did you
tested booting kernel with swiotlb=force and with your patchset ?
Because here it doesn't work. I still don't understand why
swiotlb want to create a bounce page when the page supplied
fit the constraint. Need to dig into kernel history to see if
there is any good reasons for that.

Otherwise i believe this whole patchset make things cleaner
and simpler for ttm.

Cheers,
Jerome Glisse

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 01/11] drm/ttm: remove userspace backed ttm object support

2011-11-03 Thread j . glisse
From: Jerome Glisse 

This was never use in none of the driver, properly using userspace
page for bo would need more code (vma interaction mostly). Removing
this dead code in preparation of ttm_tt & backend merge.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_bo.c|   22 
 drivers/gpu/drm/ttm/ttm_tt.c|  105 +--
 include/drm/ttm/ttm_bo_api.h|5 --
 include/drm/ttm/ttm_bo_driver.h |   24 -
 4 files changed, 1 insertions(+), 155 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 617b646..4bde335 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -342,22 +342,6 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, 
bool zero_alloc)
if (unlikely(bo->ttm == NULL))
ret = -ENOMEM;
break;
-   case ttm_bo_type_user:
-   bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT,
-   page_flags | TTM_PAGE_FLAG_USER,
-   glob->dummy_read_page);
-   if (unlikely(bo->ttm == NULL)) {
-   ret = -ENOMEM;
-   break;
-   }
-
-   ret = ttm_tt_set_user(bo->ttm, current,
- bo->buffer_start, bo->num_pages);
-   if (unlikely(ret != 0)) {
-   ttm_tt_destroy(bo->ttm);
-   bo->ttm = NULL;
-   }
-   break;
default:
printk(KERN_ERR TTM_PFX "Illegal buffer object type\n");
ret = -EINVAL;
@@ -907,16 +891,12 @@ static uint32_t ttm_bo_select_caching(struct 
ttm_mem_type_manager *man,
 }
 
 static bool ttm_bo_mt_compatible(struct ttm_mem_type_manager *man,
-bool disallow_fixed,
 uint32_t mem_type,
 uint32_t proposed_placement,
 uint32_t *masked_placement)
 {
uint32_t cur_flags = ttm_bo_type_flags(mem_type);
 
-   if ((man->flags & TTM_MEMTYPE_FLAG_FIXED) && disallow_fixed)
-   return false;
-
if ((cur_flags & proposed_placement & TTM_PL_MASK_MEM) == 0)
return false;
 
@@ -961,7 +941,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
man = &bdev->man[mem_type];
 
type_ok = ttm_bo_mt_compatible(man,
-   bo->type == ttm_bo_type_user,
mem_type,
placement->placement[i],
&cur_flags);
@@ -1009,7 +988,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
if (!man->has_type)
continue;
if (!ttm_bo_mt_compatible(man,
-   bo->type == ttm_bo_type_user,
mem_type,
placement->busy_placement[i],
&cur_flags))
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 58c271e..82a1161 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -62,43 +62,6 @@ static void ttm_tt_free_page_directory(struct ttm_tt *ttm)
ttm->dma_address = NULL;
 }
 
-static void ttm_tt_free_user_pages(struct ttm_tt *ttm)
-{
-   int write;
-   int dirty;
-   struct page *page;
-   int i;
-   struct ttm_backend *be = ttm->be;
-
-   BUG_ON(!(ttm->page_flags & TTM_PAGE_FLAG_USER));
-   write = ((ttm->page_flags & TTM_PAGE_FLAG_WRITE) != 0);
-   dirty = ((ttm->page_flags & TTM_PAGE_FLAG_USER_DIRTY) != 0);
-
-   if (be)
-   be->func->clear(be);
-
-   for (i = 0; i < ttm->num_pages; ++i) {
-   page = ttm->pages[i];
-   if (page == NULL)
-   continue;
-
-   if (page == ttm->dummy_read_page) {
-   BUG_ON(write);
-   continue;
-   }
-
-   if (write && dirty && !PageReserved(page))
-   set_page_dirty_lock(page);
-
-   ttm->pages[i] = NULL;
-   ttm_mem_global_free(ttm->glob->mem_glob, PAGE_SIZE);
-   put_page(page);
-   }
-   ttm->state = tt_unpopulated;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
-}
-
 static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index)
 {
struct page *p;
@@ -325,10 +288,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm)
}
 
if (likely(ttm->pages != NULL)) {
-   if (ttm->page_flags & TTM_PAGE_FLAG_USER)
-  

[PATCH 02/11] drm/ttm: remove split btw highmen and lowmem page

2011-11-03 Thread j . glisse
From: Jerome Glisse 

Split btw highmem and lowmem page was rendered useless by the
pool code. Remove it. Note further cleanup would change the
ttm page allocation helper to actualy take an array instead
of relying on list this could drasticly reduce the number of
function call in the common case of allocation whole buffer.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_tt.c|   11 ++-
 include/drm/ttm/ttm_bo_driver.h |7 ---
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 82a1161..8b7a6d0 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -69,7 +69,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int 
index)
struct ttm_mem_global *mem_glob = ttm->glob->mem_glob;
int ret;
 
-   while (NULL == (p = ttm->pages[index])) {
+   if (NULL == (p = ttm->pages[index])) {
 
INIT_LIST_HEAD(&h);
 
@@ -85,10 +85,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, 
int index)
if (unlikely(ret != 0))
goto out_err;
 
-   if (PageHighMem(p))
-   ttm->pages[--ttm->first_himem_page] = p;
-   else
-   ttm->pages[++ttm->last_lomem_page] = p;
+   ttm->pages[index] = p;
}
return p;
 out_err:
@@ -270,8 +267,6 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm)
ttm_put_pages(&h, count, ttm->page_flags, ttm->caching_state,
  ttm->dma_address);
ttm->state = tt_unpopulated;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
 }
 
 void ttm_tt_destroy(struct ttm_tt *ttm)
@@ -315,8 +310,6 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, 
unsigned long size,
 
ttm->glob = bdev->glob;
ttm->num_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
ttm->caching_state = tt_cached;
ttm->page_flags = page_flags;
 
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 37527d6..9da182b 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -136,11 +136,6 @@ enum ttm_caching_state {
  * @dummy_read_page: Page to map where the ttm_tt page array contains a NULL
  * pointer.
  * @pages: Array of pages backing the data.
- * @first_himem_page: Himem pages are put last in the page array, which
- * enables us to run caching attribute changes on only the first part
- * of the page array containing lomem pages. This is the index of the
- * first himem page.
- * @last_lomem_page: Index of the last lomem page in the page array.
  * @num_pages: Number of pages in the page array.
  * @bdev: Pointer to the current struct ttm_bo_device.
  * @be: Pointer to the ttm backend.
@@ -157,8 +152,6 @@ enum ttm_caching_state {
 struct ttm_tt {
struct page *dummy_read_page;
struct page **pages;
-   long first_himem_page;
-   long last_lomem_page;
uint32_t page_flags;
unsigned long num_pages;
struct ttm_bo_global *glob;
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 03/11] drm/ttm: remove unused backend flags field

2011-11-03 Thread j . glisse
From: Jerome Glisse 

This field is not use by any of the driver just drop it.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/radeon/radeon_ttm.c |1 -
 include/drm/ttm/ttm_bo_driver.h |2 --
 2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 0b5468b..97c76ae 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -787,7 +787,6 @@ struct ttm_backend *radeon_ttm_backend_create(struct 
radeon_device *rdev)
return NULL;
}
gtt->backend.bdev = &rdev->mman.bdev;
-   gtt->backend.flags = 0;
gtt->backend.func = &radeon_backend_func;
gtt->rdev = rdev;
gtt->pages = NULL;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 9da182b..6d17140 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -106,7 +106,6 @@ struct ttm_backend_func {
  * struct ttm_backend
  *
  * @bdev: Pointer to a struct ttm_bo_device.
- * @flags: For driver use.
  * @func: Pointer to a struct ttm_backend_func that describes
  * the backend methods.
  *
@@ -114,7 +113,6 @@ struct ttm_backend_func {
 
 struct ttm_backend {
struct ttm_bo_device *bdev;
-   uint32_t flags;
struct ttm_backend_func *func;
 };
 
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 04/11] drm/ttm: use ttm put pages function to properly restore cache attribute

2011-11-03 Thread j . glisse
From: Jerome Glisse 

On failure we need to make sure the page we free has wb cache
attribute. Do this pas call the proper ttm page helper function.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_tt.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 8b7a6d0..3fb4c6d 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,7 +89,10 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, 
int index)
}
return p;
 out_err:
-   put_page(p);
+   INIT_LIST_HEAD(&h);
+   list_add(&p->lru, &h);
+   ttm_put_pages(&h, 1, ttm->page_flags,
+ ttm->caching_state, &ttm->dma_address[index]);
return NULL;
 }
 
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 05/11] drm/ttm: convert page allocation to use page ptr array instead of list V3

2011-11-03 Thread j . glisse
From: Jerome Glisse 

Use the ttm_tt page ptr array for page allocation, move the list to
array unwinding into the page allocation functions.

V2 split the fix to use ttm put page as a separate fix
properly fill pages array when TTM_PAGE_FLAG_ZERO_ALLOC is not
set
V3 Added back page_count()==1 check when freeing page

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_memory.c |   44 +++--
 drivers/gpu/drm/ttm/ttm_page_alloc.c |   90 --
 drivers/gpu/drm/ttm/ttm_tt.c |   61 ---
 include/drm/ttm/ttm_memory.h |   11 ++--
 include/drm/ttm/ttm_page_alloc.h |   17 +++---
 5 files changed, 115 insertions(+), 108 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
index e70ddd8..3a3a58b 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -543,41 +543,53 @@ int ttm_mem_global_alloc(struct ttm_mem_global *glob, 
uint64_t memory,
 }
 EXPORT_SYMBOL(ttm_mem_global_alloc);
 
-int ttm_mem_global_alloc_page(struct ttm_mem_global *glob,
- struct page *page,
- bool no_wait, bool interruptible)
+int ttm_mem_global_alloc_pages(struct ttm_mem_global *glob,
+  struct page **pages,
+  unsigned npages,
+  bool no_wait, bool interruptible)
 {
 
struct ttm_mem_zone *zone = NULL;
+   unsigned i;
+   int r;
 
/**
 * Page allocations may be registed in a single zone
 * only if highmem or !dma32.
 */
-
+   for (i = 0; i < npages; i++) {
 #ifdef CONFIG_HIGHMEM
-   if (PageHighMem(page) && glob->zone_highmem != NULL)
-   zone = glob->zone_highmem;
+   if (PageHighMem(pages[i]) && glob->zone_highmem != NULL)
+   zone = glob->zone_highmem;
 #else
-   if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL)
-   zone = glob->zone_kernel;
+   if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL)
+   zone = glob->zone_kernel;
 #endif
-   return ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait,
-interruptible);
+   r = ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait,
+ interruptible);
+   if (r) {
+   return r;
+   }
+   }
+   return 0;
 }
 
-void ttm_mem_global_free_page(struct ttm_mem_global *glob, struct page *page)
+void ttm_mem_global_free_pages(struct ttm_mem_global *glob,
+  struct page **pages, unsigned npages)
 {
struct ttm_mem_zone *zone = NULL;
+   unsigned i;
 
+   for (i = 0; i < npages; i++) {
 #ifdef CONFIG_HIGHMEM
-   if (PageHighMem(page) && glob->zone_highmem != NULL)
-   zone = glob->zone_highmem;
+   if (PageHighMem(pages[i]) && glob->zone_highmem != NULL)
+   zone = glob->zone_highmem;
 #else
-   if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL)
-   zone = glob->zone_kernel;
+   if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL)
+   zone = glob->zone_kernel;
 #endif
-   ttm_mem_global_free_zone(glob, zone, PAGE_SIZE);
+   ttm_mem_global_free_zone(glob, zone, PAGE_SIZE);
+   }
 }
 
 
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index 727e93d..c4f18b9 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -619,8 +619,10 @@ static void ttm_page_pool_fill_locked(struct ttm_page_pool 
*pool,
  * @return count of pages still required to fulfill the request.
  */
 static unsigned ttm_page_pool_get_pages(struct ttm_page_pool *pool,
-   struct list_head *pages, int ttm_flags,
-   enum ttm_caching_state cstate, unsigned count)
+   struct list_head *pages,
+   int ttm_flags,
+   enum ttm_caching_state cstate,
+   unsigned count)
 {
unsigned long irq_flags;
struct list_head *p;
@@ -664,13 +666,14 @@ out:
  * On success pages list will hold count number of correctly
  * cached pages.
  */
-int ttm_get_pages(struct list_head *pages, int flags,
- enum ttm_caching_state cstate, unsigned count,
- dma_addr_t *dma_address)
+int ttm_get_pages(struct page **pages, unsigned npages, int flags,
+ enum ttm_caching_state cstate, dma_addr_t *dma_address)
 {
struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
struct page *p = NULL;
+   struct list_head plist;
gfp_t g

[PATCH 06/11] drm/ttm: test for dma_address array allocation failure

2011-11-03 Thread j . glisse
From: Jerome Glisse 

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_tt.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 2dd45ca..58ea7dc 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -298,7 +298,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, 
unsigned long size,
ttm->dummy_read_page = dummy_read_page;
 
ttm_tt_alloc_page_directory(ttm);
-   if (!ttm->pages) {
+   if (!ttm->pages || !ttm->dma_address) {
ttm_tt_destroy(ttm);
printk(KERN_ERR TTM_PFX "Failed allocating page table\n");
return NULL;
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 07/11] drm/ttm: merge ttm_backend and ttm_tt

2011-11-03 Thread j . glisse
From: Jerome Glisse 

ttm_backend will exist only and only with a ttm_tt, and ttm_tt
will be of interesting use only when bind to a backend. Thus to
avoid code & data duplication btw the two merge them.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c|   14 ++-
 drivers/gpu/drm/nouveau/nouveau_drv.h   |5 +-
 drivers/gpu/drm/nouveau/nouveau_sgdma.c |  188 --
 drivers/gpu/drm/radeon/radeon_ttm.c |  222 ---
 drivers/gpu/drm/ttm/ttm_agp_backend.c   |   88 +
 drivers/gpu/drm/ttm/ttm_bo.c|9 +-
 drivers/gpu/drm/ttm/ttm_tt.c|   59 ++---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c  |   66 +++--
 include/drm/ttm/ttm_bo_driver.h |  104 ++-
 9 files changed, 295 insertions(+), 460 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 7226f41..b060fa4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -343,8 +343,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, 
u32 val)
*mem = val;
 }
 
-static struct ttm_backend *
-nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev)
+static struct ttm_tt *
+nouveau_ttm_tt_create(struct ttm_bo_device *bdev,
+ unsigned long size, uint32_t page_flags,
+ struct page *dummy_read_page)
 {
struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev);
struct drm_device *dev = dev_priv->dev;
@@ -352,11 +354,13 @@ nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device 
*bdev)
switch (dev_priv->gart_info.type) {
 #if __OS_HAS_AGP
case NOUVEAU_GART_AGP:
-   return ttm_agp_backend_init(bdev, dev->agp->bridge);
+   return ttm_agp_tt_create(bdev, dev->agp->bridge,
+size, page_flags, dummy_read_page);
 #endif
case NOUVEAU_GART_PDMA:
case NOUVEAU_GART_HW:
-   return nouveau_sgdma_init_ttm(dev);
+   return nouveau_sgdma_create_ttm(bdev, size, page_flags,
+   dummy_read_page);
default:
NV_ERROR(dev, "Unknown GART type %d\n",
 dev_priv->gart_info.type);
@@ -1045,7 +1049,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
 }
 
 struct ttm_bo_driver nouveau_bo_driver = {
-   .create_ttm_backend_entry = nouveau_bo_create_ttm_backend_entry,
+   .ttm_tt_create = &nouveau_ttm_tt_create,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h 
b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 29837da..0c53e39 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -1000,7 +1000,10 @@ extern int nouveau_sgdma_init(struct drm_device *);
 extern void nouveau_sgdma_takedown(struct drm_device *);
 extern uint32_t nouveau_sgdma_get_physical(struct drm_device *,
   uint32_t offset);
-extern struct ttm_backend *nouveau_sgdma_init_ttm(struct drm_device *);
+extern struct ttm_tt *nouveau_sgdma_create_ttm(struct ttm_bo_device *bdev,
+  unsigned long size,
+  uint32_t page_flags,
+  struct page *dummy_read_page);
 
 /* nouveau_debugfs.c */
 #if defined(CONFIG_DRM_NOUVEAU_DEBUG)
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c 
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index b75258a..bc2ab90 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -8,44 +8,23 @@
 #define NV_CTXDMA_PAGE_MASK  (NV_CTXDMA_PAGE_SIZE - 1)
 
 struct nouveau_sgdma_be {
-   struct ttm_backend backend;
+   struct ttm_tt ttm;
struct drm_device *dev;
-
-   dma_addr_t *pages;
-   unsigned nr_pages;
-   bool unmap_pages;
-
u64 offset;
-   bool bound;
 };
 
 static int
-nouveau_sgdma_populate(struct ttm_backend *be, unsigned long num_pages,
-  struct page **pages, struct page *dummy_read_page,
-  dma_addr_t *dma_addrs)
+nouveau_sgdma_dma_map(struct ttm_tt *ttm)
 {
-   struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)be;
+   struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
struct drm_device *dev = nvbe->dev;
int i;
 
-   NV_DEBUG(nvbe->dev, "num_pages = %ld\n", num_pages);
-
-   nvbe->pages = dma_addrs;
-   nvbe->nr_pages = num_pages;
-   nvbe->unmap_pages = true;
-
-   /* this code path isn't called and is incorrect anyways */
-   if (0) { /* dma_addrs[0] != DMA_ERROR_CODE) { */
-   nvbe->unmap_pages = fa

[PATCH 08/11] drm/ttm: introduce callback for ttm_tt populate & unpopulate

2011-11-03 Thread j . glisse
From: Jerome Glisse 

Move the page allocation and freeing to driver callback and
provide ttm code helper function for those.

Most intrusive change, is the fact that we now only fully
populate an object this simplify some of code designed around
the page fault design.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c   |3 +
 drivers/gpu/drm/radeon/radeon_ttm.c|2 +
 drivers/gpu/drm/ttm/ttm_bo_util.c  |   31 ++-
 drivers/gpu/drm/ttm/ttm_bo_vm.c|   13 ++--
 drivers/gpu/drm/ttm/ttm_page_alloc.c   |   42 ++
 drivers/gpu/drm/ttm/ttm_tt.c   |   97 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 +
 include/drm/ttm/ttm_bo_driver.h|   41 --
 include/drm/ttm/ttm_page_alloc.h   |   18 ++
 9 files changed, 125 insertions(+), 125 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b060fa4..7e5ca3f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -28,6 +28,7 @@
  */
 
 #include "drmP.h"
+#include "ttm/ttm_page_alloc.h"
 
 #include "nouveau_drm.h"
 #include "nouveau_drv.h"
@@ -1050,6 +1051,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
 
 struct ttm_bo_driver nouveau_bo_driver = {
.ttm_tt_create = &nouveau_ttm_tt_create,
+   .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate,
+   .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 53ff62b..490afce 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -584,6 +584,8 @@ struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device 
*bdev,
 
 static struct ttm_bo_driver radeon_bo_driver = {
.ttm_tt_create = &radeon_ttm_tt_create,
+   .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate,
+   .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate,
.invalidate_caches = &radeon_invalidate_caches,
.init_mem_type = &radeon_init_mem_type,
.evict_flags = &radeon_evict_flags,
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 082fcae..60f204d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -244,7 +244,7 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void 
*src,
unsigned long page,
pgprot_t prot)
 {
-   struct page *d = ttm_tt_get_page(ttm, page);
+   struct page *d = ttm->pages[page];
void *dst;
 
if (!d)
@@ -281,7 +281,7 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void 
*dst,
unsigned long page,
pgprot_t prot)
 {
-   struct page *s = ttm_tt_get_page(ttm, page);
+   struct page *s = ttm->pages[page];
void *src;
 
if (!s)
@@ -342,6 +342,12 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
if (old_iomap == NULL && ttm == NULL)
goto out2;
 
+   if (ttm->state == tt_unpopulated) {
+   ret = ttm->bdev->driver->ttm_tt_populate(ttm);
+   if (ret)
+   goto out1;
+   }
+
add = 0;
dir = 1;
 
@@ -502,10 +508,16 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 {
struct ttm_mem_reg *mem = &bo->mem; pgprot_t prot;
struct ttm_tt *ttm = bo->ttm;
-   struct page *d;
-   int i;
+   int ret;
 
BUG_ON(!ttm);
+
+   if (ttm->state == tt_unpopulated) {
+   ret = ttm->bdev->driver->ttm_tt_populate(ttm);
+   if (ret)
+   return ret;
+   }
+
if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED)) {
/*
 * We're mapping a single page, and the desired
@@ -513,18 +525,9 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 */
 
map->bo_kmap_type = ttm_bo_map_kmap;
-   map->page = ttm_tt_get_page(ttm, start_page);
+   map->page = ttm->pages[start_page];
map->virtual = kmap(map->page);
} else {
-   /*
-* Populate the part we're mapping;
-*/
-   for (i = start_page; i < start_page + num_pages; ++i) {
-   d = ttm_tt_get_page(ttm, i);
-   if (!d)
-   return -ENOMEM;
-   }
-
/*
 * We need to use vmap to get the desired page protection
 * or to make the buffer object look contiguous.
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo

[PATCH 09/11] ttm: Provide DMA aware TTM page pool code.

2011-11-03 Thread j . glisse
From: Konrad Rzeszutek Wilk 

In TTM world the pages for the graphic drivers are kept in three different
pools: write combined, uncached, and cached (write-back). When the pages
are used by the graphic driver the graphic adapter via its built in MMU
(or AGP) programs these pages in. The programming requires the virtual address
(from the graphic adapter perspective) and the physical address (either System 
RAM
or the memory on the card) which is obtained using the pci_map_* calls (which 
does the
virtual to physical - or bus address translation). During the graphic 
application's
"life" those pages can be shuffled around, swapped out to disk, moved from the
VRAM to System RAM or vice-versa. This all works with the existing TTM pool code
- except when we want to use the software IOTLB (SWIOTLB) code to "map" the 
physical
addresses to the graphic adapter MMU. We end up programming the bounce buffer's
physical address instead of the TTM pool memory's and get a non-worky driver.
There are two solutions:
1) using the DMA API to allocate pages that are screened by the DMA API, or
2) using the pci_sync_* calls to copy the pages from the bounce-buffer and back.

This patch fixes the issue by allocating pages using the DMA API. The second
is a viable option - but it has performance drawbacks and potential correctness
issues - think of the write cache page being bounced (SWIOTLB->TTM), the
WC is set on the TTM page and the copy from SWIOTLB not making it to the TTM
page until the page has been recycled in the pool (and used by another 
application).

The bounce buffer does not get activated often - only in cases where we have
a 32-bit capable card and we want to use a page that is allocated above the
4GB limit. The bounce buffer offers the solution of copying the contents
of that 4GB page to an location below 4GB and then back when the operation has 
been
completed (or vice-versa). This is done by using the 'pci_sync_*' calls.
Note: If you look carefully enough in the existing TTM page pool code you will
notice the GFP_DMA32 flag is used  - which should guarantee that the provided 
page
is under 4GB. It certainly is the case, except this gets ignored in two cases:
 - If user specifies 'swiotlb=force' which bounces _every_ page.
 - If user is using a Xen's PV Linux guest (which uses the SWIOTLB and the
   underlaying PFN's aren't necessarily under 4GB).

To not have this extra copying done the other option is to allocate the pages
using the DMA API so that there is not need to map the page and perform the
expensive 'pci_sync_*' calls.

This DMA API capable TTM pool requires for this the 'struct device' to
properly call the DMA API. It also has to track the virtual and bus address of
the page being handed out in case it ends up being swapped out or de-allocated -
to make sure it is de-allocated using the proper's 'struct device'.

Implementation wise the code keeps two lists: one that is attached to the
'struct device' (via the dev->dma_pools list) and a global one to be used when
the 'struct device' is unavailable (think shrinker code). The global list can
iterate over all of the 'struct device' and its associated dma_pool. The list
in dev->dma_pools can only iterate the device's dma_pool.
/[struct 
device_pool]\
/---| dev   
 |
   /+---| dma_pool  
 |
 /-+--\/
\/
 |struct device| /-->[struct dma_pool for WC][struct dma_pool for uncached]<-/--| dma_pool  
 |
 \-+--/ /   
\/
\--/
[Two pools associated with the device (WC and UC), and the parallel list
containing the 'struct dev' and 'struct dma_pool' entries]

The maximum amount of dma pools a device can have is six: write-combined,
uncached, and cached; then there are the DMA32 variants which are:
write-combined dma32, uncached dma32, and cached dma32.

Currently this code only gets activated when any variant of the SWIOTLB IOMMU
code is running (Intel without VT-d, AMD without GART, IBM Calgary and Xen PV
with PCI devices).

Tested-by: Michel Dänzer 
[v1: Using swiotlb_nr_tbl instead of swiotlb_enabled]
[v2: Major overhaul - added 'inuse_list' to seperate used from inuse and reorder
the order of lists to get better performance.]
[v3: Added comments/and some logic based on review, Added Jerome tag]
[v4: rebase on top of ttm_tt & ttm_backend merge]
Reviewed-by: Jerome Glisse 
Signed-off-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/Makefile |4 +
 drivers/gpu/drm/ttm/ttm_memory.c |2 +
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 1212 ++
 include/drm/ttm/ttm_bo_driver.h  |2 +
 include/drm/ttm/ttm_page_alloc.h 

[PATCH 10/11] swiotlb: Expose swiotlb_nr_tlb function to modules

2011-11-03 Thread j . glisse
From: Konrad Rzeszutek Wilk 

As a mechanism to detect whether SWIOTLB is enabled or not.
We also fix the spelling - it was swioltb instead of
swiotlb.

CC: FUJITA Tomonori 
[v1: Ripped out swiotlb_enabled]
Signed-off-by: Konrad Rzeszutek Wilk 
---
 drivers/xen/swiotlb-xen.c |2 +-
 include/linux/swiotlb.h   |2 +-
 lib/swiotlb.c |5 +++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index c984768..c50fb0b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -152,7 +152,7 @@ void __init xen_swiotlb_init(int verbose)
char *m = NULL;
unsigned int repeat = 3;
 
-   nr_tbl = swioltb_nr_tbl();
+   nr_tbl = swiotlb_nr_tbl();
if (nr_tbl)
xen_io_tlb_nslabs = nr_tbl;
else {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 445702c..e872526 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -24,7 +24,7 @@ extern int swiotlb_force;
 
 extern void swiotlb_init(int verbose);
 extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
verbose);
-extern unsigned long swioltb_nr_tbl(void);
+extern unsigned long swiotlb_nr_tbl(void);
 
 /*
  * Enumeration for sync targets
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 99093b3..058935e 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -110,11 +110,11 @@ setup_io_tlb_npages(char *str)
 __setup("swiotlb=", setup_io_tlb_npages);
 /* make io_tlb_overflow tunable too? */
 
-unsigned long swioltb_nr_tbl(void)
+unsigned long swiotlb_nr_tbl(void)
 {
return io_tlb_nslabs;
 }
-
+EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
 /* Note that this doesn't work with highmem page */
 static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
  volatile void *address)
@@ -321,6 +321,7 @@ void __init swiotlb_free(void)
free_bootmem_late(__pa(io_tlb_start),
  PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
}
+   io_tlb_nslabs = 0;
 }
 
 static int is_swiotlb_buffer(phys_addr_t paddr)
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 11/11] drm/radeon/kms: Enable the TTM DMA pool if swiotlb is on

2011-11-03 Thread j . glisse
From: Konrad Rzeszutek Wilk 

With the exception that we do not handle the AGP case. We only
deal with PCIe cards such as ATI ES1000 or HD3200 that have been
detected to only do DMA up to 32-bits.

CC: Dave Airlie 
CC: Alex Deucher 
Signed-off-by: Konrad Rzeszutek Wilk 
Reviewed-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h|1 -
 drivers/gpu/drm/radeon/radeon_device.c |5 ++
 drivers/gpu/drm/radeon/radeon_gart.c   |   29 +---
 drivers/gpu/drm/radeon/radeon_ttm.c|   83 +--
 4 files changed, 83 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index e3170c7..63257ba 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -332,7 +332,6 @@ struct radeon_gart {
union radeon_gart_table table;
struct page **pages;
dma_addr_t  *pages_addr;
-   bool*ttm_alloced;
boolready;
 };
 
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index c33bc91..11f6481 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -767,6 +767,11 @@ int radeon_device_init(struct radeon_device *rdev,
rdev->need_dma32 = true;
printk(KERN_WARNING "radeon: No suitable DMA available.\n");
}
+   r = pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(dma_bits));
+   if (r) {
+   pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(32));
+   printk(KERN_WARNING "radeon: No coherent DMA available.\n");
+   }
 
/* Registers mapping */
/* TODO: block userspace mapping of io register */
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index fdc3a9a..18f496c 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -149,9 +149,6 @@ void radeon_gart_unbind(struct radeon_device *rdev, 
unsigned offset,
p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
for (i = 0; i < pages; i++, p++) {
if (rdev->gart.pages[p]) {
-   if (!rdev->gart.ttm_alloced[p])
-   pci_unmap_page(rdev->pdev, 
rdev->gart.pages_addr[p],
-   PAGE_SIZE, 
PCI_DMA_BIDIRECTIONAL);
rdev->gart.pages[p] = NULL;
rdev->gart.pages_addr[p] = rdev->dummy_page.addr;
page_base = rdev->gart.pages_addr[p];
@@ -181,23 +178,7 @@ int radeon_gart_bind(struct radeon_device *rdev, unsigned 
offset,
p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
 
for (i = 0; i < pages; i++, p++) {
-   /* we reverted the patch using dma_addr in TTM for now but this
-* code stops building on alpha so just comment it out for now 
*/
-   if (0) { /*dma_addr[i] != DMA_ERROR_CODE) */
-   rdev->gart.ttm_alloced[p] = true;
-   rdev->gart.pages_addr[p] = dma_addr[i];
-   } else {
-   /* we need to support large memory configurations */
-   /* assume that unbind have already been call on the 
range */
-   rdev->gart.pages_addr[p] = pci_map_page(rdev->pdev, 
pagelist[i],
-   0, PAGE_SIZE,
-   PCI_DMA_BIDIRECTIONAL);
-   if (pci_dma_mapping_error(rdev->pdev, 
rdev->gart.pages_addr[p])) {
-   /* FIXME: failed to map page (return -ENOMEM?) 
*/
-   radeon_gart_unbind(rdev, offset, pages);
-   return -ENOMEM;
-   }
-   }
+   rdev->gart.pages_addr[p] = dma_addr[i];
rdev->gart.pages[p] = pagelist[i];
page_base = rdev->gart.pages_addr[p];
for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) {
@@ -259,12 +240,6 @@ int radeon_gart_init(struct radeon_device *rdev)
radeon_gart_fini(rdev);
return -ENOMEM;
}
-   rdev->gart.ttm_alloced = kzalloc(sizeof(bool) *
-rdev->gart.num_cpu_pages, GFP_KERNEL);
-   if (rdev->gart.ttm_alloced == NULL) {
-   radeon_gart_fini(rdev);
-   return -ENOMEM;
-   }
/* set GART entry to point to the dummy page by default */
for (i = 0; i < rdev->gart.num_cpu_pages; i++) {
rdev->gart.pages_addr[i] = rdev->dummy_page.addr;
@@ -281,10 +256,8 @@ void radeon_gart_fini(struct radeon_device *rdev)
rdev->gart.ready = false;
kfree(rdev->gart.pages);
kfree(rdev->gart.pag

ttm: merge ttm_backend & ttm_tt, introduce ttm dma allocator [FULL]

2011-11-07 Thread j . glisse
Ok so here is full patchset, including nouveau support, Ben if you
could review (if change to nouveau in patch 7 are correct then others
change to nouveau are more than likely 100% correct :)).

So been tested on R7XX,EVERGREEN,CAICOS,CAYMAN with SWIOTLB. Also
tested on NV50. I still need to test AGP configuration but i am quite
confident that there is no regression for PCIE/PCI.

Cheers,
Jerome

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 01/12] drm/ttm: remove userspace backed ttm object support

2011-11-07 Thread j . glisse
From: Jerome Glisse 

This was never use in none of the driver, properly using userspace
page for bo would need more code (vma interaction mostly). Removing
this dead code in preparation of ttm_tt & backend merge.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_bo.c|   22 
 drivers/gpu/drm/ttm/ttm_tt.c|  105 +--
 include/drm/ttm/ttm_bo_api.h|5 --
 include/drm/ttm/ttm_bo_driver.h |   24 -
 4 files changed, 1 insertions(+), 155 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 617b646..4bde335 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -342,22 +342,6 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, 
bool zero_alloc)
if (unlikely(bo->ttm == NULL))
ret = -ENOMEM;
break;
-   case ttm_bo_type_user:
-   bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT,
-   page_flags | TTM_PAGE_FLAG_USER,
-   glob->dummy_read_page);
-   if (unlikely(bo->ttm == NULL)) {
-   ret = -ENOMEM;
-   break;
-   }
-
-   ret = ttm_tt_set_user(bo->ttm, current,
- bo->buffer_start, bo->num_pages);
-   if (unlikely(ret != 0)) {
-   ttm_tt_destroy(bo->ttm);
-   bo->ttm = NULL;
-   }
-   break;
default:
printk(KERN_ERR TTM_PFX "Illegal buffer object type\n");
ret = -EINVAL;
@@ -907,16 +891,12 @@ static uint32_t ttm_bo_select_caching(struct 
ttm_mem_type_manager *man,
 }
 
 static bool ttm_bo_mt_compatible(struct ttm_mem_type_manager *man,
-bool disallow_fixed,
 uint32_t mem_type,
 uint32_t proposed_placement,
 uint32_t *masked_placement)
 {
uint32_t cur_flags = ttm_bo_type_flags(mem_type);
 
-   if ((man->flags & TTM_MEMTYPE_FLAG_FIXED) && disallow_fixed)
-   return false;
-
if ((cur_flags & proposed_placement & TTM_PL_MASK_MEM) == 0)
return false;
 
@@ -961,7 +941,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
man = &bdev->man[mem_type];
 
type_ok = ttm_bo_mt_compatible(man,
-   bo->type == ttm_bo_type_user,
mem_type,
placement->placement[i],
&cur_flags);
@@ -1009,7 +988,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
if (!man->has_type)
continue;
if (!ttm_bo_mt_compatible(man,
-   bo->type == ttm_bo_type_user,
mem_type,
placement->busy_placement[i],
&cur_flags))
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 58c271e..82a1161 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -62,43 +62,6 @@ static void ttm_tt_free_page_directory(struct ttm_tt *ttm)
ttm->dma_address = NULL;
 }
 
-static void ttm_tt_free_user_pages(struct ttm_tt *ttm)
-{
-   int write;
-   int dirty;
-   struct page *page;
-   int i;
-   struct ttm_backend *be = ttm->be;
-
-   BUG_ON(!(ttm->page_flags & TTM_PAGE_FLAG_USER));
-   write = ((ttm->page_flags & TTM_PAGE_FLAG_WRITE) != 0);
-   dirty = ((ttm->page_flags & TTM_PAGE_FLAG_USER_DIRTY) != 0);
-
-   if (be)
-   be->func->clear(be);
-
-   for (i = 0; i < ttm->num_pages; ++i) {
-   page = ttm->pages[i];
-   if (page == NULL)
-   continue;
-
-   if (page == ttm->dummy_read_page) {
-   BUG_ON(write);
-   continue;
-   }
-
-   if (write && dirty && !PageReserved(page))
-   set_page_dirty_lock(page);
-
-   ttm->pages[i] = NULL;
-   ttm_mem_global_free(ttm->glob->mem_glob, PAGE_SIZE);
-   put_page(page);
-   }
-   ttm->state = tt_unpopulated;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
-}
-
 static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index)
 {
struct page *p;
@@ -325,10 +288,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm)
}
 
if (likely(ttm->pages != NULL)) {
-   if (ttm->page_flags & TTM_PAGE_FLAG_USER)
-  

[PATCH 02/12] drm/ttm: remove split btw highmen and lowmem page

2011-11-07 Thread j . glisse
From: Jerome Glisse 

Split btw highmem and lowmem page was rendered useless by the
pool code. Remove it. Note further cleanup would change the
ttm page allocation helper to actualy take an array instead
of relying on list this could drasticly reduce the number of
function call in the common case of allocation whole buffer.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_tt.c|   11 ++-
 include/drm/ttm/ttm_bo_driver.h |7 ---
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 82a1161..8b7a6d0 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -69,7 +69,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int 
index)
struct ttm_mem_global *mem_glob = ttm->glob->mem_glob;
int ret;
 
-   while (NULL == (p = ttm->pages[index])) {
+   if (NULL == (p = ttm->pages[index])) {
 
INIT_LIST_HEAD(&h);
 
@@ -85,10 +85,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, 
int index)
if (unlikely(ret != 0))
goto out_err;
 
-   if (PageHighMem(p))
-   ttm->pages[--ttm->first_himem_page] = p;
-   else
-   ttm->pages[++ttm->last_lomem_page] = p;
+   ttm->pages[index] = p;
}
return p;
 out_err:
@@ -270,8 +267,6 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm)
ttm_put_pages(&h, count, ttm->page_flags, ttm->caching_state,
  ttm->dma_address);
ttm->state = tt_unpopulated;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
 }
 
 void ttm_tt_destroy(struct ttm_tt *ttm)
@@ -315,8 +310,6 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, 
unsigned long size,
 
ttm->glob = bdev->glob;
ttm->num_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
ttm->caching_state = tt_cached;
ttm->page_flags = page_flags;
 
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 37527d6..9da182b 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -136,11 +136,6 @@ enum ttm_caching_state {
  * @dummy_read_page: Page to map where the ttm_tt page array contains a NULL
  * pointer.
  * @pages: Array of pages backing the data.
- * @first_himem_page: Himem pages are put last in the page array, which
- * enables us to run caching attribute changes on only the first part
- * of the page array containing lomem pages. This is the index of the
- * first himem page.
- * @last_lomem_page: Index of the last lomem page in the page array.
  * @num_pages: Number of pages in the page array.
  * @bdev: Pointer to the current struct ttm_bo_device.
  * @be: Pointer to the ttm backend.
@@ -157,8 +152,6 @@ enum ttm_caching_state {
 struct ttm_tt {
struct page *dummy_read_page;
struct page **pages;
-   long first_himem_page;
-   long last_lomem_page;
uint32_t page_flags;
unsigned long num_pages;
struct ttm_bo_global *glob;
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 03/12] drm/ttm: remove unused backend flags field

2011-11-07 Thread j . glisse
From: Jerome Glisse 

This field is not use by any of the driver just drop it.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/radeon/radeon_ttm.c |1 -
 include/drm/ttm/ttm_bo_driver.h |2 --
 2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 0b5468b..97c76ae 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -787,7 +787,6 @@ struct ttm_backend *radeon_ttm_backend_create(struct 
radeon_device *rdev)
return NULL;
}
gtt->backend.bdev = &rdev->mman.bdev;
-   gtt->backend.flags = 0;
gtt->backend.func = &radeon_backend_func;
gtt->rdev = rdev;
gtt->pages = NULL;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 9da182b..6d17140 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -106,7 +106,6 @@ struct ttm_backend_func {
  * struct ttm_backend
  *
  * @bdev: Pointer to a struct ttm_bo_device.
- * @flags: For driver use.
  * @func: Pointer to a struct ttm_backend_func that describes
  * the backend methods.
  *
@@ -114,7 +113,6 @@ struct ttm_backend_func {
 
 struct ttm_backend {
struct ttm_bo_device *bdev;
-   uint32_t flags;
struct ttm_backend_func *func;
 };
 
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 04/12] drm/ttm: use ttm put pages function to properly restore cache attribute

2011-11-07 Thread j . glisse
From: Jerome Glisse 

On failure we need to make sure the page we free has wb cache
attribute. Do this pas call the proper ttm page helper function.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_tt.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 8b7a6d0..3fb4c6d 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,7 +89,10 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, 
int index)
}
return p;
 out_err:
-   put_page(p);
+   INIT_LIST_HEAD(&h);
+   list_add(&p->lru, &h);
+   ttm_put_pages(&h, 1, ttm->page_flags,
+ ttm->caching_state, &ttm->dma_address[index]);
return NULL;
 }
 
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 05/12] drm/ttm: convert page allocation to use page ptr array instead of list V3

2011-11-07 Thread j . glisse
From: Jerome Glisse 

Use the ttm_tt page ptr array for page allocation, move the list to
array unwinding into the page allocation functions.

V2 split the fix to use ttm put page as a separate fix
properly fill pages array when TTM_PAGE_FLAG_ZERO_ALLOC is not
set
V3 Added back page_count()==1 check when freeing page

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_memory.c |   44 +++--
 drivers/gpu/drm/ttm/ttm_page_alloc.c |   90 --
 drivers/gpu/drm/ttm/ttm_tt.c |   61 ---
 include/drm/ttm/ttm_memory.h |   11 ++--
 include/drm/ttm/ttm_page_alloc.h |   17 +++---
 5 files changed, 115 insertions(+), 108 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
index e70ddd8..3a3a58b 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -543,41 +543,53 @@ int ttm_mem_global_alloc(struct ttm_mem_global *glob, 
uint64_t memory,
 }
 EXPORT_SYMBOL(ttm_mem_global_alloc);
 
-int ttm_mem_global_alloc_page(struct ttm_mem_global *glob,
- struct page *page,
- bool no_wait, bool interruptible)
+int ttm_mem_global_alloc_pages(struct ttm_mem_global *glob,
+  struct page **pages,
+  unsigned npages,
+  bool no_wait, bool interruptible)
 {
 
struct ttm_mem_zone *zone = NULL;
+   unsigned i;
+   int r;
 
/**
 * Page allocations may be registed in a single zone
 * only if highmem or !dma32.
 */
-
+   for (i = 0; i < npages; i++) {
 #ifdef CONFIG_HIGHMEM
-   if (PageHighMem(page) && glob->zone_highmem != NULL)
-   zone = glob->zone_highmem;
+   if (PageHighMem(pages[i]) && glob->zone_highmem != NULL)
+   zone = glob->zone_highmem;
 #else
-   if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL)
-   zone = glob->zone_kernel;
+   if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL)
+   zone = glob->zone_kernel;
 #endif
-   return ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait,
-interruptible);
+   r = ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait,
+ interruptible);
+   if (r) {
+   return r;
+   }
+   }
+   return 0;
 }
 
-void ttm_mem_global_free_page(struct ttm_mem_global *glob, struct page *page)
+void ttm_mem_global_free_pages(struct ttm_mem_global *glob,
+  struct page **pages, unsigned npages)
 {
struct ttm_mem_zone *zone = NULL;
+   unsigned i;
 
+   for (i = 0; i < npages; i++) {
 #ifdef CONFIG_HIGHMEM
-   if (PageHighMem(page) && glob->zone_highmem != NULL)
-   zone = glob->zone_highmem;
+   if (PageHighMem(pages[i]) && glob->zone_highmem != NULL)
+   zone = glob->zone_highmem;
 #else
-   if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL)
-   zone = glob->zone_kernel;
+   if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL)
+   zone = glob->zone_kernel;
 #endif
-   ttm_mem_global_free_zone(glob, zone, PAGE_SIZE);
+   ttm_mem_global_free_zone(glob, zone, PAGE_SIZE);
+   }
 }
 
 
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index 727e93d..c4f18b9 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -619,8 +619,10 @@ static void ttm_page_pool_fill_locked(struct ttm_page_pool 
*pool,
  * @return count of pages still required to fulfill the request.
  */
 static unsigned ttm_page_pool_get_pages(struct ttm_page_pool *pool,
-   struct list_head *pages, int ttm_flags,
-   enum ttm_caching_state cstate, unsigned count)
+   struct list_head *pages,
+   int ttm_flags,
+   enum ttm_caching_state cstate,
+   unsigned count)
 {
unsigned long irq_flags;
struct list_head *p;
@@ -664,13 +666,14 @@ out:
  * On success pages list will hold count number of correctly
  * cached pages.
  */
-int ttm_get_pages(struct list_head *pages, int flags,
- enum ttm_caching_state cstate, unsigned count,
- dma_addr_t *dma_address)
+int ttm_get_pages(struct page **pages, unsigned npages, int flags,
+ enum ttm_caching_state cstate, dma_addr_t *dma_address)
 {
struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
struct page *p = NULL;
+   struct list_head plist;
gfp_t g

[PATCH 06/12] drm/ttm: test for dma_address array allocation failure

2011-11-07 Thread j . glisse
From: Jerome Glisse 

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/ttm_tt.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 2dd45ca..58ea7dc 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -298,7 +298,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, 
unsigned long size,
ttm->dummy_read_page = dummy_read_page;
 
ttm_tt_alloc_page_directory(ttm);
-   if (!ttm->pages) {
+   if (!ttm->pages || !ttm->dma_address) {
ttm_tt_destroy(ttm);
printk(KERN_ERR TTM_PFX "Failed allocating page table\n");
return NULL;
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 07/12] drm/ttm: merge ttm_backend and ttm_tt

2011-11-07 Thread j . glisse
From: Jerome Glisse 

ttm_backend will exist only and only with a ttm_tt, and ttm_tt
will be of interesting use only when bind to a backend. Thus to
avoid code & data duplication btw the two merge them.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c|   14 ++-
 drivers/gpu/drm/nouveau/nouveau_drv.h   |5 +-
 drivers/gpu/drm/nouveau/nouveau_sgdma.c |  188 --
 drivers/gpu/drm/radeon/radeon_ttm.c |  222 ---
 drivers/gpu/drm/ttm/ttm_agp_backend.c   |   88 +
 drivers/gpu/drm/ttm/ttm_bo.c|9 +-
 drivers/gpu/drm/ttm/ttm_tt.c|   59 ++---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c  |   66 +++--
 include/drm/ttm/ttm_bo_driver.h |  104 ++-
 9 files changed, 295 insertions(+), 460 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 7226f41..b060fa4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -343,8 +343,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, 
u32 val)
*mem = val;
 }
 
-static struct ttm_backend *
-nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev)
+static struct ttm_tt *
+nouveau_ttm_tt_create(struct ttm_bo_device *bdev,
+ unsigned long size, uint32_t page_flags,
+ struct page *dummy_read_page)
 {
struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev);
struct drm_device *dev = dev_priv->dev;
@@ -352,11 +354,13 @@ nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device 
*bdev)
switch (dev_priv->gart_info.type) {
 #if __OS_HAS_AGP
case NOUVEAU_GART_AGP:
-   return ttm_agp_backend_init(bdev, dev->agp->bridge);
+   return ttm_agp_tt_create(bdev, dev->agp->bridge,
+size, page_flags, dummy_read_page);
 #endif
case NOUVEAU_GART_PDMA:
case NOUVEAU_GART_HW:
-   return nouveau_sgdma_init_ttm(dev);
+   return nouveau_sgdma_create_ttm(bdev, size, page_flags,
+   dummy_read_page);
default:
NV_ERROR(dev, "Unknown GART type %d\n",
 dev_priv->gart_info.type);
@@ -1045,7 +1049,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
 }
 
 struct ttm_bo_driver nouveau_bo_driver = {
-   .create_ttm_backend_entry = nouveau_bo_create_ttm_backend_entry,
+   .ttm_tt_create = &nouveau_ttm_tt_create,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h 
b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 29837da..0c53e39 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -1000,7 +1000,10 @@ extern int nouveau_sgdma_init(struct drm_device *);
 extern void nouveau_sgdma_takedown(struct drm_device *);
 extern uint32_t nouveau_sgdma_get_physical(struct drm_device *,
   uint32_t offset);
-extern struct ttm_backend *nouveau_sgdma_init_ttm(struct drm_device *);
+extern struct ttm_tt *nouveau_sgdma_create_ttm(struct ttm_bo_device *bdev,
+  unsigned long size,
+  uint32_t page_flags,
+  struct page *dummy_read_page);
 
 /* nouveau_debugfs.c */
 #if defined(CONFIG_DRM_NOUVEAU_DEBUG)
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c 
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index b75258a..bc2ab90 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -8,44 +8,23 @@
 #define NV_CTXDMA_PAGE_MASK  (NV_CTXDMA_PAGE_SIZE - 1)
 
 struct nouveau_sgdma_be {
-   struct ttm_backend backend;
+   struct ttm_tt ttm;
struct drm_device *dev;
-
-   dma_addr_t *pages;
-   unsigned nr_pages;
-   bool unmap_pages;
-
u64 offset;
-   bool bound;
 };
 
 static int
-nouveau_sgdma_populate(struct ttm_backend *be, unsigned long num_pages,
-  struct page **pages, struct page *dummy_read_page,
-  dma_addr_t *dma_addrs)
+nouveau_sgdma_dma_map(struct ttm_tt *ttm)
 {
-   struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)be;
+   struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
struct drm_device *dev = nvbe->dev;
int i;
 
-   NV_DEBUG(nvbe->dev, "num_pages = %ld\n", num_pages);
-
-   nvbe->pages = dma_addrs;
-   nvbe->nr_pages = num_pages;
-   nvbe->unmap_pages = true;
-
-   /* this code path isn't called and is incorrect anyways */
-   if (0) { /* dma_addrs[0] != DMA_ERROR_CODE) { */
- 

[PATCH 08/12] drm/ttm: introduce callback for ttm_tt populate & unpopulate

2011-11-07 Thread j . glisse
From: Jerome Glisse 

Move the page allocation and freeing to driver callback and
provide ttm code helper function for those.

Most intrusive change, is the fact that we now only fully
populate an object this simplify some of code designed around
the page fault design.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c   |3 +
 drivers/gpu/drm/radeon/radeon_ttm.c|2 +
 drivers/gpu/drm/ttm/ttm_bo_util.c  |   31 ++-
 drivers/gpu/drm/ttm/ttm_bo_vm.c|   13 ++--
 drivers/gpu/drm/ttm/ttm_page_alloc.c   |   42 ++
 drivers/gpu/drm/ttm/ttm_tt.c   |   97 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 +
 include/drm/ttm/ttm_bo_driver.h|   41 --
 include/drm/ttm/ttm_page_alloc.h   |   18 ++
 9 files changed, 125 insertions(+), 125 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b060fa4..7e5ca3f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -28,6 +28,7 @@
  */
 
 #include "drmP.h"
+#include "ttm/ttm_page_alloc.h"
 
 #include "nouveau_drm.h"
 #include "nouveau_drv.h"
@@ -1050,6 +1051,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
 
 struct ttm_bo_driver nouveau_bo_driver = {
.ttm_tt_create = &nouveau_ttm_tt_create,
+   .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate,
+   .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 53ff62b..490afce 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -584,6 +584,8 @@ struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device 
*bdev,
 
 static struct ttm_bo_driver radeon_bo_driver = {
.ttm_tt_create = &radeon_ttm_tt_create,
+   .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate,
+   .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate,
.invalidate_caches = &radeon_invalidate_caches,
.init_mem_type = &radeon_init_mem_type,
.evict_flags = &radeon_evict_flags,
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 082fcae..60f204d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -244,7 +244,7 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void 
*src,
unsigned long page,
pgprot_t prot)
 {
-   struct page *d = ttm_tt_get_page(ttm, page);
+   struct page *d = ttm->pages[page];
void *dst;
 
if (!d)
@@ -281,7 +281,7 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void 
*dst,
unsigned long page,
pgprot_t prot)
 {
-   struct page *s = ttm_tt_get_page(ttm, page);
+   struct page *s = ttm->pages[page];
void *src;
 
if (!s)
@@ -342,6 +342,12 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
if (old_iomap == NULL && ttm == NULL)
goto out2;
 
+   if (ttm->state == tt_unpopulated) {
+   ret = ttm->bdev->driver->ttm_tt_populate(ttm);
+   if (ret)
+   goto out1;
+   }
+
add = 0;
dir = 1;
 
@@ -502,10 +508,16 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 {
struct ttm_mem_reg *mem = &bo->mem; pgprot_t prot;
struct ttm_tt *ttm = bo->ttm;
-   struct page *d;
-   int i;
+   int ret;
 
BUG_ON(!ttm);
+
+   if (ttm->state == tt_unpopulated) {
+   ret = ttm->bdev->driver->ttm_tt_populate(ttm);
+   if (ret)
+   return ret;
+   }
+
if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED)) {
/*
 * We're mapping a single page, and the desired
@@ -513,18 +525,9 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 */
 
map->bo_kmap_type = ttm_bo_map_kmap;
-   map->page = ttm_tt_get_page(ttm, start_page);
+   map->page = ttm->pages[start_page];
map->virtual = kmap(map->page);
} else {
-   /*
-* Populate the part we're mapping;
-*/
-   for (i = start_page; i < start_page + num_pages; ++i) {
-   d = ttm_tt_get_page(ttm, i);
-   if (!d)
-   return -ENOMEM;
-   }
-
/*
 * We need to use vmap to get the desired page protection
 * or to make the buffer object look contiguous.
diff --git a/drivers/gpu/drm/ttm/ttm_

[PATCH 09/12] ttm: Provide DMA aware TTM page pool code.

2011-11-07 Thread j . glisse
From: Konrad Rzeszutek Wilk 

In TTM world the pages for the graphic drivers are kept in three different
pools: write combined, uncached, and cached (write-back). When the pages
are used by the graphic driver the graphic adapter via its built in MMU
(or AGP) programs these pages in. The programming requires the virtual address
(from the graphic adapter perspective) and the physical address (either System 
RAM
or the memory on the card) which is obtained using the pci_map_* calls (which 
does the
virtual to physical - or bus address translation). During the graphic 
application's
"life" those pages can be shuffled around, swapped out to disk, moved from the
VRAM to System RAM or vice-versa. This all works with the existing TTM pool code
- except when we want to use the software IOTLB (SWIOTLB) code to "map" the 
physical
addresses to the graphic adapter MMU. We end up programming the bounce buffer's
physical address instead of the TTM pool memory's and get a non-worky driver.
There are two solutions:
1) using the DMA API to allocate pages that are screened by the DMA API, or
2) using the pci_sync_* calls to copy the pages from the bounce-buffer and back.

This patch fixes the issue by allocating pages using the DMA API. The second
is a viable option - but it has performance drawbacks and potential correctness
issues - think of the write cache page being bounced (SWIOTLB->TTM), the
WC is set on the TTM page and the copy from SWIOTLB not making it to the TTM
page until the page has been recycled in the pool (and used by another 
application).

The bounce buffer does not get activated often - only in cases where we have
a 32-bit capable card and we want to use a page that is allocated above the
4GB limit. The bounce buffer offers the solution of copying the contents
of that 4GB page to an location below 4GB and then back when the operation has 
been
completed (or vice-versa). This is done by using the 'pci_sync_*' calls.
Note: If you look carefully enough in the existing TTM page pool code you will
notice the GFP_DMA32 flag is used  - which should guarantee that the provided 
page
is under 4GB. It certainly is the case, except this gets ignored in two cases:
 - If user specifies 'swiotlb=force' which bounces _every_ page.
 - If user is using a Xen's PV Linux guest (which uses the SWIOTLB and the
   underlaying PFN's aren't necessarily under 4GB).

To not have this extra copying done the other option is to allocate the pages
using the DMA API so that there is not need to map the page and perform the
expensive 'pci_sync_*' calls.

This DMA API capable TTM pool requires for this the 'struct device' to
properly call the DMA API. It also has to track the virtual and bus address of
the page being handed out in case it ends up being swapped out or de-allocated -
to make sure it is de-allocated using the proper's 'struct device'.

Implementation wise the code keeps two lists: one that is attached to the
'struct device' (via the dev->dma_pools list) and a global one to be used when
the 'struct device' is unavailable (think shrinker code). The global list can
iterate over all of the 'struct device' and its associated dma_pool. The list
in dev->dma_pools can only iterate the device's dma_pool.
/[struct 
device_pool]\
/---| dev   
 |
   /+---| dma_pool  
 |
 /-+--\/
\/
 |struct device| /-->[struct dma_pool for WC][struct dma_pool for uncached]<-/--| dma_pool  
 |
 \-+--/ /   
\/
\--/
[Two pools associated with the device (WC and UC), and the parallel list
containing the 'struct dev' and 'struct dma_pool' entries]

The maximum amount of dma pools a device can have is six: write-combined,
uncached, and cached; then there are the DMA32 variants which are:
write-combined dma32, uncached dma32, and cached dma32.

Currently this code only gets activated when any variant of the SWIOTLB IOMMU
code is running (Intel without VT-d, AMD without GART, IBM Calgary and Xen PV
with PCI devices).

Tested-by: Michel Dänzer 
[v1: Using swiotlb_nr_tbl instead of swiotlb_enabled]
[v2: Major overhaul - added 'inuse_list' to seperate used from inuse and reorder
the order of lists to get better performance.]
[v3: Added comments/and some logic based on review, Added Jerome tag]
[v4: rebase on top of ttm_tt & ttm_backend merge]
Reviewed-by: Jerome Glisse 
Signed-off-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/Makefile |4 +
 drivers/gpu/drm/ttm/ttm_memory.c |2 +
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 1212 ++
 include/drm/ttm/ttm_bo_driver.h  |2 +
 include/drm/ttm/ttm_page_alloc.h 

[PATCH 10/12] swiotlb: Expose swiotlb_nr_tlb function to modules

2011-11-07 Thread j . glisse
From: Konrad Rzeszutek Wilk 

As a mechanism to detect whether SWIOTLB is enabled or not.
We also fix the spelling - it was swioltb instead of
swiotlb.

CC: FUJITA Tomonori 
[v1: Ripped out swiotlb_enabled]
Signed-off-by: Konrad Rzeszutek Wilk 
---
 drivers/xen/swiotlb-xen.c |2 +-
 include/linux/swiotlb.h   |2 +-
 lib/swiotlb.c |5 +++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index c984768..c50fb0b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -152,7 +152,7 @@ void __init xen_swiotlb_init(int verbose)
char *m = NULL;
unsigned int repeat = 3;
 
-   nr_tbl = swioltb_nr_tbl();
+   nr_tbl = swiotlb_nr_tbl();
if (nr_tbl)
xen_io_tlb_nslabs = nr_tbl;
else {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 445702c..e872526 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -24,7 +24,7 @@ extern int swiotlb_force;
 
 extern void swiotlb_init(int verbose);
 extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
verbose);
-extern unsigned long swioltb_nr_tbl(void);
+extern unsigned long swiotlb_nr_tbl(void);
 
 /*
  * Enumeration for sync targets
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 99093b3..058935e 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -110,11 +110,11 @@ setup_io_tlb_npages(char *str)
 __setup("swiotlb=", setup_io_tlb_npages);
 /* make io_tlb_overflow tunable too? */
 
-unsigned long swioltb_nr_tbl(void)
+unsigned long swiotlb_nr_tbl(void)
 {
return io_tlb_nslabs;
 }
-
+EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
 /* Note that this doesn't work with highmem page */
 static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
  volatile void *address)
@@ -321,6 +321,7 @@ void __init swiotlb_free(void)
free_bootmem_late(__pa(io_tlb_start),
  PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
}
+   io_tlb_nslabs = 0;
 }
 
 static int is_swiotlb_buffer(phys_addr_t paddr)
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 11/12] drm/radeon/kms: Enable the TTM DMA pool if swiotlb is on

2011-11-07 Thread j . glisse
From: Konrad Rzeszutek Wilk 

With the exception that we do not handle the AGP case. We only
deal with PCIe cards such as ATI ES1000 or HD3200 that have been
detected to only do DMA up to 32-bits.

CC: Dave Airlie 
CC: Alex Deucher 
Signed-off-by: Konrad Rzeszutek Wilk 
Reviewed-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h|1 -
 drivers/gpu/drm/radeon/radeon_device.c |5 ++
 drivers/gpu/drm/radeon/radeon_gart.c   |   29 +---
 drivers/gpu/drm/radeon/radeon_ttm.c|   83 +--
 4 files changed, 83 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index e3170c7..63257ba 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -332,7 +332,6 @@ struct radeon_gart {
union radeon_gart_table table;
struct page **pages;
dma_addr_t  *pages_addr;
-   bool*ttm_alloced;
boolready;
 };
 
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index c33bc91..11f6481 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -767,6 +767,11 @@ int radeon_device_init(struct radeon_device *rdev,
rdev->need_dma32 = true;
printk(KERN_WARNING "radeon: No suitable DMA available.\n");
}
+   r = pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(dma_bits));
+   if (r) {
+   pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(32));
+   printk(KERN_WARNING "radeon: No coherent DMA available.\n");
+   }
 
/* Registers mapping */
/* TODO: block userspace mapping of io register */
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index fdc3a9a..18f496c 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -149,9 +149,6 @@ void radeon_gart_unbind(struct radeon_device *rdev, 
unsigned offset,
p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
for (i = 0; i < pages; i++, p++) {
if (rdev->gart.pages[p]) {
-   if (!rdev->gart.ttm_alloced[p])
-   pci_unmap_page(rdev->pdev, 
rdev->gart.pages_addr[p],
-   PAGE_SIZE, 
PCI_DMA_BIDIRECTIONAL);
rdev->gart.pages[p] = NULL;
rdev->gart.pages_addr[p] = rdev->dummy_page.addr;
page_base = rdev->gart.pages_addr[p];
@@ -181,23 +178,7 @@ int radeon_gart_bind(struct radeon_device *rdev, unsigned 
offset,
p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
 
for (i = 0; i < pages; i++, p++) {
-   /* we reverted the patch using dma_addr in TTM for now but this
-* code stops building on alpha so just comment it out for now 
*/
-   if (0) { /*dma_addr[i] != DMA_ERROR_CODE) */
-   rdev->gart.ttm_alloced[p] = true;
-   rdev->gart.pages_addr[p] = dma_addr[i];
-   } else {
-   /* we need to support large memory configurations */
-   /* assume that unbind have already been call on the 
range */
-   rdev->gart.pages_addr[p] = pci_map_page(rdev->pdev, 
pagelist[i],
-   0, PAGE_SIZE,
-   PCI_DMA_BIDIRECTIONAL);
-   if (pci_dma_mapping_error(rdev->pdev, 
rdev->gart.pages_addr[p])) {
-   /* FIXME: failed to map page (return -ENOMEM?) 
*/
-   radeon_gart_unbind(rdev, offset, pages);
-   return -ENOMEM;
-   }
-   }
+   rdev->gart.pages_addr[p] = dma_addr[i];
rdev->gart.pages[p] = pagelist[i];
page_base = rdev->gart.pages_addr[p];
for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) {
@@ -259,12 +240,6 @@ int radeon_gart_init(struct radeon_device *rdev)
radeon_gart_fini(rdev);
return -ENOMEM;
}
-   rdev->gart.ttm_alloced = kzalloc(sizeof(bool) *
-rdev->gart.num_cpu_pages, GFP_KERNEL);
-   if (rdev->gart.ttm_alloced == NULL) {
-   radeon_gart_fini(rdev);
-   return -ENOMEM;
-   }
/* set GART entry to point to the dummy page by default */
for (i = 0; i < rdev->gart.num_cpu_pages; i++) {
rdev->gart.pages_addr[i] = rdev->dummy_page.addr;
@@ -281,10 +256,8 @@ void radeon_gart_fini(struct radeon_device *rdev)
rdev->gart.ready = false;
kfree(rdev->gart.pages);
kfree(rdev->gart.pag

[PATCH 12/12] nouveau/ttm/dma: Enable the TTM DMA pool if device can only do 32-bit DMA.

2011-11-07 Thread j . glisse
From: Konrad Rzeszutek Wilk 

If the card is capable of more than 32-bit, then use the default
TTM page pool code which allocates from anywhere in the memory.

Note: If the 'ttm.no_dma' parameter is set, the override is ignored
and the default TTM pool is used.

CC: Ben Skeggs 
CC: Francisco Jerez 
CC: Dave Airlie 
Signed-off-by: Konrad Rzeszutek Wilk 
Reviewed-by: Jerome Glisse 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  |   73 -
 drivers/gpu/drm/nouveau/nouveau_debugfs.c |1 +
 drivers/gpu/drm/nouveau/nouveau_sgdma.c   |   60 +---
 3 files changed, 73 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 7e5ca3f..36234a7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1049,10 +1049,79 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
nouveau_fence_unref(&old_fence);
 }
 
+static int
+nouveau_ttm_tt_populate(struct ttm_tt *ttm)
+{
+   struct drm_nouveau_private *dev_priv;
+   struct drm_device *dev;
+   unsigned i;
+   int r;
+
+   if (ttm->state != tt_unpopulated)
+   return 0;
+
+   dev_priv = nouveau_bdev(ttm->bdev);
+   dev = dev_priv->dev;
+
+#ifdef CONFIG_SWIOTLB
+   if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) {
+   return ttm_dma_populate(ttm, dev->dev);
+   }
+#endif
+
+   r = ttm_page_alloc_ttm_tt_populate(ttm);
+   if (r) {
+   return r;
+   }
+
+   for (i = 0; i < ttm->num_pages; i++) {
+   ttm->dma_address[i] = pci_map_page(dev->pdev, ttm->pages[i],
+  0, PAGE_SIZE,
+  PCI_DMA_BIDIRECTIONAL);
+   if (pci_dma_mapping_error(dev->pdev, ttm->dma_address[i])) {
+   while (--i) {
+   pci_unmap_page(dev->pdev, ttm->dma_address[i],
+  PAGE_SIZE, 
PCI_DMA_BIDIRECTIONAL);
+   ttm->dma_address[i] = 0;
+   }
+   ttm_page_alloc_ttm_tt_unpopulate(ttm);
+   return -EFAULT;
+   }
+   }
+   return 0;
+}
+
+static void
+nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
+{
+   struct drm_nouveau_private *dev_priv;
+   struct drm_device *dev;
+   unsigned i;
+
+   dev_priv = nouveau_bdev(ttm->bdev);
+   dev = dev_priv->dev;
+
+#ifdef CONFIG_SWIOTLB
+   if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) {
+   ttm_dma_unpopulate(ttm, dev->dev);
+   return;
+   }
+#endif
+
+   for (i = 0; i < ttm->num_pages; i++) {
+   if (ttm->dma_address[i]) {
+   pci_unmap_page(dev->pdev, ttm->dma_address[i],
+  PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+   }
+   }
+
+   ttm_page_alloc_ttm_tt_unpopulate(ttm);
+}
+
 struct ttm_bo_driver nouveau_bo_driver = {
.ttm_tt_create = &nouveau_ttm_tt_create,
-   .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate,
-   .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate,
+   .ttm_tt_populate = &nouveau_ttm_tt_populate,
+   .ttm_tt_unpopulate = &nouveau_ttm_tt_unpopulate,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c 
b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
index 8e15923..f52c2db 100644
--- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c
+++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
@@ -178,6 +178,7 @@ static struct drm_info_list nouveau_debugfs_list[] = {
{ "memory", nouveau_debugfs_memory_info, 0, NULL },
{ "vbios.rom", nouveau_debugfs_vbios_image, 0, NULL },
{ "ttm_page_pool", ttm_page_alloc_debugfs, 0, NULL },
+   { "ttm_dma_page_pool", ttm_dma_page_alloc_debugfs, 0, NULL },
 };
 #define NOUVEAU_DEBUGFS_ENTRIES ARRAY_SIZE(nouveau_debugfs_list)
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c 
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index bc2ab90..ee1eb7c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -13,41 +13,6 @@ struct nouveau_sgdma_be {
u64 offset;
 };
 
-static int
-nouveau_sgdma_dma_map(struct ttm_tt *ttm)
-{
-   struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
-   struct drm_device *dev = nvbe->dev;
-   int i;
-
-   for (i = 0; i < ttm->num_pages; i++) {
-   ttm->dma_address[i] = pci_map_page(dev->pdev, ttm->pages[i],
-  0, PAGE_SIZE,
-  PCI_DMA_BIDIRECTIONAL);
- 

ttm: merge ttm_backend & ttm_tt, introduce ttm dma allocator

2011-11-09 Thread j . glisse
So i did an overhaul of ttm_memory, i believe the simplification i did
make sense. See patch 5 for a longer explanation.

Thomas with the ttm_memory change the allocation of pages won't happen
if the accounting report that we are going over the limit and bo shrinker
failed to free any memory to make room.

The handling of dma32 zone is done as post pass of ttm memory accounting.

Regarding the pagefault comment i removed, it doesn't make sense anymore
because now we populate the whole page table in one shot. So there is
no more prefaulting few pages but a full prefaulting. Thought i can
add a comment stating that if you like.

For the ttm_tt_dma struct to hold page allocator specific informations
i think it can be done as an followup patch but if you prefer to have
that in this patchset let me know i will respin with such changes.

I am in the process of retesting this whole serie and especialy the
while memory accounting.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 01/13] drm/ttm: remove userspace backed ttm object support

2011-11-09 Thread j . glisse
From: Jerome Glisse 

This was never use in none of the driver, properly using userspace
page for bo would need more code (vma interaction mostly). Removing
this dead code in preparation of ttm_tt & backend merge.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/ttm/ttm_bo.c|   22 
 drivers/gpu/drm/ttm/ttm_tt.c|  105 +--
 include/drm/ttm/ttm_bo_api.h|5 --
 include/drm/ttm/ttm_bo_driver.h |   24 -
 4 files changed, 1 insertions(+), 155 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 617b646..4bde335 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -342,22 +342,6 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, 
bool zero_alloc)
if (unlikely(bo->ttm == NULL))
ret = -ENOMEM;
break;
-   case ttm_bo_type_user:
-   bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT,
-   page_flags | TTM_PAGE_FLAG_USER,
-   glob->dummy_read_page);
-   if (unlikely(bo->ttm == NULL)) {
-   ret = -ENOMEM;
-   break;
-   }
-
-   ret = ttm_tt_set_user(bo->ttm, current,
- bo->buffer_start, bo->num_pages);
-   if (unlikely(ret != 0)) {
-   ttm_tt_destroy(bo->ttm);
-   bo->ttm = NULL;
-   }
-   break;
default:
printk(KERN_ERR TTM_PFX "Illegal buffer object type\n");
ret = -EINVAL;
@@ -907,16 +891,12 @@ static uint32_t ttm_bo_select_caching(struct 
ttm_mem_type_manager *man,
 }
 
 static bool ttm_bo_mt_compatible(struct ttm_mem_type_manager *man,
-bool disallow_fixed,
 uint32_t mem_type,
 uint32_t proposed_placement,
 uint32_t *masked_placement)
 {
uint32_t cur_flags = ttm_bo_type_flags(mem_type);
 
-   if ((man->flags & TTM_MEMTYPE_FLAG_FIXED) && disallow_fixed)
-   return false;
-
if ((cur_flags & proposed_placement & TTM_PL_MASK_MEM) == 0)
return false;
 
@@ -961,7 +941,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
man = &bdev->man[mem_type];
 
type_ok = ttm_bo_mt_compatible(man,
-   bo->type == ttm_bo_type_user,
mem_type,
placement->placement[i],
&cur_flags);
@@ -1009,7 +988,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
if (!man->has_type)
continue;
if (!ttm_bo_mt_compatible(man,
-   bo->type == ttm_bo_type_user,
mem_type,
placement->busy_placement[i],
&cur_flags))
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 58c271e..82a1161 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -62,43 +62,6 @@ static void ttm_tt_free_page_directory(struct ttm_tt *ttm)
ttm->dma_address = NULL;
 }
 
-static void ttm_tt_free_user_pages(struct ttm_tt *ttm)
-{
-   int write;
-   int dirty;
-   struct page *page;
-   int i;
-   struct ttm_backend *be = ttm->be;
-
-   BUG_ON(!(ttm->page_flags & TTM_PAGE_FLAG_USER));
-   write = ((ttm->page_flags & TTM_PAGE_FLAG_WRITE) != 0);
-   dirty = ((ttm->page_flags & TTM_PAGE_FLAG_USER_DIRTY) != 0);
-
-   if (be)
-   be->func->clear(be);
-
-   for (i = 0; i < ttm->num_pages; ++i) {
-   page = ttm->pages[i];
-   if (page == NULL)
-   continue;
-
-   if (page == ttm->dummy_read_page) {
-   BUG_ON(write);
-   continue;
-   }
-
-   if (write && dirty && !PageReserved(page))
-   set_page_dirty_lock(page);
-
-   ttm->pages[i] = NULL;
-   ttm_mem_global_free(ttm->glob->mem_glob, PAGE_SIZE);
-   put_page(page);
-   }
-   ttm->state = tt_unpopulated;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
-}
-
 static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index)
 {
struct page *p;
@@ -325,10 +288,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm)
}
 
if (likely(ttm->pages != NULL)) {
-   if (ttm->page_flags & TTM_PAGE

[PATCH 02/13] drm/ttm: remove split btw highmen and lowmem page

2011-11-09 Thread j . glisse
From: Jerome Glisse 

Split btw highmem and lowmem page was rendered useless by the
pool code. Remove it. Note further cleanup would change the
ttm page allocation helper to actualy take an array instead
of relying on list this could drasticly reduce the number of
function call in the common case of allocation whole buffer.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/ttm/ttm_tt.c|   11 ++-
 include/drm/ttm/ttm_bo_driver.h |7 ---
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 82a1161..8b7a6d0 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -69,7 +69,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int 
index)
struct ttm_mem_global *mem_glob = ttm->glob->mem_glob;
int ret;
 
-   while (NULL == (p = ttm->pages[index])) {
+   if (NULL == (p = ttm->pages[index])) {
 
INIT_LIST_HEAD(&h);
 
@@ -85,10 +85,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, 
int index)
if (unlikely(ret != 0))
goto out_err;
 
-   if (PageHighMem(p))
-   ttm->pages[--ttm->first_himem_page] = p;
-   else
-   ttm->pages[++ttm->last_lomem_page] = p;
+   ttm->pages[index] = p;
}
return p;
 out_err:
@@ -270,8 +267,6 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm)
ttm_put_pages(&h, count, ttm->page_flags, ttm->caching_state,
  ttm->dma_address);
ttm->state = tt_unpopulated;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
 }
 
 void ttm_tt_destroy(struct ttm_tt *ttm)
@@ -315,8 +310,6 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, 
unsigned long size,
 
ttm->glob = bdev->glob;
ttm->num_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
ttm->caching_state = tt_cached;
ttm->page_flags = page_flags;
 
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 37527d6..9da182b 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -136,11 +136,6 @@ enum ttm_caching_state {
  * @dummy_read_page: Page to map where the ttm_tt page array contains a NULL
  * pointer.
  * @pages: Array of pages backing the data.
- * @first_himem_page: Himem pages are put last in the page array, which
- * enables us to run caching attribute changes on only the first part
- * of the page array containing lomem pages. This is the index of the
- * first himem page.
- * @last_lomem_page: Index of the last lomem page in the page array.
  * @num_pages: Number of pages in the page array.
  * @bdev: Pointer to the current struct ttm_bo_device.
  * @be: Pointer to the ttm backend.
@@ -157,8 +152,6 @@ enum ttm_caching_state {
 struct ttm_tt {
struct page *dummy_read_page;
struct page **pages;
-   long first_himem_page;
-   long last_lomem_page;
uint32_t page_flags;
unsigned long num_pages;
struct ttm_bo_global *glob;
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 03/13] drm/ttm: remove unused backend flags field

2011-11-09 Thread j . glisse
From: Jerome Glisse 

This field is not use by any of the driver just drop it.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/radeon/radeon_ttm.c |1 -
 include/drm/ttm/ttm_bo_driver.h |2 --
 2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 0b5468b..97c76ae 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -787,7 +787,6 @@ struct ttm_backend *radeon_ttm_backend_create(struct 
radeon_device *rdev)
return NULL;
}
gtt->backend.bdev = &rdev->mman.bdev;
-   gtt->backend.flags = 0;
gtt->backend.func = &radeon_backend_func;
gtt->rdev = rdev;
gtt->pages = NULL;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 9da182b..6d17140 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -106,7 +106,6 @@ struct ttm_backend_func {
  * struct ttm_backend
  *
  * @bdev: Pointer to a struct ttm_bo_device.
- * @flags: For driver use.
  * @func: Pointer to a struct ttm_backend_func that describes
  * the backend methods.
  *
@@ -114,7 +113,6 @@ struct ttm_backend_func {
 
 struct ttm_backend {
struct ttm_bo_device *bdev;
-   uint32_t flags;
struct ttm_backend_func *func;
 };
 
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 04/13] drm/ttm: use ttm put pages function to properly restore cache attribute

2011-11-09 Thread j . glisse
From: Jerome Glisse 

On failure we need to make sure the page we free has wb cache
attribute. Do this pas call the proper ttm page helper function.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/ttm/ttm_tt.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 8b7a6d0..3fb4c6d 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,7 +89,10 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, 
int index)
}
return p;
 out_err:
-   put_page(p);
+   INIT_LIST_HEAD(&h);
+   list_add(&p->lru, &h);
+   ttm_put_pages(&h, 1, ttm->page_flags,
+ ttm->caching_state, &ttm->dma_address[index]);
return NULL;
 }
 
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 05/13] drm/ttm: overhaul memory accounting

2011-11-09 Thread j . glisse
From: Jerome Glisse 

This is an overhaul of the ttm memory accounting. This tries to keep
the same global behavior while removing the whole zone concept. It
keeps a distrinction for dma32 so that we make sure that ttm don't
starve the dma32 zone.

There is 3 threshold for memory allocation :
- max_mem is the maximum memory the whole ttm infrastructure is
  going to allow allocation for (exception of system process see
  below)
- emer_mem is the maximum memory allowed for system process, this
  limit is > to max_mem
- swap_limit is the threshold at which point ttm will start to
  try to swap object because ttm is getting close the max_mem
  limit
- swap_dma32_limit is the threshold at which point ttm will start
  swap object to try to reduce the pressure on the dma32 zone. Note
  that we don't specificly target object to swap to it might very
  well free more memory from highmem rather than from dma32

Accounting is done through used_mem & used_dma32_mem, which sum give
the total amount of memory actually accounted by ttm.

Idea is that allocation will fail if (used_mem + used_dma32_mem) >
max_mem and if swapping fail to make enough room.

The used_dma32_mem can be updated as a later stage, allowing to
perform accounting test before allocating a whole batch of pages.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/ttm/ttm_bo.c |2 +-
 drivers/gpu/drm/ttm/ttm_memory.c |  517 +-
 drivers/gpu/drm/ttm/ttm_object.c |3 +-
 drivers/gpu/drm/ttm/ttm_tt.c |2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c|8 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |8 +-
 include/drm/ttm/ttm_memory.h |   23 +-
 7 files changed, 168 insertions(+), 395 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 4bde335..92712798 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1252,7 +1252,7 @@ int ttm_bo_create(struct ttm_bo_device *bdev,
 
size_t acc_size =
ttm_bo_size(bdev->glob, (size + PAGE_SIZE - 1) >> PAGE_SHIFT);
-   ret = ttm_mem_global_alloc(mem_glob, acc_size, false, false);
+   ret = ttm_mem_global_alloc(mem_glob, acc_size, false);
if (unlikely(ret != 0))
return ret;
 
diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
index e70ddd8..b550baf 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -35,21 +35,10 @@
 #include 
 #include 
 
-#define TTM_MEMORY_ALLOC_RETRIES 4
-
-struct ttm_mem_zone {
-   struct kobject kobj;
-   struct ttm_mem_global *glob;
-   const char *name;
-   uint64_t zone_mem;
-   uint64_t emer_mem;
-   uint64_t max_mem;
-   uint64_t swap_limit;
-   uint64_t used_mem;
-};
+#define TTM_MEMORY_RETRIES 4
 
 static struct attribute ttm_mem_sys = {
-   .name = "zone_memory",
+   .name = "memory",
.mode = S_IRUGO
 };
 static struct attribute ttm_mem_emer = {
@@ -64,140 +53,141 @@ static struct attribute ttm_mem_swap = {
.name = "swap_limit",
.mode = S_IRUGO | S_IWUSR
 };
+static struct attribute ttm_mem_dma32_swap = {
+   .name = "swap_dma32_limit",
+   .mode = S_IRUGO | S_IWUSR
+};
 static struct attribute ttm_mem_used = {
.name = "used_memory",
.mode = S_IRUGO
 };
+static struct attribute ttm_mem_dma32_used = {
+   .name = "used_dma32_memory",
+   .mode = S_IRUGO
+};
 
-static void ttm_mem_zone_kobj_release(struct kobject *kobj)
-{
-   struct ttm_mem_zone *zone =
-   container_of(kobj, struct ttm_mem_zone, kobj);
-
-   printk(KERN_INFO TTM_PFX
-  "Zone %7s: Used memory at exit: %llu kiB.\n",
-  zone->name, (unsigned long long) zone->used_mem >> 10);
-   kfree(zone);
-}
-
-static ssize_t ttm_mem_zone_show(struct kobject *kobj,
-struct attribute *attr,
-char *buffer)
+static ssize_t ttm_mem_global_show(struct kobject *kobj,
+  struct attribute *attr,
+  char *buffer)
 {
-   struct ttm_mem_zone *zone =
-   container_of(kobj, struct ttm_mem_zone, kobj);
-   uint64_t val = 0;
+   struct ttm_mem_global *glob =
+   container_of(kobj, struct ttm_mem_global, kobj);
+   unsigned long val = 0;
 
-   spin_lock(&zone->glob->lock);
+   spin_lock(&glob->lock);
if (attr == &ttm_mem_sys)
-   val = zone->zone_mem;
+   val = glob->mem;
else if (attr == &ttm_mem_emer)
-   val = zone->emer_mem;
+   val = glob->emer_mem;
else if (attr == &ttm_mem_max)
-   val = zone->max_mem;
+   val = glob->max_mem;
else if (attr == &ttm_mem_swap)
-   val = zone->swap_limit;
+   val = glob->swap_limit;
else if (attr ==

[PATCH 06/13] drm/ttm: convert page allocation to use page ptr array instead of list V4

2011-11-09 Thread j . glisse
From: Jerome Glisse 

Use the ttm_tt page ptr array for page allocation, move the list to
array unwinding into the page allocation functions.

V2 split the fix to use ttm put page as a separate fix
properly fill pages array when TTM_PAGE_FLAG_ZERO_ALLOC is not
set
V3 Added back page_count()==1 check when freeing page
V4 Rebase on top of memory accounting overhaul

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/ttm/ttm_memory.c |   47 +++--
 drivers/gpu/drm/ttm/ttm_page_alloc.c |   90 --
 drivers/gpu/drm/ttm/ttm_tt.c |   68 --
 include/drm/ttm/ttm_memory.h |   13 +++--
 include/drm/ttm/ttm_page_alloc.h |   17 +++---
 5 files changed, 120 insertions(+), 115 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
index b550baf..98f6899 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -326,32 +326,45 @@ int ttm_mem_global_alloc(struct ttm_mem_global *glob,
 }
 EXPORT_SYMBOL(ttm_mem_global_alloc);
 
-int ttm_mem_global_alloc_page(struct ttm_mem_global *glob,
- struct page *page,
- bool no_wait)
+int ttm_mem_global_alloc_pages(struct ttm_mem_global *glob,
+  unsigned npages,
+  bool no_wait)
 {
-
-   if (ttm_mem_global_alloc(glob, PAGE_SIZE, no_wait))
+   if (ttm_mem_global_alloc(glob, PAGE_SIZE * npages, no_wait))
return -ENOMEM;
+   ttm_check_swapping(glob);
+   return 0;
+}
+
+void ttm_mem_global_account_pages(struct ttm_mem_global *glob,
+ struct page **pages,
+ unsigned npages)
+{
+   unsigned i;
 
/* check if page is dma32 */
-   if (page_to_pfn(page) > 0x0010UL) {
-   spin_lock(&glob->lock);
-   glob->used_mem -= PAGE_SIZE;
-   glob->used_dma32_mem += PAGE_SIZE;
-   spin_unlock(&glob->lock);
+   spin_lock(&glob->lock);
+   for (i = 0; i < npages; i++) {
+   if (page_to_pfn(pages[i]) > 0x0010UL) {
+   glob->used_mem -= PAGE_SIZE;
+   glob->used_dma32_mem += PAGE_SIZE;
+   }
}
-   ttm_check_swapping(glob);
-   return 0;
+   spin_unlock(&glob->lock);
 }
 
-void ttm_mem_global_free_page(struct ttm_mem_global *glob, struct page *page)
+void ttm_mem_global_free_pages(struct ttm_mem_global *glob,
+  struct page **pages, unsigned npages)
 {
+   unsigned i;
+
spin_lock(&glob->lock);
-   if (page_to_pfn(page) > 0x0010UL) {
-   glob->used_dma32_mem -= PAGE_SIZE;
-   } else {
-   glob->used_mem -= PAGE_SIZE;
+   for (i = 0; i < npages; i++) {
+   if (page_to_pfn(pages[i]) > 0x0010UL) {
+   glob->used_dma32_mem -= PAGE_SIZE;
+   } else {
+   glob->used_mem -= PAGE_SIZE;
+   }
}
spin_unlock(&glob->lock);
 }
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index 727e93d..c4f18b9 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -619,8 +619,10 @@ static void ttm_page_pool_fill_locked(struct ttm_page_pool 
*pool,
  * @return count of pages still required to fulfill the request.
  */
 static unsigned ttm_page_pool_get_pages(struct ttm_page_pool *pool,
-   struct list_head *pages, int ttm_flags,
-   enum ttm_caching_state cstate, unsigned count)
+   struct list_head *pages,
+   int ttm_flags,
+   enum ttm_caching_state cstate,
+   unsigned count)
 {
unsigned long irq_flags;
struct list_head *p;
@@ -664,13 +666,14 @@ out:
  * On success pages list will hold count number of correctly
  * cached pages.
  */
-int ttm_get_pages(struct list_head *pages, int flags,
- enum ttm_caching_state cstate, unsigned count,
- dma_addr_t *dma_address)
+int ttm_get_pages(struct page **pages, unsigned npages, int flags,
+ enum ttm_caching_state cstate, dma_addr_t *dma_address)
 {
struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
struct page *p = NULL;
+   struct list_head plist;
gfp_t gfp_flags = GFP_USER;
+   unsigned count = 0;
int r;
 
/* set zero flag for page allocation if required */
@@ -684,94 +687,107 @@ int ttm_get_pages(struct list_head *pages, int flags,
else
gfp_flags |= GFP_HIGHUSER;
 
-   for (r = 0; r < count; ++r) {
-   p = alloc_page(gfp_flags);
-   if (!p) {
-

[PATCH 07/13] drm/ttm: test for dma_address array allocation failure

2011-11-09 Thread j . glisse
From: Jerome Glisse 

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/ttm/ttm_tt.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 303bbba..2dab08b 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -293,7 +293,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, 
unsigned long size,
ttm->dummy_read_page = dummy_read_page;
 
ttm_tt_alloc_page_directory(ttm);
-   if (!ttm->pages) {
+   if (!ttm->pages || !ttm->dma_address) {
ttm_tt_destroy(ttm);
printk(KERN_ERR TTM_PFX "Failed allocating page table\n");
return NULL;
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 08/13] drm/ttm: merge ttm_backend and ttm_tt V2

2011-11-09 Thread j . glisse
From: Jerome Glisse 

ttm_backend will exist only and only with a ttm_tt, and ttm_tt
will be of interesting use only when bind to a backend. Thus to
avoid code & data duplication btw the two merge them.

V2 Rebase on top of memory accountign overhaul

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c|   14 ++-
 drivers/gpu/drm/nouveau/nouveau_drv.h   |5 +-
 drivers/gpu/drm/nouveau/nouveau_sgdma.c |  188 --
 drivers/gpu/drm/radeon/radeon_ttm.c |  222 ---
 drivers/gpu/drm/ttm/ttm_agp_backend.c   |   88 +
 drivers/gpu/drm/ttm/ttm_bo.c|9 +-
 drivers/gpu/drm/ttm/ttm_tt.c|   60 ++---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c  |   66 +++--
 include/drm/ttm/ttm_bo_driver.h |  104 ++-
 9 files changed, 295 insertions(+), 461 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 7226f41..b060fa4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -343,8 +343,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, 
u32 val)
*mem = val;
 }
 
-static struct ttm_backend *
-nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev)
+static struct ttm_tt *
+nouveau_ttm_tt_create(struct ttm_bo_device *bdev,
+ unsigned long size, uint32_t page_flags,
+ struct page *dummy_read_page)
 {
struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev);
struct drm_device *dev = dev_priv->dev;
@@ -352,11 +354,13 @@ nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device 
*bdev)
switch (dev_priv->gart_info.type) {
 #if __OS_HAS_AGP
case NOUVEAU_GART_AGP:
-   return ttm_agp_backend_init(bdev, dev->agp->bridge);
+   return ttm_agp_tt_create(bdev, dev->agp->bridge,
+size, page_flags, dummy_read_page);
 #endif
case NOUVEAU_GART_PDMA:
case NOUVEAU_GART_HW:
-   return nouveau_sgdma_init_ttm(dev);
+   return nouveau_sgdma_create_ttm(bdev, size, page_flags,
+   dummy_read_page);
default:
NV_ERROR(dev, "Unknown GART type %d\n",
 dev_priv->gart_info.type);
@@ -1045,7 +1049,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
 }
 
 struct ttm_bo_driver nouveau_bo_driver = {
-   .create_ttm_backend_entry = nouveau_bo_create_ttm_backend_entry,
+   .ttm_tt_create = &nouveau_ttm_tt_create,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h 
b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 29837da..0c53e39 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -1000,7 +1000,10 @@ extern int nouveau_sgdma_init(struct drm_device *);
 extern void nouveau_sgdma_takedown(struct drm_device *);
 extern uint32_t nouveau_sgdma_get_physical(struct drm_device *,
   uint32_t offset);
-extern struct ttm_backend *nouveau_sgdma_init_ttm(struct drm_device *);
+extern struct ttm_tt *nouveau_sgdma_create_ttm(struct ttm_bo_device *bdev,
+  unsigned long size,
+  uint32_t page_flags,
+  struct page *dummy_read_page);
 
 /* nouveau_debugfs.c */
 #if defined(CONFIG_DRM_NOUVEAU_DEBUG)
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c 
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index b75258a..bc2ab90 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -8,44 +8,23 @@
 #define NV_CTXDMA_PAGE_MASK  (NV_CTXDMA_PAGE_SIZE - 1)
 
 struct nouveau_sgdma_be {
-   struct ttm_backend backend;
+   struct ttm_tt ttm;
struct drm_device *dev;
-
-   dma_addr_t *pages;
-   unsigned nr_pages;
-   bool unmap_pages;
-
u64 offset;
-   bool bound;
 };
 
 static int
-nouveau_sgdma_populate(struct ttm_backend *be, unsigned long num_pages,
-  struct page **pages, struct page *dummy_read_page,
-  dma_addr_t *dma_addrs)
+nouveau_sgdma_dma_map(struct ttm_tt *ttm)
 {
-   struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)be;
+   struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
struct drm_device *dev = nvbe->dev;
int i;
 
-   NV_DEBUG(nvbe->dev, "num_pages = %ld\n", num_pages);
-
-   nvbe->pages = dma_addrs;
-   nvbe->nr_pages = num_pages;
-   nvbe->unmap_pages = true;
-
-   /* this code path isn't called and is incorrect anyways */
-   if 

[PATCH 09/13] drm/ttm: introduce callback for ttm_tt populate & unpopulate V2

2011-11-09 Thread j . glisse
From: Jerome Glisse 

Move the page allocation and freeing to driver callback and
provide ttm code helper function for those.

Most intrusive change, is the fact that we now only fully
populate an object this simplify some of code designed around
the page fault design.

V2 Rebase on top of memory accounting overhaul

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c   |3 +
 drivers/gpu/drm/radeon/radeon_ttm.c|2 +
 drivers/gpu/drm/ttm/ttm_bo_util.c  |   31 ++-
 drivers/gpu/drm/ttm/ttm_bo_vm.c|   13 +++--
 drivers/gpu/drm/ttm/ttm_page_alloc.c   |   45 +
 drivers/gpu/drm/ttm/ttm_tt.c   |   86 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 +
 include/drm/ttm/ttm_bo_driver.h|   41 +---
 include/drm/ttm/ttm_page_alloc.h   |   18 +++
 9 files changed, 123 insertions(+), 119 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b060fa4..7e5ca3f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -28,6 +28,7 @@
  */
 
 #include "drmP.h"
+#include "ttm/ttm_page_alloc.h"
 
 #include "nouveau_drm.h"
 #include "nouveau_drv.h"
@@ -1050,6 +1051,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
 
 struct ttm_bo_driver nouveau_bo_driver = {
.ttm_tt_create = &nouveau_ttm_tt_create,
+   .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate,
+   .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 53ff62b..490afce 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -584,6 +584,8 @@ struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device 
*bdev,
 
 static struct ttm_bo_driver radeon_bo_driver = {
.ttm_tt_create = &radeon_ttm_tt_create,
+   .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate,
+   .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate,
.invalidate_caches = &radeon_invalidate_caches,
.init_mem_type = &radeon_init_mem_type,
.evict_flags = &radeon_evict_flags,
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 082fcae..60f204d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -244,7 +244,7 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void 
*src,
unsigned long page,
pgprot_t prot)
 {
-   struct page *d = ttm_tt_get_page(ttm, page);
+   struct page *d = ttm->pages[page];
void *dst;
 
if (!d)
@@ -281,7 +281,7 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void 
*dst,
unsigned long page,
pgprot_t prot)
 {
-   struct page *s = ttm_tt_get_page(ttm, page);
+   struct page *s = ttm->pages[page];
void *src;
 
if (!s)
@@ -342,6 +342,12 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
if (old_iomap == NULL && ttm == NULL)
goto out2;
 
+   if (ttm->state == tt_unpopulated) {
+   ret = ttm->bdev->driver->ttm_tt_populate(ttm);
+   if (ret)
+   goto out1;
+   }
+
add = 0;
dir = 1;
 
@@ -502,10 +508,16 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 {
struct ttm_mem_reg *mem = &bo->mem; pgprot_t prot;
struct ttm_tt *ttm = bo->ttm;
-   struct page *d;
-   int i;
+   int ret;
 
BUG_ON(!ttm);
+
+   if (ttm->state == tt_unpopulated) {
+   ret = ttm->bdev->driver->ttm_tt_populate(ttm);
+   if (ret)
+   return ret;
+   }
+
if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED)) {
/*
 * We're mapping a single page, and the desired
@@ -513,18 +525,9 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 */
 
map->bo_kmap_type = ttm_bo_map_kmap;
-   map->page = ttm_tt_get_page(ttm, start_page);
+   map->page = ttm->pages[start_page];
map->virtual = kmap(map->page);
} else {
-   /*
-* Populate the part we're mapping;
-*/
-   for (i = start_page; i < start_page + num_pages; ++i) {
-   d = ttm_tt_get_page(ttm, i);
-   if (!d)
-   return -ENOMEM;
-   }
-
/*
 * We need to use vmap to get the desired page protection
 * or to make the buffer objec

[PATCH 11/13] swiotlb: Expose swiotlb_nr_tlb function to modules

2011-11-09 Thread j . glisse
From: Konrad Rzeszutek Wilk 

As a mechanism to detect whether SWIOTLB is enabled or not.
We also fix the spelling - it was swioltb instead of
swiotlb.

CC: FUJITA Tomonori 
[v1: Ripped out swiotlb_enabled]
Signed-off-by: Konrad Rzeszutek Wilk 
---
 drivers/xen/swiotlb-xen.c |2 +-
 include/linux/swiotlb.h   |2 +-
 lib/swiotlb.c |5 +++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index c984768..c50fb0b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -152,7 +152,7 @@ void __init xen_swiotlb_init(int verbose)
char *m = NULL;
unsigned int repeat = 3;
 
-   nr_tbl = swioltb_nr_tbl();
+   nr_tbl = swiotlb_nr_tbl();
if (nr_tbl)
xen_io_tlb_nslabs = nr_tbl;
else {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 445702c..e872526 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -24,7 +24,7 @@ extern int swiotlb_force;
 
 extern void swiotlb_init(int verbose);
 extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
verbose);
-extern unsigned long swioltb_nr_tbl(void);
+extern unsigned long swiotlb_nr_tbl(void);
 
 /*
  * Enumeration for sync targets
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 99093b3..058935e 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -110,11 +110,11 @@ setup_io_tlb_npages(char *str)
 __setup("swiotlb=", setup_io_tlb_npages);
 /* make io_tlb_overflow tunable too? */
 
-unsigned long swioltb_nr_tbl(void)
+unsigned long swiotlb_nr_tbl(void)
 {
return io_tlb_nslabs;
 }
-
+EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
 /* Note that this doesn't work with highmem page */
 static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
  volatile void *address)
@@ -321,6 +321,7 @@ void __init swiotlb_free(void)
free_bootmem_late(__pa(io_tlb_start),
  PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
}
+   io_tlb_nslabs = 0;
 }
 
 static int is_swiotlb_buffer(phys_addr_t paddr)
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 12/13] drm/radeon/kms: Enable the TTM DMA pool if swiotlb is on V2

2011-11-09 Thread j . glisse
From: Konrad Rzeszutek Wilk 

With the exception that we do not handle the AGP case. We only
deal with PCIe cards such as ATI ES1000 or HD3200 that have been
detected to only do DMA up to 32-bits.

V2 force dma32 if we fail to set bigger dma mask

CC: Dave Airlie 
CC: Alex Deucher 
Signed-off-by: Konrad Rzeszutek Wilk 
Reviewed-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h|1 -
 drivers/gpu/drm/radeon/radeon_device.c |6 ++
 drivers/gpu/drm/radeon/radeon_gart.c   |   29 +---
 drivers/gpu/drm/radeon/radeon_ttm.c|   83 +--
 4 files changed, 84 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index e3170c7..63257ba 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -332,7 +332,6 @@ struct radeon_gart {
union radeon_gart_table table;
struct page **pages;
dma_addr_t  *pages_addr;
-   bool*ttm_alloced;
boolready;
 };
 
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index c33bc91..7c31321 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -765,8 +765,14 @@ int radeon_device_init(struct radeon_device *rdev,
r = pci_set_dma_mask(rdev->pdev, DMA_BIT_MASK(dma_bits));
if (r) {
rdev->need_dma32 = true;
+   dma_bits = 32;
printk(KERN_WARNING "radeon: No suitable DMA available.\n");
}
+   r = pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(dma_bits));
+   if (r) {
+   pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(32));
+   printk(KERN_WARNING "radeon: No coherent DMA available.\n");
+   }
 
/* Registers mapping */
/* TODO: block userspace mapping of io register */
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index fdc3a9a..18f496c 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -149,9 +149,6 @@ void radeon_gart_unbind(struct radeon_device *rdev, 
unsigned offset,
p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
for (i = 0; i < pages; i++, p++) {
if (rdev->gart.pages[p]) {
-   if (!rdev->gart.ttm_alloced[p])
-   pci_unmap_page(rdev->pdev, 
rdev->gart.pages_addr[p],
-   PAGE_SIZE, 
PCI_DMA_BIDIRECTIONAL);
rdev->gart.pages[p] = NULL;
rdev->gart.pages_addr[p] = rdev->dummy_page.addr;
page_base = rdev->gart.pages_addr[p];
@@ -181,23 +178,7 @@ int radeon_gart_bind(struct radeon_device *rdev, unsigned 
offset,
p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
 
for (i = 0; i < pages; i++, p++) {
-   /* we reverted the patch using dma_addr in TTM for now but this
-* code stops building on alpha so just comment it out for now 
*/
-   if (0) { /*dma_addr[i] != DMA_ERROR_CODE) */
-   rdev->gart.ttm_alloced[p] = true;
-   rdev->gart.pages_addr[p] = dma_addr[i];
-   } else {
-   /* we need to support large memory configurations */
-   /* assume that unbind have already been call on the 
range */
-   rdev->gart.pages_addr[p] = pci_map_page(rdev->pdev, 
pagelist[i],
-   0, PAGE_SIZE,
-   PCI_DMA_BIDIRECTIONAL);
-   if (pci_dma_mapping_error(rdev->pdev, 
rdev->gart.pages_addr[p])) {
-   /* FIXME: failed to map page (return -ENOMEM?) 
*/
-   radeon_gart_unbind(rdev, offset, pages);
-   return -ENOMEM;
-   }
-   }
+   rdev->gart.pages_addr[p] = dma_addr[i];
rdev->gart.pages[p] = pagelist[i];
page_base = rdev->gart.pages_addr[p];
for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) {
@@ -259,12 +240,6 @@ int radeon_gart_init(struct radeon_device *rdev)
radeon_gart_fini(rdev);
return -ENOMEM;
}
-   rdev->gart.ttm_alloced = kzalloc(sizeof(bool) *
-rdev->gart.num_cpu_pages, GFP_KERNEL);
-   if (rdev->gart.ttm_alloced == NULL) {
-   radeon_gart_fini(rdev);
-   return -ENOMEM;
-   }
/* set GART entry to point to the dummy page by default */
for (i = 0; i < rdev->gart.num_cpu_pages; i++) {
rdev->gart.pages_addr[i] = rdev->dummy_page.addr;
@@

[PATCH 13/13] drm/nouveau: enable the TTM DMA pool on 32-bit DMA only device V2

2011-11-09 Thread j . glisse
From: Konrad Rzeszutek Wilk 

If the card is capable of more than 32-bit, then use the default
TTM page pool code which allocates from anywhere in the memory.

Note: If the 'ttm.no_dma' parameter is set, the override is ignored
and the default TTM pool is used.

V2 use pci_set_consistent_dma_mask

CC: Ben Skeggs 
CC: Francisco Jerez 
CC: Dave Airlie 
Signed-off-by: Konrad Rzeszutek Wilk 
Reviewed-by: Jerome Glisse 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  |   73 -
 drivers/gpu/drm/nouveau/nouveau_debugfs.c |1 +
 drivers/gpu/drm/nouveau/nouveau_mem.c |6 ++
 drivers/gpu/drm/nouveau/nouveau_sgdma.c   |   60 +---
 4 files changed, 79 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 7e5ca3f..36234a7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1049,10 +1049,79 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
nouveau_fence_unref(&old_fence);
 }
 
+static int
+nouveau_ttm_tt_populate(struct ttm_tt *ttm)
+{
+   struct drm_nouveau_private *dev_priv;
+   struct drm_device *dev;
+   unsigned i;
+   int r;
+
+   if (ttm->state != tt_unpopulated)
+   return 0;
+
+   dev_priv = nouveau_bdev(ttm->bdev);
+   dev = dev_priv->dev;
+
+#ifdef CONFIG_SWIOTLB
+   if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) {
+   return ttm_dma_populate(ttm, dev->dev);
+   }
+#endif
+
+   r = ttm_page_alloc_ttm_tt_populate(ttm);
+   if (r) {
+   return r;
+   }
+
+   for (i = 0; i < ttm->num_pages; i++) {
+   ttm->dma_address[i] = pci_map_page(dev->pdev, ttm->pages[i],
+  0, PAGE_SIZE,
+  PCI_DMA_BIDIRECTIONAL);
+   if (pci_dma_mapping_error(dev->pdev, ttm->dma_address[i])) {
+   while (--i) {
+   pci_unmap_page(dev->pdev, ttm->dma_address[i],
+  PAGE_SIZE, 
PCI_DMA_BIDIRECTIONAL);
+   ttm->dma_address[i] = 0;
+   }
+   ttm_page_alloc_ttm_tt_unpopulate(ttm);
+   return -EFAULT;
+   }
+   }
+   return 0;
+}
+
+static void
+nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
+{
+   struct drm_nouveau_private *dev_priv;
+   struct drm_device *dev;
+   unsigned i;
+
+   dev_priv = nouveau_bdev(ttm->bdev);
+   dev = dev_priv->dev;
+
+#ifdef CONFIG_SWIOTLB
+   if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) {
+   ttm_dma_unpopulate(ttm, dev->dev);
+   return;
+   }
+#endif
+
+   for (i = 0; i < ttm->num_pages; i++) {
+   if (ttm->dma_address[i]) {
+   pci_unmap_page(dev->pdev, ttm->dma_address[i],
+  PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+   }
+   }
+
+   ttm_page_alloc_ttm_tt_unpopulate(ttm);
+}
+
 struct ttm_bo_driver nouveau_bo_driver = {
.ttm_tt_create = &nouveau_ttm_tt_create,
-   .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate,
-   .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate,
+   .ttm_tt_populate = &nouveau_ttm_tt_populate,
+   .ttm_tt_unpopulate = &nouveau_ttm_tt_unpopulate,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c 
b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
index 8e15923..f52c2db 100644
--- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c
+++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
@@ -178,6 +178,7 @@ static struct drm_info_list nouveau_debugfs_list[] = {
{ "memory", nouveau_debugfs_memory_info, 0, NULL },
{ "vbios.rom", nouveau_debugfs_vbios_image, 0, NULL },
{ "ttm_page_pool", ttm_page_alloc_debugfs, 0, NULL },
+   { "ttm_dma_page_pool", ttm_dma_page_alloc_debugfs, 0, NULL },
 };
 #define NOUVEAU_DEBUGFS_ENTRIES ARRAY_SIZE(nouveau_debugfs_list)
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_mem.c 
b/drivers/gpu/drm/nouveau/nouveau_mem.c
index 36bec48..37fcaa2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_mem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_mem.c
@@ -407,6 +407,12 @@ nouveau_mem_vram_init(struct drm_device *dev)
ret = pci_set_dma_mask(dev->pdev, DMA_BIT_MASK(dma_bits));
if (ret)
return ret;
+   ret = pci_set_consistent_dma_mask(dev->pdev, DMA_BIT_MASK(dma_bits));
+   if (ret) {
+   /* Reset to default value. */
+   pci_set_consistent_dma_mask(dev->pdev, DMA_BIT_MASK(32));
+   }
+
 
ret = nouveau_ttm_global_init(dev_priv);

Isolate dma information from ttm_tt

2011-11-09 Thread j . glisse
This apply on top of the ttm_tt & backend merge patchset.

Cheers,
Jerome

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 14/14] drm/ttm: isolate dma data from ttm_tt

2011-11-09 Thread j . glisse
From: Jerome Glisse 

Move dma data to a superset ttm_dma_tt structure which herit
from ttm_tt. This allow driver that don't use dma functionalities
to not have to waste memory for it.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c |   18 +
 drivers/gpu/drm/nouveau/nouveau_sgdma.c  |   22 +++
 drivers/gpu/drm/radeon/radeon_ttm.c  |   43 +++---
 drivers/gpu/drm/ttm/ttm_page_alloc.c |   10 +++---
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |   38 +++-
 drivers/gpu/drm/ttm/ttm_tt.c |   58 -
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c   |2 +
 include/drm/ttm/ttm_bo_driver.h  |   31 +++-
 include/drm/ttm/ttm_page_alloc.h |   12 ++
 9 files changed, 155 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 36234a7..df3f19c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1052,6 +1052,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
 static int
 nouveau_ttm_tt_populate(struct ttm_tt *ttm)
 {
+   struct ttm_dma_tt *ttm_dma = (void *)ttm;
struct drm_nouveau_private *dev_priv;
struct drm_device *dev;
unsigned i;
@@ -1065,7 +1066,7 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm)
 
 #ifdef CONFIG_SWIOTLB
if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) {
-   return ttm_dma_populate(ttm, dev->dev);
+   return ttm_dma_populate((void *)ttm, dev->dev);
}
 #endif
 
@@ -1075,14 +1076,14 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm)
}
 
for (i = 0; i < ttm->num_pages; i++) {
-   ttm->dma_address[i] = pci_map_page(dev->pdev, ttm->pages[i],
+   ttm_dma->dma_address[i] = pci_map_page(dev->pdev, ttm->pages[i],
   0, PAGE_SIZE,
   PCI_DMA_BIDIRECTIONAL);
-   if (pci_dma_mapping_error(dev->pdev, ttm->dma_address[i])) {
+   if (pci_dma_mapping_error(dev->pdev, ttm_dma->dma_address[i])) {
while (--i) {
-   pci_unmap_page(dev->pdev, ttm->dma_address[i],
+   pci_unmap_page(dev->pdev, 
ttm_dma->dma_address[i],
   PAGE_SIZE, 
PCI_DMA_BIDIRECTIONAL);
-   ttm->dma_address[i] = 0;
+   ttm_dma->dma_address[i] = 0;
}
ttm_page_alloc_ttm_tt_unpopulate(ttm);
return -EFAULT;
@@ -1094,6 +1095,7 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm)
 static void
 nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
 {
+   struct ttm_dma_tt *ttm_dma = (void *)ttm;
struct drm_nouveau_private *dev_priv;
struct drm_device *dev;
unsigned i;
@@ -1103,14 +1105,14 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
 
 #ifdef CONFIG_SWIOTLB
if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) {
-   ttm_dma_unpopulate(ttm, dev->dev);
+   ttm_dma_unpopulate((void *)ttm, dev->dev);
return;
}
 #endif
 
for (i = 0; i < ttm->num_pages; i++) {
-   if (ttm->dma_address[i]) {
-   pci_unmap_page(dev->pdev, ttm->dma_address[i],
+   if (ttm_dma->dma_address[i]) {
+   pci_unmap_page(dev->pdev, ttm_dma->dma_address[i],
   PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
}
}
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c 
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index ee1eb7c..47f245e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -8,7 +8,10 @@
 #define NV_CTXDMA_PAGE_MASK  (NV_CTXDMA_PAGE_SIZE - 1)
 
 struct nouveau_sgdma_be {
-   struct ttm_tt ttm;
+   /* this has to be the first field so populate/unpopulated in
+* nouve_bo.c works properly, otherwise have to move them here
+*/
+   struct ttm_dma_tt ttm;
struct drm_device *dev;
u64 offset;
 };
@@ -20,6 +23,7 @@ nouveau_sgdma_destroy(struct ttm_tt *ttm)
 
if (ttm) {
NV_DEBUG(nvbe->dev, "\n");
+   ttm_dma_tt_fini(&nvbe->ttm);
kfree(nvbe);
}
 }
@@ -38,7 +42,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg *mem)
nvbe->offset = mem->start << PAGE_SHIFT;
pte = (nvbe->offset >> NV_CTXDMA_PAGE_SHIFT) + 2;
for (i = 0; i < ttm->num_pages; i++) {
-   dma_addr_t dma_offset = ttm->dma_address[i];
+   dma_addr_t dma_offset = nvbe->ttm.dma_address[i];
uint32_t offset_l = lower_32_bits(dma_offset);
 
 

ttm: merge ttm_backend & ttm_tt, introduce ttm dma allocator V4

2011-11-10 Thread j . glisse
So squeezed all to avoid any memory accouting messing, seems to work ok
so far.

Cheers,
Jerome

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 01/13] swiotlb: Expose swiotlb_nr_tlb function to modules

2011-11-10 Thread j . glisse
From: Konrad Rzeszutek Wilk 

As a mechanism to detect whether SWIOTLB is enabled or not.
We also fix the spelling - it was swioltb instead of
swiotlb.

CC: FUJITA Tomonori 
[v1: Ripped out swiotlb_enabled]
Signed-off-by: Konrad Rzeszutek Wilk 
---
 drivers/xen/swiotlb-xen.c |2 +-
 include/linux/swiotlb.h   |2 +-
 lib/swiotlb.c |5 +++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index c984768..c50fb0b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -152,7 +152,7 @@ void __init xen_swiotlb_init(int verbose)
char *m = NULL;
unsigned int repeat = 3;
 
-   nr_tbl = swioltb_nr_tbl();
+   nr_tbl = swiotlb_nr_tbl();
if (nr_tbl)
xen_io_tlb_nslabs = nr_tbl;
else {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 445702c..e872526 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -24,7 +24,7 @@ extern int swiotlb_force;
 
 extern void swiotlb_init(int verbose);
 extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
verbose);
-extern unsigned long swioltb_nr_tbl(void);
+extern unsigned long swiotlb_nr_tbl(void);
 
 /*
  * Enumeration for sync targets
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 99093b3..058935e 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -110,11 +110,11 @@ setup_io_tlb_npages(char *str)
 __setup("swiotlb=", setup_io_tlb_npages);
 /* make io_tlb_overflow tunable too? */
 
-unsigned long swioltb_nr_tbl(void)
+unsigned long swiotlb_nr_tbl(void)
 {
return io_tlb_nslabs;
 }
-
+EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
 /* Note that this doesn't work with highmem page */
 static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
  volatile void *address)
@@ -321,6 +321,7 @@ void __init swiotlb_free(void)
free_bootmem_late(__pa(io_tlb_start),
  PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
}
+   io_tlb_nslabs = 0;
 }
 
 static int is_swiotlb_buffer(phys_addr_t paddr)
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 02/13] drm/ttm: remove userspace backed ttm object support

2011-11-10 Thread j . glisse
From: Jerome Glisse 

This was never use in none of the driver, properly using userspace
page for bo would need more code (vma interaction mostly). Removing
this dead code in preparation of ttm_tt & backend merge.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/ttm/ttm_bo.c|   22 
 drivers/gpu/drm/ttm/ttm_tt.c|  105 +--
 include/drm/ttm/ttm_bo_api.h|5 --
 include/drm/ttm/ttm_bo_driver.h |   24 -
 4 files changed, 1 insertions(+), 155 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 617b646..4bde335 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -342,22 +342,6 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, 
bool zero_alloc)
if (unlikely(bo->ttm == NULL))
ret = -ENOMEM;
break;
-   case ttm_bo_type_user:
-   bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT,
-   page_flags | TTM_PAGE_FLAG_USER,
-   glob->dummy_read_page);
-   if (unlikely(bo->ttm == NULL)) {
-   ret = -ENOMEM;
-   break;
-   }
-
-   ret = ttm_tt_set_user(bo->ttm, current,
- bo->buffer_start, bo->num_pages);
-   if (unlikely(ret != 0)) {
-   ttm_tt_destroy(bo->ttm);
-   bo->ttm = NULL;
-   }
-   break;
default:
printk(KERN_ERR TTM_PFX "Illegal buffer object type\n");
ret = -EINVAL;
@@ -907,16 +891,12 @@ static uint32_t ttm_bo_select_caching(struct 
ttm_mem_type_manager *man,
 }
 
 static bool ttm_bo_mt_compatible(struct ttm_mem_type_manager *man,
-bool disallow_fixed,
 uint32_t mem_type,
 uint32_t proposed_placement,
 uint32_t *masked_placement)
 {
uint32_t cur_flags = ttm_bo_type_flags(mem_type);
 
-   if ((man->flags & TTM_MEMTYPE_FLAG_FIXED) && disallow_fixed)
-   return false;
-
if ((cur_flags & proposed_placement & TTM_PL_MASK_MEM) == 0)
return false;
 
@@ -961,7 +941,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
man = &bdev->man[mem_type];
 
type_ok = ttm_bo_mt_compatible(man,
-   bo->type == ttm_bo_type_user,
mem_type,
placement->placement[i],
&cur_flags);
@@ -1009,7 +988,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
if (!man->has_type)
continue;
if (!ttm_bo_mt_compatible(man,
-   bo->type == ttm_bo_type_user,
mem_type,
placement->busy_placement[i],
&cur_flags))
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 58c271e..82a1161 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -62,43 +62,6 @@ static void ttm_tt_free_page_directory(struct ttm_tt *ttm)
ttm->dma_address = NULL;
 }
 
-static void ttm_tt_free_user_pages(struct ttm_tt *ttm)
-{
-   int write;
-   int dirty;
-   struct page *page;
-   int i;
-   struct ttm_backend *be = ttm->be;
-
-   BUG_ON(!(ttm->page_flags & TTM_PAGE_FLAG_USER));
-   write = ((ttm->page_flags & TTM_PAGE_FLAG_WRITE) != 0);
-   dirty = ((ttm->page_flags & TTM_PAGE_FLAG_USER_DIRTY) != 0);
-
-   if (be)
-   be->func->clear(be);
-
-   for (i = 0; i < ttm->num_pages; ++i) {
-   page = ttm->pages[i];
-   if (page == NULL)
-   continue;
-
-   if (page == ttm->dummy_read_page) {
-   BUG_ON(write);
-   continue;
-   }
-
-   if (write && dirty && !PageReserved(page))
-   set_page_dirty_lock(page);
-
-   ttm->pages[i] = NULL;
-   ttm_mem_global_free(ttm->glob->mem_glob, PAGE_SIZE);
-   put_page(page);
-   }
-   ttm->state = tt_unpopulated;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
-}
-
 static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index)
 {
struct page *p;
@@ -325,10 +288,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm)
}
 
if (likely(ttm->pages != NULL)) {
-   if (ttm->page_flags & TTM_PAGE

[PATCH 03/13] drm/ttm: remove split btw highmen and lowmem page

2011-11-10 Thread j . glisse
From: Jerome Glisse 

Split btw highmem and lowmem page was rendered useless by the
pool code. Remove it. Note further cleanup would change the
ttm page allocation helper to actualy take an array instead
of relying on list this could drasticly reduce the number of
function call in the common case of allocation whole buffer.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/ttm/ttm_tt.c|   11 ++-
 include/drm/ttm/ttm_bo_driver.h |7 ---
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 82a1161..8b7a6d0 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -69,7 +69,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int 
index)
struct ttm_mem_global *mem_glob = ttm->glob->mem_glob;
int ret;
 
-   while (NULL == (p = ttm->pages[index])) {
+   if (NULL == (p = ttm->pages[index])) {
 
INIT_LIST_HEAD(&h);
 
@@ -85,10 +85,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, 
int index)
if (unlikely(ret != 0))
goto out_err;
 
-   if (PageHighMem(p))
-   ttm->pages[--ttm->first_himem_page] = p;
-   else
-   ttm->pages[++ttm->last_lomem_page] = p;
+   ttm->pages[index] = p;
}
return p;
 out_err:
@@ -270,8 +267,6 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm)
ttm_put_pages(&h, count, ttm->page_flags, ttm->caching_state,
  ttm->dma_address);
ttm->state = tt_unpopulated;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
 }
 
 void ttm_tt_destroy(struct ttm_tt *ttm)
@@ -315,8 +310,6 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, 
unsigned long size,
 
ttm->glob = bdev->glob;
ttm->num_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   ttm->first_himem_page = ttm->num_pages;
-   ttm->last_lomem_page = -1;
ttm->caching_state = tt_cached;
ttm->page_flags = page_flags;
 
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 37527d6..9da182b 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -136,11 +136,6 @@ enum ttm_caching_state {
  * @dummy_read_page: Page to map where the ttm_tt page array contains a NULL
  * pointer.
  * @pages: Array of pages backing the data.
- * @first_himem_page: Himem pages are put last in the page array, which
- * enables us to run caching attribute changes on only the first part
- * of the page array containing lomem pages. This is the index of the
- * first himem page.
- * @last_lomem_page: Index of the last lomem page in the page array.
  * @num_pages: Number of pages in the page array.
  * @bdev: Pointer to the current struct ttm_bo_device.
  * @be: Pointer to the ttm backend.
@@ -157,8 +152,6 @@ enum ttm_caching_state {
 struct ttm_tt {
struct page *dummy_read_page;
struct page **pages;
-   long first_himem_page;
-   long last_lomem_page;
uint32_t page_flags;
unsigned long num_pages;
struct ttm_bo_global *glob;
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 04/13] drm/ttm: remove unused backend flags field

2011-11-10 Thread j . glisse
From: Jerome Glisse 

This field is not use by any of the driver just drop it.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/radeon/radeon_ttm.c |1 -
 include/drm/ttm/ttm_bo_driver.h |2 --
 2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 0b5468b..97c76ae 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -787,7 +787,6 @@ struct ttm_backend *radeon_ttm_backend_create(struct 
radeon_device *rdev)
return NULL;
}
gtt->backend.bdev = &rdev->mman.bdev;
-   gtt->backend.flags = 0;
gtt->backend.func = &radeon_backend_func;
gtt->rdev = rdev;
gtt->pages = NULL;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 9da182b..6d17140 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -106,7 +106,6 @@ struct ttm_backend_func {
  * struct ttm_backend
  *
  * @bdev: Pointer to a struct ttm_bo_device.
- * @flags: For driver use.
  * @func: Pointer to a struct ttm_backend_func that describes
  * the backend methods.
  *
@@ -114,7 +113,6 @@ struct ttm_backend_func {
 
 struct ttm_backend {
struct ttm_bo_device *bdev;
-   uint32_t flags;
struct ttm_backend_func *func;
 };
 
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 05/13] drm/ttm: use ttm put pages function to properly restore cache attribute

2011-11-10 Thread j . glisse
From: Jerome Glisse 

On failure we need to make sure the page we free has wb cache
attribute. Do this pas call the proper ttm page helper function.

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/ttm/ttm_tt.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 8b7a6d0..3fb4c6d 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,7 +89,10 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, 
int index)
}
return p;
 out_err:
-   put_page(p);
+   INIT_LIST_HEAD(&h);
+   list_add(&p->lru, &h);
+   ttm_put_pages(&h, 1, ttm->page_flags,
+ ttm->caching_state, &ttm->dma_address[index]);
return NULL;
 }
 
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 06/13] drm/ttm: test for dma_address array allocation failure

2011-11-10 Thread j . glisse
From: Jerome Glisse 

Signed-off-by: Jerome Glisse 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/ttm/ttm_tt.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 3fb4c6d..aceecb5 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -319,7 +319,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, 
unsigned long size,
ttm->dummy_read_page = dummy_read_page;
 
ttm_tt_alloc_page_directory(ttm);
-   if (!ttm->pages) {
+   if (!ttm->pages || !ttm->dma_address) {
ttm_tt_destroy(ttm);
printk(KERN_ERR TTM_PFX "Failed allocating page table\n");
return NULL;
-- 
1.7.7.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 07/13] drm/ttm: page allocation use page array instead of list

2011-11-10 Thread j . glisse
From: Jerome Glisse 

Use the ttm_tt pages array for pages allocations, move the list
unwinding into the page allocation functions.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/ttm/ttm_page_alloc.c |   85 +-
 drivers/gpu/drm/ttm/ttm_tt.c |   36 +++
 include/drm/ttm/ttm_page_alloc.h |8 ++--
 3 files changed, 63 insertions(+), 66 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index 727e93d..0f3e6d2 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -619,8 +619,10 @@ static void ttm_page_pool_fill_locked(struct ttm_page_pool 
*pool,
  * @return count of pages still required to fulfill the request.
  */
 static unsigned ttm_page_pool_get_pages(struct ttm_page_pool *pool,
-   struct list_head *pages, int ttm_flags,
-   enum ttm_caching_state cstate, unsigned count)
+   struct list_head *pages,
+   int ttm_flags,
+   enum ttm_caching_state cstate,
+   unsigned count)
 {
unsigned long irq_flags;
struct list_head *p;
@@ -664,13 +666,15 @@ out:
  * On success pages list will hold count number of correctly
  * cached pages.
  */
-int ttm_get_pages(struct list_head *pages, int flags,
- enum ttm_caching_state cstate, unsigned count,
+int ttm_get_pages(struct page **pages, int flags,
+ enum ttm_caching_state cstate, unsigned npages,
  dma_addr_t *dma_address)
 {
struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
+   struct list_head plist;
struct page *p = NULL;
gfp_t gfp_flags = GFP_USER;
+   unsigned count;
int r;
 
/* set zero flag for page allocation if required */
@@ -684,7 +688,7 @@ int ttm_get_pages(struct list_head *pages, int flags,
else
gfp_flags |= GFP_HIGHUSER;
 
-   for (r = 0; r < count; ++r) {
+   for (r = 0; r < npages; ++r) {
p = alloc_page(gfp_flags);
if (!p) {
 
@@ -693,85 +697,100 @@ int ttm_get_pages(struct list_head *pages, int flags,
return -ENOMEM;
}
 
-   list_add(&p->lru, pages);
+   pages[r] = p;
}
return 0;
}
 
-
/* combine zero flag to pool flags */
gfp_flags |= pool->gfp_flags;
 
/* First we take pages from the pool */
-   count = ttm_page_pool_get_pages(pool, pages, flags, cstate, count);
+   INIT_LIST_HEAD(&plist);
+   npages = ttm_page_pool_get_pages(pool, &plist, flags, cstate, npages);
+   count = 0;
+   list_for_each_entry(p, &plist, lru) {
+   pages[count++] = p;
+   }
 
/* clear the pages coming from the pool if requested */
if (flags & TTM_PAGE_FLAG_ZERO_ALLOC) {
-   list_for_each_entry(p, pages, lru) {
+   list_for_each_entry(p, &plist, lru) {
clear_page(page_address(p));
}
}
 
/* If pool didn't have enough pages allocate new one. */
-   if (count > 0) {
+   if (npages > 0) {
/* ttm_alloc_new_pages doesn't reference pool so we can run
 * multiple requests in parallel.
 **/
-   r = ttm_alloc_new_pages(pages, gfp_flags, flags, cstate, count);
+   INIT_LIST_HEAD(&plist);
+   r = ttm_alloc_new_pages(&plist, gfp_flags, flags, cstate, 
npages);
+   list_for_each_entry(p, &plist, lru) {
+   pages[count++] = p;
+   }
if (r) {
/* If there is any pages in the list put them back to
 * the pool. */
printk(KERN_ERR TTM_PFX
   "Failed to allocate extra pages "
   "for large request.");
-   ttm_put_pages(pages, 0, flags, cstate, NULL);
+   ttm_put_pages(pages, count, flags, cstate, NULL);
return r;
}
}
 
-
return 0;
 }
 
 /* Put all pages in pages list to correct pool to wait for reuse */
-void ttm_put_pages(struct list_head *pages, unsigned page_count, int flags,
+void ttm_put_pages(struct page **pages, unsigned npages, int flags,
   enum ttm_caching_state cstate, dma_addr_t *dma_address)
 {
unsigned long irq_flags;
struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
-   struct page *p, *tmp;
+   unsigned i;
 
if (pool == NULL) {
/* No pool for this memory type so free the pages */
-
-   list_for_each

[PATCH 09/13] drm/ttm: introduce callback for ttm_tt populate & unpopulate V4

2011-11-10 Thread j . glisse
From: Jerome Glisse 

Move the page allocation and freeing to driver callback and
provide ttm code helper function for those.

Most intrusive change, is the fact that we now only fully
populate an object this simplify some of code designed around
the page fault design.

V2 Rebase on top of memory accounting overhaul
V3 New rebase on top of more memory accouting changes
V4 Rebase on top of no memory account changes (where/when is my
   delorean when i need it ?)

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c   |3 +
 drivers/gpu/drm/radeon/radeon_ttm.c|2 +
 drivers/gpu/drm/ttm/ttm_bo_util.c  |   31 ++-
 drivers/gpu/drm/ttm/ttm_bo_vm.c|9 +++-
 drivers/gpu/drm/ttm/ttm_page_alloc.c   |   57 
 drivers/gpu/drm/ttm/ttm_tt.c   |   91 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 +
 include/drm/ttm/ttm_bo_driver.h|   41 --
 include/drm/ttm/ttm_page_alloc.h   |   18 ++
 9 files changed, 135 insertions(+), 120 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b060fa4..f19ac42 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -28,6 +28,7 @@
  */
 
 #include "drmP.h"
+#include "ttm/ttm_page_alloc.h"
 
 #include "nouveau_drm.h"
 #include "nouveau_drv.h"
@@ -1050,6 +1051,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
 
 struct ttm_bo_driver nouveau_bo_driver = {
.ttm_tt_create = &nouveau_ttm_tt_create,
+   .ttm_tt_populate = &ttm_pool_populate,
+   .ttm_tt_unpopulate = &ttm_pool_unpopulate,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 53ff62b..13d5996 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -584,6 +584,8 @@ struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device 
*bdev,
 
 static struct ttm_bo_driver radeon_bo_driver = {
.ttm_tt_create = &radeon_ttm_tt_create,
+   .ttm_tt_populate = &ttm_pool_populate,
+   .ttm_tt_unpopulate = &ttm_pool_unpopulate,
.invalidate_caches = &radeon_invalidate_caches,
.init_mem_type = &radeon_init_mem_type,
.evict_flags = &radeon_evict_flags,
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 082fcae..60f204d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -244,7 +244,7 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void 
*src,
unsigned long page,
pgprot_t prot)
 {
-   struct page *d = ttm_tt_get_page(ttm, page);
+   struct page *d = ttm->pages[page];
void *dst;
 
if (!d)
@@ -281,7 +281,7 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void 
*dst,
unsigned long page,
pgprot_t prot)
 {
-   struct page *s = ttm_tt_get_page(ttm, page);
+   struct page *s = ttm->pages[page];
void *src;
 
if (!s)
@@ -342,6 +342,12 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
if (old_iomap == NULL && ttm == NULL)
goto out2;
 
+   if (ttm->state == tt_unpopulated) {
+   ret = ttm->bdev->driver->ttm_tt_populate(ttm);
+   if (ret)
+   goto out1;
+   }
+
add = 0;
dir = 1;
 
@@ -502,10 +508,16 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 {
struct ttm_mem_reg *mem = &bo->mem; pgprot_t prot;
struct ttm_tt *ttm = bo->ttm;
-   struct page *d;
-   int i;
+   int ret;
 
BUG_ON(!ttm);
+
+   if (ttm->state == tt_unpopulated) {
+   ret = ttm->bdev->driver->ttm_tt_populate(ttm);
+   if (ret)
+   return ret;
+   }
+
if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED)) {
/*
 * We're mapping a single page, and the desired
@@ -513,18 +525,9 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 */
 
map->bo_kmap_type = ttm_bo_map_kmap;
-   map->page = ttm_tt_get_page(ttm, start_page);
+   map->page = ttm->pages[start_page];
map->virtual = kmap(map->page);
} else {
-   /*
-* Populate the part we're mapping;
-*/
-   for (i = start_page; i < start_page + num_pages; ++i) {
-   d = ttm_tt_get_page(ttm, i);
-   if (!d)
-   return -ENOMEM;
-   }
-
/*
 * We need to use vmap to get the desired pag

[PATCH 08/13] drm/ttm: merge ttm_backend and ttm_tt V4

2011-11-10 Thread j . glisse
From: Jerome Glisse 

ttm_backend will exist only and only with a ttm_tt, and ttm_tt
will be of interesting use only when bind to a backend. Thus to
avoid code & data duplication btw the two merge them.

V2 Rebase on top of memory accounting overhaul
V3 Rebase on top of more memory accounting changes
V4 Rebase on top of no memory account changes (where/when is my
   delorean when i need it ?)

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c|   14 ++-
 drivers/gpu/drm/nouveau/nouveau_drv.h   |5 +-
 drivers/gpu/drm/nouveau/nouveau_sgdma.c |  188 --
 drivers/gpu/drm/radeon/radeon_ttm.c |  222 ---
 drivers/gpu/drm/ttm/ttm_agp_backend.c   |   88 +
 drivers/gpu/drm/ttm/ttm_bo.c|9 +-
 drivers/gpu/drm/ttm/ttm_tt.c|   59 ++---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c  |   66 +++--
 include/drm/ttm/ttm_bo_driver.h |  104 ++-
 9 files changed, 295 insertions(+), 460 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 7226f41..b060fa4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -343,8 +343,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, 
u32 val)
*mem = val;
 }
 
-static struct ttm_backend *
-nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev)
+static struct ttm_tt *
+nouveau_ttm_tt_create(struct ttm_bo_device *bdev,
+ unsigned long size, uint32_t page_flags,
+ struct page *dummy_read_page)
 {
struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev);
struct drm_device *dev = dev_priv->dev;
@@ -352,11 +354,13 @@ nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device 
*bdev)
switch (dev_priv->gart_info.type) {
 #if __OS_HAS_AGP
case NOUVEAU_GART_AGP:
-   return ttm_agp_backend_init(bdev, dev->agp->bridge);
+   return ttm_agp_tt_create(bdev, dev->agp->bridge,
+size, page_flags, dummy_read_page);
 #endif
case NOUVEAU_GART_PDMA:
case NOUVEAU_GART_HW:
-   return nouveau_sgdma_init_ttm(dev);
+   return nouveau_sgdma_create_ttm(bdev, size, page_flags,
+   dummy_read_page);
default:
NV_ERROR(dev, "Unknown GART type %d\n",
 dev_priv->gart_info.type);
@@ -1045,7 +1049,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
 }
 
 struct ttm_bo_driver nouveau_bo_driver = {
-   .create_ttm_backend_entry = nouveau_bo_create_ttm_backend_entry,
+   .ttm_tt_create = &nouveau_ttm_tt_create,
.invalidate_caches = nouveau_bo_invalidate_caches,
.init_mem_type = nouveau_bo_init_mem_type,
.evict_flags = nouveau_bo_evict_flags,
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h 
b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 29837da..0c53e39 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -1000,7 +1000,10 @@ extern int nouveau_sgdma_init(struct drm_device *);
 extern void nouveau_sgdma_takedown(struct drm_device *);
 extern uint32_t nouveau_sgdma_get_physical(struct drm_device *,
   uint32_t offset);
-extern struct ttm_backend *nouveau_sgdma_init_ttm(struct drm_device *);
+extern struct ttm_tt *nouveau_sgdma_create_ttm(struct ttm_bo_device *bdev,
+  unsigned long size,
+  uint32_t page_flags,
+  struct page *dummy_read_page);
 
 /* nouveau_debugfs.c */
 #if defined(CONFIG_DRM_NOUVEAU_DEBUG)
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c 
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index b75258a..bc2ab90 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -8,44 +8,23 @@
 #define NV_CTXDMA_PAGE_MASK  (NV_CTXDMA_PAGE_SIZE - 1)
 
 struct nouveau_sgdma_be {
-   struct ttm_backend backend;
+   struct ttm_tt ttm;
struct drm_device *dev;
-
-   dma_addr_t *pages;
-   unsigned nr_pages;
-   bool unmap_pages;
-
u64 offset;
-   bool bound;
 };
 
 static int
-nouveau_sgdma_populate(struct ttm_backend *be, unsigned long num_pages,
-  struct page **pages, struct page *dummy_read_page,
-  dma_addr_t *dma_addrs)
+nouveau_sgdma_dma_map(struct ttm_tt *ttm)
 {
-   struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)be;
+   struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
struct drm_device *dev = nvbe->dev;
int i;
 
-   NV_DEBUG(nvbe->dev, "num_pages = %ld\n", num_pages);
-
-   nvbe->pages = dma_addrs;
-   nvbe->nr_pages = num_pages;
-

[PATCH 10/13] drm/ttm: provide dma aware ttm page pool code V7

2011-11-10 Thread j . glisse
From: Konrad Rzeszutek Wilk 

In TTM world the pages for the graphic drivers are kept in three different
pools: write combined, uncached, and cached (write-back). When the pages
are used by the graphic driver the graphic adapter via its built in MMU
(or AGP) programs these pages in. The programming requires the virtual address
(from the graphic adapter perspective) and the physical address (either System 
RAM
or the memory on the card) which is obtained using the pci_map_* calls (which 
does the
virtual to physical - or bus address translation). During the graphic 
application's
"life" those pages can be shuffled around, swapped out to disk, moved from the
VRAM to System RAM or vice-versa. This all works with the existing TTM pool code
- except when we want to use the software IOTLB (SWIOTLB) code to "map" the 
physical
addresses to the graphic adapter MMU. We end up programming the bounce buffer's
physical address instead of the TTM pool memory's and get a non-worky driver.
There are two solutions:
1) using the DMA API to allocate pages that are screened by the DMA API, or
2) using the pci_sync_* calls to copy the pages from the bounce-buffer and back.

This patch fixes the issue by allocating pages using the DMA API. The second
is a viable option - but it has performance drawbacks and potential correctness
issues - think of the write cache page being bounced (SWIOTLB->TTM), the
WC is set on the TTM page and the copy from SWIOTLB not making it to the TTM
page until the page has been recycled in the pool (and used by another 
application).

The bounce buffer does not get activated often - only in cases where we have
a 32-bit capable card and we want to use a page that is allocated above the
4GB limit. The bounce buffer offers the solution of copying the contents
of that 4GB page to an location below 4GB and then back when the operation has 
been
completed (or vice-versa). This is done by using the 'pci_sync_*' calls.
Note: If you look carefully enough in the existing TTM page pool code you will
notice the GFP_DMA32 flag is used  - which should guarantee that the provided 
page
is under 4GB. It certainly is the case, except this gets ignored in two cases:
 - If user specifies 'swiotlb=force' which bounces _every_ page.
 - If user is using a Xen's PV Linux guest (which uses the SWIOTLB and the
   underlaying PFN's aren't necessarily under 4GB).

To not have this extra copying done the other option is to allocate the pages
using the DMA API so that there is not need to map the page and perform the
expensive 'pci_sync_*' calls.

This DMA API capable TTM pool requires for this the 'struct device' to
properly call the DMA API. It also has to track the virtual and bus address of
the page being handed out in case it ends up being swapped out or de-allocated -
to make sure it is de-allocated using the proper's 'struct device'.

Implementation wise the code keeps two lists: one that is attached to the
'struct device' (via the dev->dma_pools list) and a global one to be used when
the 'struct device' is unavailable (think shrinker code). The global list can
iterate over all of the 'struct device' and its associated dma_pool. The list
in dev->dma_pools can only iterate the device's dma_pool.
/[struct 
device_pool]\
/---| dev   
 |
   /+---| dma_pool  
 |
 /-+--\/
\/
 |struct device| /-->[struct dma_pool for WC][struct dma_pool for uncached]<-/--| dma_pool  
 |
 \-+--/ /   
\/
\--/
[Two pools associated with the device (WC and UC), and the parallel list
containing the 'struct dev' and 'struct dma_pool' entries]

The maximum amount of dma pools a device can have is six: write-combined,
uncached, and cached; then there are the DMA32 variants which are:
write-combined dma32, uncached dma32, and cached dma32.

Currently this code only gets activated when any variant of the SWIOTLB IOMMU
code is running (Intel without VT-d, AMD without GART, IBM Calgary and Xen PV
with PCI devices).

Tested-by: Michel Dänzer 
[v1: Using swiotlb_nr_tbl instead of swiotlb_enabled]
[v2: Major overhaul - added 'inuse_list' to seperate used from inuse and reorder
the order of lists to get better performance.]
[v3: Added comments/and some logic based on review, Added Jerome tag]
[v4: rebase on top of ttm_tt & ttm_backend merge]
[v5: rebase on top of ttm memory accounting overhaul]
[v6: New rebase on top of more memory accouting changes]
[v7: well rebase on top of no memory accounting changes]
Reviewed-by: Jerome Glisse 
Signed-off-by: Konrad Rzeszutek Wilk 
---
 drivers/gpu/drm/ttm/Makefile |4 +
 drivers/gpu/drm/ttm/ttm_memory.c |2 +
 

  1   2   3   4   >