[PATCH] drm/radeon: avoid kernel segfault in vce when gpu fails to resume
From: Jérôme Glisse When GPU fails to resume we can not trust that value we write to GPU memory will post and we might get garbage (more like 0x on x86) when reading them back. This trigger out of range memory access in the kernel inside the vce resume code path. This patch use canonical value to compute offset instead of reading back value from GPU memory. Signed-off-by: Jérôme Glisse --- drivers/gpu/drm/radeon/vce_v1_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/vce_v1_0.c b/drivers/gpu/drm/radeon/vce_v1_0.c index a01efe3..f541a4b 100644 --- a/drivers/gpu/drm/radeon/vce_v1_0.c +++ b/drivers/gpu/drm/radeon/vce_v1_0.c @@ -196,7 +196,7 @@ int vce_v1_0_load_fw(struct radeon_device *rdev, uint32_t *data) memset(&data[5], 0, 44); memcpy(&data[16], &sign[1], rdev->vce_fw->size - sizeof(*sign)); - data += le32_to_cpu(data[4]) / 4; + data += (le32_to_cpu(sign->len) + 64) / 4; data[0] = sign->val[i].sigval[0]; data[1] = sign->val[i].sigval[1]; data[2] = sign->val[i].sigval[2]; -- 2.9.3 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] radeon: fix pll/ctrc mapping on dce2 and dce3 hardware
From: Jerome Glisse This fix black screen on resume issue that some people are experiencing. There is a bug in the atombios code regarding pll/crtc mapping. The atombios code reverse the logic for the pll and crtc mapping. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/atombios_crtc.c | 54 +- 1 file changed, 20 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c b/drivers/gpu/drm/radeon/atombios_crtc.c index 3bce029..7c1f080 100644 --- a/drivers/gpu/drm/radeon/atombios_crtc.c +++ b/drivers/gpu/drm/radeon/atombios_crtc.c @@ -1696,42 +1696,28 @@ static int radeon_atom_pick_pll(struct drm_crtc *crtc) return ATOM_PPLL2; DRM_ERROR("unable to allocate a PPLL\n"); return ATOM_PPLL_INVALID; - } else if (ASIC_IS_AVIVO(rdev)) { - /* in DP mode, the DP ref clock can come from either PPLL -* depending on the asic: -* DCE3: PPLL1 or PPLL2 -*/ - if (ENCODER_MODE_IS_DP(atombios_get_encoder_mode(radeon_crtc->encoder))) { - /* use the same PPLL for all DP monitors */ - pll = radeon_get_shared_dp_ppll(crtc); - if (pll != ATOM_PPLL_INVALID) - return pll; - } else { - /* use the same PPLL for all monitors with the same clock */ - pll = radeon_get_shared_nondp_ppll(crtc); - if (pll != ATOM_PPLL_INVALID) - return pll; - } - /* all other cases */ - pll_in_use = radeon_get_pll_use_mask(crtc); - /* the order shouldn't matter here, but we probably -* need this until we have atomic modeset -*/ - if (rdev->flags & RADEON_IS_IGP) { - if (!(pll_in_use & (1 << ATOM_PPLL1))) - return ATOM_PPLL1; - if (!(pll_in_use & (1 << ATOM_PPLL2))) - return ATOM_PPLL2; - } else { - if (!(pll_in_use & (1 << ATOM_PPLL2))) - return ATOM_PPLL2; - if (!(pll_in_use & (1 << ATOM_PPLL1))) - return ATOM_PPLL1; - } - DRM_ERROR("unable to allocate a PPLL\n"); - return ATOM_PPLL_INVALID; } else { /* on pre-R5xx asics, the crtc to pll mapping is hardcoded */ + /* some atombios (observed in some DCE2/DCE3) code have a bug, +* the matching btw pll and crtc is done through +* PCLK_CRTC[1|2]_CNTL (0x480/0x484) but atombios code use the +* pll (1 or 2) to select which register to write. ie if using +* pll1 it will use PCLK_CRTC1_CNTL (0x480) and if using pll2 +* it will use PCLK_CRTC2_CNTL (0x484), it then use crtc id to +* choose which value to write. Which is reverse order from +* register logic. So only case that works is when pllid is +* same as crtcid or when both pll and crtc are enabled and +* both use same clock. +* +* So just return crtc id as if crtc and pll were hard linked +* together even if they aren't +*/ + if (radeon_crtc->crtc_id > 1) { + /* crtc other than crtc1 and crtc2 can only be use for +* DP those doesn't need a valid pll to work. +*/ + return ATOM_PPLL_INVALID; + } return radeon_crtc->crtc_id; } } -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2] radeon: fix pll/ctrc mapping on dce2 and dce3 hardware v2
From: Jerome Glisse This fix black screen on resume issue that some people are experiencing. There is a bug in the atombios code regarding pll/crtc mapping. The atombios code reverse the logic for the pll and crtc mapping. v2: DCE3 or DCE2 only have 2 crtc Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/atombios_crtc.c | 48 ++ 1 file changed, 14 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c b/drivers/gpu/drm/radeon/atombios_crtc.c index 3bce029..24d932f 100644 --- a/drivers/gpu/drm/radeon/atombios_crtc.c +++ b/drivers/gpu/drm/radeon/atombios_crtc.c @@ -1696,42 +1696,22 @@ static int radeon_atom_pick_pll(struct drm_crtc *crtc) return ATOM_PPLL2; DRM_ERROR("unable to allocate a PPLL\n"); return ATOM_PPLL_INVALID; - } else if (ASIC_IS_AVIVO(rdev)) { - /* in DP mode, the DP ref clock can come from either PPLL -* depending on the asic: -* DCE3: PPLL1 or PPLL2 -*/ - if (ENCODER_MODE_IS_DP(atombios_get_encoder_mode(radeon_crtc->encoder))) { - /* use the same PPLL for all DP monitors */ - pll = radeon_get_shared_dp_ppll(crtc); - if (pll != ATOM_PPLL_INVALID) - return pll; - } else { - /* use the same PPLL for all monitors with the same clock */ - pll = radeon_get_shared_nondp_ppll(crtc); - if (pll != ATOM_PPLL_INVALID) - return pll; - } - /* all other cases */ - pll_in_use = radeon_get_pll_use_mask(crtc); - /* the order shouldn't matter here, but we probably -* need this until we have atomic modeset -*/ - if (rdev->flags & RADEON_IS_IGP) { - if (!(pll_in_use & (1 << ATOM_PPLL1))) - return ATOM_PPLL1; - if (!(pll_in_use & (1 << ATOM_PPLL2))) - return ATOM_PPLL2; - } else { - if (!(pll_in_use & (1 << ATOM_PPLL2))) - return ATOM_PPLL2; - if (!(pll_in_use & (1 << ATOM_PPLL1))) - return ATOM_PPLL1; - } - DRM_ERROR("unable to allocate a PPLL\n"); - return ATOM_PPLL_INVALID; } else { /* on pre-R5xx asics, the crtc to pll mapping is hardcoded */ + /* some atombios (observed in some DCE2/DCE3) code have a bug, +* the matching btw pll and crtc is done through +* PCLK_CRTC[1|2]_CNTL (0x480/0x484) but atombios code use the +* pll (1 or 2) to select which register to write. ie if using +* pll1 it will use PCLK_CRTC1_CNTL (0x480) and if using pll2 +* it will use PCLK_CRTC2_CNTL (0x484), it then use crtc id to +* choose which value to write. Which is reverse order from +* register logic. So only case that works is when pllid is +* same as crtcid or when both pll and crtc are enabled and +* both use same clock. +* +* So just return crtc id as if crtc and pll were hard linked +* together even if they aren't +*/ return radeon_crtc->crtc_id; } } -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/2] drm/radeon: fix deadlock when bo is associated to different handle
From: Jerome Glisse There is a rare case, that seems to only happen accross suspend/resume cycle, where a bo is associated with several different handle. This lead to a deadlock in ttm buffer reservation path. This could only happen with flinked(globaly exported) object. Userspace should not reopen multiple time a globaly exported object. However the kernel should handle gracefully this corner case and not keep rejecting the userspace command stream. This is the object of this patch. Fix suspend/resume issue where user see following message : [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35! Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_cs.c | 53 ++ 1 file changed, 31 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 41672cc..064e64d 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -54,39 +54,48 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p) return -ENOMEM; } for (i = 0; i < p->nrelocs; i++) { - struct drm_radeon_cs_reloc *r; - + struct drm_radeon_cs_reloc *reloc; + + /* One bo could be associated with several different handle. +* Only happen for flinked bo that are open several time. +* +* FIXME: +* Maybe we should consider an alternative to idr for gem +* object to insure a 1:1 uniq mapping btw handle and gem +* object. +*/ duplicate = false; - r = (struct drm_radeon_cs_reloc *)&chunk->kdata[i*4]; + reloc = (struct drm_radeon_cs_reloc *)&chunk->kdata[i*4]; + p->relocs[i].handle = 0; + p->relocs[i].flags = reloc->flags; + p->relocs[i].gobj = drm_gem_object_lookup(ddev, + p->filp, + reloc->handle); + if (p->relocs[i].gobj == NULL) { + DRM_ERROR("gem object lookup failed 0x%x\n", + reloc->handle); + return -ENOENT; + } + p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj); + p->relocs[i].lobj.bo = p->relocs[i].robj; + p->relocs[i].lobj.wdomain = reloc->write_domain; + p->relocs[i].lobj.rdomain = reloc->read_domains; + p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo; + for (j = 0; j < i; j++) { - if (r->handle == p->relocs[j].handle) { + if (p->relocs[i].lobj.bo == p->relocs[j].lobj.bo) { p->relocs_ptr[i] = &p->relocs[j]; duplicate = true; break; } } + if (!duplicate) { - p->relocs[i].gobj = drm_gem_object_lookup(ddev, - p->filp, - r->handle); - if (p->relocs[i].gobj == NULL) { - DRM_ERROR("gem object lookup failed 0x%x\n", - r->handle); - return -ENOENT; - } p->relocs_ptr[i] = &p->relocs[i]; - p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj); - p->relocs[i].lobj.bo = p->relocs[i].robj; - p->relocs[i].lobj.wdomain = r->write_domain; - p->relocs[i].lobj.rdomain = r->read_domains; - p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo; - p->relocs[i].handle = r->handle; - p->relocs[i].flags = r->flags; + p->relocs[i].handle = reloc->handle; radeon_bo_list_add_object(&p->relocs[i].lobj, &p->validated); - - } else - p->relocs[i].handle = 0; + } } return radeon_bo_list_validate(&p->validated); } -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: track global bo name and always return the same
From: Jerome Glisse To avoid kernel rejecting cs if we return different global name for same bo keep track of global name and always return the same. Seems to fix issue with suspend/resume failing and repeatly printing following message : [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35! There might still be way for a rogue program to trigger this issue. Signed-off-by: Jerome Glisse --- radeon/radeon_bo_gem.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/radeon/radeon_bo_gem.c b/radeon/radeon_bo_gem.c index 265f177..fca0aaf 100644 --- a/radeon/radeon_bo_gem.c +++ b/radeon/radeon_bo_gem.c @@ -47,11 +47,11 @@ #include "radeon_bo_gem.h" #include struct radeon_bo_gem { -struct radeon_bo_int base; -uint32_tname; -int map_count; -atomic_treloc_in_cs; -void *priv_ptr; +struct radeon_bo_intbase; +uint32_tname; +int map_count; +atomic_treloc_in_cs; +void*priv_ptr; }; struct bo_manager_gem { @@ -320,15 +320,21 @@ void *radeon_gem_get_reloc_in_cs(struct radeon_bo *bo) int radeon_gem_get_kernel_name(struct radeon_bo *bo, uint32_t *name) { +struct radeon_bo_gem *bo_gem = (struct radeon_bo_gem*)bo; struct radeon_bo_int *boi = (struct radeon_bo_int *)bo; struct drm_gem_flink flink; int r; +if (bo_gem->name) { +*name = bo_gem->name; +return 0; +} flink.handle = bo->handle; r = drmIoctl(boi->bom->fd, DRM_IOCTL_GEM_FLINK, &flink); if (r) { return r; } +bo_gem->name = flink.name; *name = flink.name; return 0; } -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/ttm: do not try to preserve caching state
From: Jerome Glisse It make no sense to preserve caching state especialy when moving from vram to system. It burden the page allocator to match the vram caching (often WC) which just burn CPU cycle for no good reasons. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/ttm/ttm_bo.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index bf6e4b5..39dcc58 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -896,19 +896,12 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, } static uint32_t ttm_bo_select_caching(struct ttm_mem_type_manager *man, - uint32_t cur_placement, uint32_t proposed_placement) { uint32_t caching = proposed_placement & TTM_PL_MASK_CACHING; uint32_t result = proposed_placement & ~TTM_PL_MASK_CACHING; - /** -* Keep current caching if possible. -*/ - - if ((cur_placement & caching) != 0) - result |= (cur_placement & caching); - else if ((man->default_caching & caching) != 0) + if ((man->default_caching & caching) != 0) result |= man->default_caching; else if ((TTM_PL_FLAG_CACHED & caching) != 0) result |= TTM_PL_FLAG_CACHED; @@ -978,8 +971,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, if (!type_ok) continue; - cur_flags = ttm_bo_select_caching(man, bo->mem.placement, - cur_flags); + cur_flags = ttm_bo_select_caching(man, cur_flags); /* * Use the access and other non-mapping-related flag bits from * the memory placement flags to the current flags @@ -1023,8 +1015,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, &cur_flags)) continue; - cur_flags = ttm_bo_select_caching(man, bo->mem.placement, - cur_flags); + cur_flags = ttm_bo_select_caching(man, cur_flags); /* * Use the access and other non-mapping-related flag bits from * the memory placement flags to the current flags -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC] drm/ttm: add minimum residency constraint for bo eviction
So i spend the day looking at ttm and eviction. The first patch i sent earlier is i believe something that should be merged. This patch however is more about discussing if other people are interested in similar mecanism to be share among driver through ttm. I could otherwise just move its logic to the radeon driver. So the idea of this patch is that we don't want to constantly move object in and out of certain memory pool, mostly VRAM. So it adds a minimum residency time and no object that have been in the given pool for less than this residency time can be moved out. It closely solve regression we are having with radeon since gallium driver change and probably improve some other workload. Statistic i gathered on xonotic/realquake showed that we can have as much as 1GB in each direction (VRAM to system and system to vram) over a second. So we are obviously not saturating the PCIE bandwidth. Profiling shows that 80-90% of the cost of this eviction is in memory allocation/deallocation for the system memory (lot of irqlock, and mostly kernel spending time allocating pages thing 256 000 or more page per second to allocate/deallocate. I used this WIP patch to gather statistic and play with various combination : http://people.freedesktop.org/~glisse/0001-TTM-EVICT-WIP.patch Some numbers with xonotic : 17.369fps stock 3.7 kernel 27.883fps 3.7 kernel + do not preserve caching patch ~ +60% 49.292fps 3.7 kernel + WIP with 500ms residency for all pool and no bo wait for eviction 49.258fps 3.7 kernel + WIP with 500ms residency for all pool and bo wait 48.213fps 3.7 kernel always allowing GTT placement (basicly revent the gallium patch effect) Other design i am thinking of is changing the way radeon handle it's memory and stop trying to revalidate object to different memory pool at each cs, instead i think we should keep a vram lru list probably per process and move bo out of vram according to this lru and following some euristic. So radeon would only move bo into vram when there is room. Other improvement i am thinking of is to reuse GTT memory of object that are moved in for object that are evicted as statistic i gathered showed that it's often close amount that move in and out. But this would require true dma as it would mean scheduling in/out move on page granularity or group of page (write 4 pages from vram to scratch 4pages into sys, write 4 pages of system memory bo to vram 4 pages, write 4pages of vram to the just moved 4pages of system memory ...). Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/ttm: add minimum residency constraint for bo eviction
From: Jerome Glisse This patch add a minimum residency time configurable for each memory pool (VRAM, GTT, ...). Intention is to avoid having a lot of memory eviction from VRAM up to a point where the GPU pretty much spend all it's time moving things in and out. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_ttm.c | 3 +++ drivers/gpu/drm/ttm/ttm_bo.c| 7 +++ include/drm/ttm/ttm_bo_api.h| 1 + include/drm/ttm/ttm_bo_driver.h | 1 + 4 files changed, 12 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 5ebe1b3..88722c4 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -129,11 +129,13 @@ static int radeon_init_mem_type(struct ttm_bo_device *bdev, uint32_t type, switch (type) { case TTM_PL_SYSTEM: /* System memory */ + man->minimum_residency_time_ms = 0; man->flags = TTM_MEMTYPE_FLAG_MAPPABLE; man->available_caching = TTM_PL_MASK_CACHING; man->default_caching = TTM_PL_FLAG_CACHED; break; case TTM_PL_TT: + man->minimum_residency_time_ms = 0; man->func = &ttm_bo_manager_func; man->gpu_offset = rdev->mc.gtt_start; man->available_caching = TTM_PL_MASK_CACHING; @@ -156,6 +158,7 @@ static int radeon_init_mem_type(struct ttm_bo_device *bdev, uint32_t type, break; case TTM_PL_VRAM: /* "On-card" video ram */ + man->minimum_residency_time_ms = 500; man->func = &ttm_bo_manager_func; man->gpu_offset = rdev->mc.vram_start; man->flags = TTM_MEMTYPE_FLAG_FIXED | diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 39dcc58..40476121 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -452,6 +452,7 @@ moved: bo->cur_placement = bo->mem.placement; } else bo->offset = 0; + bo->jiffies = jiffies; return 0; @@ -810,6 +811,12 @@ retry: } bo = list_first_entry(&man->lru, struct ttm_buffer_object, lru); + + if (time_after(jiffies, bo->jiffies) && jiffies_to_msecs(jiffies - bo->jiffies) >= man->minimum_residency_time_ms) { + spin_unlock(&glob->lru_lock); + return -EBUSY; + } + kref_get(&bo->list_kref); if (!list_empty(&bo->ddestroy)) { diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index e8028ad..9e12313 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -275,6 +275,7 @@ struct ttm_buffer_object { unsigned long offset; uint32_t cur_placement; + unsigned long jiffies; struct sg_table *sg; }; diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index d803b92..7f60a18e6 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -280,6 +280,7 @@ struct ttm_mem_type_manager { struct mutex io_reserve_mutex; bool use_io_reserve_lru; bool io_reserve_fastpath; + unsigned long minimum_residency_time_ms; /* * Protected by @io_reserve_mutex: -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: use cached memory when evicting for vram on non agp
From: Jerome Glisse Force the use of cached memory when evicting from vram on non agp hardware. Also force write combine on agp hw. This is to insure the minimum cache type change when allocating memory and improving memory eviction especialy on pci/pcie hw. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_object.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index b91118c..3f9f3bb 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -88,10 +88,20 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) if (domain & RADEON_GEM_DOMAIN_VRAM) rbo->placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_VRAM; - if (domain & RADEON_GEM_DOMAIN_GTT) - rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_TT; - if (domain & RADEON_GEM_DOMAIN_CPU) - rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM; + if (domain & RADEON_GEM_DOMAIN_GTT) { + if (rbo->rdev->flags & RADEON_IS_AGP) { + rbo->placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_TT; + } else { + rbo->placements[c++] = TTM_PL_FLAG_CACHED | TTM_PL_FLAG_TT; + } + } + if (domain & RADEON_GEM_DOMAIN_CPU) { + if (rbo->rdev->flags & RADEON_IS_AGP) { + rbo->placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_TT; + } else { + rbo->placements[c++] = TTM_PL_FLAG_CACHED | TTM_PL_FLAG_TT; + } + } if (!c) rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM; rbo->placement.num_placement = c; -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: fix rare segfault after gpu lockup on r7xx
From: Jerome Glisse If GPU reset fails the gart table ptr might be NULL avoid a kernel segfault in this rare event. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/r600.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index cda280d..0e3a68a 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -843,7 +843,9 @@ void r600_pcie_gart_tlb_flush(struct radeon_device *rdev) * method for them. */ WREG32(HDP_DEBUG1, 0); - tmp = readl((void __iomem *)ptr); + if (ptr) { + tmp = readl((void __iomem *)ptr); + } } else WREG32(R_005480_HDP_MEM_COHERENCY_FLUSH_CNTL, 0x1); -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC] improve memory placement for radeon
So as a followup is 2 patch. The first one just stop trying to move object at each cs ioctl i believe it could be included in 3.7 as it improve performances (especialy with vram change from userspace). The second one implement a vram eviction policy. It's a simple one, buffer used for write operation are more important than buffer used for read operation. Buffer get evicted from vram only if they haven't been use in the last 50ms (so in the last few frames) and only if there is buffer that have been recently use and that could be move into vram. This is mostly were i believe discussion should be, what kind of heuristic would work better than tat. So without first patch and with mesa master xonotic high is at 17fps, with first patch it goes to 40fps, with second patch it goes to 48fps. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2] drm/radeon: do not move bo to different placement at each cs
From: Jerome Glisse The bo creation placement is where the bo will be. Instead of trying to move bo at each command stream let this work to another worker thread that will use more advance heuristic. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h| 1 + drivers/gpu/drm/radeon/radeon_object.c | 17 - 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 8c42d54..0a2664c 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -313,6 +313,7 @@ struct radeon_bo { struct list_headlist; /* Protected by tbo.reserved */ u32 placements[3]; + u32 busy_placements[3]; struct ttm_placementplacement; struct ttm_buffer_objecttbo; struct ttm_bo_kmap_obj kmap; diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 3f9f3bb..e25ae20 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -84,7 +84,6 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) rbo->placement.fpfn = 0; rbo->placement.lpfn = 0; rbo->placement.placement = rbo->placements; - rbo->placement.busy_placement = rbo->placements; if (domain & RADEON_GEM_DOMAIN_VRAM) rbo->placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_VRAM; @@ -105,6 +104,14 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) if (!c) rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM; rbo->placement.num_placement = c; + + c = 0; + rbo->placement.busy_placement = rbo->busy_placements; + if (rbo->rdev->flags & RADEON_IS_AGP) { + rbo->busy_placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_TT; + } else { + rbo->busy_placements[c++] = TTM_PL_FLAG_CACHED | TTM_PL_FLAG_TT; + } rbo->placement.num_busy_placement = c; } @@ -360,17 +367,9 @@ int radeon_bo_list_validate(struct list_head *head) list_for_each_entry(lobj, head, tv.head) { bo = lobj->bo; if (!bo->pin_count) { - domain = lobj->wdomain ? lobj->wdomain : lobj->rdomain; - - retry: - radeon_ttm_placement_from_domain(bo, domain); r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false, false); if (unlikely(r)) { - if (r != -ERESTARTSYS && domain == RADEON_GEM_DOMAIN_VRAM) { - domain |= RADEON_GEM_DOMAIN_GTT; - goto retry; - } return r; } } -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/2] drm/radeon: buffer memory placement work thread WIP
From: Jerome Glisse Use delayed work thread to move buffer out of vram if they haven't been use over some period of time. This allow to make room for buffer that are actively use. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h| 13 ++ drivers/gpu/drm/radeon/radeon_cs.c | 2 +- drivers/gpu/drm/radeon/radeon_device.c | 8 ++ drivers/gpu/drm/radeon/radeon_object.c | 241 - drivers/gpu/drm/radeon/radeon_object.h | 3 +- 5 files changed, 262 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 0a2664c..a2e92da 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -102,6 +102,8 @@ extern int radeon_lockup_timeout; */ #define RADEON_MAX_USEC_TIMEOUT10 /* 100 ms */ #define RADEON_FENCE_JIFFIES_TIMEOUT (HZ / 2) +#define RADEON_PLACEMENT_WORK_MS 500 +#define RADEON_PLACEMENT_MAX_EVICTION 8 /* RADEON_IB_POOL_SIZE must be a power of 2 */ #define RADEON_IB_POOL_SIZE16 #define RADEON_DEBUGFS_MAX_COMPONENTS 32 @@ -311,6 +313,10 @@ struct radeon_bo_va { struct radeon_bo { /* Protected by gem.mutex */ struct list_headlist; + /* Protected by rdev->placement_mutex */ + struct list_headplist; + struct list_head*head; + unsigned long last_use_jiffies; /* Protected by tbo.reserved */ u32 placements[3]; u32 busy_placements[3]; @@ -1523,6 +1529,13 @@ struct radeon_device { struct drm_device *ddev; struct pci_dev *pdev; struct rw_semaphore exclusive_lock; + struct mutexplacement_mutex; + struct list_headwvram_in_list; + struct list_headrvram_in_list; + struct list_headwvram_out_list; + struct list_headrvram_out_list; + struct delayed_work placement_work; + unsigned long vram_in_size; /* ASIC */ union radeon_asic_configconfig; enum radeon_family family; diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 41672cc..e9e90bc 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -88,7 +88,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p) } else p->relocs[i].handle = 0; } - return radeon_bo_list_validate(&p->validated); + return radeon_bo_list_validate(p->rdev, &p->validated); } static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 priority) diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index e2f5f88..0c4c874 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1001,6 +1001,14 @@ int radeon_device_init(struct radeon_device *rdev, init_rwsem(&rdev->pm.mclk_lock); init_rwsem(&rdev->exclusive_lock); init_waitqueue_head(&rdev->irq.vblank_queue); + + mutex_init(&rdev->placement_mutex); + INIT_LIST_HEAD(&rdev->wvram_in_list); + INIT_LIST_HEAD(&rdev->rvram_in_list); + INIT_LIST_HEAD(&rdev->wvram_out_list); + INIT_LIST_HEAD(&rdev->rvram_out_list); + INIT_DELAYED_WORK(&rdev->placement_work, radeon_placement_work_handler); + r = radeon_gem_init(rdev); if (r) return r; diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index e25ae20..f2bcc5f 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -64,6 +64,10 @@ static void radeon_ttm_bo_destroy(struct ttm_buffer_object *tbo) mutex_lock(&bo->rdev->gem.mutex); list_del_init(&bo->list); mutex_unlock(&bo->rdev->gem.mutex); + mutex_lock(&bo->rdev->placement_mutex); + list_del_init(&bo->plist); + bo->head = NULL; + mutex_unlock(&bo->rdev->placement_mutex); radeon_bo_clear_surface_reg(bo); radeon_bo_clear_va(bo); drm_gem_object_release(&bo->gem_base); @@ -153,6 +157,8 @@ int radeon_bo_create(struct radeon_device *rdev, bo->surface_reg = -1; INIT_LIST_HEAD(&bo->list); INIT_LIST_HEAD(&bo->va); + INIT_LIST_HEAD(&bo->plist); + bo->head = NULL; radeon_ttm_placement_from_domain(bo, domain); /* Kernel allocation are uninterruptible */ down_read(&rdev->pm.mclk_lock); @@ -263,8 +269,14 @@ int radeon_bo_pin_restricted(struct radeon_bo *bo, u32 domain, u64 max_offset, if (gpu_addr != NULL) *
[PATCH] drm/radeon: fix amd afusion gpu setup aka sumo v2
From: Jerome Glisse Set the proper number of tile pipe that should be a multiple of pipe depending on the number of se engine. Fix: https://bugs.freedesktop.org/show_bug.cgi?id=56405 https://bugs.freedesktop.org/show_bug.cgi?id=56720 v2: Don't change sumo2 Signed-off-by: Jerome Glisse Cc: sta...@vger.kernel.org --- drivers/gpu/drm/radeon/evergreen.c | 8 drivers/gpu/drm/radeon/evergreend.h | 2 ++ 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 14313ad..b957de1 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -1819,7 +1819,7 @@ static void evergreen_gpu_init(struct radeon_device *rdev) case CHIP_SUMO: rdev->config.evergreen.num_ses = 1; rdev->config.evergreen.max_pipes = 4; - rdev->config.evergreen.max_tile_pipes = 2; + rdev->config.evergreen.max_tile_pipes = 4; if (rdev->pdev->device == 0x9648) rdev->config.evergreen.max_simds = 3; else if ((rdev->pdev->device == 0x9647) || @@ -1842,7 +1842,7 @@ static void evergreen_gpu_init(struct radeon_device *rdev) rdev->config.evergreen.sc_prim_fifo_size = 0x40; rdev->config.evergreen.sc_hiz_tile_fifo_size = 0x30; rdev->config.evergreen.sc_earlyz_tile_fifo_size = 0x130; - gb_addr_config = REDWOOD_GB_ADDR_CONFIG_GOLDEN; + gb_addr_config = SUMO_GB_ADDR_CONFIG_GOLDEN; break; case CHIP_SUMO2: rdev->config.evergreen.num_ses = 1; @@ -1864,7 +1864,7 @@ static void evergreen_gpu_init(struct radeon_device *rdev) rdev->config.evergreen.sc_prim_fifo_size = 0x40; rdev->config.evergreen.sc_hiz_tile_fifo_size = 0x30; rdev->config.evergreen.sc_earlyz_tile_fifo_size = 0x130; - gb_addr_config = REDWOOD_GB_ADDR_CONFIG_GOLDEN; + gb_addr_config = SUMO2_GB_ADDR_CONFIG_GOLDEN; break; case CHIP_BARTS: rdev->config.evergreen.num_ses = 2; @@ -1912,7 +1912,7 @@ static void evergreen_gpu_init(struct radeon_device *rdev) break; case CHIP_CAICOS: rdev->config.evergreen.num_ses = 1; - rdev->config.evergreen.max_pipes = 4; + rdev->config.evergreen.max_pipes = 2; rdev->config.evergreen.max_tile_pipes = 2; rdev->config.evergreen.max_simds = 2; rdev->config.evergreen.max_backends = 1 * rdev->config.evergreen.num_ses; diff --git a/drivers/gpu/drm/radeon/evergreend.h b/drivers/gpu/drm/radeon/evergreend.h index df542f1..52c89c9 100644 --- a/drivers/gpu/drm/radeon/evergreend.h +++ b/drivers/gpu/drm/radeon/evergreend.h @@ -45,6 +45,8 @@ #define TURKS_GB_ADDR_CONFIG_GOLDEN 0x02010002 #define CEDAR_GB_ADDR_CONFIG_GOLDEN 0x02010001 #define CAICOS_GB_ADDR_CONFIG_GOLDEN 0x02010001 +#define SUMO_GB_ADDR_CONFIG_GOLDEN 0x02010002 +#define SUMO2_GB_ADDR_CONFIG_GOLDEN 0x02010002 /* Registers */ -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: fix fence driver for dma ring when wb is disabled
From: Jerome Glisse The dma ring can't write to register thus have to write to memory its fence value. This ensure that it doesn't try to use scratch register for dma ring fence driver. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/r600.c | 3 ++- drivers/gpu/drm/radeon/radeon_fence.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index a76eca1..2aaf147 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2533,11 +2533,12 @@ void r600_dma_fence_ring_emit(struct radeon_device *rdev, { struct radeon_ring *ring = &rdev->ring[fence->ring]; u64 addr = rdev->fence_drv[fence->ring].gpu_addr; + /* write the fence */ radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_FENCE, 0, 0, 0)); radeon_ring_write(ring, addr & 0xfffc); radeon_ring_write(ring, (upper_32_bits(addr) & 0xff)); - radeon_ring_write(ring, fence->seq); + radeon_ring_write(ring, lower_32_bits(fence->seq)); /* generate an interrupt */ radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_TRAP, 0, 0, 0)); } diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 22bd6c2..410a975 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -772,7 +772,7 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring) int r; radeon_scratch_free(rdev, rdev->fence_drv[ring].scratch_reg); - if (rdev->wb.use_event) { + if (rdev->wb.use_event || !radeon_ring_supports_scratch_reg(rdev, &rdev->ring[ring])) { rdev->fence_drv[ring].scratch_reg = 0; index = R600_WB_EVENT_OFFSET + ring * 4; } else { -- 1.8.0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: fix htile buffer size computation for command stream checker
From: Jerome Glisse Fix the size computation of the htile buffer. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen_cs.c | 17 +-- drivers/gpu/drm/radeon/r600_cs.c | 92 --- drivers/gpu/drm/radeon/radeon_drv.c | 3 +- 3 files changed, 35 insertions(+), 77 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 62c2271..fc7e613 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -507,20 +507,28 @@ static int evergreen_cs_track_validate_htile(struct radeon_cs_parser *p, /* height is npipes htiles aligned == npipes * 8 pixel aligned */ nby = round_up(nby, track->npipes * 8); } else { + /* always assume 8x8 htile */ + /* align is htile align * 8, htile align vary according to +* number of pipe and tile width and nby +*/ switch (track->npipes) { case 8: + /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/ nbx = round_up(nbx, 64 * 8); nby = round_up(nby, 64 * 8); break; case 4: + /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/ nbx = round_up(nbx, 64 * 8); nby = round_up(nby, 32 * 8); break; case 2: + /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/ nbx = round_up(nbx, 32 * 8); nby = round_up(nby, 32 * 8); break; case 1: + /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/ nbx = round_up(nbx, 32 * 8); nby = round_up(nby, 16 * 8); break; @@ -531,9 +539,10 @@ static int evergreen_cs_track_validate_htile(struct radeon_cs_parser *p, } } /* compute number of htile */ - nbx = nbx / 8; - nby = nby / 8; - size = nbx * nby * 4; + nbx = nbx >> 3; + nby = nby >> 3; + /* size must be aligned on npipes * 2K boundary */ + size = roundup(nbx * nby * 4, track->npipes * (2 << 10)); size += track->htile_offset; if (size > radeon_bo_size(track->htile_bo)) { @@ -1790,6 +1799,8 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) case DB_HTILE_SURFACE: /* 8x8 only */ track->htile_surface = radeon_get_ib_value(p, idx); + /* force 8x8 htile width and height */ + ib[idx] |= 3; track->db_dirty = true; break; case CB_IMMED0_BASE: diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c index 5d6e7f9..0b4d833 100644 --- a/drivers/gpu/drm/radeon/r600_cs.c +++ b/drivers/gpu/drm/radeon/r600_cs.c @@ -657,87 +657,30 @@ static int r600_cs_track_validate_db(struct radeon_cs_parser *p) /* nby is npipes htiles aligned == npipes * 8 pixel aligned */ nby = round_up(nby, track->npipes * 8); } else { - /* htile widht & nby (8 or 4) make 2 bits number */ - tmp = track->htile_surface & 3; + /* always assume 8x8 htile */ /* align is htile align * 8, htile align vary according to * number of pipe and tile width and nby */ switch (track->npipes) { case 8: - switch (tmp) { - case 3: /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/ - nbx = round_up(nbx, 64 * 8); - nby = round_up(nby, 64 * 8); - break; - case 2: /* HTILE_WIDTH = 4 & HTILE_HEIGHT = 8*/ - case 1: /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 4*/ - nbx = round_up(nbx, 64 * 8); - nby = round_up(nby, 32 * 8); - break; - case 0: /* HTILE_WIDTH = 4 & HTILE_HEIGHT = 4*/ - nbx = round_up(nbx, 32 * 8); - nby = round_up(nby, 32 * 8); - break; - default: - return -EINVAL; - } + /* HTILE_WIDTH = 8 & HTILE_HEIGHT = 8*/ + nbx = round_up(nbx, 64 * 8); + nby = round_up(nby, 64 * 8);
[PATCH] drm/radeon: resume fence driver to last sync sequence on lockup
From: Jerome Glisse After lockup we need to resume fence to last sync sequence and not last received sequence so that all thread waiting on command stream that lockedup resume. Otherwise GPU reset will be ineffective in most cases. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 22bd6c2..38233e7 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -787,7 +787,7 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring) } rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4]; rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index; - radeon_fence_write(rdev, atomic64_read(&rdev->fence_drv[ring].last_seq), ring); + radeon_fence_write(rdev, rdev->fence_drv[ring].sync_seq[ring], ring); rdev->fence_drv[ring].initialized = true; dev_info(rdev->dev, "fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n", ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr); -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: restore modeset late in GPU reset path
From: Jerome Glisse Modeset path seems to conflict sometimes with the memory management leading to kernel deadlock. This move modesetting reset after GPU acceleration reset. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index e2f5f88..ffd5534 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1337,7 +1337,6 @@ retry: } radeon_restore_bios_scratch_regs(rdev); - drm_helper_resume_force_mode(rdev->ddev); if (!r) { for (i = 0; i < RADEON_NUM_RINGS; ++i) { @@ -1362,6 +1361,8 @@ retry: } } + drm_helper_resume_force_mode(rdev->ddev); + ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); if (r) { /* bad news, how to tell it to userspace ? */ -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: don't leave fence blocked process on failed GPU reset
From: Jerome Glisse Force all fence to signal if GPU reset failed so no process get stuck on waiting fence. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h| 1 + drivers/gpu/drm/radeon/radeon_device.c | 1 + drivers/gpu/drm/radeon/radeon_fence.c | 19 +++ 3 files changed, 21 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 5d68346..9c7625c 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -225,6 +225,7 @@ struct radeon_fence { int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring); int radeon_fence_driver_init(struct radeon_device *rdev); void radeon_fence_driver_fini(struct radeon_device *rdev); +void radeon_fence_driver_force_completion(struct radeon_device *rdev); int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence **fence, int ring); void radeon_fence_process(struct radeon_device *rdev, int ring); bool radeon_fence_signaled(struct radeon_fence *fence); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index e2f5f88..774fae7 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1357,6 +1357,7 @@ retry: } } } else { + radeon_fence_driver_force_completion(rdev); for (i = 0; i < RADEON_NUM_RINGS; ++i) { kfree(ring_data[i]); } diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 22bd6c2..bf7b20e 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -868,6 +868,25 @@ void radeon_fence_driver_fini(struct radeon_device *rdev) mutex_unlock(&rdev->ring_lock); } +/** + * radeon_fence_driver_force_completion - force all fence waiter to complete + * + * @rdev: radeon device pointer + * + * In case of GPU reset failure make sure no process keep waiting on fence + * that will never complete. + */ +void radeon_fence_driver_force_completion(struct radeon_device *rdev) +{ + int ring; + + for (ring = 0; ring < RADEON_NUM_RINGS; ring++) { + if (!rdev->fence_drv[ring].initialized) + continue; + radeon_fence_write(rdev, rdev->fence_drv[ring].sync_seq[ring], ring); + } +} + /* * Fence debugfs -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: avoid deadlock in pm path when waiting for fence
From: Jerome Glisse radeon_fence_wait_empty_locked should not trigger GPU reset as no place where it's call from would benefit from such thing and it actually lead to a kernel deadlock in case the reset is triggered from pm codepath. Instead force ring completion in place where it makes sense or return early in others. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h| 2 +- drivers/gpu/drm/radeon/radeon_device.c | 13 +++-- drivers/gpu/drm/radeon/radeon_fence.c | 30 ++ drivers/gpu/drm/radeon/radeon_pm.c | 15 --- 4 files changed, 38 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 9c7625c..071b2d7 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -231,7 +231,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring); bool radeon_fence_signaled(struct radeon_fence *fence); int radeon_fence_wait(struct radeon_fence *fence, bool interruptible); int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring); -void radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); +int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); int radeon_fence_wait_any(struct radeon_device *rdev, struct radeon_fence **fences, bool intr); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 774fae7..53a9223 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1163,6 +1163,7 @@ int radeon_suspend_kms(struct drm_device *dev, pm_message_t state) struct drm_crtc *crtc; struct drm_connector *connector; int i, r; + bool force_completion = false; if (dev == NULL || dev->dev_private == NULL) { return -ENODEV; @@ -1205,8 +1206,16 @@ int radeon_suspend_kms(struct drm_device *dev, pm_message_t state) mutex_lock(&rdev->ring_lock); /* wait for gpu to finish processing current batch */ - for (i = 0; i < RADEON_NUM_RINGS; i++) - radeon_fence_wait_empty_locked(rdev, i); + for (i = 0; i < RADEON_NUM_RINGS; i++) { + r = radeon_fence_wait_empty_locked(rdev, i); + if (r) { + /* delay GPU reset to resume */ + force_completion = true; + } + } + if (force_completion) { + radeon_fence_driver_force_completion(rdev); + } mutex_unlock(&rdev->ring_lock); radeon_save_bios_scratch_regs(rdev); diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index bf7b20e..28c09b6 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -609,26 +609,20 @@ int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring) * Returns 0 if the fences have passed, error for all other cases. * Caller must hold ring lock. */ -void radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring) +int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring) { uint64_t seq = rdev->fence_drv[ring].sync_seq[ring]; + int r; - while(1) { - int r; - r = radeon_fence_wait_seq(rdev, seq, ring, false, false); + r = radeon_fence_wait_seq(rdev, seq, ring, false, false); + if (r) { if (r == -EDEADLK) { - mutex_unlock(&rdev->ring_lock); - r = radeon_gpu_reset(rdev); - mutex_lock(&rdev->ring_lock); - if (!r) - continue; - } - if (r) { - dev_err(rdev->dev, "error waiting for ring to become" - " idle (%d)\n", r); + return -EDEADLK; } - return; + dev_err(rdev->dev, "error waiting for ring[%d] to become idle (%d)\n", + ring, r); } + return 0; } /** @@ -854,13 +848,17 @@ int radeon_fence_driver_init(struct radeon_device *rdev) */ void radeon_fence_driver_fini(struct radeon_device *rdev) { - int ring; + int ring, r; mutex_lock(&rdev->ring_lock); for (ring = 0; ring < RADEON_NUM_RINGS; ring++) { if (!rdev->fence_drv[ring].initialized) continue; - radeon_fence_wait_empty_locked(rdev, ring); + r = radeon_fence_wait_empty_locked(rdev, ring); + if (r) { + /* no need to trigger GPU reset as we are unloading */ + radeon_fence_driver_force_completion(rdev); + } wake_up_all(&rdev->fence_queue); radeon_scratch
[PATCH] drm/radeon: add support for MEM_WRITE packet
From: Jerome Glisse To make it easier to debug some lockup from userspace add support to MEM_WRITE packet. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen_cs.c | 29 + drivers/gpu/drm/radeon/r600_cs.c | 29 + drivers/gpu/drm/radeon/radeon_drv.c | 3 ++- 3 files changed, 60 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 74c6b42..5cea852 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -2654,6 +2654,35 @@ static int evergreen_packet3_check(struct radeon_cs_parser *p, ib[idx+4] = upper_32_bits(offset) & 0xff; } break; + case PACKET3_MEM_WRITE: + { + u64 offset; + + if (pkt->count != 3) { + DRM_ERROR("bad MEM_WRITE (invalid count)\n"); + return -EINVAL; + } + r = evergreen_cs_packet_next_reloc(p, &reloc); + if (r) { + DRM_ERROR("bad MEM_WRITE (missing reloc)\n"); + return -EINVAL; + } + offset = radeon_get_ib_value(p, idx+0); + offset += ((u64)(radeon_get_ib_value(p, idx+1) & 0xff)) << 32UL; + if (offset & 0x7) { + DRM_ERROR("bad MEM_WRITE (address not qwords aligned)\n"); + return -EINVAL; + } + if ((offset + 8) > radeon_bo_size(reloc->robj)) { + DRM_ERROR("bad MEM_WRITE bo too small: 0x%llx, 0x%lx\n", + offset + 8, radeon_bo_size(reloc->robj)); + return -EINVAL; + } + offset += reloc->lobj.gpu_offset; + ib[idx+0] = offset; + ib[idx+1] = upper_32_bits(offset) & 0xff; + break; + } case PACKET3_COPY_DW: if (pkt->count != 4) { DRM_ERROR("bad COPY_DW (invalid count)\n"); diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c index 0be768b..9ea13d0 100644 --- a/drivers/gpu/drm/radeon/r600_cs.c +++ b/drivers/gpu/drm/radeon/r600_cs.c @@ -2294,6 +2294,35 @@ static int r600_packet3_check(struct radeon_cs_parser *p, ib[idx+4] = upper_32_bits(offset) & 0xff; } break; + case PACKET3_MEM_WRITE: + { + u64 offset; + + if (pkt->count != 3) { + DRM_ERROR("bad MEM_WRITE (invalid count)\n"); + return -EINVAL; + } + r = r600_cs_packet_next_reloc(p, &reloc); + if (r) { + DRM_ERROR("bad MEM_WRITE (missing reloc)\n"); + return -EINVAL; + } + offset = radeon_get_ib_value(p, idx+0); + offset += ((u64)(radeon_get_ib_value(p, idx+1) & 0xff)) << 32UL; + if (offset & 0x7) { + DRM_ERROR("bad MEM_WRITE (address not qwords aligned)\n"); + return -EINVAL; + } + if ((offset + 8) > radeon_bo_size(reloc->robj)) { + DRM_ERROR("bad MEM_WRITE bo too small: 0x%llx, 0x%lx\n", + offset + 8, radeon_bo_size(reloc->robj)); + return -EINVAL; + } + offset += reloc->lobj.gpu_offset; + ib[idx+0] = offset; + ib[idx+1] = upper_32_bits(offset) & 0xff; + break; + } case PACKET3_COPY_DW: if (pkt->count != 4) { DRM_ERROR("bad COPY_DW (invalid count)\n"); diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index 9b1a727..ff75934 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -68,9 +68,10 @@ * 2.25.0 - eg+: new info request for num SE and num SH * 2.26.0 - r600-eg: fix htile size computation * 2.27.0 - r600-SI: Add CS ioctl support for async DMA + * 2.28.0 - r600-eg: Add MEM_WRITE packet support */ #define KMS_DRIVER_MAJOR 2 -#define KMS_DRIVER_MINOR 27 +#define KMS_DRIVER_MINOR 28 #define KMS_DRIVER_PATCHLEVEL 0 int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags); int radeon_driver_unload_kms(struct drm_device *dev); -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2] drm/radeon: add debugfs file for dma rings
From: Jerome Glisse Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_ring.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index ebd6956..9410e43 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -794,11 +794,15 @@ static int radeon_debugfs_ring_info(struct seq_file *m, void *data) static int radeon_ring_type_gfx_index = RADEON_RING_TYPE_GFX_INDEX; static int cayman_ring_type_cp1_index = CAYMAN_RING_TYPE_CP1_INDEX; static int cayman_ring_type_cp2_index = CAYMAN_RING_TYPE_CP2_INDEX; +static int radeon_ring_type_dma1_index = R600_RING_TYPE_DMA_INDEX; +static int radeon_ring_type_dma2_index = CAYMAN_RING_TYPE_DMA1_INDEX; static struct drm_info_list radeon_debugfs_ring_info_list[] = { {"radeon_ring_gfx", radeon_debugfs_ring_info, 0, &radeon_ring_type_gfx_index}, {"radeon_ring_cp1", radeon_debugfs_ring_info, 0, &cayman_ring_type_cp1_index}, {"radeon_ring_cp2", radeon_debugfs_ring_info, 0, &cayman_ring_type_cp2_index}, + {"radeon_ring_dma1", radeon_debugfs_ring_info, 0, &radeon_ring_type_dma1_index}, + {"radeon_ring_dma2", radeon_debugfs_ring_info, 0, &radeon_ring_type_dma2_index}, }; static int radeon_debugfs_sa_info(struct seq_file *m, void *data) -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/2] drm/radeon: print dma status reg on lockup
From: Jerome Glisse To help debug dma related lockup. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen.c | 4 drivers/gpu/drm/radeon/evergreend.h | 3 +++ drivers/gpu/drm/radeon/ni.c | 4 drivers/gpu/drm/radeon/nid.h| 1 - drivers/gpu/drm/radeon/r600.c | 4 5 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index f95d7fc..6dc9ee7 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -2331,6 +2331,8 @@ static int evergreen_gpu_soft_reset(struct radeon_device *rdev) RREG32(CP_BUSY_STAT)); dev_info(rdev->dev, " R_008680_CP_STAT = 0x%08X\n", RREG32(CP_STAT)); + dev_info(rdev->dev, " R_00D034_DMA_STATUS_REG = 0x%08X\n", + RREG32(DMA_STATUS_REG)); evergreen_mc_stop(rdev, &save); if (evergreen_mc_wait_for_idle(rdev)) { dev_warn(rdev->dev, "Wait for MC idle timedout !\n"); @@ -2376,6 +2378,8 @@ static int evergreen_gpu_soft_reset(struct radeon_device *rdev) RREG32(CP_BUSY_STAT)); dev_info(rdev->dev, " R_008680_CP_STAT = 0x%08X\n", RREG32(CP_STAT)); + dev_info(rdev->dev, " R_00D034_DMA_STATUS_REG = 0x%08X\n", + RREG32(DMA_STATUS_REG)); evergreen_mc_resume(rdev, &save); return 0; } diff --git a/drivers/gpu/drm/radeon/evergreend.h b/drivers/gpu/drm/radeon/evergreend.h index cb9baaa..f82f98a 100644 --- a/drivers/gpu/drm/radeon/evergreend.h +++ b/drivers/gpu/drm/radeon/evergreend.h @@ -2027,4 +2027,7 @@ /* cayman packet3 addition */ #defineCAYMAN_PACKET3_DEALLOC_STATE0x14 +/* DMA regs common on r6xx/r7xx/evergreen/ni */ +#define DMA_STATUS_REG0xd034 + #endif diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 7bdbcb0..6dae387 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1331,6 +1331,8 @@ static int cayman_gpu_soft_reset(struct radeon_device *rdev) RREG32(CP_BUSY_STAT)); dev_info(rdev->dev, " R_008680_CP_STAT = 0x%08X\n", RREG32(CP_STAT)); + dev_info(rdev->dev, " R_00D034_DMA_STATUS_REG = 0x%08X\n", + RREG32(DMA_STATUS_REG)); dev_info(rdev->dev, " VM_CONTEXT0_PROTECTION_FAULT_ADDR 0x%08X\n", RREG32(0x14F8)); dev_info(rdev->dev, " VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x%08X\n", @@ -1387,6 +1389,8 @@ static int cayman_gpu_soft_reset(struct radeon_device *rdev) RREG32(CP_BUSY_STAT)); dev_info(rdev->dev, " R_008680_CP_STAT = 0x%08X\n", RREG32(CP_STAT)); + dev_info(rdev->dev, " R_00D034_DMA_STATUS_REG = 0x%08X\n", + RREG32(DMA_STATUS_REG)); evergreen_mc_resume(rdev, &save); return 0; } diff --git a/drivers/gpu/drm/radeon/nid.h b/drivers/gpu/drm/radeon/nid.h index b93186b..22a62c6 100644 --- a/drivers/gpu/drm/radeon/nid.h +++ b/drivers/gpu/drm/radeon/nid.h @@ -675,4 +675,3 @@ #defineDMA_PACKET_NOP0xf #endif - diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 2aaf147..4605551 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -1297,6 +1297,8 @@ static int r600_gpu_soft_reset(struct radeon_device *rdev) RREG32(CP_BUSY_STAT)); dev_info(rdev->dev, " R_008680_CP_STAT = 0x%08X\n", RREG32(CP_STAT)); + dev_info(rdev->dev, " R_00D034_DMA_STATUS_REG = 0x%08X\n", + RREG32(DMA_STATUS_REG)); rv515_mc_stop(rdev, &save); if (r600_mc_wait_for_idle(rdev)) { dev_warn(rdev->dev, "Wait for MC idle timedout !\n"); @@ -1348,6 +1350,8 @@ static int r600_gpu_soft_reset(struct radeon_device *rdev) RREG32(CP_BUSY_STAT)); dev_info(rdev->dev, " R_008680_CP_STAT = 0x%08X\n", RREG32(CP_STAT)); + dev_info(rdev->dev, " R_00D034_DMA_STATUS_REG = 0x%08X\n", + RREG32(DMA_STATUS_REG)); rv515_mc_resume(rdev, &save); return 0; } -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2] drm/radeon: improve ring debugfs printing
From: Jerome Glisse Print 32dword before last know rptr as problem most likely comes from previous command. Also small cosmetic change to the printing. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_ring.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 9410e43..141f2b6 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -770,22 +770,28 @@ static int radeon_debugfs_ring_info(struct seq_file *m, void *data) int ridx = *(int*)node->info_ent->data; struct radeon_ring *ring = &rdev->ring[ridx]; unsigned count, i, j; + u32 tmp; radeon_ring_free_size(rdev, ring); count = (ring->ring_size / 4) - ring->ring_free_dw; - seq_printf(m, "wptr(0x%04x): 0x%08x\n", ring->wptr_reg, RREG32(ring->wptr_reg)); - seq_printf(m, "rptr(0x%04x): 0x%08x\n", ring->rptr_reg, RREG32(ring->rptr_reg)); + tmp = RREG32(ring->wptr_reg) >> ring->ptr_reg_shift; + seq_printf(m, "wptr(0x%04x): 0x%08x [%5d]\n", ring->wptr_reg, tmp, tmp); + tmp = RREG32(ring->rptr_reg) >> ring->ptr_reg_shift; + seq_printf(m, "rptr(0x%04x): 0x%08x [%5d]\n", ring->rptr_reg, tmp, tmp); if (ring->rptr_save_reg) { seq_printf(m, "rptr next(0x%04x): 0x%08x\n", ring->rptr_save_reg, RREG32(ring->rptr_save_reg)); } - seq_printf(m, "driver's copy of the wptr: 0x%08x\n", ring->wptr); - seq_printf(m, "driver's copy of the rptr: 0x%08x\n", ring->rptr); + seq_printf(m, "driver's copy of the wptr: 0x%08x [%5d]\n", ring->wptr, ring->wptr); + seq_printf(m, "driver's copy of the rptr: 0x%08x [%5d]\n", ring->rptr, ring->rptr); seq_printf(m, "%u free dwords in ring\n", ring->ring_free_dw); seq_printf(m, "%u dwords in ring\n", count); - i = ring->rptr; - for (j = 0; j <= count; j++) { - seq_printf(m, "r[%04d]=0x%08x\n", i, ring->ring[i]); + /* print 8 dw before current rptr as often it's the last executed +* packet that is the root issue +*/ + i = (ring->rptr + ring->ptr_mask + 1 - 32) & ring->ptr_mask; + for (j = 0; j <= (count + 32); j++) { + seq_printf(m, "r[%5d]=0x%08x\n", i, ring->ring[i]); i = (i + 1) & ring->ptr_mask; } return 0; -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/2] drm/radeon: reset dma engine on gpu reset
From: Jerome Glisse This try to reset the dma engine when performing gpu reset. Hopefully bringing back the gpu dma engine in sane state. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen.c | 30 +- drivers/gpu/drm/radeon/evergreend.h | 10 +- drivers/gpu/drm/radeon/ni.c | 30 +- drivers/gpu/drm/radeon/nid.h| 2 +- drivers/gpu/drm/radeon/r600.c | 28 ++-- 5 files changed, 74 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 6dc9ee7..f92f6bb 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -2309,19 +2309,19 @@ bool evergreen_gpu_is_lockup(struct radeon_device *rdev, struct radeon_ring *rin static int evergreen_gpu_soft_reset(struct radeon_device *rdev) { struct evergreen_mc_save save; - u32 grbm_reset = 0; + u32 grbm_reset = 0, tmp; if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE)) return 0; dev_info(rdev->dev, "GPU softreset \n"); - dev_info(rdev->dev, " GRBM_STATUS=0x%08X\n", + dev_info(rdev->dev, " GRBM_STATUS = 0x%08X\n", RREG32(GRBM_STATUS)); - dev_info(rdev->dev, " GRBM_STATUS_SE0=0x%08X\n", + dev_info(rdev->dev, " GRBM_STATUS_SE0 = 0x%08X\n", RREG32(GRBM_STATUS_SE0)); - dev_info(rdev->dev, " GRBM_STATUS_SE1=0x%08X\n", + dev_info(rdev->dev, " GRBM_STATUS_SE1 = 0x%08X\n", RREG32(GRBM_STATUS_SE1)); - dev_info(rdev->dev, " SRBM_STATUS=0x%08X\n", + dev_info(rdev->dev, " SRBM_STATUS = 0x%08X\n", RREG32(SRBM_STATUS)); dev_info(rdev->dev, " R_008674_CP_STALLED_STAT1 = 0x%08X\n", RREG32(CP_STALLED_STAT1)); @@ -2337,9 +2337,21 @@ static int evergreen_gpu_soft_reset(struct radeon_device *rdev) if (evergreen_mc_wait_for_idle(rdev)) { dev_warn(rdev->dev, "Wait for MC idle timedout !\n"); } + /* Disable CP parsing/prefetching */ WREG32(CP_ME_CNTL, CP_ME_HALT | CP_PFP_HALT); + /* Disable DMA */ + tmp = RREG32(DMA_RB_CNTL); + tmp &= ~DMA_RB_ENABLE; + WREG32(DMA_RB_CNTL, tmp); + + /* Reset dma */ + WREG32(SRBM_SOFT_RESET, SOFT_RESET_DMA); + RREG32(SRBM_SOFT_RESET); + udelay(50); + WREG32(SRBM_SOFT_RESET, 0); + /* reset all the gfx blocks */ grbm_reset = (SOFT_RESET_CP | SOFT_RESET_CB | @@ -2362,13 +2374,13 @@ static int evergreen_gpu_soft_reset(struct radeon_device *rdev) (void)RREG32(GRBM_SOFT_RESET); /* Wait a little for things to settle down */ udelay(50); - dev_info(rdev->dev, " GRBM_STATUS=0x%08X\n", + dev_info(rdev->dev, " GRBM_STATUS = 0x%08X\n", RREG32(GRBM_STATUS)); - dev_info(rdev->dev, " GRBM_STATUS_SE0=0x%08X\n", + dev_info(rdev->dev, " GRBM_STATUS_SE0 = 0x%08X\n", RREG32(GRBM_STATUS_SE0)); - dev_info(rdev->dev, " GRBM_STATUS_SE1=0x%08X\n", + dev_info(rdev->dev, " GRBM_STATUS_SE1 = 0x%08X\n", RREG32(GRBM_STATUS_SE1)); - dev_info(rdev->dev, " SRBM_STATUS=0x%08X\n", + dev_info(rdev->dev, " SRBM_STATUS = 0x%08X\n", RREG32(SRBM_STATUS)); dev_info(rdev->dev, " R_008674_CP_STALLED_STAT1 = 0x%08X\n", RREG32(CP_STALLED_STAT1)); diff --git a/drivers/gpu/drm/radeon/evergreend.h b/drivers/gpu/drm/radeon/evergreend.h index f82f98a..5786a32 100644 --- a/drivers/gpu/drm/radeon/evergreend.h +++ b/drivers/gpu/drm/radeon/evergreend.h @@ -742,8 +742,9 @@ #defineSOFT_RESET_ROM (1 << 14) #defineSOFT_RESET_SEM (1 << 15) #defineSOFT_RESET_VMC (1 << 17) +#defineSOFT_RESET_DMA (1 << 20) #defineSOFT_RESET_TST (1 << 21) -#defineSOFT_RESET_REGBB(1 << 22) +#defineSOFT_RESET_REGBB(1 << 22) #defineSOFT_RESET_ORB (1 << 23) /* display watermarks */ @@ -2028,6 +2029,13 @@ #defineCAYMAN_PACKET3_DEALLOC_STATE0x14 /* DMA regs common on r6xx/r7xx/evergreen/ni */ +#define DMA_RB_CNTL 0xd000 +# define DMA_RB_ENABLE (1 << 0) +# define DMA_RB_SIZE(x) ((x) << 1) /* log2 */ +# define DMA_RB_SWAP_ENABLE (1 << 9) /* 8IN32 */ +# define DMA_RPTR_WRITEBACK_ENABLE (1 << 12) +# define DMA_RPTR_
[PATCH] radeon/kms: force rn50 chip to always report connected on analog output
From: Jerome Glisse Those rn50 chip are often connected to console remoting hw and load detection often fails with those. Just don't try to load detect and report connect. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_legacy_encoders.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c index f5ba224..62cd512 100644 --- a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c +++ b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c @@ -640,6 +640,14 @@ static enum drm_connector_status radeon_legacy_primary_dac_detect(struct drm_enc enum drm_connector_status found = connector_status_disconnected; bool color = true; + /* just don't bother on RN50 those chip are often connected to remoting +* console hw and often we get failure to load detect those. So to make +* everyone happy report the encoder as always connected. +*/ + if (ASIC_IS_RN50(rdev)) { + return connector_status_connected; + } + /* save the regs we need */ vclk_ecp_cntl = RREG32_PLL(RADEON_VCLK_ECP_CNTL); crtc_ext_cntl = RREG32(RADEON_CRTC_EXT_CNTL); -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2] radeon/kms: fix dma relocation checking
From: Jerome Glisse We were checking the index against the size of the relocation buffer instead of against the last index. This fix kernel segfault when userspace submit ill formated command stream/relocation buffer pair. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/r600_cs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c index 9ea13d0..f91919e 100644 --- a/drivers/gpu/drm/radeon/r600_cs.c +++ b/drivers/gpu/drm/radeon/r600_cs.c @@ -2561,16 +2561,16 @@ int r600_dma_cs_next_reloc(struct radeon_cs_parser *p, struct radeon_cs_chunk *relocs_chunk; unsigned idx; + *cs_reloc = NULL; if (p->chunk_relocs_idx == -1) { DRM_ERROR("No relocation chunk !\n"); return -EINVAL; } - *cs_reloc = NULL; relocs_chunk = &p->chunks[p->chunk_relocs_idx]; idx = p->dma_reloc_idx; - if (idx >= relocs_chunk->length_dw) { + if (idx >= p->nrelocs) { DRM_ERROR("Relocs at %d after relocations chunk end %d !\n", - idx, relocs_chunk->length_dw); + idx, p->nrelocs); return -EINVAL; } *cs_reloc = p->relocs_ptr[idx]; -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/2] radeon/kms: cleanup async dma packet checking
From: Jerome Glisse This simplify and cleanup the async dma checking. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen.c| 16 +- drivers/gpu/drm/radeon/evergreen_cs.c | 807 +- drivers/gpu/drm/radeon/evergreend.h | 29 +- 3 files changed, 417 insertions(+), 435 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index f92f6bb..28f8d4f 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3223,14 +3223,14 @@ void evergreen_dma_fence_ring_emit(struct radeon_device *rdev, struct radeon_ring *ring = &rdev->ring[fence->ring]; u64 addr = rdev->fence_drv[fence->ring].gpu_addr; /* write the fence */ - radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_FENCE, 0, 0, 0)); + radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_FENCE, 0, 0)); radeon_ring_write(ring, addr & 0xfffc); radeon_ring_write(ring, (upper_32_bits(addr) & 0xff)); radeon_ring_write(ring, fence->seq); /* generate an interrupt */ - radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_TRAP, 0, 0, 0)); + radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_TRAP, 0, 0)); /* flush HDP */ - radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_SRBM_WRITE, 0, 0, 0)); + radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_SRBM_WRITE, 0, 0)); radeon_ring_write(ring, (0xf << 16) | HDP_MEM_COHERENCY_FLUSH_CNTL); radeon_ring_write(ring, 1); } @@ -3253,7 +3253,7 @@ void evergreen_dma_ring_ib_execute(struct radeon_device *rdev, while ((next_rptr & 7) != 5) next_rptr++; next_rptr += 3; - radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_WRITE, 0, 0, 1)); + radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_WRITE, 0, 1)); radeon_ring_write(ring, ring->next_rptr_gpu_addr & 0xfffc); radeon_ring_write(ring, upper_32_bits(ring->next_rptr_gpu_addr) & 0xff); radeon_ring_write(ring, next_rptr); @@ -3263,8 +3263,8 @@ void evergreen_dma_ring_ib_execute(struct radeon_device *rdev, * Pad as necessary with NOPs. */ while ((ring->wptr & 7) != 5) - radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_NOP, 0, 0, 0)); - radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_INDIRECT_BUFFER, 0, 0, 0)); + radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_NOP, 0, 0)); + radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_INDIRECT_BUFFER, 0, 0)); radeon_ring_write(ring, (ib->gpu_addr & 0xFFE0)); radeon_ring_write(ring, (ib->length_dw << 12) | (upper_32_bits(ib->gpu_addr) & 0xFF)); @@ -3323,7 +3323,7 @@ int evergreen_copy_dma(struct radeon_device *rdev, if (cur_size_in_dw > 0xF) cur_size_in_dw = 0xF; size_in_dw -= cur_size_in_dw; - radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_COPY, 0, 0, cur_size_in_dw)); + radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_COPY, 0, cur_size_in_dw)); radeon_ring_write(ring, dst_offset & 0xfffc); radeon_ring_write(ring, src_offset & 0xfffc); radeon_ring_write(ring, upper_32_bits(dst_offset) & 0xff); @@ -3431,7 +3431,7 @@ static int evergreen_startup(struct radeon_device *rdev) ring = &rdev->ring[R600_RING_TYPE_DMA_INDEX]; r = radeon_ring_init(rdev, ring, ring->ring_size, R600_WB_DMA_RPTR_OFFSET, DMA_RB_RPTR, DMA_RB_WPTR, -2, 0x3fffc, DMA_PACKET(DMA_PACKET_NOP, 0, 0, 0)); +2, 0x3fffc, DMA_PACKET(DMA_PACKET_NOP, 0, 0)); if (r) return r; diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 7a44566..32c07bb 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -2858,16 +2858,6 @@ int evergreen_cs_parse(struct radeon_cs_parser *p) return 0; } -/* - * DMA - */ - -#define GET_DMA_CMD(h) (((h) & 0xf000) >> 28) -#define GET_DMA_COUNT(h) ((h) & 0x000f) -#define GET_DMA_T(h) (((h) & 0x0080) >> 23) -#define GET_DMA_NEW(h) (((h) & 0x0400) >> 26) -#define GET_DMA_MISC(h) (((h) & 0x070) >> 20) - /** * evergreen_dma_cs_parse() - parse the DMA IB * @p: parser structure holding parsing context. @@ -2881,9 +2871,9 @@ int evergreen_dma_cs_parse(struct radeon_cs_parser *p) { struct radeon_cs_chunk *ib_chunk = &p->chunks[p->chunk_ib_idx]; struct radeon_cs_reloc *src_reloc, *dst_reloc, *dst2_reloc; - u32 header, cmd, count, tiled, new_cmd, misc; + u32 header, cmd, count, sub_cmd; volatile u32 *ib = p->ib.ptr; - u32 idx, idx_value; + u32 idx; u64 src_offset, dst_offset, dst2_offset; int r; @@ -
[PATCH] drm/radeon: improve semaphore debugging on lockup
From: Jerome Glisse Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h | 2 ++ drivers/gpu/drm/radeon/radeon_ring.c | 2 ++ drivers/gpu/drm/radeon/radeon_semaphore.c | 4 3 files changed, 8 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 9b9422c..f0bb8d5 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -649,6 +649,8 @@ struct radeon_ring { u32 ptr_reg_mask; u32 nop; u32 idx; + u64 last_semaphore_signal_addr; + u64 last_semaphore_wait_addr; }; /* diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 141f2b6..2430d80 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -784,6 +784,8 @@ static int radeon_debugfs_ring_info(struct seq_file *m, void *data) } seq_printf(m, "driver's copy of the wptr: 0x%08x [%5d]\n", ring->wptr, ring->wptr); seq_printf(m, "driver's copy of the rptr: 0x%08x [%5d]\n", ring->rptr, ring->rptr); + seq_printf(m, "last semaphore signal addr : 0x%016llx\n", ring->last_semaphore_signal_addr); + seq_printf(m, "last semaphore wait addr : 0x%016llx\n", ring->last_semaphore_wait_addr); seq_printf(m, "%u free dwords in ring\n", ring->ring_free_dw); seq_printf(m, "%u dwords in ring\n", count); /* print 8 dw before current rptr as often it's the last executed diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c index 97f3ece..8dcc20f 100644 --- a/drivers/gpu/drm/radeon/radeon_semaphore.c +++ b/drivers/gpu/drm/radeon/radeon_semaphore.c @@ -95,6 +95,10 @@ int radeon_semaphore_sync_rings(struct radeon_device *rdev, /* we assume caller has already allocated space on waiters ring */ radeon_semaphore_emit_wait(rdev, waiter, semaphore); + /* for debugging lockup only, used by sysfs debug files */ + rdev->ring[signaler].last_semaphore_signal_addr = semaphore->gpu_addr; + rdev->ring[waiter].last_semaphore_wait_addr = semaphore->gpu_addr; + return 0; } -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: fix cursor corruption on aruba and newer
From: Jerome Glisse Aruba and newer gpu does not need the avivo cursor work around, quite the opposite this work around lead to corruption. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_cursor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_cursor.c b/drivers/gpu/drm/radeon/radeon_cursor.c index ad6df62..30f71cc 100644 --- a/drivers/gpu/drm/radeon/radeon_cursor.c +++ b/drivers/gpu/drm/radeon/radeon_cursor.c @@ -241,7 +241,7 @@ int radeon_crtc_cursor_move(struct drm_crtc *crtc, y = 0; } - if (ASIC_IS_AVIVO(rdev)) { + if (ASIC_IS_AVIVO(rdev) && (rdev->family < CHIP_ARUBA)) { int i = 0; struct drm_crtc *crtc_p; -- 1.7.11.7 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: avoid turning off spread spectrum for used pll
From: Jerome Glisse If spread spectrum is enabled and in use for a given pll we should not turn it off as it will lead to turning off display for crtc that use the pll (this behavior was observed on chelsea edp). Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/atombios_crtc.c | 25 + 1 files changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c b/drivers/gpu/drm/radeon/atombios_crtc.c index c6fcb5b..cb18813 100644 --- a/drivers/gpu/drm/radeon/atombios_crtc.c +++ b/drivers/gpu/drm/radeon/atombios_crtc.c @@ -444,11 +444,28 @@ union atom_enable_ss { static void atombios_crtc_program_ss(struct radeon_device *rdev, int enable, int pll_id, +int crtc_id, struct radeon_atom_ss *ss) { + unsigned i; int index = GetIndexIntoMasterTable(COMMAND, EnableSpreadSpectrumOnPPLL); union atom_enable_ss args; + if (!enable) { + for (i = 0; i < 6; i++) { + if (rdev->mode_info.crtcs[i] && + rdev->mode_info.crtcs[i]->enabled && + i != crtc_id && + pll_id == rdev->mode_info.crtcs[i]->pll_id) { + /* one other crtc is using this pll don't turn +* off spread spectrum as it might turn off +* display on active crtc +*/ + return; + } + } + } + memset(&args, 0, sizeof(args)); if (ASIC_IS_DCE5(rdev)) { @@ -1028,7 +1045,7 @@ static void atombios_crtc_set_pll(struct drm_crtc *crtc, struct drm_display_mode radeon_compute_pll_legacy(pll, adjusted_clock, &pll_clock, &fb_div, &frac_fb_div, &ref_div, &post_div); - atombios_crtc_program_ss(rdev, ATOM_DISABLE, radeon_crtc->pll_id, &ss); + atombios_crtc_program_ss(rdev, ATOM_DISABLE, radeon_crtc->pll_id, radeon_crtc->crtc_id, &ss); atombios_crtc_program_pll(crtc, radeon_crtc->crtc_id, radeon_crtc->pll_id, encoder_mode, radeon_encoder->encoder_id, mode->clock, @@ -1051,7 +1068,7 @@ static void atombios_crtc_set_pll(struct drm_crtc *crtc, struct drm_display_mode ss.step = step_size; } - atombios_crtc_program_ss(rdev, ATOM_ENABLE, radeon_crtc->pll_id, &ss); + atombios_crtc_program_ss(rdev, ATOM_ENABLE, radeon_crtc->pll_id, radeon_crtc->crtc_id, &ss); } } @@ -1572,11 +1589,11 @@ void radeon_atom_disp_eng_pll_init(struct radeon_device *rdev) ASIC_INTERNAL_SS_ON_DCPLL, rdev->clock.default_dispclk); if (ss_enabled) - atombios_crtc_program_ss(rdev, ATOM_DISABLE, ATOM_DCPLL, &ss); + atombios_crtc_program_ss(rdev, ATOM_DISABLE, ATOM_DCPLL, -1, &ss); /* XXX: DCE5, make sure voltage, dispclk is high enough */ atombios_crtc_set_disp_eng_pll(rdev, rdev->clock.default_dispclk); if (ss_enabled) - atombios_crtc_program_ss(rdev, ATOM_ENABLE, ATOM_DCPLL, &ss); + atombios_crtc_program_ss(rdev, ATOM_ENABLE, ATOM_DCPLL, -1, &ss); } } -- 1.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: force dma32 on rs400, rs690, rs740 IGP
From: Jerome Glisse It seems some of those IGP dislike non dma32 page. https://bugzilla.redhat.com/show_bug.cgi?id=785375 Signed-off-by: Jerome Glisse Cc: --- drivers/gpu/drm/radeon/radeon_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 066c98b..8867400 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -774,7 +774,7 @@ int radeon_device_init(struct radeon_device *rdev, if (rdev->flags & RADEON_IS_AGP) rdev->need_dma32 = true; if ((rdev->flags & RADEON_IS_PCI) && - (rdev->family < CHIP_RS400)) + (rdev->family <= CHIP_RS740)) rdev->need_dma32 = true; dma_bits = rdev->need_dma32 ? 32 : 40; -- 1.7.11.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: force dma32 to fix regression rs4xx,rs6xx,rs740
From: Jerome Glisse It seems some of those IGP dislike non dma32 page despite what documentation says. Fix regression since we allowed non dma32 pages. It seems it only affect some revision of those IGP chips as we don't know which one just force dma32 for all of them. https://bugzilla.redhat.com/show_bug.cgi?id=785375 Signed-off-by: Jerome Glisse Cc: --- drivers/gpu/drm/radeon/radeon_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 066c98b..8867400 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -774,7 +774,7 @@ int radeon_device_init(struct radeon_device *rdev, if (rdev->flags & RADEON_IS_AGP) rdev->need_dma32 = true; if ((rdev->flags & RADEON_IS_PCI) && - (rdev->family < CHIP_RS400)) + (rdev->family <= CHIP_RS740)) rdev->need_dma32 = true; dma_bits = rdev->need_dma32 ? 32 : 40; -- 1.7.11.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/ttm: Pass the buffer object on backend creation
From: Jerome Glisse In case of multiple page table for GART, driver want to know which buffer object is being bind/unbind. This allow driver to bind/unbind buffer object from several different GART. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/nouveau/nouveau_bo.c |3 ++- drivers/gpu/drm/radeon/radeon_ttm.c| 11 +++ drivers/gpu/drm/ttm/ttm_bo.c |4 ++-- drivers/gpu/drm/ttm/ttm_tt.c |9 ++--- drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 ++- include/drm/ttm/ttm_bo_driver.h|5 - 6 files changed, 23 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 890d50e..9f65371 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -344,7 +344,8 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val) } static struct ttm_backend * -nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev) +nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev, + struct ttm_buffer_object *bo) { struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev); struct drm_device *dev = dev_priv->dev; diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 0b5468b..0bad266 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -114,10 +114,12 @@ static void radeon_ttm_global_fini(struct radeon_device *rdev) } } -struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev); +struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev, + struct ttm_buffer_object *bo); static struct ttm_backend* -radeon_create_ttm_backend_entry(struct ttm_bo_device *bdev) +radeon_create_ttm_backend_entry(struct ttm_bo_device *bdev, + struct ttm_buffer_object *bo) { struct radeon_device *rdev; @@ -128,7 +130,7 @@ radeon_create_ttm_backend_entry(struct ttm_bo_device *bdev) } else #endif { - return radeon_ttm_backend_create(rdev); + return radeon_ttm_backend_create(rdev, bo); } } @@ -778,7 +780,8 @@ static struct ttm_backend_func radeon_backend_func = { .destroy = &radeon_ttm_backend_destroy, }; -struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev) +struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev, + struct ttm_buffer_object *bo) { struct radeon_ttm_backend *gtt; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index ef06194..fe957e7 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -337,13 +337,13 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, bool zero_alloc) if (zero_alloc) page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC; case ttm_bo_type_kernel: - bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT, + bo->ttm = ttm_tt_create(bdev, bo, bo->num_pages << PAGE_SHIFT, page_flags, glob->dummy_read_page); if (unlikely(bo->ttm == NULL)) ret = -ENOMEM; break; case ttm_bo_type_user: - bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT, + bo->ttm = ttm_tt_create(bdev, bo, bo->num_pages << PAGE_SHIFT, page_flags | TTM_PAGE_FLAG_USER, glob->dummy_read_page); if (unlikely(bo->ttm == NULL)) { diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 58c271e..202e16e 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -379,8 +379,11 @@ int ttm_tt_set_user(struct ttm_tt *ttm, return 0; } -struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, -uint32_t page_flags, struct page *dummy_read_page) +struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, +struct ttm_buffer_object *bo, +unsigned long size, +uint32_t page_flags, +struct page *dummy_read_page) { struct ttm_bo_driver *bo_driver = bdev->driver; struct ttm_tt *ttm; @@ -407,7 +410,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, printk(KERN_ERR TTM_PFX "Failed allocating page table\n"); return NULL; } - ttm->be = bo_driver->create_ttm_backend_entry(bdev); + ttm->be = bo_driver->create_ttm_backend_entry(bdev, bo); if (!ttm->be) { ttm_tt_destroy(ttm); printk(KERN_ERR
[PATCH] drm/radeon/kms: consolidate GART code, fix memory fault after GPU lockup
From: Jerome Glisse After GPU lockup VRAM gart table is unpinned and thus its pointer becomes unvalid. This patch move the unpin code to a common helper function and set pointer to NULL so that page update code can check if it should update GPU page table or not. That way bo still bound to GART can be unbound (pci_unmap_page for all there page) properly while there is no need to update the GPU page table. Signed-off-by: Jerome Glisse cc: sta...@kernel.org --- drivers/gpu/drm/radeon/evergreen.c | 12 +- drivers/gpu/drm/radeon/ni.c | 13 +-- drivers/gpu/drm/radeon/r100.c|6 ++- drivers/gpu/drm/radeon/r300.c| 16 ++-- drivers/gpu/drm/radeon/r600.c| 17 +++-- drivers/gpu/drm/radeon/radeon.h | 22 +++- drivers/gpu/drm/radeon/radeon_gart.c | 66 - drivers/gpu/drm/radeon/rs400.c |5 ++- drivers/gpu/drm/radeon/rs600.c | 16 ++-- drivers/gpu/drm/radeon/rv770.c | 13 ++- 10 files changed, 72 insertions(+), 114 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index c4ffa14f..fe5cf3e 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -893,7 +893,7 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev) u32 tmp; int r; - if (rdev->gart.table.vram.robj == NULL) { + if (rdev->gart.robj == NULL) { dev_err(rdev->dev, "No VRAM object for PCIE GART.\n"); return -EINVAL; } @@ -942,7 +942,6 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev) void evergreen_pcie_gart_disable(struct radeon_device *rdev) { u32 tmp; - int r; /* Disable all tables */ WREG32(VM_CONTEXT0_CNTL, 0); @@ -962,14 +961,7 @@ void evergreen_pcie_gart_disable(struct radeon_device *rdev) WREG32(MC_VM_MB_L1_TLB1_CNTL, tmp); WREG32(MC_VM_MB_L1_TLB2_CNTL, tmp); WREG32(MC_VM_MB_L1_TLB3_CNTL, tmp); - if (rdev->gart.table.vram.robj) { - r = radeon_bo_reserve(rdev->gart.table.vram.robj, false); - if (likely(r == 0)) { - radeon_bo_kunmap(rdev->gart.table.vram.robj); - radeon_bo_unpin(rdev->gart.table.vram.robj); - radeon_bo_unreserve(rdev->gart.table.vram.robj); - } - } + radeon_gart_table_vram_unpin(rdev); } void evergreen_pcie_gart_fini(struct radeon_device *rdev) diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 8c79ca9..529aaee 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -931,7 +931,7 @@ int cayman_pcie_gart_enable(struct radeon_device *rdev) { int r; - if (rdev->gart.table.vram.robj == NULL) { + if (rdev->gart.robj == NULL) { dev_err(rdev->dev, "No VRAM object for PCIE GART.\n"); return -EINVAL; } @@ -973,8 +973,6 @@ int cayman_pcie_gart_enable(struct radeon_device *rdev) void cayman_pcie_gart_disable(struct radeon_device *rdev) { - int r; - /* Disable all tables */ WREG32(VM_CONTEXT0_CNTL, 0); WREG32(VM_CONTEXT1_CNTL, 0); @@ -990,14 +988,7 @@ void cayman_pcie_gart_disable(struct radeon_device *rdev) WREG32(VM_L2_CNTL2, 0); WREG32(VM_L2_CNTL3, L2_CACHE_BIGK_ASSOCIATIVITY | L2_CACHE_BIGK_FRAGMENT_SIZE(6)); - if (rdev->gart.table.vram.robj) { - r = radeon_bo_reserve(rdev->gart.table.vram.robj, false); - if (likely(r == 0)) { - radeon_bo_kunmap(rdev->gart.table.vram.robj); - radeon_bo_unpin(rdev->gart.table.vram.robj); - radeon_bo_unreserve(rdev->gart.table.vram.robj); - } - } + radeon_gart_table_vram_unpin(rdev); } void cayman_pcie_gart_fini(struct radeon_device *rdev) diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index 7fcdbbb..8ad6769 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -474,7 +474,7 @@ int r100_pci_gart_init(struct radeon_device *rdev) { int r; - if (rdev->gart.table.ram.ptr) { + if (rdev->gart.ptr) { WARN(1, "R100 PCI GART already initialized\n"); return 0; } @@ -530,10 +530,12 @@ void r100_pci_gart_disable(struct radeon_device *rdev) int r100_pci_gart_set_page(struct radeon_device *rdev, int i, uint64_t addr) { + u32 *gtt = rdev->gart.ptr; + if (i < 0 || i > rdev->gart.num_gpu_pages) { return -EINVAL; } - rdev->gart.table.ram.ptr[i] = cpu_to_le32(lower_32_bits(addr)); + gtt[i] = cpu_to_le32(lower_32_bits(addr)); return 0; } diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c index 55a7f19..6c62d88 100644 --- a/drivers
[PATCH] drm/radeon: avoid bouncing connector status btw disconnected & unknown
From: Jerome Glisse Since force handling rework of d0d0a225e6ad43314c9aa7ea081f76adc5098ad4 we could end up bouncing connector status btw disconnected and unknown. When connector status change a call to output_poll_changed happen which in turn ask again for detect but with force set. So set the load detect flags whenever we report the connector as connected or unknown this avoid bouncing btw disconnected and unknown. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_connectors.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c index dec6cbe..ff6a2e0 100644 --- a/drivers/gpu/drm/radeon/radeon_connectors.c +++ b/drivers/gpu/drm/radeon/radeon_connectors.c @@ -764,7 +764,7 @@ radeon_vga_detect(struct drm_connector *connector, bool force) if (radeon_connector->dac_load_detect && encoder) { encoder_funcs = encoder->helper_private; ret = encoder_funcs->detect(encoder, connector); - if (ret == connector_status_connected) + if (ret != connector_status_disconnected) radeon_connector->detected_by_load = true; } } @@ -1005,8 +1005,9 @@ radeon_dvi_detect(struct drm_connector *connector, bool force) ret = encoder_funcs->detect(encoder, connector); if (ret == connector_status_connected) { radeon_connector->use_digital = false; - radeon_connector->detected_by_load = true; } + if (ret != connector_status_disconnected) + radeon_connector->detected_by_load = true; } break; } -- 1.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: flush read cache for gtt with fence on r6xx and newer GPU V2
From: Jerome Glisse Cayman seems to be particularly sensitive to read cache returning old data after bind/unbind to GTT. Flush read cache for GTT range with each fences for all new hw. Should fix several rendering glitches. Like V2 flush whole address space https://bugs.freedesktop.org/show_bug.cgi?id=40221 https://bugs.freedesktop.org/show_bug.cgi?id=38022 https://bugzilla.redhat.com/show_bug.cgi?id=738790 Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen_blit_kms.c |4 ++-- drivers/gpu/drm/radeon/r600.c | 12 drivers/gpu/drm/radeon/r600_blit_kms.c |4 ++-- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c b/drivers/gpu/drm/radeon/evergreen_blit_kms.c index dcf11bb..e9aeeed 100644 --- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c +++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c @@ -613,9 +613,9 @@ int evergreen_blit_init(struct radeon_device *rdev) rdev->r600_blit.primitives.set_default_state = set_default_state; rdev->r600_blit.ring_size_common = 55; /* shaders + def state */ - rdev->r600_blit.ring_size_common += 10; /* fence emit for VB IB */ + rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */ rdev->r600_blit.ring_size_common += 5; /* done copy */ - rdev->r600_blit.ring_size_common += 10; /* fence emit for done copy */ + rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */ rdev->r600_blit.ring_size_per_loop = 74; diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 12470b0..983808a 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2331,6 +2331,12 @@ void r600_fence_ring_emit(struct radeon_device *rdev, if (rdev->wb.use_event) { u64 addr = rdev->wb.gpu_addr + R600_WB_EVENT_OFFSET + (u64)(rdev->fence_drv.scratch_reg - rdev->scratch.reg_base); + /* flush read cache over gart */ + radeon_ring_write(rdev, PACKET3(PACKET3_SURFACE_SYNC, 3)); + radeon_ring_write(rdev, PACKET3_TC_ACTION_ENA | PACKET3_VC_ACTION_ENA); + radeon_ring_write(rdev, 0x); + radeon_ring_write(rdev, 0); + radeon_ring_write(rdev, 10); /* poll interval */ /* EVENT_WRITE_EOP - flush caches, send int */ radeon_ring_write(rdev, PACKET3(PACKET3_EVENT_WRITE_EOP, 4)); radeon_ring_write(rdev, EVENT_TYPE(CACHE_FLUSH_AND_INV_EVENT_TS) | EVENT_INDEX(5)); @@ -2339,6 +2345,12 @@ void r600_fence_ring_emit(struct radeon_device *rdev, radeon_ring_write(rdev, fence->seq); radeon_ring_write(rdev, 0); } else { + /* flush read cache over gart */ + radeon_ring_write(rdev, PACKET3(PACKET3_SURFACE_SYNC, 3)); + radeon_ring_write(rdev, PACKET3_TC_ACTION_ENA | PACKET3_VC_ACTION_ENA); + radeon_ring_write(rdev, 0x); + radeon_ring_write(rdev, 0); + radeon_ring_write(rdev, 10); /* poll interval */ radeon_ring_write(rdev, PACKET3(PACKET3_EVENT_WRITE, 0)); radeon_ring_write(rdev, EVENT_TYPE(CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0)); /* wait for 3D idle clean */ diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index c4cf130..36e62f2 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -500,9 +500,9 @@ int r600_blit_init(struct radeon_device *rdev) rdev->r600_blit.primitives.set_default_state = set_default_state; rdev->r600_blit.ring_size_common = 40; /* shaders + def state */ - rdev->r600_blit.ring_size_common += 10; /* fence emit for VB IB */ + rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */ rdev->r600_blit.ring_size_common += 5; /* done copy */ - rdev->r600_blit.ring_size_common += 10; /* fence emit for done copy */ + rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */ rdev->r600_blit.ring_size_per_loop = 76; /* set_render_target emits 2 extra dwords on rv6xx */ -- 1.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: flush read cache for gtt with fence on r6xx and newer GPU V3
From: Jerome Glisse Cayman seems to be particularly sensitive to read cache returning old data after bind/unbind to GTT. Flush read cache for GTT range with each fences for all new hw. Should fix several rendering glitches. Like V2 flush whole address space V3 also flush shader read cache https://bugs.freedesktop.org/show_bug.cgi?id=40221 https://bugs.freedesktop.org/show_bug.cgi?id=38022 https://bugzilla.redhat.com/show_bug.cgi?id=738790 Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen_blit_kms.c |4 ++-- drivers/gpu/drm/radeon/r600.c | 16 drivers/gpu/drm/radeon/r600_blit_kms.c |4 ++-- 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c b/drivers/gpu/drm/radeon/evergreen_blit_kms.c index dcf11bb..e9aeeed 100644 --- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c +++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c @@ -613,9 +613,9 @@ int evergreen_blit_init(struct radeon_device *rdev) rdev->r600_blit.primitives.set_default_state = set_default_state; rdev->r600_blit.ring_size_common = 55; /* shaders + def state */ - rdev->r600_blit.ring_size_common += 10; /* fence emit for VB IB */ + rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */ rdev->r600_blit.ring_size_common += 5; /* done copy */ - rdev->r600_blit.ring_size_common += 10; /* fence emit for done copy */ + rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */ rdev->r600_blit.ring_size_per_loop = 74; diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 12470b0..1f007ad 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2331,6 +2331,14 @@ void r600_fence_ring_emit(struct radeon_device *rdev, if (rdev->wb.use_event) { u64 addr = rdev->wb.gpu_addr + R600_WB_EVENT_OFFSET + (u64)(rdev->fence_drv.scratch_reg - rdev->scratch.reg_base); + /* flush read cache over gart */ + radeon_ring_write(rdev, PACKET3(PACKET3_SURFACE_SYNC, 3)); + radeon_ring_write(rdev, PACKET3_TC_ACTION_ENA | + PACKET3_VC_ACTION_ENA | + PACKET3_SH_ACTION_ENA); + radeon_ring_write(rdev, 0x); + radeon_ring_write(rdev, 0); + radeon_ring_write(rdev, 10); /* poll interval */ /* EVENT_WRITE_EOP - flush caches, send int */ radeon_ring_write(rdev, PACKET3(PACKET3_EVENT_WRITE_EOP, 4)); radeon_ring_write(rdev, EVENT_TYPE(CACHE_FLUSH_AND_INV_EVENT_TS) | EVENT_INDEX(5)); @@ -2339,6 +2347,14 @@ void r600_fence_ring_emit(struct radeon_device *rdev, radeon_ring_write(rdev, fence->seq); radeon_ring_write(rdev, 0); } else { + /* flush read cache over gart */ + radeon_ring_write(rdev, PACKET3(PACKET3_SURFACE_SYNC, 3)); + radeon_ring_write(rdev, PACKET3_TC_ACTION_ENA | + PACKET3_VC_ACTION_ENA | + PACKET3_SH_ACTION_ENA); + radeon_ring_write(rdev, 0x); + radeon_ring_write(rdev, 0); + radeon_ring_write(rdev, 10); /* poll interval */ radeon_ring_write(rdev, PACKET3(PACKET3_EVENT_WRITE, 0)); radeon_ring_write(rdev, EVENT_TYPE(CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0)); /* wait for 3D idle clean */ diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index c4cf130..36e62f2 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -500,9 +500,9 @@ int r600_blit_init(struct radeon_device *rdev) rdev->r600_blit.primitives.set_default_state = set_default_state; rdev->r600_blit.ring_size_common = 40; /* shaders + def state */ - rdev->r600_blit.ring_size_common += 10; /* fence emit for VB IB */ + rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */ rdev->r600_blit.ring_size_common += 5; /* done copy */ - rdev->r600_blit.ring_size_common += 10; /* fence emit for done copy */ + rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */ rdev->r600_blit.ring_size_per_loop = 76; /* set_render_target emits 2 extra dwords on rv6xx */ -- 1.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: set hpd polarity at init time so hotplug detect works
From: Jerome Glisse Polarity needs to be set accordingly to connector status (connected or disconnected). Set it up at module init so first hotplug works reliably no matter what is the initial set of connector. Signed-off-by: Jerome Glisse cc: sta...@kernel.org --- drivers/gpu/drm/radeon/radeon_connectors.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c index dec6cbe..bfdd48b 100644 --- a/drivers/gpu/drm/radeon/radeon_connectors.c +++ b/drivers/gpu/drm/radeon/radeon_connectors.c @@ -1789,6 +1789,7 @@ radeon_add_atom_connector(struct drm_device *dev, connector->polled = DRM_CONNECTOR_POLL_CONNECT; } else connector->polled = DRM_CONNECTOR_POLL_HPD; + radeon_hpd_set_polarity(rdev, radeon_connector->hpd.hpd); connector->display_info.subpixel_order = subpixel_order; drm_sysfs_connector_add(connector); -- 1.7.6.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC] ttm merge ttm_backend & ttm_t V2
Hi, So attached is last batch of patch, i split the ttm put page fix and i fixed a bug in the pages alloc when clear flags wasn't set. I tested them on a bunch of radeon and everythings seems fine (several gl app, firefox, compositor ...). I will do more testing on agp and nouveau tomorrow. The last patch add callback for populating and unpopulating (better name if any welcome) a ttm_tt. Allowing the driver to choose btw different choice, idea is that Konrad dma allocator would provide helper function the driver can call. I choosed to allocate all page at once because ttm_tt object are meant to be bind and thus to be fully populated in their lifetime (vmwgfx might be different in this regard). It simplify code in several place. I didn't see any performances impact in the few gl benchmark i ran. Konrad so i am planning on rebasing the last 4 patch of your patchset on top of that. They will likely shrink in size a bit. Cheers, Jerome Glisse ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/8] drm/ttm: remove userspace backed ttm object support
From: Jerome Glisse This was never use in none of the driver, properly using userspace page for bo would need more code (vma interaction mostly). Removing this dead code in preparation of ttm_tt & backend merge. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_bo.c| 22 drivers/gpu/drm/ttm/ttm_tt.c| 105 +-- include/drm/ttm/ttm_bo_api.h|5 -- include/drm/ttm/ttm_bo_driver.h | 24 - 4 files changed, 1 insertions(+), 155 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 617b646..4bde335 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -342,22 +342,6 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, bool zero_alloc) if (unlikely(bo->ttm == NULL)) ret = -ENOMEM; break; - case ttm_bo_type_user: - bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT, - page_flags | TTM_PAGE_FLAG_USER, - glob->dummy_read_page); - if (unlikely(bo->ttm == NULL)) { - ret = -ENOMEM; - break; - } - - ret = ttm_tt_set_user(bo->ttm, current, - bo->buffer_start, bo->num_pages); - if (unlikely(ret != 0)) { - ttm_tt_destroy(bo->ttm); - bo->ttm = NULL; - } - break; default: printk(KERN_ERR TTM_PFX "Illegal buffer object type\n"); ret = -EINVAL; @@ -907,16 +891,12 @@ static uint32_t ttm_bo_select_caching(struct ttm_mem_type_manager *man, } static bool ttm_bo_mt_compatible(struct ttm_mem_type_manager *man, -bool disallow_fixed, uint32_t mem_type, uint32_t proposed_placement, uint32_t *masked_placement) { uint32_t cur_flags = ttm_bo_type_flags(mem_type); - if ((man->flags & TTM_MEMTYPE_FLAG_FIXED) && disallow_fixed) - return false; - if ((cur_flags & proposed_placement & TTM_PL_MASK_MEM) == 0) return false; @@ -961,7 +941,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, man = &bdev->man[mem_type]; type_ok = ttm_bo_mt_compatible(man, - bo->type == ttm_bo_type_user, mem_type, placement->placement[i], &cur_flags); @@ -1009,7 +988,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, if (!man->has_type) continue; if (!ttm_bo_mt_compatible(man, - bo->type == ttm_bo_type_user, mem_type, placement->busy_placement[i], &cur_flags)) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 58c271e..82a1161 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -62,43 +62,6 @@ static void ttm_tt_free_page_directory(struct ttm_tt *ttm) ttm->dma_address = NULL; } -static void ttm_tt_free_user_pages(struct ttm_tt *ttm) -{ - int write; - int dirty; - struct page *page; - int i; - struct ttm_backend *be = ttm->be; - - BUG_ON(!(ttm->page_flags & TTM_PAGE_FLAG_USER)); - write = ((ttm->page_flags & TTM_PAGE_FLAG_WRITE) != 0); - dirty = ((ttm->page_flags & TTM_PAGE_FLAG_USER_DIRTY) != 0); - - if (be) - be->func->clear(be); - - for (i = 0; i < ttm->num_pages; ++i) { - page = ttm->pages[i]; - if (page == NULL) - continue; - - if (page == ttm->dummy_read_page) { - BUG_ON(write); - continue; - } - - if (write && dirty && !PageReserved(page)) - set_page_dirty_lock(page); - - ttm->pages[i] = NULL; - ttm_mem_global_free(ttm->glob->mem_glob, PAGE_SIZE); - put_page(page); - } - ttm->state = tt_unpopulated; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; -} - static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) { struct page *p; @@ -325,10 +288,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm) } if (likely(ttm->pages != NULL)) { - if (ttm->page_flags & TTM_PAGE_FLAG_USER) -
[PATCH 2/8] drm/ttm: remove split btw highmen and lowmem page
From: Jerome Glisse Split btw highmem and lowmem page was rendered useless by the pool code. Remove it. Note further cleanup would change the ttm page allocation helper to actualy take an array instead of relying on list this could drasticly reduce the number of function call in the common case of allocation whole buffer. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_tt.c| 11 ++- include/drm/ttm/ttm_bo_driver.h |7 --- 2 files changed, 2 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 82a1161..8b7a6d0 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -69,7 +69,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) struct ttm_mem_global *mem_glob = ttm->glob->mem_glob; int ret; - while (NULL == (p = ttm->pages[index])) { + if (NULL == (p = ttm->pages[index])) { INIT_LIST_HEAD(&h); @@ -85,10 +85,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) if (unlikely(ret != 0)) goto out_err; - if (PageHighMem(p)) - ttm->pages[--ttm->first_himem_page] = p; - else - ttm->pages[++ttm->last_lomem_page] = p; + ttm->pages[index] = p; } return p; out_err: @@ -270,8 +267,6 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm) ttm_put_pages(&h, count, ttm->page_flags, ttm->caching_state, ttm->dma_address); ttm->state = tt_unpopulated; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; } void ttm_tt_destroy(struct ttm_tt *ttm) @@ -315,8 +310,6 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, ttm->glob = bdev->glob; ttm->num_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; ttm->caching_state = tt_cached; ttm->page_flags = page_flags; diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 37527d6..9da182b 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -136,11 +136,6 @@ enum ttm_caching_state { * @dummy_read_page: Page to map where the ttm_tt page array contains a NULL * pointer. * @pages: Array of pages backing the data. - * @first_himem_page: Himem pages are put last in the page array, which - * enables us to run caching attribute changes on only the first part - * of the page array containing lomem pages. This is the index of the - * first himem page. - * @last_lomem_page: Index of the last lomem page in the page array. * @num_pages: Number of pages in the page array. * @bdev: Pointer to the current struct ttm_bo_device. * @be: Pointer to the ttm backend. @@ -157,8 +152,6 @@ enum ttm_caching_state { struct ttm_tt { struct page *dummy_read_page; struct page **pages; - long first_himem_page; - long last_lomem_page; uint32_t page_flags; unsigned long num_pages; struct ttm_bo_global *glob; -- 1.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 3/8] drm/ttm: remove unused backend flags field
From: Jerome Glisse This field is not use by any of the driver just drop it. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/radeon/radeon_ttm.c |1 - include/drm/ttm/ttm_bo_driver.h |2 -- 2 files changed, 0 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 0b5468b..97c76ae 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -787,7 +787,6 @@ struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev) return NULL; } gtt->backend.bdev = &rdev->mman.bdev; - gtt->backend.flags = 0; gtt->backend.func = &radeon_backend_func; gtt->rdev = rdev; gtt->pages = NULL; diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 9da182b..6d17140 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -106,7 +106,6 @@ struct ttm_backend_func { * struct ttm_backend * * @bdev: Pointer to a struct ttm_bo_device. - * @flags: For driver use. * @func: Pointer to a struct ttm_backend_func that describes * the backend methods. * @@ -114,7 +113,6 @@ struct ttm_backend_func { struct ttm_backend { struct ttm_bo_device *bdev; - uint32_t flags; struct ttm_backend_func *func; }; -- 1.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 4/8] drm/ttm: use ttm put pages function to properly restore cache attribute
From: Jerome Glisse On failure we need to make sure the page we free has wb cache attribute. Do this pas call the proper ttm page helper function. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/ttm/ttm_tt.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 8b7a6d0..3fb4c6d 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,7 +89,10 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) } return p; out_err: - put_page(p); + INIT_LIST_HEAD(&h); + list_add(&p->lru, &h); + ttm_put_pages(&h, 1, ttm->page_flags, + ttm->caching_state, &ttm->dma_address[index]); return NULL; } -- 1.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 5/8] drm/ttm: convert page allocation to use page ptr array instead of list V2
From: Jerome Glisse Use the ttm_tt page ptr array for page allocation, move the list to array unwinding into the page allocation functions. V2 split the fix to use ttm put page as a separate fix properly fill pages array when TTM_PAGE_FLAG_ZERO_ALLOC is not set Signed-off-by: Jerome Glisse --- drivers/gpu/drm/ttm/ttm_memory.c | 44 + drivers/gpu/drm/ttm/ttm_page_alloc.c | 70 +++--- drivers/gpu/drm/ttm/ttm_tt.c | 61 ++ include/drm/ttm/ttm_memory.h | 11 +++-- include/drm/ttm/ttm_page_alloc.h | 17 5 files changed, 101 insertions(+), 102 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c index e70ddd8..3a3a58b 100644 --- a/drivers/gpu/drm/ttm/ttm_memory.c +++ b/drivers/gpu/drm/ttm/ttm_memory.c @@ -543,41 +543,53 @@ int ttm_mem_global_alloc(struct ttm_mem_global *glob, uint64_t memory, } EXPORT_SYMBOL(ttm_mem_global_alloc); -int ttm_mem_global_alloc_page(struct ttm_mem_global *glob, - struct page *page, - bool no_wait, bool interruptible) +int ttm_mem_global_alloc_pages(struct ttm_mem_global *glob, + struct page **pages, + unsigned npages, + bool no_wait, bool interruptible) { struct ttm_mem_zone *zone = NULL; + unsigned i; + int r; /** * Page allocations may be registed in a single zone * only if highmem or !dma32. */ - + for (i = 0; i < npages; i++) { #ifdef CONFIG_HIGHMEM - if (PageHighMem(page) && glob->zone_highmem != NULL) - zone = glob->zone_highmem; + if (PageHighMem(pages[i]) && glob->zone_highmem != NULL) + zone = glob->zone_highmem; #else - if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL) - zone = glob->zone_kernel; + if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL) + zone = glob->zone_kernel; #endif - return ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait, -interruptible); + r = ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait, + interruptible); + if (r) { + return r; + } + } + return 0; } -void ttm_mem_global_free_page(struct ttm_mem_global *glob, struct page *page) +void ttm_mem_global_free_pages(struct ttm_mem_global *glob, + struct page **pages, unsigned npages) { struct ttm_mem_zone *zone = NULL; + unsigned i; + for (i = 0; i < npages; i++) { #ifdef CONFIG_HIGHMEM - if (PageHighMem(page) && glob->zone_highmem != NULL) - zone = glob->zone_highmem; + if (PageHighMem(pages[i]) && glob->zone_highmem != NULL) + zone = glob->zone_highmem; #else - if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL) - zone = glob->zone_kernel; + if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL) + zone = glob->zone_kernel; #endif - ttm_mem_global_free_zone(glob, zone, PAGE_SIZE); + ttm_mem_global_free_zone(glob, zone, PAGE_SIZE); + } } diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 727e93d..e94ff12 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -619,8 +619,10 @@ static void ttm_page_pool_fill_locked(struct ttm_page_pool *pool, * @return count of pages still required to fulfill the request. */ static unsigned ttm_page_pool_get_pages(struct ttm_page_pool *pool, - struct list_head *pages, int ttm_flags, - enum ttm_caching_state cstate, unsigned count) + struct list_head *pages, + int ttm_flags, + enum ttm_caching_state cstate, + unsigned count) { unsigned long irq_flags; struct list_head *p; @@ -664,13 +666,14 @@ out: * On success pages list will hold count number of correctly * cached pages. */ -int ttm_get_pages(struct list_head *pages, int flags, - enum ttm_caching_state cstate, unsigned count, - dma_addr_t *dma_address) +int ttm_get_pages(struct page **pages, unsigned npages, int flags, + enum ttm_caching_state cstate, dma_addr_t *dma_address) { struct ttm_page_pool *pool = ttm_get_pool(flags, cstate); struct page *p = NULL; + struct list_head plist; gfp_t gfp_flags = GFP_USER; + unsigned count = 0; int r; /*
[PATCH 6/8] drm/ttm: test for dma_address array allocation failure
From: Jerome Glisse Signed-off-by: Jerome Glisse --- drivers/gpu/drm/ttm/ttm_tt.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 2dd45ca..58ea7dc 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -298,7 +298,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, ttm->dummy_read_page = dummy_read_page; ttm_tt_alloc_page_directory(ttm); - if (!ttm->pages) { + if (!ttm->pages || !ttm->dma_address) { ttm_tt_destroy(ttm); printk(KERN_ERR TTM_PFX "Failed allocating page table\n"); return NULL; -- 1.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 7/8] drm/ttm: merge ttm_backend and ttm_tt
From: Jerome Glisse ttm_backend will exist only and only with a ttm_tt, and ttm_tt will be of interesting use only when bind to a backend. Thus to avoid code & data duplication btw the two merge them. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/nouveau/nouveau_bo.c| 14 ++- drivers/gpu/drm/nouveau/nouveau_drv.h |5 +- drivers/gpu/drm/nouveau/nouveau_sgdma.c | 188 -- drivers/gpu/drm/radeon/radeon_ttm.c | 222 --- drivers/gpu/drm/ttm/ttm_agp_backend.c | 88 + drivers/gpu/drm/ttm/ttm_bo.c|9 +- drivers/gpu/drm/ttm/ttm_tt.c| 59 ++--- drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 66 +++-- include/drm/ttm/ttm_bo_driver.h | 104 ++- 9 files changed, 295 insertions(+), 460 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 7226f41..b060fa4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -343,8 +343,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val) *mem = val; } -static struct ttm_backend * -nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev) +static struct ttm_tt * +nouveau_ttm_tt_create(struct ttm_bo_device *bdev, + unsigned long size, uint32_t page_flags, + struct page *dummy_read_page) { struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev); struct drm_device *dev = dev_priv->dev; @@ -352,11 +354,13 @@ nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev) switch (dev_priv->gart_info.type) { #if __OS_HAS_AGP case NOUVEAU_GART_AGP: - return ttm_agp_backend_init(bdev, dev->agp->bridge); + return ttm_agp_tt_create(bdev, dev->agp->bridge, +size, page_flags, dummy_read_page); #endif case NOUVEAU_GART_PDMA: case NOUVEAU_GART_HW: - return nouveau_sgdma_init_ttm(dev); + return nouveau_sgdma_create_ttm(bdev, size, page_flags, + dummy_read_page); default: NV_ERROR(dev, "Unknown GART type %d\n", dev_priv->gart_info.type); @@ -1045,7 +1049,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) } struct ttm_bo_driver nouveau_bo_driver = { - .create_ttm_backend_entry = nouveau_bo_create_ttm_backend_entry, + .ttm_tt_create = &nouveau_ttm_tt_create, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 29837da..0c53e39 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -1000,7 +1000,10 @@ extern int nouveau_sgdma_init(struct drm_device *); extern void nouveau_sgdma_takedown(struct drm_device *); extern uint32_t nouveau_sgdma_get_physical(struct drm_device *, uint32_t offset); -extern struct ttm_backend *nouveau_sgdma_init_ttm(struct drm_device *); +extern struct ttm_tt *nouveau_sgdma_create_ttm(struct ttm_bo_device *bdev, + unsigned long size, + uint32_t page_flags, + struct page *dummy_read_page); /* nouveau_debugfs.c */ #if defined(CONFIG_DRM_NOUVEAU_DEBUG) diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c index b75258a..bc2ab90 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c @@ -8,44 +8,23 @@ #define NV_CTXDMA_PAGE_MASK (NV_CTXDMA_PAGE_SIZE - 1) struct nouveau_sgdma_be { - struct ttm_backend backend; + struct ttm_tt ttm; struct drm_device *dev; - - dma_addr_t *pages; - unsigned nr_pages; - bool unmap_pages; - u64 offset; - bool bound; }; static int -nouveau_sgdma_populate(struct ttm_backend *be, unsigned long num_pages, - struct page **pages, struct page *dummy_read_page, - dma_addr_t *dma_addrs) +nouveau_sgdma_dma_map(struct ttm_tt *ttm) { - struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)be; + struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm; struct drm_device *dev = nvbe->dev; int i; - NV_DEBUG(nvbe->dev, "num_pages = %ld\n", num_pages); - - nvbe->pages = dma_addrs; - nvbe->nr_pages = num_pages; - nvbe->unmap_pages = true; - - /* this code path isn't called and is incorrect anyways */ - if (0) { /* dma_addrs[0] != DMA_ERROR_CODE) { */ - nvbe->unmap_pages = fa
[PATCH 8/8] drm/ttm: introduce callback for ttm_tt populate & unpopulate
From: Jerome Glisse Move the page allocation and freeing to driver callback and provide ttm code helper function for those. Most intrusive change, is the fact that we now only fully populate an object this simplify some of code designed around the page fault design. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/nouveau/nouveau_bo.c |3 + drivers/gpu/drm/radeon/radeon_ttm.c|2 + drivers/gpu/drm/ttm/ttm_bo_util.c | 31 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c| 13 ++-- drivers/gpu/drm/ttm/ttm_page_alloc.c | 42 ++ drivers/gpu/drm/ttm/ttm_tt.c | 97 +++ drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 + include/drm/ttm/ttm_bo_driver.h| 41 -- include/drm/ttm/ttm_page_alloc.h | 18 ++ 9 files changed, 125 insertions(+), 125 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index b060fa4..7e5ca3f 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -28,6 +28,7 @@ */ #include "drmP.h" +#include "ttm/ttm_page_alloc.h" #include "nouveau_drm.h" #include "nouveau_drv.h" @@ -1050,6 +1051,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) struct ttm_bo_driver nouveau_bo_driver = { .ttm_tt_create = &nouveau_ttm_tt_create, + .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate, + .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 53ff62b..490afce 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -584,6 +584,8 @@ struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device *bdev, static struct ttm_bo_driver radeon_bo_driver = { .ttm_tt_create = &radeon_ttm_tt_create, + .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate, + .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate, .invalidate_caches = &radeon_invalidate_caches, .init_mem_type = &radeon_init_mem_type, .evict_flags = &radeon_evict_flags, diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 082fcae..60f204d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -244,7 +244,7 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src, unsigned long page, pgprot_t prot) { - struct page *d = ttm_tt_get_page(ttm, page); + struct page *d = ttm->pages[page]; void *dst; if (!d) @@ -281,7 +281,7 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void *dst, unsigned long page, pgprot_t prot) { - struct page *s = ttm_tt_get_page(ttm, page); + struct page *s = ttm->pages[page]; void *src; if (!s) @@ -342,6 +342,12 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, if (old_iomap == NULL && ttm == NULL) goto out2; + if (ttm->state == tt_unpopulated) { + ret = ttm->bdev->driver->ttm_tt_populate(ttm); + if (ret) + goto out1; + } + add = 0; dir = 1; @@ -502,10 +508,16 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, { struct ttm_mem_reg *mem = &bo->mem; pgprot_t prot; struct ttm_tt *ttm = bo->ttm; - struct page *d; - int i; + int ret; BUG_ON(!ttm); + + if (ttm->state == tt_unpopulated) { + ret = ttm->bdev->driver->ttm_tt_populate(ttm); + if (ret) + return ret; + } + if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED)) { /* * We're mapping a single page, and the desired @@ -513,18 +525,9 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, */ map->bo_kmap_type = ttm_bo_map_kmap; - map->page = ttm_tt_get_page(ttm, start_page); + map->page = ttm->pages[start_page]; map->virtual = kmap(map->page); } else { - /* -* Populate the part we're mapping; -*/ - for (i = start_page; i < start_page + num_pages; ++i) { - d = ttm_tt_get_page(ttm, i); - if (!d) - return -ENOMEM; - } - /* * We need to use vmap to get the desired page protection * or to make the buffer object look contiguous. diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo
[PATCH] drm/radeon/kms: consolidate GART code, fix segfault after GPU lockup V2
From: Jerome Glisse After GPU lockup VRAM gart table is unpinned and thus its pointer becomes unvalid. This patch move the unpin code to a common helper function and set pointer to NULL so that page update code can check if it should update GPU page table or not. That way bo still bound to GART can be unbound (pci_unmap_page for all there page) properly while there is no need to update the GPU page table. V2 move the test for null gart out of the loop, small optimization Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen.c | 12 +- drivers/gpu/drm/radeon/ni.c | 13 +- drivers/gpu/drm/radeon/r100.c|6 ++- drivers/gpu/drm/radeon/r300.c| 16 ++-- drivers/gpu/drm/radeon/r600.c| 17 ++-- drivers/gpu/drm/radeon/radeon.h | 22 ++ drivers/gpu/drm/radeon/radeon_gart.c | 71 - drivers/gpu/drm/radeon/rs400.c |5 +- drivers/gpu/drm/radeon/rs600.c | 16 ++-- drivers/gpu/drm/radeon/rv770.c | 13 +- 10 files changed, 75 insertions(+), 116 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index ed406e8..ebd2092 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -893,7 +893,7 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev) u32 tmp; int r; - if (rdev->gart.table.vram.robj == NULL) { + if (rdev->gart.robj == NULL) { dev_err(rdev->dev, "No VRAM object for PCIE GART.\n"); return -EINVAL; } @@ -945,7 +945,6 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev) void evergreen_pcie_gart_disable(struct radeon_device *rdev) { u32 tmp; - int r; /* Disable all tables */ WREG32(VM_CONTEXT0_CNTL, 0); @@ -965,14 +964,7 @@ void evergreen_pcie_gart_disable(struct radeon_device *rdev) WREG32(MC_VM_MB_L1_TLB1_CNTL, tmp); WREG32(MC_VM_MB_L1_TLB2_CNTL, tmp); WREG32(MC_VM_MB_L1_TLB3_CNTL, tmp); - if (rdev->gart.table.vram.robj) { - r = radeon_bo_reserve(rdev->gart.table.vram.robj, false); - if (likely(r == 0)) { - radeon_bo_kunmap(rdev->gart.table.vram.robj); - radeon_bo_unpin(rdev->gart.table.vram.robj); - radeon_bo_unreserve(rdev->gart.table.vram.robj); - } - } + radeon_gart_table_vram_unpin(rdev); } void evergreen_pcie_gart_fini(struct radeon_device *rdev) diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 556b7bc..927af99 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -932,7 +932,7 @@ int cayman_pcie_gart_enable(struct radeon_device *rdev) { int r; - if (rdev->gart.table.vram.robj == NULL) { + if (rdev->gart.robj == NULL) { dev_err(rdev->dev, "No VRAM object for PCIE GART.\n"); return -EINVAL; } @@ -977,8 +977,6 @@ int cayman_pcie_gart_enable(struct radeon_device *rdev) void cayman_pcie_gart_disable(struct radeon_device *rdev) { - int r; - /* Disable all tables */ WREG32(VM_CONTEXT0_CNTL, 0); WREG32(VM_CONTEXT1_CNTL, 0); @@ -994,14 +992,7 @@ void cayman_pcie_gart_disable(struct radeon_device *rdev) WREG32(VM_L2_CNTL2, 0); WREG32(VM_L2_CNTL3, L2_CACHE_BIGK_ASSOCIATIVITY | L2_CACHE_BIGK_FRAGMENT_SIZE(6)); - if (rdev->gart.table.vram.robj) { - r = radeon_bo_reserve(rdev->gart.table.vram.robj, false); - if (likely(r == 0)) { - radeon_bo_kunmap(rdev->gart.table.vram.robj); - radeon_bo_unpin(rdev->gart.table.vram.robj); - radeon_bo_unreserve(rdev->gart.table.vram.robj); - } - } + radeon_gart_table_vram_unpin(rdev); } void cayman_pcie_gart_fini(struct radeon_device *rdev) diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index 8f8b8fa..00d2fa9 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -576,7 +576,7 @@ int r100_pci_gart_init(struct radeon_device *rdev) { int r; - if (rdev->gart.table.ram.ptr) { + if (rdev->gart.ptr) { WARN(1, "R100 PCI GART already initialized\n"); return 0; } @@ -635,10 +635,12 @@ void r100_pci_gart_disable(struct radeon_device *rdev) int r100_pci_gart_set_page(struct radeon_device *rdev, int i, uint64_t addr) { + u32 *gtt = rdev->gart.ptr; + if (i < 0 || i > rdev->gart.num_gpu_pages) { return -EINVAL; } - rdev->gart.table.ram.ptr[i] = cpu_to_le32(lower_32_bits(addr)); + gtt[i] = cpu_to_le32(lower_32_bits(addr)); return 0; } diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c inde
ttm: merge ttm_backend & ttm_tt, introduce ttm dma allocator
Hi, So updated patchset, only patch 5 seen change since last set. Last 3 patch are from your patchset, modified on top of mine. Konrad so i added you dma pool allocator on top of that and added support for it to radeon. All in all it's slightly smaller than your patchset. Biggest change is use of a list_head in ttm_tt to keep the dma_page list inside the ttm_tt object allowing faster and lot simpler deallocation of page. I only briefly test this code, it seems ok so far. Did you tested booting kernel with swiotlb=force and with your patchset ? Because here it doesn't work. I still don't understand why swiotlb want to create a bounce page when the page supplied fit the constraint. Need to dig into kernel history to see if there is any good reasons for that. Otherwise i believe this whole patchset make things cleaner and simpler for ttm. Cheers, Jerome Glisse ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 01/11] drm/ttm: remove userspace backed ttm object support
From: Jerome Glisse This was never use in none of the driver, properly using userspace page for bo would need more code (vma interaction mostly). Removing this dead code in preparation of ttm_tt & backend merge. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_bo.c| 22 drivers/gpu/drm/ttm/ttm_tt.c| 105 +-- include/drm/ttm/ttm_bo_api.h|5 -- include/drm/ttm/ttm_bo_driver.h | 24 - 4 files changed, 1 insertions(+), 155 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 617b646..4bde335 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -342,22 +342,6 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, bool zero_alloc) if (unlikely(bo->ttm == NULL)) ret = -ENOMEM; break; - case ttm_bo_type_user: - bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT, - page_flags | TTM_PAGE_FLAG_USER, - glob->dummy_read_page); - if (unlikely(bo->ttm == NULL)) { - ret = -ENOMEM; - break; - } - - ret = ttm_tt_set_user(bo->ttm, current, - bo->buffer_start, bo->num_pages); - if (unlikely(ret != 0)) { - ttm_tt_destroy(bo->ttm); - bo->ttm = NULL; - } - break; default: printk(KERN_ERR TTM_PFX "Illegal buffer object type\n"); ret = -EINVAL; @@ -907,16 +891,12 @@ static uint32_t ttm_bo_select_caching(struct ttm_mem_type_manager *man, } static bool ttm_bo_mt_compatible(struct ttm_mem_type_manager *man, -bool disallow_fixed, uint32_t mem_type, uint32_t proposed_placement, uint32_t *masked_placement) { uint32_t cur_flags = ttm_bo_type_flags(mem_type); - if ((man->flags & TTM_MEMTYPE_FLAG_FIXED) && disallow_fixed) - return false; - if ((cur_flags & proposed_placement & TTM_PL_MASK_MEM) == 0) return false; @@ -961,7 +941,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, man = &bdev->man[mem_type]; type_ok = ttm_bo_mt_compatible(man, - bo->type == ttm_bo_type_user, mem_type, placement->placement[i], &cur_flags); @@ -1009,7 +988,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, if (!man->has_type) continue; if (!ttm_bo_mt_compatible(man, - bo->type == ttm_bo_type_user, mem_type, placement->busy_placement[i], &cur_flags)) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 58c271e..82a1161 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -62,43 +62,6 @@ static void ttm_tt_free_page_directory(struct ttm_tt *ttm) ttm->dma_address = NULL; } -static void ttm_tt_free_user_pages(struct ttm_tt *ttm) -{ - int write; - int dirty; - struct page *page; - int i; - struct ttm_backend *be = ttm->be; - - BUG_ON(!(ttm->page_flags & TTM_PAGE_FLAG_USER)); - write = ((ttm->page_flags & TTM_PAGE_FLAG_WRITE) != 0); - dirty = ((ttm->page_flags & TTM_PAGE_FLAG_USER_DIRTY) != 0); - - if (be) - be->func->clear(be); - - for (i = 0; i < ttm->num_pages; ++i) { - page = ttm->pages[i]; - if (page == NULL) - continue; - - if (page == ttm->dummy_read_page) { - BUG_ON(write); - continue; - } - - if (write && dirty && !PageReserved(page)) - set_page_dirty_lock(page); - - ttm->pages[i] = NULL; - ttm_mem_global_free(ttm->glob->mem_glob, PAGE_SIZE); - put_page(page); - } - ttm->state = tt_unpopulated; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; -} - static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) { struct page *p; @@ -325,10 +288,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm) } if (likely(ttm->pages != NULL)) { - if (ttm->page_flags & TTM_PAGE_FLAG_USER) -
[PATCH 02/11] drm/ttm: remove split btw highmen and lowmem page
From: Jerome Glisse Split btw highmem and lowmem page was rendered useless by the pool code. Remove it. Note further cleanup would change the ttm page allocation helper to actualy take an array instead of relying on list this could drasticly reduce the number of function call in the common case of allocation whole buffer. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_tt.c| 11 ++- include/drm/ttm/ttm_bo_driver.h |7 --- 2 files changed, 2 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 82a1161..8b7a6d0 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -69,7 +69,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) struct ttm_mem_global *mem_glob = ttm->glob->mem_glob; int ret; - while (NULL == (p = ttm->pages[index])) { + if (NULL == (p = ttm->pages[index])) { INIT_LIST_HEAD(&h); @@ -85,10 +85,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) if (unlikely(ret != 0)) goto out_err; - if (PageHighMem(p)) - ttm->pages[--ttm->first_himem_page] = p; - else - ttm->pages[++ttm->last_lomem_page] = p; + ttm->pages[index] = p; } return p; out_err: @@ -270,8 +267,6 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm) ttm_put_pages(&h, count, ttm->page_flags, ttm->caching_state, ttm->dma_address); ttm->state = tt_unpopulated; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; } void ttm_tt_destroy(struct ttm_tt *ttm) @@ -315,8 +310,6 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, ttm->glob = bdev->glob; ttm->num_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; ttm->caching_state = tt_cached; ttm->page_flags = page_flags; diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 37527d6..9da182b 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -136,11 +136,6 @@ enum ttm_caching_state { * @dummy_read_page: Page to map where the ttm_tt page array contains a NULL * pointer. * @pages: Array of pages backing the data. - * @first_himem_page: Himem pages are put last in the page array, which - * enables us to run caching attribute changes on only the first part - * of the page array containing lomem pages. This is the index of the - * first himem page. - * @last_lomem_page: Index of the last lomem page in the page array. * @num_pages: Number of pages in the page array. * @bdev: Pointer to the current struct ttm_bo_device. * @be: Pointer to the ttm backend. @@ -157,8 +152,6 @@ enum ttm_caching_state { struct ttm_tt { struct page *dummy_read_page; struct page **pages; - long first_himem_page; - long last_lomem_page; uint32_t page_flags; unsigned long num_pages; struct ttm_bo_global *glob; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 03/11] drm/ttm: remove unused backend flags field
From: Jerome Glisse This field is not use by any of the driver just drop it. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/radeon/radeon_ttm.c |1 - include/drm/ttm/ttm_bo_driver.h |2 -- 2 files changed, 0 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 0b5468b..97c76ae 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -787,7 +787,6 @@ struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev) return NULL; } gtt->backend.bdev = &rdev->mman.bdev; - gtt->backend.flags = 0; gtt->backend.func = &radeon_backend_func; gtt->rdev = rdev; gtt->pages = NULL; diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 9da182b..6d17140 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -106,7 +106,6 @@ struct ttm_backend_func { * struct ttm_backend * * @bdev: Pointer to a struct ttm_bo_device. - * @flags: For driver use. * @func: Pointer to a struct ttm_backend_func that describes * the backend methods. * @@ -114,7 +113,6 @@ struct ttm_backend_func { struct ttm_backend { struct ttm_bo_device *bdev; - uint32_t flags; struct ttm_backend_func *func; }; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 04/11] drm/ttm: use ttm put pages function to properly restore cache attribute
From: Jerome Glisse On failure we need to make sure the page we free has wb cache attribute. Do this pas call the proper ttm page helper function. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_tt.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 8b7a6d0..3fb4c6d 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,7 +89,10 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) } return p; out_err: - put_page(p); + INIT_LIST_HEAD(&h); + list_add(&p->lru, &h); + ttm_put_pages(&h, 1, ttm->page_flags, + ttm->caching_state, &ttm->dma_address[index]); return NULL; } -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 05/11] drm/ttm: convert page allocation to use page ptr array instead of list V3
From: Jerome Glisse Use the ttm_tt page ptr array for page allocation, move the list to array unwinding into the page allocation functions. V2 split the fix to use ttm put page as a separate fix properly fill pages array when TTM_PAGE_FLAG_ZERO_ALLOC is not set V3 Added back page_count()==1 check when freeing page Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_memory.c | 44 +++-- drivers/gpu/drm/ttm/ttm_page_alloc.c | 90 -- drivers/gpu/drm/ttm/ttm_tt.c | 61 --- include/drm/ttm/ttm_memory.h | 11 ++-- include/drm/ttm/ttm_page_alloc.h | 17 +++--- 5 files changed, 115 insertions(+), 108 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c index e70ddd8..3a3a58b 100644 --- a/drivers/gpu/drm/ttm/ttm_memory.c +++ b/drivers/gpu/drm/ttm/ttm_memory.c @@ -543,41 +543,53 @@ int ttm_mem_global_alloc(struct ttm_mem_global *glob, uint64_t memory, } EXPORT_SYMBOL(ttm_mem_global_alloc); -int ttm_mem_global_alloc_page(struct ttm_mem_global *glob, - struct page *page, - bool no_wait, bool interruptible) +int ttm_mem_global_alloc_pages(struct ttm_mem_global *glob, + struct page **pages, + unsigned npages, + bool no_wait, bool interruptible) { struct ttm_mem_zone *zone = NULL; + unsigned i; + int r; /** * Page allocations may be registed in a single zone * only if highmem or !dma32. */ - + for (i = 0; i < npages; i++) { #ifdef CONFIG_HIGHMEM - if (PageHighMem(page) && glob->zone_highmem != NULL) - zone = glob->zone_highmem; + if (PageHighMem(pages[i]) && glob->zone_highmem != NULL) + zone = glob->zone_highmem; #else - if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL) - zone = glob->zone_kernel; + if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL) + zone = glob->zone_kernel; #endif - return ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait, -interruptible); + r = ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait, + interruptible); + if (r) { + return r; + } + } + return 0; } -void ttm_mem_global_free_page(struct ttm_mem_global *glob, struct page *page) +void ttm_mem_global_free_pages(struct ttm_mem_global *glob, + struct page **pages, unsigned npages) { struct ttm_mem_zone *zone = NULL; + unsigned i; + for (i = 0; i < npages; i++) { #ifdef CONFIG_HIGHMEM - if (PageHighMem(page) && glob->zone_highmem != NULL) - zone = glob->zone_highmem; + if (PageHighMem(pages[i]) && glob->zone_highmem != NULL) + zone = glob->zone_highmem; #else - if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL) - zone = glob->zone_kernel; + if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL) + zone = glob->zone_kernel; #endif - ttm_mem_global_free_zone(glob, zone, PAGE_SIZE); + ttm_mem_global_free_zone(glob, zone, PAGE_SIZE); + } } diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 727e93d..c4f18b9 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -619,8 +619,10 @@ static void ttm_page_pool_fill_locked(struct ttm_page_pool *pool, * @return count of pages still required to fulfill the request. */ static unsigned ttm_page_pool_get_pages(struct ttm_page_pool *pool, - struct list_head *pages, int ttm_flags, - enum ttm_caching_state cstate, unsigned count) + struct list_head *pages, + int ttm_flags, + enum ttm_caching_state cstate, + unsigned count) { unsigned long irq_flags; struct list_head *p; @@ -664,13 +666,14 @@ out: * On success pages list will hold count number of correctly * cached pages. */ -int ttm_get_pages(struct list_head *pages, int flags, - enum ttm_caching_state cstate, unsigned count, - dma_addr_t *dma_address) +int ttm_get_pages(struct page **pages, unsigned npages, int flags, + enum ttm_caching_state cstate, dma_addr_t *dma_address) { struct ttm_page_pool *pool = ttm_get_pool(flags, cstate); struct page *p = NULL; + struct list_head plist; gfp_t g
[PATCH 06/11] drm/ttm: test for dma_address array allocation failure
From: Jerome Glisse Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_tt.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 2dd45ca..58ea7dc 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -298,7 +298,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, ttm->dummy_read_page = dummy_read_page; ttm_tt_alloc_page_directory(ttm); - if (!ttm->pages) { + if (!ttm->pages || !ttm->dma_address) { ttm_tt_destroy(ttm); printk(KERN_ERR TTM_PFX "Failed allocating page table\n"); return NULL; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 07/11] drm/ttm: merge ttm_backend and ttm_tt
From: Jerome Glisse ttm_backend will exist only and only with a ttm_tt, and ttm_tt will be of interesting use only when bind to a backend. Thus to avoid code & data duplication btw the two merge them. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/nouveau/nouveau_bo.c| 14 ++- drivers/gpu/drm/nouveau/nouveau_drv.h |5 +- drivers/gpu/drm/nouveau/nouveau_sgdma.c | 188 -- drivers/gpu/drm/radeon/radeon_ttm.c | 222 --- drivers/gpu/drm/ttm/ttm_agp_backend.c | 88 + drivers/gpu/drm/ttm/ttm_bo.c|9 +- drivers/gpu/drm/ttm/ttm_tt.c| 59 ++--- drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 66 +++-- include/drm/ttm/ttm_bo_driver.h | 104 ++- 9 files changed, 295 insertions(+), 460 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 7226f41..b060fa4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -343,8 +343,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val) *mem = val; } -static struct ttm_backend * -nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev) +static struct ttm_tt * +nouveau_ttm_tt_create(struct ttm_bo_device *bdev, + unsigned long size, uint32_t page_flags, + struct page *dummy_read_page) { struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev); struct drm_device *dev = dev_priv->dev; @@ -352,11 +354,13 @@ nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev) switch (dev_priv->gart_info.type) { #if __OS_HAS_AGP case NOUVEAU_GART_AGP: - return ttm_agp_backend_init(bdev, dev->agp->bridge); + return ttm_agp_tt_create(bdev, dev->agp->bridge, +size, page_flags, dummy_read_page); #endif case NOUVEAU_GART_PDMA: case NOUVEAU_GART_HW: - return nouveau_sgdma_init_ttm(dev); + return nouveau_sgdma_create_ttm(bdev, size, page_flags, + dummy_read_page); default: NV_ERROR(dev, "Unknown GART type %d\n", dev_priv->gart_info.type); @@ -1045,7 +1049,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) } struct ttm_bo_driver nouveau_bo_driver = { - .create_ttm_backend_entry = nouveau_bo_create_ttm_backend_entry, + .ttm_tt_create = &nouveau_ttm_tt_create, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 29837da..0c53e39 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -1000,7 +1000,10 @@ extern int nouveau_sgdma_init(struct drm_device *); extern void nouveau_sgdma_takedown(struct drm_device *); extern uint32_t nouveau_sgdma_get_physical(struct drm_device *, uint32_t offset); -extern struct ttm_backend *nouveau_sgdma_init_ttm(struct drm_device *); +extern struct ttm_tt *nouveau_sgdma_create_ttm(struct ttm_bo_device *bdev, + unsigned long size, + uint32_t page_flags, + struct page *dummy_read_page); /* nouveau_debugfs.c */ #if defined(CONFIG_DRM_NOUVEAU_DEBUG) diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c index b75258a..bc2ab90 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c @@ -8,44 +8,23 @@ #define NV_CTXDMA_PAGE_MASK (NV_CTXDMA_PAGE_SIZE - 1) struct nouveau_sgdma_be { - struct ttm_backend backend; + struct ttm_tt ttm; struct drm_device *dev; - - dma_addr_t *pages; - unsigned nr_pages; - bool unmap_pages; - u64 offset; - bool bound; }; static int -nouveau_sgdma_populate(struct ttm_backend *be, unsigned long num_pages, - struct page **pages, struct page *dummy_read_page, - dma_addr_t *dma_addrs) +nouveau_sgdma_dma_map(struct ttm_tt *ttm) { - struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)be; + struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm; struct drm_device *dev = nvbe->dev; int i; - NV_DEBUG(nvbe->dev, "num_pages = %ld\n", num_pages); - - nvbe->pages = dma_addrs; - nvbe->nr_pages = num_pages; - nvbe->unmap_pages = true; - - /* this code path isn't called and is incorrect anyways */ - if (0) { /* dma_addrs[0] != DMA_ERROR_CODE) { */ - nvbe->unmap_pages = fa
[PATCH 08/11] drm/ttm: introduce callback for ttm_tt populate & unpopulate
From: Jerome Glisse Move the page allocation and freeing to driver callback and provide ttm code helper function for those. Most intrusive change, is the fact that we now only fully populate an object this simplify some of code designed around the page fault design. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/nouveau/nouveau_bo.c |3 + drivers/gpu/drm/radeon/radeon_ttm.c|2 + drivers/gpu/drm/ttm/ttm_bo_util.c | 31 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c| 13 ++-- drivers/gpu/drm/ttm/ttm_page_alloc.c | 42 ++ drivers/gpu/drm/ttm/ttm_tt.c | 97 +++ drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 + include/drm/ttm/ttm_bo_driver.h| 41 -- include/drm/ttm/ttm_page_alloc.h | 18 ++ 9 files changed, 125 insertions(+), 125 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index b060fa4..7e5ca3f 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -28,6 +28,7 @@ */ #include "drmP.h" +#include "ttm/ttm_page_alloc.h" #include "nouveau_drm.h" #include "nouveau_drv.h" @@ -1050,6 +1051,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) struct ttm_bo_driver nouveau_bo_driver = { .ttm_tt_create = &nouveau_ttm_tt_create, + .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate, + .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 53ff62b..490afce 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -584,6 +584,8 @@ struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device *bdev, static struct ttm_bo_driver radeon_bo_driver = { .ttm_tt_create = &radeon_ttm_tt_create, + .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate, + .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate, .invalidate_caches = &radeon_invalidate_caches, .init_mem_type = &radeon_init_mem_type, .evict_flags = &radeon_evict_flags, diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 082fcae..60f204d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -244,7 +244,7 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src, unsigned long page, pgprot_t prot) { - struct page *d = ttm_tt_get_page(ttm, page); + struct page *d = ttm->pages[page]; void *dst; if (!d) @@ -281,7 +281,7 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void *dst, unsigned long page, pgprot_t prot) { - struct page *s = ttm_tt_get_page(ttm, page); + struct page *s = ttm->pages[page]; void *src; if (!s) @@ -342,6 +342,12 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, if (old_iomap == NULL && ttm == NULL) goto out2; + if (ttm->state == tt_unpopulated) { + ret = ttm->bdev->driver->ttm_tt_populate(ttm); + if (ret) + goto out1; + } + add = 0; dir = 1; @@ -502,10 +508,16 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, { struct ttm_mem_reg *mem = &bo->mem; pgprot_t prot; struct ttm_tt *ttm = bo->ttm; - struct page *d; - int i; + int ret; BUG_ON(!ttm); + + if (ttm->state == tt_unpopulated) { + ret = ttm->bdev->driver->ttm_tt_populate(ttm); + if (ret) + return ret; + } + if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED)) { /* * We're mapping a single page, and the desired @@ -513,18 +525,9 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, */ map->bo_kmap_type = ttm_bo_map_kmap; - map->page = ttm_tt_get_page(ttm, start_page); + map->page = ttm->pages[start_page]; map->virtual = kmap(map->page); } else { - /* -* Populate the part we're mapping; -*/ - for (i = start_page; i < start_page + num_pages; ++i) { - d = ttm_tt_get_page(ttm, i); - if (!d) - return -ENOMEM; - } - /* * We need to use vmap to get the desired page protection * or to make the buffer object look contiguous. diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo
[PATCH 09/11] ttm: Provide DMA aware TTM page pool code.
From: Konrad Rzeszutek Wilk In TTM world the pages for the graphic drivers are kept in three different pools: write combined, uncached, and cached (write-back). When the pages are used by the graphic driver the graphic adapter via its built in MMU (or AGP) programs these pages in. The programming requires the virtual address (from the graphic adapter perspective) and the physical address (either System RAM or the memory on the card) which is obtained using the pci_map_* calls (which does the virtual to physical - or bus address translation). During the graphic application's "life" those pages can be shuffled around, swapped out to disk, moved from the VRAM to System RAM or vice-versa. This all works with the existing TTM pool code - except when we want to use the software IOTLB (SWIOTLB) code to "map" the physical addresses to the graphic adapter MMU. We end up programming the bounce buffer's physical address instead of the TTM pool memory's and get a non-worky driver. There are two solutions: 1) using the DMA API to allocate pages that are screened by the DMA API, or 2) using the pci_sync_* calls to copy the pages from the bounce-buffer and back. This patch fixes the issue by allocating pages using the DMA API. The second is a viable option - but it has performance drawbacks and potential correctness issues - think of the write cache page being bounced (SWIOTLB->TTM), the WC is set on the TTM page and the copy from SWIOTLB not making it to the TTM page until the page has been recycled in the pool (and used by another application). The bounce buffer does not get activated often - only in cases where we have a 32-bit capable card and we want to use a page that is allocated above the 4GB limit. The bounce buffer offers the solution of copying the contents of that 4GB page to an location below 4GB and then back when the operation has been completed (or vice-versa). This is done by using the 'pci_sync_*' calls. Note: If you look carefully enough in the existing TTM page pool code you will notice the GFP_DMA32 flag is used - which should guarantee that the provided page is under 4GB. It certainly is the case, except this gets ignored in two cases: - If user specifies 'swiotlb=force' which bounces _every_ page. - If user is using a Xen's PV Linux guest (which uses the SWIOTLB and the underlaying PFN's aren't necessarily under 4GB). To not have this extra copying done the other option is to allocate the pages using the DMA API so that there is not need to map the page and perform the expensive 'pci_sync_*' calls. This DMA API capable TTM pool requires for this the 'struct device' to properly call the DMA API. It also has to track the virtual and bus address of the page being handed out in case it ends up being swapped out or de-allocated - to make sure it is de-allocated using the proper's 'struct device'. Implementation wise the code keeps two lists: one that is attached to the 'struct device' (via the dev->dma_pools list) and a global one to be used when the 'struct device' is unavailable (think shrinker code). The global list can iterate over all of the 'struct device' and its associated dma_pool. The list in dev->dma_pools can only iterate the device's dma_pool. /[struct device_pool]\ /---| dev | /+---| dma_pool | /-+--\/ \/ |struct device| /-->[struct dma_pool for WC][struct dma_pool for uncached]<-/--| dma_pool | \-+--/ / \/ \--/ [Two pools associated with the device (WC and UC), and the parallel list containing the 'struct dev' and 'struct dma_pool' entries] The maximum amount of dma pools a device can have is six: write-combined, uncached, and cached; then there are the DMA32 variants which are: write-combined dma32, uncached dma32, and cached dma32. Currently this code only gets activated when any variant of the SWIOTLB IOMMU code is running (Intel without VT-d, AMD without GART, IBM Calgary and Xen PV with PCI devices). Tested-by: Michel Dänzer [v1: Using swiotlb_nr_tbl instead of swiotlb_enabled] [v2: Major overhaul - added 'inuse_list' to seperate used from inuse and reorder the order of lists to get better performance.] [v3: Added comments/and some logic based on review, Added Jerome tag] [v4: rebase on top of ttm_tt & ttm_backend merge] Reviewed-by: Jerome Glisse Signed-off-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/Makefile |4 + drivers/gpu/drm/ttm/ttm_memory.c |2 + drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 1212 ++ include/drm/ttm/ttm_bo_driver.h |2 + include/drm/ttm/ttm_page_alloc.h
[PATCH 10/11] swiotlb: Expose swiotlb_nr_tlb function to modules
From: Konrad Rzeszutek Wilk As a mechanism to detect whether SWIOTLB is enabled or not. We also fix the spelling - it was swioltb instead of swiotlb. CC: FUJITA Tomonori [v1: Ripped out swiotlb_enabled] Signed-off-by: Konrad Rzeszutek Wilk --- drivers/xen/swiotlb-xen.c |2 +- include/linux/swiotlb.h |2 +- lib/swiotlb.c |5 +++-- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index c984768..c50fb0b 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -152,7 +152,7 @@ void __init xen_swiotlb_init(int verbose) char *m = NULL; unsigned int repeat = 3; - nr_tbl = swioltb_nr_tbl(); + nr_tbl = swiotlb_nr_tbl(); if (nr_tbl) xen_io_tlb_nslabs = nr_tbl; else { diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 445702c..e872526 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -24,7 +24,7 @@ extern int swiotlb_force; extern void swiotlb_init(int verbose); extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose); -extern unsigned long swioltb_nr_tbl(void); +extern unsigned long swiotlb_nr_tbl(void); /* * Enumeration for sync targets diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 99093b3..058935e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -110,11 +110,11 @@ setup_io_tlb_npages(char *str) __setup("swiotlb=", setup_io_tlb_npages); /* make io_tlb_overflow tunable too? */ -unsigned long swioltb_nr_tbl(void) +unsigned long swiotlb_nr_tbl(void) { return io_tlb_nslabs; } - +EXPORT_SYMBOL_GPL(swiotlb_nr_tbl); /* Note that this doesn't work with highmem page */ static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev, volatile void *address) @@ -321,6 +321,7 @@ void __init swiotlb_free(void) free_bootmem_late(__pa(io_tlb_start), PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT)); } + io_tlb_nslabs = 0; } static int is_swiotlb_buffer(phys_addr_t paddr) -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 11/11] drm/radeon/kms: Enable the TTM DMA pool if swiotlb is on
From: Konrad Rzeszutek Wilk With the exception that we do not handle the AGP case. We only deal with PCIe cards such as ATI ES1000 or HD3200 that have been detected to only do DMA up to 32-bits. CC: Dave Airlie CC: Alex Deucher Signed-off-by: Konrad Rzeszutek Wilk Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h|1 - drivers/gpu/drm/radeon/radeon_device.c |5 ++ drivers/gpu/drm/radeon/radeon_gart.c | 29 +--- drivers/gpu/drm/radeon/radeon_ttm.c| 83 +-- 4 files changed, 83 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index e3170c7..63257ba 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -332,7 +332,6 @@ struct radeon_gart { union radeon_gart_table table; struct page **pages; dma_addr_t *pages_addr; - bool*ttm_alloced; boolready; }; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index c33bc91..11f6481 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -767,6 +767,11 @@ int radeon_device_init(struct radeon_device *rdev, rdev->need_dma32 = true; printk(KERN_WARNING "radeon: No suitable DMA available.\n"); } + r = pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(dma_bits)); + if (r) { + pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(32)); + printk(KERN_WARNING "radeon: No coherent DMA available.\n"); + } /* Registers mapping */ /* TODO: block userspace mapping of io register */ diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index fdc3a9a..18f496c 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -149,9 +149,6 @@ void radeon_gart_unbind(struct radeon_device *rdev, unsigned offset, p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); for (i = 0; i < pages; i++, p++) { if (rdev->gart.pages[p]) { - if (!rdev->gart.ttm_alloced[p]) - pci_unmap_page(rdev->pdev, rdev->gart.pages_addr[p], - PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); rdev->gart.pages[p] = NULL; rdev->gart.pages_addr[p] = rdev->dummy_page.addr; page_base = rdev->gart.pages_addr[p]; @@ -181,23 +178,7 @@ int radeon_gart_bind(struct radeon_device *rdev, unsigned offset, p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); for (i = 0; i < pages; i++, p++) { - /* we reverted the patch using dma_addr in TTM for now but this -* code stops building on alpha so just comment it out for now */ - if (0) { /*dma_addr[i] != DMA_ERROR_CODE) */ - rdev->gart.ttm_alloced[p] = true; - rdev->gart.pages_addr[p] = dma_addr[i]; - } else { - /* we need to support large memory configurations */ - /* assume that unbind have already been call on the range */ - rdev->gart.pages_addr[p] = pci_map_page(rdev->pdev, pagelist[i], - 0, PAGE_SIZE, - PCI_DMA_BIDIRECTIONAL); - if (pci_dma_mapping_error(rdev->pdev, rdev->gart.pages_addr[p])) { - /* FIXME: failed to map page (return -ENOMEM?) */ - radeon_gart_unbind(rdev, offset, pages); - return -ENOMEM; - } - } + rdev->gart.pages_addr[p] = dma_addr[i]; rdev->gart.pages[p] = pagelist[i]; page_base = rdev->gart.pages_addr[p]; for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) { @@ -259,12 +240,6 @@ int radeon_gart_init(struct radeon_device *rdev) radeon_gart_fini(rdev); return -ENOMEM; } - rdev->gart.ttm_alloced = kzalloc(sizeof(bool) * -rdev->gart.num_cpu_pages, GFP_KERNEL); - if (rdev->gart.ttm_alloced == NULL) { - radeon_gart_fini(rdev); - return -ENOMEM; - } /* set GART entry to point to the dummy page by default */ for (i = 0; i < rdev->gart.num_cpu_pages; i++) { rdev->gart.pages_addr[i] = rdev->dummy_page.addr; @@ -281,10 +256,8 @@ void radeon_gart_fini(struct radeon_device *rdev) rdev->gart.ready = false; kfree(rdev->gart.pages); kfree(rdev->gart.pag
ttm: merge ttm_backend & ttm_tt, introduce ttm dma allocator [FULL]
Ok so here is full patchset, including nouveau support, Ben if you could review (if change to nouveau in patch 7 are correct then others change to nouveau are more than likely 100% correct :)). So been tested on R7XX,EVERGREEN,CAICOS,CAYMAN with SWIOTLB. Also tested on NV50. I still need to test AGP configuration but i am quite confident that there is no regression for PCIE/PCI. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 01/12] drm/ttm: remove userspace backed ttm object support
From: Jerome Glisse This was never use in none of the driver, properly using userspace page for bo would need more code (vma interaction mostly). Removing this dead code in preparation of ttm_tt & backend merge. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_bo.c| 22 drivers/gpu/drm/ttm/ttm_tt.c| 105 +-- include/drm/ttm/ttm_bo_api.h|5 -- include/drm/ttm/ttm_bo_driver.h | 24 - 4 files changed, 1 insertions(+), 155 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 617b646..4bde335 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -342,22 +342,6 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, bool zero_alloc) if (unlikely(bo->ttm == NULL)) ret = -ENOMEM; break; - case ttm_bo_type_user: - bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT, - page_flags | TTM_PAGE_FLAG_USER, - glob->dummy_read_page); - if (unlikely(bo->ttm == NULL)) { - ret = -ENOMEM; - break; - } - - ret = ttm_tt_set_user(bo->ttm, current, - bo->buffer_start, bo->num_pages); - if (unlikely(ret != 0)) { - ttm_tt_destroy(bo->ttm); - bo->ttm = NULL; - } - break; default: printk(KERN_ERR TTM_PFX "Illegal buffer object type\n"); ret = -EINVAL; @@ -907,16 +891,12 @@ static uint32_t ttm_bo_select_caching(struct ttm_mem_type_manager *man, } static bool ttm_bo_mt_compatible(struct ttm_mem_type_manager *man, -bool disallow_fixed, uint32_t mem_type, uint32_t proposed_placement, uint32_t *masked_placement) { uint32_t cur_flags = ttm_bo_type_flags(mem_type); - if ((man->flags & TTM_MEMTYPE_FLAG_FIXED) && disallow_fixed) - return false; - if ((cur_flags & proposed_placement & TTM_PL_MASK_MEM) == 0) return false; @@ -961,7 +941,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, man = &bdev->man[mem_type]; type_ok = ttm_bo_mt_compatible(man, - bo->type == ttm_bo_type_user, mem_type, placement->placement[i], &cur_flags); @@ -1009,7 +988,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, if (!man->has_type) continue; if (!ttm_bo_mt_compatible(man, - bo->type == ttm_bo_type_user, mem_type, placement->busy_placement[i], &cur_flags)) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 58c271e..82a1161 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -62,43 +62,6 @@ static void ttm_tt_free_page_directory(struct ttm_tt *ttm) ttm->dma_address = NULL; } -static void ttm_tt_free_user_pages(struct ttm_tt *ttm) -{ - int write; - int dirty; - struct page *page; - int i; - struct ttm_backend *be = ttm->be; - - BUG_ON(!(ttm->page_flags & TTM_PAGE_FLAG_USER)); - write = ((ttm->page_flags & TTM_PAGE_FLAG_WRITE) != 0); - dirty = ((ttm->page_flags & TTM_PAGE_FLAG_USER_DIRTY) != 0); - - if (be) - be->func->clear(be); - - for (i = 0; i < ttm->num_pages; ++i) { - page = ttm->pages[i]; - if (page == NULL) - continue; - - if (page == ttm->dummy_read_page) { - BUG_ON(write); - continue; - } - - if (write && dirty && !PageReserved(page)) - set_page_dirty_lock(page); - - ttm->pages[i] = NULL; - ttm_mem_global_free(ttm->glob->mem_glob, PAGE_SIZE); - put_page(page); - } - ttm->state = tt_unpopulated; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; -} - static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) { struct page *p; @@ -325,10 +288,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm) } if (likely(ttm->pages != NULL)) { - if (ttm->page_flags & TTM_PAGE_FLAG_USER) -
[PATCH 02/12] drm/ttm: remove split btw highmen and lowmem page
From: Jerome Glisse Split btw highmem and lowmem page was rendered useless by the pool code. Remove it. Note further cleanup would change the ttm page allocation helper to actualy take an array instead of relying on list this could drasticly reduce the number of function call in the common case of allocation whole buffer. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_tt.c| 11 ++- include/drm/ttm/ttm_bo_driver.h |7 --- 2 files changed, 2 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 82a1161..8b7a6d0 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -69,7 +69,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) struct ttm_mem_global *mem_glob = ttm->glob->mem_glob; int ret; - while (NULL == (p = ttm->pages[index])) { + if (NULL == (p = ttm->pages[index])) { INIT_LIST_HEAD(&h); @@ -85,10 +85,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) if (unlikely(ret != 0)) goto out_err; - if (PageHighMem(p)) - ttm->pages[--ttm->first_himem_page] = p; - else - ttm->pages[++ttm->last_lomem_page] = p; + ttm->pages[index] = p; } return p; out_err: @@ -270,8 +267,6 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm) ttm_put_pages(&h, count, ttm->page_flags, ttm->caching_state, ttm->dma_address); ttm->state = tt_unpopulated; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; } void ttm_tt_destroy(struct ttm_tt *ttm) @@ -315,8 +310,6 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, ttm->glob = bdev->glob; ttm->num_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; ttm->caching_state = tt_cached; ttm->page_flags = page_flags; diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 37527d6..9da182b 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -136,11 +136,6 @@ enum ttm_caching_state { * @dummy_read_page: Page to map where the ttm_tt page array contains a NULL * pointer. * @pages: Array of pages backing the data. - * @first_himem_page: Himem pages are put last in the page array, which - * enables us to run caching attribute changes on only the first part - * of the page array containing lomem pages. This is the index of the - * first himem page. - * @last_lomem_page: Index of the last lomem page in the page array. * @num_pages: Number of pages in the page array. * @bdev: Pointer to the current struct ttm_bo_device. * @be: Pointer to the ttm backend. @@ -157,8 +152,6 @@ enum ttm_caching_state { struct ttm_tt { struct page *dummy_read_page; struct page **pages; - long first_himem_page; - long last_lomem_page; uint32_t page_flags; unsigned long num_pages; struct ttm_bo_global *glob; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 03/12] drm/ttm: remove unused backend flags field
From: Jerome Glisse This field is not use by any of the driver just drop it. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/radeon/radeon_ttm.c |1 - include/drm/ttm/ttm_bo_driver.h |2 -- 2 files changed, 0 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 0b5468b..97c76ae 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -787,7 +787,6 @@ struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev) return NULL; } gtt->backend.bdev = &rdev->mman.bdev; - gtt->backend.flags = 0; gtt->backend.func = &radeon_backend_func; gtt->rdev = rdev; gtt->pages = NULL; diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 9da182b..6d17140 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -106,7 +106,6 @@ struct ttm_backend_func { * struct ttm_backend * * @bdev: Pointer to a struct ttm_bo_device. - * @flags: For driver use. * @func: Pointer to a struct ttm_backend_func that describes * the backend methods. * @@ -114,7 +113,6 @@ struct ttm_backend_func { struct ttm_backend { struct ttm_bo_device *bdev; - uint32_t flags; struct ttm_backend_func *func; }; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 04/12] drm/ttm: use ttm put pages function to properly restore cache attribute
From: Jerome Glisse On failure we need to make sure the page we free has wb cache attribute. Do this pas call the proper ttm page helper function. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_tt.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 8b7a6d0..3fb4c6d 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,7 +89,10 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) } return p; out_err: - put_page(p); + INIT_LIST_HEAD(&h); + list_add(&p->lru, &h); + ttm_put_pages(&h, 1, ttm->page_flags, + ttm->caching_state, &ttm->dma_address[index]); return NULL; } -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 05/12] drm/ttm: convert page allocation to use page ptr array instead of list V3
From: Jerome Glisse Use the ttm_tt page ptr array for page allocation, move the list to array unwinding into the page allocation functions. V2 split the fix to use ttm put page as a separate fix properly fill pages array when TTM_PAGE_FLAG_ZERO_ALLOC is not set V3 Added back page_count()==1 check when freeing page Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_memory.c | 44 +++-- drivers/gpu/drm/ttm/ttm_page_alloc.c | 90 -- drivers/gpu/drm/ttm/ttm_tt.c | 61 --- include/drm/ttm/ttm_memory.h | 11 ++-- include/drm/ttm/ttm_page_alloc.h | 17 +++--- 5 files changed, 115 insertions(+), 108 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c index e70ddd8..3a3a58b 100644 --- a/drivers/gpu/drm/ttm/ttm_memory.c +++ b/drivers/gpu/drm/ttm/ttm_memory.c @@ -543,41 +543,53 @@ int ttm_mem_global_alloc(struct ttm_mem_global *glob, uint64_t memory, } EXPORT_SYMBOL(ttm_mem_global_alloc); -int ttm_mem_global_alloc_page(struct ttm_mem_global *glob, - struct page *page, - bool no_wait, bool interruptible) +int ttm_mem_global_alloc_pages(struct ttm_mem_global *glob, + struct page **pages, + unsigned npages, + bool no_wait, bool interruptible) { struct ttm_mem_zone *zone = NULL; + unsigned i; + int r; /** * Page allocations may be registed in a single zone * only if highmem or !dma32. */ - + for (i = 0; i < npages; i++) { #ifdef CONFIG_HIGHMEM - if (PageHighMem(page) && glob->zone_highmem != NULL) - zone = glob->zone_highmem; + if (PageHighMem(pages[i]) && glob->zone_highmem != NULL) + zone = glob->zone_highmem; #else - if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL) - zone = glob->zone_kernel; + if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL) + zone = glob->zone_kernel; #endif - return ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait, -interruptible); + r = ttm_mem_global_alloc_zone(glob, zone, PAGE_SIZE, no_wait, + interruptible); + if (r) { + return r; + } + } + return 0; } -void ttm_mem_global_free_page(struct ttm_mem_global *glob, struct page *page) +void ttm_mem_global_free_pages(struct ttm_mem_global *glob, + struct page **pages, unsigned npages) { struct ttm_mem_zone *zone = NULL; + unsigned i; + for (i = 0; i < npages; i++) { #ifdef CONFIG_HIGHMEM - if (PageHighMem(page) && glob->zone_highmem != NULL) - zone = glob->zone_highmem; + if (PageHighMem(pages[i]) && glob->zone_highmem != NULL) + zone = glob->zone_highmem; #else - if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL) - zone = glob->zone_kernel; + if (glob->zone_dma32 && page_to_pfn(pages[i]) > 0x0010UL) + zone = glob->zone_kernel; #endif - ttm_mem_global_free_zone(glob, zone, PAGE_SIZE); + ttm_mem_global_free_zone(glob, zone, PAGE_SIZE); + } } diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 727e93d..c4f18b9 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -619,8 +619,10 @@ static void ttm_page_pool_fill_locked(struct ttm_page_pool *pool, * @return count of pages still required to fulfill the request. */ static unsigned ttm_page_pool_get_pages(struct ttm_page_pool *pool, - struct list_head *pages, int ttm_flags, - enum ttm_caching_state cstate, unsigned count) + struct list_head *pages, + int ttm_flags, + enum ttm_caching_state cstate, + unsigned count) { unsigned long irq_flags; struct list_head *p; @@ -664,13 +666,14 @@ out: * On success pages list will hold count number of correctly * cached pages. */ -int ttm_get_pages(struct list_head *pages, int flags, - enum ttm_caching_state cstate, unsigned count, - dma_addr_t *dma_address) +int ttm_get_pages(struct page **pages, unsigned npages, int flags, + enum ttm_caching_state cstate, dma_addr_t *dma_address) { struct ttm_page_pool *pool = ttm_get_pool(flags, cstate); struct page *p = NULL; + struct list_head plist; gfp_t g
[PATCH 06/12] drm/ttm: test for dma_address array allocation failure
From: Jerome Glisse Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/ttm_tt.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 2dd45ca..58ea7dc 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -298,7 +298,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, ttm->dummy_read_page = dummy_read_page; ttm_tt_alloc_page_directory(ttm); - if (!ttm->pages) { + if (!ttm->pages || !ttm->dma_address) { ttm_tt_destroy(ttm); printk(KERN_ERR TTM_PFX "Failed allocating page table\n"); return NULL; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 07/12] drm/ttm: merge ttm_backend and ttm_tt
From: Jerome Glisse ttm_backend will exist only and only with a ttm_tt, and ttm_tt will be of interesting use only when bind to a backend. Thus to avoid code & data duplication btw the two merge them. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/nouveau/nouveau_bo.c| 14 ++- drivers/gpu/drm/nouveau/nouveau_drv.h |5 +- drivers/gpu/drm/nouveau/nouveau_sgdma.c | 188 -- drivers/gpu/drm/radeon/radeon_ttm.c | 222 --- drivers/gpu/drm/ttm/ttm_agp_backend.c | 88 + drivers/gpu/drm/ttm/ttm_bo.c|9 +- drivers/gpu/drm/ttm/ttm_tt.c| 59 ++--- drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 66 +++-- include/drm/ttm/ttm_bo_driver.h | 104 ++- 9 files changed, 295 insertions(+), 460 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 7226f41..b060fa4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -343,8 +343,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val) *mem = val; } -static struct ttm_backend * -nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev) +static struct ttm_tt * +nouveau_ttm_tt_create(struct ttm_bo_device *bdev, + unsigned long size, uint32_t page_flags, + struct page *dummy_read_page) { struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev); struct drm_device *dev = dev_priv->dev; @@ -352,11 +354,13 @@ nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev) switch (dev_priv->gart_info.type) { #if __OS_HAS_AGP case NOUVEAU_GART_AGP: - return ttm_agp_backend_init(bdev, dev->agp->bridge); + return ttm_agp_tt_create(bdev, dev->agp->bridge, +size, page_flags, dummy_read_page); #endif case NOUVEAU_GART_PDMA: case NOUVEAU_GART_HW: - return nouveau_sgdma_init_ttm(dev); + return nouveau_sgdma_create_ttm(bdev, size, page_flags, + dummy_read_page); default: NV_ERROR(dev, "Unknown GART type %d\n", dev_priv->gart_info.type); @@ -1045,7 +1049,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) } struct ttm_bo_driver nouveau_bo_driver = { - .create_ttm_backend_entry = nouveau_bo_create_ttm_backend_entry, + .ttm_tt_create = &nouveau_ttm_tt_create, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 29837da..0c53e39 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -1000,7 +1000,10 @@ extern int nouveau_sgdma_init(struct drm_device *); extern void nouveau_sgdma_takedown(struct drm_device *); extern uint32_t nouveau_sgdma_get_physical(struct drm_device *, uint32_t offset); -extern struct ttm_backend *nouveau_sgdma_init_ttm(struct drm_device *); +extern struct ttm_tt *nouveau_sgdma_create_ttm(struct ttm_bo_device *bdev, + unsigned long size, + uint32_t page_flags, + struct page *dummy_read_page); /* nouveau_debugfs.c */ #if defined(CONFIG_DRM_NOUVEAU_DEBUG) diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c index b75258a..bc2ab90 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c @@ -8,44 +8,23 @@ #define NV_CTXDMA_PAGE_MASK (NV_CTXDMA_PAGE_SIZE - 1) struct nouveau_sgdma_be { - struct ttm_backend backend; + struct ttm_tt ttm; struct drm_device *dev; - - dma_addr_t *pages; - unsigned nr_pages; - bool unmap_pages; - u64 offset; - bool bound; }; static int -nouveau_sgdma_populate(struct ttm_backend *be, unsigned long num_pages, - struct page **pages, struct page *dummy_read_page, - dma_addr_t *dma_addrs) +nouveau_sgdma_dma_map(struct ttm_tt *ttm) { - struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)be; + struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm; struct drm_device *dev = nvbe->dev; int i; - NV_DEBUG(nvbe->dev, "num_pages = %ld\n", num_pages); - - nvbe->pages = dma_addrs; - nvbe->nr_pages = num_pages; - nvbe->unmap_pages = true; - - /* this code path isn't called and is incorrect anyways */ - if (0) { /* dma_addrs[0] != DMA_ERROR_CODE) { */ -
[PATCH 08/12] drm/ttm: introduce callback for ttm_tt populate & unpopulate
From: Jerome Glisse Move the page allocation and freeing to driver callback and provide ttm code helper function for those. Most intrusive change, is the fact that we now only fully populate an object this simplify some of code designed around the page fault design. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/nouveau/nouveau_bo.c |3 + drivers/gpu/drm/radeon/radeon_ttm.c|2 + drivers/gpu/drm/ttm/ttm_bo_util.c | 31 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c| 13 ++-- drivers/gpu/drm/ttm/ttm_page_alloc.c | 42 ++ drivers/gpu/drm/ttm/ttm_tt.c | 97 +++ drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 + include/drm/ttm/ttm_bo_driver.h| 41 -- include/drm/ttm/ttm_page_alloc.h | 18 ++ 9 files changed, 125 insertions(+), 125 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index b060fa4..7e5ca3f 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -28,6 +28,7 @@ */ #include "drmP.h" +#include "ttm/ttm_page_alloc.h" #include "nouveau_drm.h" #include "nouveau_drv.h" @@ -1050,6 +1051,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) struct ttm_bo_driver nouveau_bo_driver = { .ttm_tt_create = &nouveau_ttm_tt_create, + .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate, + .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 53ff62b..490afce 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -584,6 +584,8 @@ struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device *bdev, static struct ttm_bo_driver radeon_bo_driver = { .ttm_tt_create = &radeon_ttm_tt_create, + .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate, + .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate, .invalidate_caches = &radeon_invalidate_caches, .init_mem_type = &radeon_init_mem_type, .evict_flags = &radeon_evict_flags, diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 082fcae..60f204d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -244,7 +244,7 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src, unsigned long page, pgprot_t prot) { - struct page *d = ttm_tt_get_page(ttm, page); + struct page *d = ttm->pages[page]; void *dst; if (!d) @@ -281,7 +281,7 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void *dst, unsigned long page, pgprot_t prot) { - struct page *s = ttm_tt_get_page(ttm, page); + struct page *s = ttm->pages[page]; void *src; if (!s) @@ -342,6 +342,12 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, if (old_iomap == NULL && ttm == NULL) goto out2; + if (ttm->state == tt_unpopulated) { + ret = ttm->bdev->driver->ttm_tt_populate(ttm); + if (ret) + goto out1; + } + add = 0; dir = 1; @@ -502,10 +508,16 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, { struct ttm_mem_reg *mem = &bo->mem; pgprot_t prot; struct ttm_tt *ttm = bo->ttm; - struct page *d; - int i; + int ret; BUG_ON(!ttm); + + if (ttm->state == tt_unpopulated) { + ret = ttm->bdev->driver->ttm_tt_populate(ttm); + if (ret) + return ret; + } + if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED)) { /* * We're mapping a single page, and the desired @@ -513,18 +525,9 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, */ map->bo_kmap_type = ttm_bo_map_kmap; - map->page = ttm_tt_get_page(ttm, start_page); + map->page = ttm->pages[start_page]; map->virtual = kmap(map->page); } else { - /* -* Populate the part we're mapping; -*/ - for (i = start_page; i < start_page + num_pages; ++i) { - d = ttm_tt_get_page(ttm, i); - if (!d) - return -ENOMEM; - } - /* * We need to use vmap to get the desired page protection * or to make the buffer object look contiguous. diff --git a/drivers/gpu/drm/ttm/ttm_
[PATCH 09/12] ttm: Provide DMA aware TTM page pool code.
From: Konrad Rzeszutek Wilk In TTM world the pages for the graphic drivers are kept in three different pools: write combined, uncached, and cached (write-back). When the pages are used by the graphic driver the graphic adapter via its built in MMU (or AGP) programs these pages in. The programming requires the virtual address (from the graphic adapter perspective) and the physical address (either System RAM or the memory on the card) which is obtained using the pci_map_* calls (which does the virtual to physical - or bus address translation). During the graphic application's "life" those pages can be shuffled around, swapped out to disk, moved from the VRAM to System RAM or vice-versa. This all works with the existing TTM pool code - except when we want to use the software IOTLB (SWIOTLB) code to "map" the physical addresses to the graphic adapter MMU. We end up programming the bounce buffer's physical address instead of the TTM pool memory's and get a non-worky driver. There are two solutions: 1) using the DMA API to allocate pages that are screened by the DMA API, or 2) using the pci_sync_* calls to copy the pages from the bounce-buffer and back. This patch fixes the issue by allocating pages using the DMA API. The second is a viable option - but it has performance drawbacks and potential correctness issues - think of the write cache page being bounced (SWIOTLB->TTM), the WC is set on the TTM page and the copy from SWIOTLB not making it to the TTM page until the page has been recycled in the pool (and used by another application). The bounce buffer does not get activated often - only in cases where we have a 32-bit capable card and we want to use a page that is allocated above the 4GB limit. The bounce buffer offers the solution of copying the contents of that 4GB page to an location below 4GB and then back when the operation has been completed (or vice-versa). This is done by using the 'pci_sync_*' calls. Note: If you look carefully enough in the existing TTM page pool code you will notice the GFP_DMA32 flag is used - which should guarantee that the provided page is under 4GB. It certainly is the case, except this gets ignored in two cases: - If user specifies 'swiotlb=force' which bounces _every_ page. - If user is using a Xen's PV Linux guest (which uses the SWIOTLB and the underlaying PFN's aren't necessarily under 4GB). To not have this extra copying done the other option is to allocate the pages using the DMA API so that there is not need to map the page and perform the expensive 'pci_sync_*' calls. This DMA API capable TTM pool requires for this the 'struct device' to properly call the DMA API. It also has to track the virtual and bus address of the page being handed out in case it ends up being swapped out or de-allocated - to make sure it is de-allocated using the proper's 'struct device'. Implementation wise the code keeps two lists: one that is attached to the 'struct device' (via the dev->dma_pools list) and a global one to be used when the 'struct device' is unavailable (think shrinker code). The global list can iterate over all of the 'struct device' and its associated dma_pool. The list in dev->dma_pools can only iterate the device's dma_pool. /[struct device_pool]\ /---| dev | /+---| dma_pool | /-+--\/ \/ |struct device| /-->[struct dma_pool for WC][struct dma_pool for uncached]<-/--| dma_pool | \-+--/ / \/ \--/ [Two pools associated with the device (WC and UC), and the parallel list containing the 'struct dev' and 'struct dma_pool' entries] The maximum amount of dma pools a device can have is six: write-combined, uncached, and cached; then there are the DMA32 variants which are: write-combined dma32, uncached dma32, and cached dma32. Currently this code only gets activated when any variant of the SWIOTLB IOMMU code is running (Intel without VT-d, AMD without GART, IBM Calgary and Xen PV with PCI devices). Tested-by: Michel Dänzer [v1: Using swiotlb_nr_tbl instead of swiotlb_enabled] [v2: Major overhaul - added 'inuse_list' to seperate used from inuse and reorder the order of lists to get better performance.] [v3: Added comments/and some logic based on review, Added Jerome tag] [v4: rebase on top of ttm_tt & ttm_backend merge] Reviewed-by: Jerome Glisse Signed-off-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/Makefile |4 + drivers/gpu/drm/ttm/ttm_memory.c |2 + drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 1212 ++ include/drm/ttm/ttm_bo_driver.h |2 + include/drm/ttm/ttm_page_alloc.h
[PATCH 10/12] swiotlb: Expose swiotlb_nr_tlb function to modules
From: Konrad Rzeszutek Wilk As a mechanism to detect whether SWIOTLB is enabled or not. We also fix the spelling - it was swioltb instead of swiotlb. CC: FUJITA Tomonori [v1: Ripped out swiotlb_enabled] Signed-off-by: Konrad Rzeszutek Wilk --- drivers/xen/swiotlb-xen.c |2 +- include/linux/swiotlb.h |2 +- lib/swiotlb.c |5 +++-- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index c984768..c50fb0b 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -152,7 +152,7 @@ void __init xen_swiotlb_init(int verbose) char *m = NULL; unsigned int repeat = 3; - nr_tbl = swioltb_nr_tbl(); + nr_tbl = swiotlb_nr_tbl(); if (nr_tbl) xen_io_tlb_nslabs = nr_tbl; else { diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 445702c..e872526 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -24,7 +24,7 @@ extern int swiotlb_force; extern void swiotlb_init(int verbose); extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose); -extern unsigned long swioltb_nr_tbl(void); +extern unsigned long swiotlb_nr_tbl(void); /* * Enumeration for sync targets diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 99093b3..058935e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -110,11 +110,11 @@ setup_io_tlb_npages(char *str) __setup("swiotlb=", setup_io_tlb_npages); /* make io_tlb_overflow tunable too? */ -unsigned long swioltb_nr_tbl(void) +unsigned long swiotlb_nr_tbl(void) { return io_tlb_nslabs; } - +EXPORT_SYMBOL_GPL(swiotlb_nr_tbl); /* Note that this doesn't work with highmem page */ static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev, volatile void *address) @@ -321,6 +321,7 @@ void __init swiotlb_free(void) free_bootmem_late(__pa(io_tlb_start), PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT)); } + io_tlb_nslabs = 0; } static int is_swiotlb_buffer(phys_addr_t paddr) -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 11/12] drm/radeon/kms: Enable the TTM DMA pool if swiotlb is on
From: Konrad Rzeszutek Wilk With the exception that we do not handle the AGP case. We only deal with PCIe cards such as ATI ES1000 or HD3200 that have been detected to only do DMA up to 32-bits. CC: Dave Airlie CC: Alex Deucher Signed-off-by: Konrad Rzeszutek Wilk Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h|1 - drivers/gpu/drm/radeon/radeon_device.c |5 ++ drivers/gpu/drm/radeon/radeon_gart.c | 29 +--- drivers/gpu/drm/radeon/radeon_ttm.c| 83 +-- 4 files changed, 83 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index e3170c7..63257ba 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -332,7 +332,6 @@ struct radeon_gart { union radeon_gart_table table; struct page **pages; dma_addr_t *pages_addr; - bool*ttm_alloced; boolready; }; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index c33bc91..11f6481 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -767,6 +767,11 @@ int radeon_device_init(struct radeon_device *rdev, rdev->need_dma32 = true; printk(KERN_WARNING "radeon: No suitable DMA available.\n"); } + r = pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(dma_bits)); + if (r) { + pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(32)); + printk(KERN_WARNING "radeon: No coherent DMA available.\n"); + } /* Registers mapping */ /* TODO: block userspace mapping of io register */ diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index fdc3a9a..18f496c 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -149,9 +149,6 @@ void radeon_gart_unbind(struct radeon_device *rdev, unsigned offset, p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); for (i = 0; i < pages; i++, p++) { if (rdev->gart.pages[p]) { - if (!rdev->gart.ttm_alloced[p]) - pci_unmap_page(rdev->pdev, rdev->gart.pages_addr[p], - PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); rdev->gart.pages[p] = NULL; rdev->gart.pages_addr[p] = rdev->dummy_page.addr; page_base = rdev->gart.pages_addr[p]; @@ -181,23 +178,7 @@ int radeon_gart_bind(struct radeon_device *rdev, unsigned offset, p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); for (i = 0; i < pages; i++, p++) { - /* we reverted the patch using dma_addr in TTM for now but this -* code stops building on alpha so just comment it out for now */ - if (0) { /*dma_addr[i] != DMA_ERROR_CODE) */ - rdev->gart.ttm_alloced[p] = true; - rdev->gart.pages_addr[p] = dma_addr[i]; - } else { - /* we need to support large memory configurations */ - /* assume that unbind have already been call on the range */ - rdev->gart.pages_addr[p] = pci_map_page(rdev->pdev, pagelist[i], - 0, PAGE_SIZE, - PCI_DMA_BIDIRECTIONAL); - if (pci_dma_mapping_error(rdev->pdev, rdev->gart.pages_addr[p])) { - /* FIXME: failed to map page (return -ENOMEM?) */ - radeon_gart_unbind(rdev, offset, pages); - return -ENOMEM; - } - } + rdev->gart.pages_addr[p] = dma_addr[i]; rdev->gart.pages[p] = pagelist[i]; page_base = rdev->gart.pages_addr[p]; for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) { @@ -259,12 +240,6 @@ int radeon_gart_init(struct radeon_device *rdev) radeon_gart_fini(rdev); return -ENOMEM; } - rdev->gart.ttm_alloced = kzalloc(sizeof(bool) * -rdev->gart.num_cpu_pages, GFP_KERNEL); - if (rdev->gart.ttm_alloced == NULL) { - radeon_gart_fini(rdev); - return -ENOMEM; - } /* set GART entry to point to the dummy page by default */ for (i = 0; i < rdev->gart.num_cpu_pages; i++) { rdev->gart.pages_addr[i] = rdev->dummy_page.addr; @@ -281,10 +256,8 @@ void radeon_gart_fini(struct radeon_device *rdev) rdev->gart.ready = false; kfree(rdev->gart.pages); kfree(rdev->gart.pag
[PATCH 12/12] nouveau/ttm/dma: Enable the TTM DMA pool if device can only do 32-bit DMA.
From: Konrad Rzeszutek Wilk If the card is capable of more than 32-bit, then use the default TTM page pool code which allocates from anywhere in the memory. Note: If the 'ttm.no_dma' parameter is set, the override is ignored and the default TTM pool is used. CC: Ben Skeggs CC: Francisco Jerez CC: Dave Airlie Signed-off-by: Konrad Rzeszutek Wilk Reviewed-by: Jerome Glisse --- drivers/gpu/drm/nouveau/nouveau_bo.c | 73 - drivers/gpu/drm/nouveau/nouveau_debugfs.c |1 + drivers/gpu/drm/nouveau/nouveau_sgdma.c | 60 +--- 3 files changed, 73 insertions(+), 61 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 7e5ca3f..36234a7 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -1049,10 +1049,79 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) nouveau_fence_unref(&old_fence); } +static int +nouveau_ttm_tt_populate(struct ttm_tt *ttm) +{ + struct drm_nouveau_private *dev_priv; + struct drm_device *dev; + unsigned i; + int r; + + if (ttm->state != tt_unpopulated) + return 0; + + dev_priv = nouveau_bdev(ttm->bdev); + dev = dev_priv->dev; + +#ifdef CONFIG_SWIOTLB + if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) { + return ttm_dma_populate(ttm, dev->dev); + } +#endif + + r = ttm_page_alloc_ttm_tt_populate(ttm); + if (r) { + return r; + } + + for (i = 0; i < ttm->num_pages; i++) { + ttm->dma_address[i] = pci_map_page(dev->pdev, ttm->pages[i], + 0, PAGE_SIZE, + PCI_DMA_BIDIRECTIONAL); + if (pci_dma_mapping_error(dev->pdev, ttm->dma_address[i])) { + while (--i) { + pci_unmap_page(dev->pdev, ttm->dma_address[i], + PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); + ttm->dma_address[i] = 0; + } + ttm_page_alloc_ttm_tt_unpopulate(ttm); + return -EFAULT; + } + } + return 0; +} + +static void +nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm) +{ + struct drm_nouveau_private *dev_priv; + struct drm_device *dev; + unsigned i; + + dev_priv = nouveau_bdev(ttm->bdev); + dev = dev_priv->dev; + +#ifdef CONFIG_SWIOTLB + if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) { + ttm_dma_unpopulate(ttm, dev->dev); + return; + } +#endif + + for (i = 0; i < ttm->num_pages; i++) { + if (ttm->dma_address[i]) { + pci_unmap_page(dev->pdev, ttm->dma_address[i], + PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); + } + } + + ttm_page_alloc_ttm_tt_unpopulate(ttm); +} + struct ttm_bo_driver nouveau_bo_driver = { .ttm_tt_create = &nouveau_ttm_tt_create, - .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate, - .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate, + .ttm_tt_populate = &nouveau_ttm_tt_populate, + .ttm_tt_unpopulate = &nouveau_ttm_tt_unpopulate, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c b/drivers/gpu/drm/nouveau/nouveau_debugfs.c index 8e15923..f52c2db 100644 --- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c +++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c @@ -178,6 +178,7 @@ static struct drm_info_list nouveau_debugfs_list[] = { { "memory", nouveau_debugfs_memory_info, 0, NULL }, { "vbios.rom", nouveau_debugfs_vbios_image, 0, NULL }, { "ttm_page_pool", ttm_page_alloc_debugfs, 0, NULL }, + { "ttm_dma_page_pool", ttm_dma_page_alloc_debugfs, 0, NULL }, }; #define NOUVEAU_DEBUGFS_ENTRIES ARRAY_SIZE(nouveau_debugfs_list) diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c index bc2ab90..ee1eb7c 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c @@ -13,41 +13,6 @@ struct nouveau_sgdma_be { u64 offset; }; -static int -nouveau_sgdma_dma_map(struct ttm_tt *ttm) -{ - struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm; - struct drm_device *dev = nvbe->dev; - int i; - - for (i = 0; i < ttm->num_pages; i++) { - ttm->dma_address[i] = pci_map_page(dev->pdev, ttm->pages[i], - 0, PAGE_SIZE, - PCI_DMA_BIDIRECTIONAL); -
ttm: merge ttm_backend & ttm_tt, introduce ttm dma allocator
So i did an overhaul of ttm_memory, i believe the simplification i did make sense. See patch 5 for a longer explanation. Thomas with the ttm_memory change the allocation of pages won't happen if the accounting report that we are going over the limit and bo shrinker failed to free any memory to make room. The handling of dma32 zone is done as post pass of ttm memory accounting. Regarding the pagefault comment i removed, it doesn't make sense anymore because now we populate the whole page table in one shot. So there is no more prefaulting few pages but a full prefaulting. Thought i can add a comment stating that if you like. For the ttm_tt_dma struct to hold page allocator specific informations i think it can be done as an followup patch but if you prefer to have that in this patchset let me know i will respin with such changes. I am in the process of retesting this whole serie and especialy the while memory accounting. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 01/13] drm/ttm: remove userspace backed ttm object support
From: Jerome Glisse This was never use in none of the driver, properly using userspace page for bo would need more code (vma interaction mostly). Removing this dead code in preparation of ttm_tt & backend merge. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c| 22 drivers/gpu/drm/ttm/ttm_tt.c| 105 +-- include/drm/ttm/ttm_bo_api.h|5 -- include/drm/ttm/ttm_bo_driver.h | 24 - 4 files changed, 1 insertions(+), 155 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 617b646..4bde335 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -342,22 +342,6 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, bool zero_alloc) if (unlikely(bo->ttm == NULL)) ret = -ENOMEM; break; - case ttm_bo_type_user: - bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT, - page_flags | TTM_PAGE_FLAG_USER, - glob->dummy_read_page); - if (unlikely(bo->ttm == NULL)) { - ret = -ENOMEM; - break; - } - - ret = ttm_tt_set_user(bo->ttm, current, - bo->buffer_start, bo->num_pages); - if (unlikely(ret != 0)) { - ttm_tt_destroy(bo->ttm); - bo->ttm = NULL; - } - break; default: printk(KERN_ERR TTM_PFX "Illegal buffer object type\n"); ret = -EINVAL; @@ -907,16 +891,12 @@ static uint32_t ttm_bo_select_caching(struct ttm_mem_type_manager *man, } static bool ttm_bo_mt_compatible(struct ttm_mem_type_manager *man, -bool disallow_fixed, uint32_t mem_type, uint32_t proposed_placement, uint32_t *masked_placement) { uint32_t cur_flags = ttm_bo_type_flags(mem_type); - if ((man->flags & TTM_MEMTYPE_FLAG_FIXED) && disallow_fixed) - return false; - if ((cur_flags & proposed_placement & TTM_PL_MASK_MEM) == 0) return false; @@ -961,7 +941,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, man = &bdev->man[mem_type]; type_ok = ttm_bo_mt_compatible(man, - bo->type == ttm_bo_type_user, mem_type, placement->placement[i], &cur_flags); @@ -1009,7 +988,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, if (!man->has_type) continue; if (!ttm_bo_mt_compatible(man, - bo->type == ttm_bo_type_user, mem_type, placement->busy_placement[i], &cur_flags)) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 58c271e..82a1161 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -62,43 +62,6 @@ static void ttm_tt_free_page_directory(struct ttm_tt *ttm) ttm->dma_address = NULL; } -static void ttm_tt_free_user_pages(struct ttm_tt *ttm) -{ - int write; - int dirty; - struct page *page; - int i; - struct ttm_backend *be = ttm->be; - - BUG_ON(!(ttm->page_flags & TTM_PAGE_FLAG_USER)); - write = ((ttm->page_flags & TTM_PAGE_FLAG_WRITE) != 0); - dirty = ((ttm->page_flags & TTM_PAGE_FLAG_USER_DIRTY) != 0); - - if (be) - be->func->clear(be); - - for (i = 0; i < ttm->num_pages; ++i) { - page = ttm->pages[i]; - if (page == NULL) - continue; - - if (page == ttm->dummy_read_page) { - BUG_ON(write); - continue; - } - - if (write && dirty && !PageReserved(page)) - set_page_dirty_lock(page); - - ttm->pages[i] = NULL; - ttm_mem_global_free(ttm->glob->mem_glob, PAGE_SIZE); - put_page(page); - } - ttm->state = tt_unpopulated; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; -} - static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) { struct page *p; @@ -325,10 +288,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm) } if (likely(ttm->pages != NULL)) { - if (ttm->page_flags & TTM_PAGE
[PATCH 02/13] drm/ttm: remove split btw highmen and lowmem page
From: Jerome Glisse Split btw highmem and lowmem page was rendered useless by the pool code. Remove it. Note further cleanup would change the ttm page allocation helper to actualy take an array instead of relying on list this could drasticly reduce the number of function call in the common case of allocation whole buffer. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_tt.c| 11 ++- include/drm/ttm/ttm_bo_driver.h |7 --- 2 files changed, 2 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 82a1161..8b7a6d0 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -69,7 +69,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) struct ttm_mem_global *mem_glob = ttm->glob->mem_glob; int ret; - while (NULL == (p = ttm->pages[index])) { + if (NULL == (p = ttm->pages[index])) { INIT_LIST_HEAD(&h); @@ -85,10 +85,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) if (unlikely(ret != 0)) goto out_err; - if (PageHighMem(p)) - ttm->pages[--ttm->first_himem_page] = p; - else - ttm->pages[++ttm->last_lomem_page] = p; + ttm->pages[index] = p; } return p; out_err: @@ -270,8 +267,6 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm) ttm_put_pages(&h, count, ttm->page_flags, ttm->caching_state, ttm->dma_address); ttm->state = tt_unpopulated; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; } void ttm_tt_destroy(struct ttm_tt *ttm) @@ -315,8 +310,6 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, ttm->glob = bdev->glob; ttm->num_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; ttm->caching_state = tt_cached; ttm->page_flags = page_flags; diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 37527d6..9da182b 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -136,11 +136,6 @@ enum ttm_caching_state { * @dummy_read_page: Page to map where the ttm_tt page array contains a NULL * pointer. * @pages: Array of pages backing the data. - * @first_himem_page: Himem pages are put last in the page array, which - * enables us to run caching attribute changes on only the first part - * of the page array containing lomem pages. This is the index of the - * first himem page. - * @last_lomem_page: Index of the last lomem page in the page array. * @num_pages: Number of pages in the page array. * @bdev: Pointer to the current struct ttm_bo_device. * @be: Pointer to the ttm backend. @@ -157,8 +152,6 @@ enum ttm_caching_state { struct ttm_tt { struct page *dummy_read_page; struct page **pages; - long first_himem_page; - long last_lomem_page; uint32_t page_flags; unsigned long num_pages; struct ttm_bo_global *glob; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 03/13] drm/ttm: remove unused backend flags field
From: Jerome Glisse This field is not use by any of the driver just drop it. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/radeon/radeon_ttm.c |1 - include/drm/ttm/ttm_bo_driver.h |2 -- 2 files changed, 0 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 0b5468b..97c76ae 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -787,7 +787,6 @@ struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev) return NULL; } gtt->backend.bdev = &rdev->mman.bdev; - gtt->backend.flags = 0; gtt->backend.func = &radeon_backend_func; gtt->rdev = rdev; gtt->pages = NULL; diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 9da182b..6d17140 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -106,7 +106,6 @@ struct ttm_backend_func { * struct ttm_backend * * @bdev: Pointer to a struct ttm_bo_device. - * @flags: For driver use. * @func: Pointer to a struct ttm_backend_func that describes * the backend methods. * @@ -114,7 +113,6 @@ struct ttm_backend_func { struct ttm_backend { struct ttm_bo_device *bdev; - uint32_t flags; struct ttm_backend_func *func; }; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 04/13] drm/ttm: use ttm put pages function to properly restore cache attribute
From: Jerome Glisse On failure we need to make sure the page we free has wb cache attribute. Do this pas call the proper ttm page helper function. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_tt.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 8b7a6d0..3fb4c6d 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,7 +89,10 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) } return p; out_err: - put_page(p); + INIT_LIST_HEAD(&h); + list_add(&p->lru, &h); + ttm_put_pages(&h, 1, ttm->page_flags, + ttm->caching_state, &ttm->dma_address[index]); return NULL; } -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 05/13] drm/ttm: overhaul memory accounting
From: Jerome Glisse This is an overhaul of the ttm memory accounting. This tries to keep the same global behavior while removing the whole zone concept. It keeps a distrinction for dma32 so that we make sure that ttm don't starve the dma32 zone. There is 3 threshold for memory allocation : - max_mem is the maximum memory the whole ttm infrastructure is going to allow allocation for (exception of system process see below) - emer_mem is the maximum memory allowed for system process, this limit is > to max_mem - swap_limit is the threshold at which point ttm will start to try to swap object because ttm is getting close the max_mem limit - swap_dma32_limit is the threshold at which point ttm will start swap object to try to reduce the pressure on the dma32 zone. Note that we don't specificly target object to swap to it might very well free more memory from highmem rather than from dma32 Accounting is done through used_mem & used_dma32_mem, which sum give the total amount of memory actually accounted by ttm. Idea is that allocation will fail if (used_mem + used_dma32_mem) > max_mem and if swapping fail to make enough room. The used_dma32_mem can be updated as a later stage, allowing to perform accounting test before allocating a whole batch of pages. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/ttm/ttm_bo.c |2 +- drivers/gpu/drm/ttm/ttm_memory.c | 517 +- drivers/gpu/drm/ttm/ttm_object.c |3 +- drivers/gpu/drm/ttm/ttm_tt.c |2 +- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c|8 +- drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |8 +- include/drm/ttm/ttm_memory.h | 23 +- 7 files changed, 168 insertions(+), 395 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 4bde335..92712798 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1252,7 +1252,7 @@ int ttm_bo_create(struct ttm_bo_device *bdev, size_t acc_size = ttm_bo_size(bdev->glob, (size + PAGE_SIZE - 1) >> PAGE_SHIFT); - ret = ttm_mem_global_alloc(mem_glob, acc_size, false, false); + ret = ttm_mem_global_alloc(mem_glob, acc_size, false); if (unlikely(ret != 0)) return ret; diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c index e70ddd8..b550baf 100644 --- a/drivers/gpu/drm/ttm/ttm_memory.c +++ b/drivers/gpu/drm/ttm/ttm_memory.c @@ -35,21 +35,10 @@ #include #include -#define TTM_MEMORY_ALLOC_RETRIES 4 - -struct ttm_mem_zone { - struct kobject kobj; - struct ttm_mem_global *glob; - const char *name; - uint64_t zone_mem; - uint64_t emer_mem; - uint64_t max_mem; - uint64_t swap_limit; - uint64_t used_mem; -}; +#define TTM_MEMORY_RETRIES 4 static struct attribute ttm_mem_sys = { - .name = "zone_memory", + .name = "memory", .mode = S_IRUGO }; static struct attribute ttm_mem_emer = { @@ -64,140 +53,141 @@ static struct attribute ttm_mem_swap = { .name = "swap_limit", .mode = S_IRUGO | S_IWUSR }; +static struct attribute ttm_mem_dma32_swap = { + .name = "swap_dma32_limit", + .mode = S_IRUGO | S_IWUSR +}; static struct attribute ttm_mem_used = { .name = "used_memory", .mode = S_IRUGO }; +static struct attribute ttm_mem_dma32_used = { + .name = "used_dma32_memory", + .mode = S_IRUGO +}; -static void ttm_mem_zone_kobj_release(struct kobject *kobj) -{ - struct ttm_mem_zone *zone = - container_of(kobj, struct ttm_mem_zone, kobj); - - printk(KERN_INFO TTM_PFX - "Zone %7s: Used memory at exit: %llu kiB.\n", - zone->name, (unsigned long long) zone->used_mem >> 10); - kfree(zone); -} - -static ssize_t ttm_mem_zone_show(struct kobject *kobj, -struct attribute *attr, -char *buffer) +static ssize_t ttm_mem_global_show(struct kobject *kobj, + struct attribute *attr, + char *buffer) { - struct ttm_mem_zone *zone = - container_of(kobj, struct ttm_mem_zone, kobj); - uint64_t val = 0; + struct ttm_mem_global *glob = + container_of(kobj, struct ttm_mem_global, kobj); + unsigned long val = 0; - spin_lock(&zone->glob->lock); + spin_lock(&glob->lock); if (attr == &ttm_mem_sys) - val = zone->zone_mem; + val = glob->mem; else if (attr == &ttm_mem_emer) - val = zone->emer_mem; + val = glob->emer_mem; else if (attr == &ttm_mem_max) - val = zone->max_mem; + val = glob->max_mem; else if (attr == &ttm_mem_swap) - val = zone->swap_limit; + val = glob->swap_limit; else if (attr ==
[PATCH 06/13] drm/ttm: convert page allocation to use page ptr array instead of list V4
From: Jerome Glisse Use the ttm_tt page ptr array for page allocation, move the list to array unwinding into the page allocation functions. V2 split the fix to use ttm put page as a separate fix properly fill pages array when TTM_PAGE_FLAG_ZERO_ALLOC is not set V3 Added back page_count()==1 check when freeing page V4 Rebase on top of memory accounting overhaul Signed-off-by: Jerome Glisse --- drivers/gpu/drm/ttm/ttm_memory.c | 47 +++-- drivers/gpu/drm/ttm/ttm_page_alloc.c | 90 -- drivers/gpu/drm/ttm/ttm_tt.c | 68 -- include/drm/ttm/ttm_memory.h | 13 +++-- include/drm/ttm/ttm_page_alloc.h | 17 +++--- 5 files changed, 120 insertions(+), 115 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c index b550baf..98f6899 100644 --- a/drivers/gpu/drm/ttm/ttm_memory.c +++ b/drivers/gpu/drm/ttm/ttm_memory.c @@ -326,32 +326,45 @@ int ttm_mem_global_alloc(struct ttm_mem_global *glob, } EXPORT_SYMBOL(ttm_mem_global_alloc); -int ttm_mem_global_alloc_page(struct ttm_mem_global *glob, - struct page *page, - bool no_wait) +int ttm_mem_global_alloc_pages(struct ttm_mem_global *glob, + unsigned npages, + bool no_wait) { - - if (ttm_mem_global_alloc(glob, PAGE_SIZE, no_wait)) + if (ttm_mem_global_alloc(glob, PAGE_SIZE * npages, no_wait)) return -ENOMEM; + ttm_check_swapping(glob); + return 0; +} + +void ttm_mem_global_account_pages(struct ttm_mem_global *glob, + struct page **pages, + unsigned npages) +{ + unsigned i; /* check if page is dma32 */ - if (page_to_pfn(page) > 0x0010UL) { - spin_lock(&glob->lock); - glob->used_mem -= PAGE_SIZE; - glob->used_dma32_mem += PAGE_SIZE; - spin_unlock(&glob->lock); + spin_lock(&glob->lock); + for (i = 0; i < npages; i++) { + if (page_to_pfn(pages[i]) > 0x0010UL) { + glob->used_mem -= PAGE_SIZE; + glob->used_dma32_mem += PAGE_SIZE; + } } - ttm_check_swapping(glob); - return 0; + spin_unlock(&glob->lock); } -void ttm_mem_global_free_page(struct ttm_mem_global *glob, struct page *page) +void ttm_mem_global_free_pages(struct ttm_mem_global *glob, + struct page **pages, unsigned npages) { + unsigned i; + spin_lock(&glob->lock); - if (page_to_pfn(page) > 0x0010UL) { - glob->used_dma32_mem -= PAGE_SIZE; - } else { - glob->used_mem -= PAGE_SIZE; + for (i = 0; i < npages; i++) { + if (page_to_pfn(pages[i]) > 0x0010UL) { + glob->used_dma32_mem -= PAGE_SIZE; + } else { + glob->used_mem -= PAGE_SIZE; + } } spin_unlock(&glob->lock); } diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 727e93d..c4f18b9 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -619,8 +619,10 @@ static void ttm_page_pool_fill_locked(struct ttm_page_pool *pool, * @return count of pages still required to fulfill the request. */ static unsigned ttm_page_pool_get_pages(struct ttm_page_pool *pool, - struct list_head *pages, int ttm_flags, - enum ttm_caching_state cstate, unsigned count) + struct list_head *pages, + int ttm_flags, + enum ttm_caching_state cstate, + unsigned count) { unsigned long irq_flags; struct list_head *p; @@ -664,13 +666,14 @@ out: * On success pages list will hold count number of correctly * cached pages. */ -int ttm_get_pages(struct list_head *pages, int flags, - enum ttm_caching_state cstate, unsigned count, - dma_addr_t *dma_address) +int ttm_get_pages(struct page **pages, unsigned npages, int flags, + enum ttm_caching_state cstate, dma_addr_t *dma_address) { struct ttm_page_pool *pool = ttm_get_pool(flags, cstate); struct page *p = NULL; + struct list_head plist; gfp_t gfp_flags = GFP_USER; + unsigned count = 0; int r; /* set zero flag for page allocation if required */ @@ -684,94 +687,107 @@ int ttm_get_pages(struct list_head *pages, int flags, else gfp_flags |= GFP_HIGHUSER; - for (r = 0; r < count; ++r) { - p = alloc_page(gfp_flags); - if (!p) { -
[PATCH 07/13] drm/ttm: test for dma_address array allocation failure
From: Jerome Glisse Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_tt.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 303bbba..2dab08b 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -293,7 +293,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, ttm->dummy_read_page = dummy_read_page; ttm_tt_alloc_page_directory(ttm); - if (!ttm->pages) { + if (!ttm->pages || !ttm->dma_address) { ttm_tt_destroy(ttm); printk(KERN_ERR TTM_PFX "Failed allocating page table\n"); return NULL; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 08/13] drm/ttm: merge ttm_backend and ttm_tt V2
From: Jerome Glisse ttm_backend will exist only and only with a ttm_tt, and ttm_tt will be of interesting use only when bind to a backend. Thus to avoid code & data duplication btw the two merge them. V2 Rebase on top of memory accountign overhaul Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/nouveau/nouveau_bo.c| 14 ++- drivers/gpu/drm/nouveau/nouveau_drv.h |5 +- drivers/gpu/drm/nouveau/nouveau_sgdma.c | 188 -- drivers/gpu/drm/radeon/radeon_ttm.c | 222 --- drivers/gpu/drm/ttm/ttm_agp_backend.c | 88 + drivers/gpu/drm/ttm/ttm_bo.c|9 +- drivers/gpu/drm/ttm/ttm_tt.c| 60 ++--- drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 66 +++-- include/drm/ttm/ttm_bo_driver.h | 104 ++- 9 files changed, 295 insertions(+), 461 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 7226f41..b060fa4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -343,8 +343,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val) *mem = val; } -static struct ttm_backend * -nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev) +static struct ttm_tt * +nouveau_ttm_tt_create(struct ttm_bo_device *bdev, + unsigned long size, uint32_t page_flags, + struct page *dummy_read_page) { struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev); struct drm_device *dev = dev_priv->dev; @@ -352,11 +354,13 @@ nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev) switch (dev_priv->gart_info.type) { #if __OS_HAS_AGP case NOUVEAU_GART_AGP: - return ttm_agp_backend_init(bdev, dev->agp->bridge); + return ttm_agp_tt_create(bdev, dev->agp->bridge, +size, page_flags, dummy_read_page); #endif case NOUVEAU_GART_PDMA: case NOUVEAU_GART_HW: - return nouveau_sgdma_init_ttm(dev); + return nouveau_sgdma_create_ttm(bdev, size, page_flags, + dummy_read_page); default: NV_ERROR(dev, "Unknown GART type %d\n", dev_priv->gart_info.type); @@ -1045,7 +1049,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) } struct ttm_bo_driver nouveau_bo_driver = { - .create_ttm_backend_entry = nouveau_bo_create_ttm_backend_entry, + .ttm_tt_create = &nouveau_ttm_tt_create, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 29837da..0c53e39 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -1000,7 +1000,10 @@ extern int nouveau_sgdma_init(struct drm_device *); extern void nouveau_sgdma_takedown(struct drm_device *); extern uint32_t nouveau_sgdma_get_physical(struct drm_device *, uint32_t offset); -extern struct ttm_backend *nouveau_sgdma_init_ttm(struct drm_device *); +extern struct ttm_tt *nouveau_sgdma_create_ttm(struct ttm_bo_device *bdev, + unsigned long size, + uint32_t page_flags, + struct page *dummy_read_page); /* nouveau_debugfs.c */ #if defined(CONFIG_DRM_NOUVEAU_DEBUG) diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c index b75258a..bc2ab90 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c @@ -8,44 +8,23 @@ #define NV_CTXDMA_PAGE_MASK (NV_CTXDMA_PAGE_SIZE - 1) struct nouveau_sgdma_be { - struct ttm_backend backend; + struct ttm_tt ttm; struct drm_device *dev; - - dma_addr_t *pages; - unsigned nr_pages; - bool unmap_pages; - u64 offset; - bool bound; }; static int -nouveau_sgdma_populate(struct ttm_backend *be, unsigned long num_pages, - struct page **pages, struct page *dummy_read_page, - dma_addr_t *dma_addrs) +nouveau_sgdma_dma_map(struct ttm_tt *ttm) { - struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)be; + struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm; struct drm_device *dev = nvbe->dev; int i; - NV_DEBUG(nvbe->dev, "num_pages = %ld\n", num_pages); - - nvbe->pages = dma_addrs; - nvbe->nr_pages = num_pages; - nvbe->unmap_pages = true; - - /* this code path isn't called and is incorrect anyways */ - if
[PATCH 09/13] drm/ttm: introduce callback for ttm_tt populate & unpopulate V2
From: Jerome Glisse Move the page allocation and freeing to driver callback and provide ttm code helper function for those. Most intrusive change, is the fact that we now only fully populate an object this simplify some of code designed around the page fault design. V2 Rebase on top of memory accounting overhaul Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/nouveau/nouveau_bo.c |3 + drivers/gpu/drm/radeon/radeon_ttm.c|2 + drivers/gpu/drm/ttm/ttm_bo_util.c | 31 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c| 13 +++-- drivers/gpu/drm/ttm/ttm_page_alloc.c | 45 + drivers/gpu/drm/ttm/ttm_tt.c | 86 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 + include/drm/ttm/ttm_bo_driver.h| 41 +--- include/drm/ttm/ttm_page_alloc.h | 18 +++ 9 files changed, 123 insertions(+), 119 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index b060fa4..7e5ca3f 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -28,6 +28,7 @@ */ #include "drmP.h" +#include "ttm/ttm_page_alloc.h" #include "nouveau_drm.h" #include "nouveau_drv.h" @@ -1050,6 +1051,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) struct ttm_bo_driver nouveau_bo_driver = { .ttm_tt_create = &nouveau_ttm_tt_create, + .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate, + .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 53ff62b..490afce 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -584,6 +584,8 @@ struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device *bdev, static struct ttm_bo_driver radeon_bo_driver = { .ttm_tt_create = &radeon_ttm_tt_create, + .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate, + .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate, .invalidate_caches = &radeon_invalidate_caches, .init_mem_type = &radeon_init_mem_type, .evict_flags = &radeon_evict_flags, diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 082fcae..60f204d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -244,7 +244,7 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src, unsigned long page, pgprot_t prot) { - struct page *d = ttm_tt_get_page(ttm, page); + struct page *d = ttm->pages[page]; void *dst; if (!d) @@ -281,7 +281,7 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void *dst, unsigned long page, pgprot_t prot) { - struct page *s = ttm_tt_get_page(ttm, page); + struct page *s = ttm->pages[page]; void *src; if (!s) @@ -342,6 +342,12 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, if (old_iomap == NULL && ttm == NULL) goto out2; + if (ttm->state == tt_unpopulated) { + ret = ttm->bdev->driver->ttm_tt_populate(ttm); + if (ret) + goto out1; + } + add = 0; dir = 1; @@ -502,10 +508,16 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, { struct ttm_mem_reg *mem = &bo->mem; pgprot_t prot; struct ttm_tt *ttm = bo->ttm; - struct page *d; - int i; + int ret; BUG_ON(!ttm); + + if (ttm->state == tt_unpopulated) { + ret = ttm->bdev->driver->ttm_tt_populate(ttm); + if (ret) + return ret; + } + if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED)) { /* * We're mapping a single page, and the desired @@ -513,18 +525,9 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, */ map->bo_kmap_type = ttm_bo_map_kmap; - map->page = ttm_tt_get_page(ttm, start_page); + map->page = ttm->pages[start_page]; map->virtual = kmap(map->page); } else { - /* -* Populate the part we're mapping; -*/ - for (i = start_page; i < start_page + num_pages; ++i) { - d = ttm_tt_get_page(ttm, i); - if (!d) - return -ENOMEM; - } - /* * We need to use vmap to get the desired page protection * or to make the buffer objec
[PATCH 11/13] swiotlb: Expose swiotlb_nr_tlb function to modules
From: Konrad Rzeszutek Wilk As a mechanism to detect whether SWIOTLB is enabled or not. We also fix the spelling - it was swioltb instead of swiotlb. CC: FUJITA Tomonori [v1: Ripped out swiotlb_enabled] Signed-off-by: Konrad Rzeszutek Wilk --- drivers/xen/swiotlb-xen.c |2 +- include/linux/swiotlb.h |2 +- lib/swiotlb.c |5 +++-- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index c984768..c50fb0b 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -152,7 +152,7 @@ void __init xen_swiotlb_init(int verbose) char *m = NULL; unsigned int repeat = 3; - nr_tbl = swioltb_nr_tbl(); + nr_tbl = swiotlb_nr_tbl(); if (nr_tbl) xen_io_tlb_nslabs = nr_tbl; else { diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 445702c..e872526 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -24,7 +24,7 @@ extern int swiotlb_force; extern void swiotlb_init(int verbose); extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose); -extern unsigned long swioltb_nr_tbl(void); +extern unsigned long swiotlb_nr_tbl(void); /* * Enumeration for sync targets diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 99093b3..058935e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -110,11 +110,11 @@ setup_io_tlb_npages(char *str) __setup("swiotlb=", setup_io_tlb_npages); /* make io_tlb_overflow tunable too? */ -unsigned long swioltb_nr_tbl(void) +unsigned long swiotlb_nr_tbl(void) { return io_tlb_nslabs; } - +EXPORT_SYMBOL_GPL(swiotlb_nr_tbl); /* Note that this doesn't work with highmem page */ static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev, volatile void *address) @@ -321,6 +321,7 @@ void __init swiotlb_free(void) free_bootmem_late(__pa(io_tlb_start), PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT)); } + io_tlb_nslabs = 0; } static int is_swiotlb_buffer(phys_addr_t paddr) -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 12/13] drm/radeon/kms: Enable the TTM DMA pool if swiotlb is on V2
From: Konrad Rzeszutek Wilk With the exception that we do not handle the AGP case. We only deal with PCIe cards such as ATI ES1000 or HD3200 that have been detected to only do DMA up to 32-bits. V2 force dma32 if we fail to set bigger dma mask CC: Dave Airlie CC: Alex Deucher Signed-off-by: Konrad Rzeszutek Wilk Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h|1 - drivers/gpu/drm/radeon/radeon_device.c |6 ++ drivers/gpu/drm/radeon/radeon_gart.c | 29 +--- drivers/gpu/drm/radeon/radeon_ttm.c| 83 +-- 4 files changed, 84 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index e3170c7..63257ba 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -332,7 +332,6 @@ struct radeon_gart { union radeon_gart_table table; struct page **pages; dma_addr_t *pages_addr; - bool*ttm_alloced; boolready; }; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index c33bc91..7c31321 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -765,8 +765,14 @@ int radeon_device_init(struct radeon_device *rdev, r = pci_set_dma_mask(rdev->pdev, DMA_BIT_MASK(dma_bits)); if (r) { rdev->need_dma32 = true; + dma_bits = 32; printk(KERN_WARNING "radeon: No suitable DMA available.\n"); } + r = pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(dma_bits)); + if (r) { + pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(32)); + printk(KERN_WARNING "radeon: No coherent DMA available.\n"); + } /* Registers mapping */ /* TODO: block userspace mapping of io register */ diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index fdc3a9a..18f496c 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -149,9 +149,6 @@ void radeon_gart_unbind(struct radeon_device *rdev, unsigned offset, p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); for (i = 0; i < pages; i++, p++) { if (rdev->gart.pages[p]) { - if (!rdev->gart.ttm_alloced[p]) - pci_unmap_page(rdev->pdev, rdev->gart.pages_addr[p], - PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); rdev->gart.pages[p] = NULL; rdev->gart.pages_addr[p] = rdev->dummy_page.addr; page_base = rdev->gart.pages_addr[p]; @@ -181,23 +178,7 @@ int radeon_gart_bind(struct radeon_device *rdev, unsigned offset, p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); for (i = 0; i < pages; i++, p++) { - /* we reverted the patch using dma_addr in TTM for now but this -* code stops building on alpha so just comment it out for now */ - if (0) { /*dma_addr[i] != DMA_ERROR_CODE) */ - rdev->gart.ttm_alloced[p] = true; - rdev->gart.pages_addr[p] = dma_addr[i]; - } else { - /* we need to support large memory configurations */ - /* assume that unbind have already been call on the range */ - rdev->gart.pages_addr[p] = pci_map_page(rdev->pdev, pagelist[i], - 0, PAGE_SIZE, - PCI_DMA_BIDIRECTIONAL); - if (pci_dma_mapping_error(rdev->pdev, rdev->gart.pages_addr[p])) { - /* FIXME: failed to map page (return -ENOMEM?) */ - radeon_gart_unbind(rdev, offset, pages); - return -ENOMEM; - } - } + rdev->gart.pages_addr[p] = dma_addr[i]; rdev->gart.pages[p] = pagelist[i]; page_base = rdev->gart.pages_addr[p]; for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) { @@ -259,12 +240,6 @@ int radeon_gart_init(struct radeon_device *rdev) radeon_gart_fini(rdev); return -ENOMEM; } - rdev->gart.ttm_alloced = kzalloc(sizeof(bool) * -rdev->gart.num_cpu_pages, GFP_KERNEL); - if (rdev->gart.ttm_alloced == NULL) { - radeon_gart_fini(rdev); - return -ENOMEM; - } /* set GART entry to point to the dummy page by default */ for (i = 0; i < rdev->gart.num_cpu_pages; i++) { rdev->gart.pages_addr[i] = rdev->dummy_page.addr; @@
[PATCH 13/13] drm/nouveau: enable the TTM DMA pool on 32-bit DMA only device V2
From: Konrad Rzeszutek Wilk If the card is capable of more than 32-bit, then use the default TTM page pool code which allocates from anywhere in the memory. Note: If the 'ttm.no_dma' parameter is set, the override is ignored and the default TTM pool is used. V2 use pci_set_consistent_dma_mask CC: Ben Skeggs CC: Francisco Jerez CC: Dave Airlie Signed-off-by: Konrad Rzeszutek Wilk Reviewed-by: Jerome Glisse --- drivers/gpu/drm/nouveau/nouveau_bo.c | 73 - drivers/gpu/drm/nouveau/nouveau_debugfs.c |1 + drivers/gpu/drm/nouveau/nouveau_mem.c |6 ++ drivers/gpu/drm/nouveau/nouveau_sgdma.c | 60 +--- 4 files changed, 79 insertions(+), 61 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 7e5ca3f..36234a7 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -1049,10 +1049,79 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) nouveau_fence_unref(&old_fence); } +static int +nouveau_ttm_tt_populate(struct ttm_tt *ttm) +{ + struct drm_nouveau_private *dev_priv; + struct drm_device *dev; + unsigned i; + int r; + + if (ttm->state != tt_unpopulated) + return 0; + + dev_priv = nouveau_bdev(ttm->bdev); + dev = dev_priv->dev; + +#ifdef CONFIG_SWIOTLB + if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) { + return ttm_dma_populate(ttm, dev->dev); + } +#endif + + r = ttm_page_alloc_ttm_tt_populate(ttm); + if (r) { + return r; + } + + for (i = 0; i < ttm->num_pages; i++) { + ttm->dma_address[i] = pci_map_page(dev->pdev, ttm->pages[i], + 0, PAGE_SIZE, + PCI_DMA_BIDIRECTIONAL); + if (pci_dma_mapping_error(dev->pdev, ttm->dma_address[i])) { + while (--i) { + pci_unmap_page(dev->pdev, ttm->dma_address[i], + PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); + ttm->dma_address[i] = 0; + } + ttm_page_alloc_ttm_tt_unpopulate(ttm); + return -EFAULT; + } + } + return 0; +} + +static void +nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm) +{ + struct drm_nouveau_private *dev_priv; + struct drm_device *dev; + unsigned i; + + dev_priv = nouveau_bdev(ttm->bdev); + dev = dev_priv->dev; + +#ifdef CONFIG_SWIOTLB + if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) { + ttm_dma_unpopulate(ttm, dev->dev); + return; + } +#endif + + for (i = 0; i < ttm->num_pages; i++) { + if (ttm->dma_address[i]) { + pci_unmap_page(dev->pdev, ttm->dma_address[i], + PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); + } + } + + ttm_page_alloc_ttm_tt_unpopulate(ttm); +} + struct ttm_bo_driver nouveau_bo_driver = { .ttm_tt_create = &nouveau_ttm_tt_create, - .ttm_tt_populate = &ttm_page_alloc_ttm_tt_populate, - .ttm_tt_unpopulate = &ttm_page_alloc_ttm_tt_unpopulate, + .ttm_tt_populate = &nouveau_ttm_tt_populate, + .ttm_tt_unpopulate = &nouveau_ttm_tt_unpopulate, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c b/drivers/gpu/drm/nouveau/nouveau_debugfs.c index 8e15923..f52c2db 100644 --- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c +++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c @@ -178,6 +178,7 @@ static struct drm_info_list nouveau_debugfs_list[] = { { "memory", nouveau_debugfs_memory_info, 0, NULL }, { "vbios.rom", nouveau_debugfs_vbios_image, 0, NULL }, { "ttm_page_pool", ttm_page_alloc_debugfs, 0, NULL }, + { "ttm_dma_page_pool", ttm_dma_page_alloc_debugfs, 0, NULL }, }; #define NOUVEAU_DEBUGFS_ENTRIES ARRAY_SIZE(nouveau_debugfs_list) diff --git a/drivers/gpu/drm/nouveau/nouveau_mem.c b/drivers/gpu/drm/nouveau/nouveau_mem.c index 36bec48..37fcaa2 100644 --- a/drivers/gpu/drm/nouveau/nouveau_mem.c +++ b/drivers/gpu/drm/nouveau/nouveau_mem.c @@ -407,6 +407,12 @@ nouveau_mem_vram_init(struct drm_device *dev) ret = pci_set_dma_mask(dev->pdev, DMA_BIT_MASK(dma_bits)); if (ret) return ret; + ret = pci_set_consistent_dma_mask(dev->pdev, DMA_BIT_MASK(dma_bits)); + if (ret) { + /* Reset to default value. */ + pci_set_consistent_dma_mask(dev->pdev, DMA_BIT_MASK(32)); + } + ret = nouveau_ttm_global_init(dev_priv);
Isolate dma information from ttm_tt
This apply on top of the ttm_tt & backend merge patchset. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 14/14] drm/ttm: isolate dma data from ttm_tt
From: Jerome Glisse Move dma data to a superset ttm_dma_tt structure which herit from ttm_tt. This allow driver that don't use dma functionalities to not have to waste memory for it. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/nouveau/nouveau_bo.c | 18 + drivers/gpu/drm/nouveau/nouveau_sgdma.c | 22 +++ drivers/gpu/drm/radeon/radeon_ttm.c | 43 +++--- drivers/gpu/drm/ttm/ttm_page_alloc.c | 10 +++--- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 38 +++- drivers/gpu/drm/ttm/ttm_tt.c | 58 - drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |2 + include/drm/ttm/ttm_bo_driver.h | 31 +++- include/drm/ttm/ttm_page_alloc.h | 12 ++ 9 files changed, 155 insertions(+), 79 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 36234a7..df3f19c 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -1052,6 +1052,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) static int nouveau_ttm_tt_populate(struct ttm_tt *ttm) { + struct ttm_dma_tt *ttm_dma = (void *)ttm; struct drm_nouveau_private *dev_priv; struct drm_device *dev; unsigned i; @@ -1065,7 +1066,7 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm) #ifdef CONFIG_SWIOTLB if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) { - return ttm_dma_populate(ttm, dev->dev); + return ttm_dma_populate((void *)ttm, dev->dev); } #endif @@ -1075,14 +1076,14 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm) } for (i = 0; i < ttm->num_pages; i++) { - ttm->dma_address[i] = pci_map_page(dev->pdev, ttm->pages[i], + ttm_dma->dma_address[i] = pci_map_page(dev->pdev, ttm->pages[i], 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); - if (pci_dma_mapping_error(dev->pdev, ttm->dma_address[i])) { + if (pci_dma_mapping_error(dev->pdev, ttm_dma->dma_address[i])) { while (--i) { - pci_unmap_page(dev->pdev, ttm->dma_address[i], + pci_unmap_page(dev->pdev, ttm_dma->dma_address[i], PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); - ttm->dma_address[i] = 0; + ttm_dma->dma_address[i] = 0; } ttm_page_alloc_ttm_tt_unpopulate(ttm); return -EFAULT; @@ -1094,6 +1095,7 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm) static void nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm) { + struct ttm_dma_tt *ttm_dma = (void *)ttm; struct drm_nouveau_private *dev_priv; struct drm_device *dev; unsigned i; @@ -1103,14 +1105,14 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm) #ifdef CONFIG_SWIOTLB if ((dma_get_mask(dev->dev) <= DMA_BIT_MASK(32)) && swiotlb_nr_tbl()) { - ttm_dma_unpopulate(ttm, dev->dev); + ttm_dma_unpopulate((void *)ttm, dev->dev); return; } #endif for (i = 0; i < ttm->num_pages; i++) { - if (ttm->dma_address[i]) { - pci_unmap_page(dev->pdev, ttm->dma_address[i], + if (ttm_dma->dma_address[i]) { + pci_unmap_page(dev->pdev, ttm_dma->dma_address[i], PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); } } diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c index ee1eb7c..47f245e 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c @@ -8,7 +8,10 @@ #define NV_CTXDMA_PAGE_MASK (NV_CTXDMA_PAGE_SIZE - 1) struct nouveau_sgdma_be { - struct ttm_tt ttm; + /* this has to be the first field so populate/unpopulated in +* nouve_bo.c works properly, otherwise have to move them here +*/ + struct ttm_dma_tt ttm; struct drm_device *dev; u64 offset; }; @@ -20,6 +23,7 @@ nouveau_sgdma_destroy(struct ttm_tt *ttm) if (ttm) { NV_DEBUG(nvbe->dev, "\n"); + ttm_dma_tt_fini(&nvbe->ttm); kfree(nvbe); } } @@ -38,7 +42,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg *mem) nvbe->offset = mem->start << PAGE_SHIFT; pte = (nvbe->offset >> NV_CTXDMA_PAGE_SHIFT) + 2; for (i = 0; i < ttm->num_pages; i++) { - dma_addr_t dma_offset = ttm->dma_address[i]; + dma_addr_t dma_offset = nvbe->ttm.dma_address[i]; uint32_t offset_l = lower_32_bits(dma_offset);
ttm: merge ttm_backend & ttm_tt, introduce ttm dma allocator V4
So squeezed all to avoid any memory accouting messing, seems to work ok so far. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 01/13] swiotlb: Expose swiotlb_nr_tlb function to modules
From: Konrad Rzeszutek Wilk As a mechanism to detect whether SWIOTLB is enabled or not. We also fix the spelling - it was swioltb instead of swiotlb. CC: FUJITA Tomonori [v1: Ripped out swiotlb_enabled] Signed-off-by: Konrad Rzeszutek Wilk --- drivers/xen/swiotlb-xen.c |2 +- include/linux/swiotlb.h |2 +- lib/swiotlb.c |5 +++-- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index c984768..c50fb0b 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -152,7 +152,7 @@ void __init xen_swiotlb_init(int verbose) char *m = NULL; unsigned int repeat = 3; - nr_tbl = swioltb_nr_tbl(); + nr_tbl = swiotlb_nr_tbl(); if (nr_tbl) xen_io_tlb_nslabs = nr_tbl; else { diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 445702c..e872526 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -24,7 +24,7 @@ extern int swiotlb_force; extern void swiotlb_init(int verbose); extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose); -extern unsigned long swioltb_nr_tbl(void); +extern unsigned long swiotlb_nr_tbl(void); /* * Enumeration for sync targets diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 99093b3..058935e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -110,11 +110,11 @@ setup_io_tlb_npages(char *str) __setup("swiotlb=", setup_io_tlb_npages); /* make io_tlb_overflow tunable too? */ -unsigned long swioltb_nr_tbl(void) +unsigned long swiotlb_nr_tbl(void) { return io_tlb_nslabs; } - +EXPORT_SYMBOL_GPL(swiotlb_nr_tbl); /* Note that this doesn't work with highmem page */ static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev, volatile void *address) @@ -321,6 +321,7 @@ void __init swiotlb_free(void) free_bootmem_late(__pa(io_tlb_start), PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT)); } + io_tlb_nslabs = 0; } static int is_swiotlb_buffer(phys_addr_t paddr) -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 02/13] drm/ttm: remove userspace backed ttm object support
From: Jerome Glisse This was never use in none of the driver, properly using userspace page for bo would need more code (vma interaction mostly). Removing this dead code in preparation of ttm_tt & backend merge. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c| 22 drivers/gpu/drm/ttm/ttm_tt.c| 105 +-- include/drm/ttm/ttm_bo_api.h|5 -- include/drm/ttm/ttm_bo_driver.h | 24 - 4 files changed, 1 insertions(+), 155 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 617b646..4bde335 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -342,22 +342,6 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, bool zero_alloc) if (unlikely(bo->ttm == NULL)) ret = -ENOMEM; break; - case ttm_bo_type_user: - bo->ttm = ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT, - page_flags | TTM_PAGE_FLAG_USER, - glob->dummy_read_page); - if (unlikely(bo->ttm == NULL)) { - ret = -ENOMEM; - break; - } - - ret = ttm_tt_set_user(bo->ttm, current, - bo->buffer_start, bo->num_pages); - if (unlikely(ret != 0)) { - ttm_tt_destroy(bo->ttm); - bo->ttm = NULL; - } - break; default: printk(KERN_ERR TTM_PFX "Illegal buffer object type\n"); ret = -EINVAL; @@ -907,16 +891,12 @@ static uint32_t ttm_bo_select_caching(struct ttm_mem_type_manager *man, } static bool ttm_bo_mt_compatible(struct ttm_mem_type_manager *man, -bool disallow_fixed, uint32_t mem_type, uint32_t proposed_placement, uint32_t *masked_placement) { uint32_t cur_flags = ttm_bo_type_flags(mem_type); - if ((man->flags & TTM_MEMTYPE_FLAG_FIXED) && disallow_fixed) - return false; - if ((cur_flags & proposed_placement & TTM_PL_MASK_MEM) == 0) return false; @@ -961,7 +941,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, man = &bdev->man[mem_type]; type_ok = ttm_bo_mt_compatible(man, - bo->type == ttm_bo_type_user, mem_type, placement->placement[i], &cur_flags); @@ -1009,7 +988,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, if (!man->has_type) continue; if (!ttm_bo_mt_compatible(man, - bo->type == ttm_bo_type_user, mem_type, placement->busy_placement[i], &cur_flags)) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 58c271e..82a1161 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -62,43 +62,6 @@ static void ttm_tt_free_page_directory(struct ttm_tt *ttm) ttm->dma_address = NULL; } -static void ttm_tt_free_user_pages(struct ttm_tt *ttm) -{ - int write; - int dirty; - struct page *page; - int i; - struct ttm_backend *be = ttm->be; - - BUG_ON(!(ttm->page_flags & TTM_PAGE_FLAG_USER)); - write = ((ttm->page_flags & TTM_PAGE_FLAG_WRITE) != 0); - dirty = ((ttm->page_flags & TTM_PAGE_FLAG_USER_DIRTY) != 0); - - if (be) - be->func->clear(be); - - for (i = 0; i < ttm->num_pages; ++i) { - page = ttm->pages[i]; - if (page == NULL) - continue; - - if (page == ttm->dummy_read_page) { - BUG_ON(write); - continue; - } - - if (write && dirty && !PageReserved(page)) - set_page_dirty_lock(page); - - ttm->pages[i] = NULL; - ttm_mem_global_free(ttm->glob->mem_glob, PAGE_SIZE); - put_page(page); - } - ttm->state = tt_unpopulated; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; -} - static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) { struct page *p; @@ -325,10 +288,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm) } if (likely(ttm->pages != NULL)) { - if (ttm->page_flags & TTM_PAGE
[PATCH 03/13] drm/ttm: remove split btw highmen and lowmem page
From: Jerome Glisse Split btw highmem and lowmem page was rendered useless by the pool code. Remove it. Note further cleanup would change the ttm page allocation helper to actualy take an array instead of relying on list this could drasticly reduce the number of function call in the common case of allocation whole buffer. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_tt.c| 11 ++- include/drm/ttm/ttm_bo_driver.h |7 --- 2 files changed, 2 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 82a1161..8b7a6d0 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -69,7 +69,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) struct ttm_mem_global *mem_glob = ttm->glob->mem_glob; int ret; - while (NULL == (p = ttm->pages[index])) { + if (NULL == (p = ttm->pages[index])) { INIT_LIST_HEAD(&h); @@ -85,10 +85,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) if (unlikely(ret != 0)) goto out_err; - if (PageHighMem(p)) - ttm->pages[--ttm->first_himem_page] = p; - else - ttm->pages[++ttm->last_lomem_page] = p; + ttm->pages[index] = p; } return p; out_err: @@ -270,8 +267,6 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm) ttm_put_pages(&h, count, ttm->page_flags, ttm->caching_state, ttm->dma_address); ttm->state = tt_unpopulated; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; } void ttm_tt_destroy(struct ttm_tt *ttm) @@ -315,8 +310,6 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, ttm->glob = bdev->glob; ttm->num_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; - ttm->first_himem_page = ttm->num_pages; - ttm->last_lomem_page = -1; ttm->caching_state = tt_cached; ttm->page_flags = page_flags; diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 37527d6..9da182b 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -136,11 +136,6 @@ enum ttm_caching_state { * @dummy_read_page: Page to map where the ttm_tt page array contains a NULL * pointer. * @pages: Array of pages backing the data. - * @first_himem_page: Himem pages are put last in the page array, which - * enables us to run caching attribute changes on only the first part - * of the page array containing lomem pages. This is the index of the - * first himem page. - * @last_lomem_page: Index of the last lomem page in the page array. * @num_pages: Number of pages in the page array. * @bdev: Pointer to the current struct ttm_bo_device. * @be: Pointer to the ttm backend. @@ -157,8 +152,6 @@ enum ttm_caching_state { struct ttm_tt { struct page *dummy_read_page; struct page **pages; - long first_himem_page; - long last_lomem_page; uint32_t page_flags; unsigned long num_pages; struct ttm_bo_global *glob; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 04/13] drm/ttm: remove unused backend flags field
From: Jerome Glisse This field is not use by any of the driver just drop it. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/radeon/radeon_ttm.c |1 - include/drm/ttm/ttm_bo_driver.h |2 -- 2 files changed, 0 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 0b5468b..97c76ae 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -787,7 +787,6 @@ struct ttm_backend *radeon_ttm_backend_create(struct radeon_device *rdev) return NULL; } gtt->backend.bdev = &rdev->mman.bdev; - gtt->backend.flags = 0; gtt->backend.func = &radeon_backend_func; gtt->rdev = rdev; gtt->pages = NULL; diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 9da182b..6d17140 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -106,7 +106,6 @@ struct ttm_backend_func { * struct ttm_backend * * @bdev: Pointer to a struct ttm_bo_device. - * @flags: For driver use. * @func: Pointer to a struct ttm_backend_func that describes * the backend methods. * @@ -114,7 +113,6 @@ struct ttm_backend_func { struct ttm_backend { struct ttm_bo_device *bdev; - uint32_t flags; struct ttm_backend_func *func; }; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 05/13] drm/ttm: use ttm put pages function to properly restore cache attribute
From: Jerome Glisse On failure we need to make sure the page we free has wb cache attribute. Do this pas call the proper ttm page helper function. Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_tt.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 8b7a6d0..3fb4c6d 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,7 +89,10 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm, int index) } return p; out_err: - put_page(p); + INIT_LIST_HEAD(&h); + list_add(&p->lru, &h); + ttm_put_pages(&h, 1, ttm->page_flags, + ttm->caching_state, &ttm->dma_address[index]); return NULL; } -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 06/13] drm/ttm: test for dma_address array allocation failure
From: Jerome Glisse Signed-off-by: Jerome Glisse Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_tt.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 3fb4c6d..aceecb5 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -319,7 +319,7 @@ struct ttm_tt *ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, ttm->dummy_read_page = dummy_read_page; ttm_tt_alloc_page_directory(ttm); - if (!ttm->pages) { + if (!ttm->pages || !ttm->dma_address) { ttm_tt_destroy(ttm); printk(KERN_ERR TTM_PFX "Failed allocating page table\n"); return NULL; -- 1.7.7.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 07/13] drm/ttm: page allocation use page array instead of list
From: Jerome Glisse Use the ttm_tt pages array for pages allocations, move the list unwinding into the page allocation functions. Signed-off-by: Jerome Glisse --- drivers/gpu/drm/ttm/ttm_page_alloc.c | 85 +- drivers/gpu/drm/ttm/ttm_tt.c | 36 +++ include/drm/ttm/ttm_page_alloc.h |8 ++-- 3 files changed, 63 insertions(+), 66 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 727e93d..0f3e6d2 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -619,8 +619,10 @@ static void ttm_page_pool_fill_locked(struct ttm_page_pool *pool, * @return count of pages still required to fulfill the request. */ static unsigned ttm_page_pool_get_pages(struct ttm_page_pool *pool, - struct list_head *pages, int ttm_flags, - enum ttm_caching_state cstate, unsigned count) + struct list_head *pages, + int ttm_flags, + enum ttm_caching_state cstate, + unsigned count) { unsigned long irq_flags; struct list_head *p; @@ -664,13 +666,15 @@ out: * On success pages list will hold count number of correctly * cached pages. */ -int ttm_get_pages(struct list_head *pages, int flags, - enum ttm_caching_state cstate, unsigned count, +int ttm_get_pages(struct page **pages, int flags, + enum ttm_caching_state cstate, unsigned npages, dma_addr_t *dma_address) { struct ttm_page_pool *pool = ttm_get_pool(flags, cstate); + struct list_head plist; struct page *p = NULL; gfp_t gfp_flags = GFP_USER; + unsigned count; int r; /* set zero flag for page allocation if required */ @@ -684,7 +688,7 @@ int ttm_get_pages(struct list_head *pages, int flags, else gfp_flags |= GFP_HIGHUSER; - for (r = 0; r < count; ++r) { + for (r = 0; r < npages; ++r) { p = alloc_page(gfp_flags); if (!p) { @@ -693,85 +697,100 @@ int ttm_get_pages(struct list_head *pages, int flags, return -ENOMEM; } - list_add(&p->lru, pages); + pages[r] = p; } return 0; } - /* combine zero flag to pool flags */ gfp_flags |= pool->gfp_flags; /* First we take pages from the pool */ - count = ttm_page_pool_get_pages(pool, pages, flags, cstate, count); + INIT_LIST_HEAD(&plist); + npages = ttm_page_pool_get_pages(pool, &plist, flags, cstate, npages); + count = 0; + list_for_each_entry(p, &plist, lru) { + pages[count++] = p; + } /* clear the pages coming from the pool if requested */ if (flags & TTM_PAGE_FLAG_ZERO_ALLOC) { - list_for_each_entry(p, pages, lru) { + list_for_each_entry(p, &plist, lru) { clear_page(page_address(p)); } } /* If pool didn't have enough pages allocate new one. */ - if (count > 0) { + if (npages > 0) { /* ttm_alloc_new_pages doesn't reference pool so we can run * multiple requests in parallel. **/ - r = ttm_alloc_new_pages(pages, gfp_flags, flags, cstate, count); + INIT_LIST_HEAD(&plist); + r = ttm_alloc_new_pages(&plist, gfp_flags, flags, cstate, npages); + list_for_each_entry(p, &plist, lru) { + pages[count++] = p; + } if (r) { /* If there is any pages in the list put them back to * the pool. */ printk(KERN_ERR TTM_PFX "Failed to allocate extra pages " "for large request."); - ttm_put_pages(pages, 0, flags, cstate, NULL); + ttm_put_pages(pages, count, flags, cstate, NULL); return r; } } - return 0; } /* Put all pages in pages list to correct pool to wait for reuse */ -void ttm_put_pages(struct list_head *pages, unsigned page_count, int flags, +void ttm_put_pages(struct page **pages, unsigned npages, int flags, enum ttm_caching_state cstate, dma_addr_t *dma_address) { unsigned long irq_flags; struct ttm_page_pool *pool = ttm_get_pool(flags, cstate); - struct page *p, *tmp; + unsigned i; if (pool == NULL) { /* No pool for this memory type so free the pages */ - - list_for_each
[PATCH 09/13] drm/ttm: introduce callback for ttm_tt populate & unpopulate V4
From: Jerome Glisse Move the page allocation and freeing to driver callback and provide ttm code helper function for those. Most intrusive change, is the fact that we now only fully populate an object this simplify some of code designed around the page fault design. V2 Rebase on top of memory accounting overhaul V3 New rebase on top of more memory accouting changes V4 Rebase on top of no memory account changes (where/when is my delorean when i need it ?) Signed-off-by: Jerome Glisse --- drivers/gpu/drm/nouveau/nouveau_bo.c |3 + drivers/gpu/drm/radeon/radeon_ttm.c|2 + drivers/gpu/drm/ttm/ttm_bo_util.c | 31 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c|9 +++- drivers/gpu/drm/ttm/ttm_page_alloc.c | 57 drivers/gpu/drm/ttm/ttm_tt.c | 91 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |3 + include/drm/ttm/ttm_bo_driver.h| 41 -- include/drm/ttm/ttm_page_alloc.h | 18 ++ 9 files changed, 135 insertions(+), 120 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index b060fa4..f19ac42 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -28,6 +28,7 @@ */ #include "drmP.h" +#include "ttm/ttm_page_alloc.h" #include "nouveau_drm.h" #include "nouveau_drv.h" @@ -1050,6 +1051,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) struct ttm_bo_driver nouveau_bo_driver = { .ttm_tt_create = &nouveau_ttm_tt_create, + .ttm_tt_populate = &ttm_pool_populate, + .ttm_tt_unpopulate = &ttm_pool_unpopulate, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 53ff62b..13d5996 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -584,6 +584,8 @@ struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device *bdev, static struct ttm_bo_driver radeon_bo_driver = { .ttm_tt_create = &radeon_ttm_tt_create, + .ttm_tt_populate = &ttm_pool_populate, + .ttm_tt_unpopulate = &ttm_pool_unpopulate, .invalidate_caches = &radeon_invalidate_caches, .init_mem_type = &radeon_init_mem_type, .evict_flags = &radeon_evict_flags, diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 082fcae..60f204d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -244,7 +244,7 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src, unsigned long page, pgprot_t prot) { - struct page *d = ttm_tt_get_page(ttm, page); + struct page *d = ttm->pages[page]; void *dst; if (!d) @@ -281,7 +281,7 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void *dst, unsigned long page, pgprot_t prot) { - struct page *s = ttm_tt_get_page(ttm, page); + struct page *s = ttm->pages[page]; void *src; if (!s) @@ -342,6 +342,12 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, if (old_iomap == NULL && ttm == NULL) goto out2; + if (ttm->state == tt_unpopulated) { + ret = ttm->bdev->driver->ttm_tt_populate(ttm); + if (ret) + goto out1; + } + add = 0; dir = 1; @@ -502,10 +508,16 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, { struct ttm_mem_reg *mem = &bo->mem; pgprot_t prot; struct ttm_tt *ttm = bo->ttm; - struct page *d; - int i; + int ret; BUG_ON(!ttm); + + if (ttm->state == tt_unpopulated) { + ret = ttm->bdev->driver->ttm_tt_populate(ttm); + if (ret) + return ret; + } + if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED)) { /* * We're mapping a single page, and the desired @@ -513,18 +525,9 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, */ map->bo_kmap_type = ttm_bo_map_kmap; - map->page = ttm_tt_get_page(ttm, start_page); + map->page = ttm->pages[start_page]; map->virtual = kmap(map->page); } else { - /* -* Populate the part we're mapping; -*/ - for (i = start_page; i < start_page + num_pages; ++i) { - d = ttm_tt_get_page(ttm, i); - if (!d) - return -ENOMEM; - } - /* * We need to use vmap to get the desired pag
[PATCH 08/13] drm/ttm: merge ttm_backend and ttm_tt V4
From: Jerome Glisse ttm_backend will exist only and only with a ttm_tt, and ttm_tt will be of interesting use only when bind to a backend. Thus to avoid code & data duplication btw the two merge them. V2 Rebase on top of memory accounting overhaul V3 Rebase on top of more memory accounting changes V4 Rebase on top of no memory account changes (where/when is my delorean when i need it ?) Signed-off-by: Jerome Glisse --- drivers/gpu/drm/nouveau/nouveau_bo.c| 14 ++- drivers/gpu/drm/nouveau/nouveau_drv.h |5 +- drivers/gpu/drm/nouveau/nouveau_sgdma.c | 188 -- drivers/gpu/drm/radeon/radeon_ttm.c | 222 --- drivers/gpu/drm/ttm/ttm_agp_backend.c | 88 + drivers/gpu/drm/ttm/ttm_bo.c|9 +- drivers/gpu/drm/ttm/ttm_tt.c| 59 ++--- drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 66 +++-- include/drm/ttm/ttm_bo_driver.h | 104 ++- 9 files changed, 295 insertions(+), 460 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 7226f41..b060fa4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -343,8 +343,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val) *mem = val; } -static struct ttm_backend * -nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev) +static struct ttm_tt * +nouveau_ttm_tt_create(struct ttm_bo_device *bdev, + unsigned long size, uint32_t page_flags, + struct page *dummy_read_page) { struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev); struct drm_device *dev = dev_priv->dev; @@ -352,11 +354,13 @@ nouveau_bo_create_ttm_backend_entry(struct ttm_bo_device *bdev) switch (dev_priv->gart_info.type) { #if __OS_HAS_AGP case NOUVEAU_GART_AGP: - return ttm_agp_backend_init(bdev, dev->agp->bridge); + return ttm_agp_tt_create(bdev, dev->agp->bridge, +size, page_flags, dummy_read_page); #endif case NOUVEAU_GART_PDMA: case NOUVEAU_GART_HW: - return nouveau_sgdma_init_ttm(dev); + return nouveau_sgdma_create_ttm(bdev, size, page_flags, + dummy_read_page); default: NV_ERROR(dev, "Unknown GART type %d\n", dev_priv->gart_info.type); @@ -1045,7 +1049,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence) } struct ttm_bo_driver nouveau_bo_driver = { - .create_ttm_backend_entry = nouveau_bo_create_ttm_backend_entry, + .ttm_tt_create = &nouveau_ttm_tt_create, .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags, diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 29837da..0c53e39 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -1000,7 +1000,10 @@ extern int nouveau_sgdma_init(struct drm_device *); extern void nouveau_sgdma_takedown(struct drm_device *); extern uint32_t nouveau_sgdma_get_physical(struct drm_device *, uint32_t offset); -extern struct ttm_backend *nouveau_sgdma_init_ttm(struct drm_device *); +extern struct ttm_tt *nouveau_sgdma_create_ttm(struct ttm_bo_device *bdev, + unsigned long size, + uint32_t page_flags, + struct page *dummy_read_page); /* nouveau_debugfs.c */ #if defined(CONFIG_DRM_NOUVEAU_DEBUG) diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c index b75258a..bc2ab90 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c @@ -8,44 +8,23 @@ #define NV_CTXDMA_PAGE_MASK (NV_CTXDMA_PAGE_SIZE - 1) struct nouveau_sgdma_be { - struct ttm_backend backend; + struct ttm_tt ttm; struct drm_device *dev; - - dma_addr_t *pages; - unsigned nr_pages; - bool unmap_pages; - u64 offset; - bool bound; }; static int -nouveau_sgdma_populate(struct ttm_backend *be, unsigned long num_pages, - struct page **pages, struct page *dummy_read_page, - dma_addr_t *dma_addrs) +nouveau_sgdma_dma_map(struct ttm_tt *ttm) { - struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)be; + struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm; struct drm_device *dev = nvbe->dev; int i; - NV_DEBUG(nvbe->dev, "num_pages = %ld\n", num_pages); - - nvbe->pages = dma_addrs; - nvbe->nr_pages = num_pages; -
[PATCH 10/13] drm/ttm: provide dma aware ttm page pool code V7
From: Konrad Rzeszutek Wilk In TTM world the pages for the graphic drivers are kept in three different pools: write combined, uncached, and cached (write-back). When the pages are used by the graphic driver the graphic adapter via its built in MMU (or AGP) programs these pages in. The programming requires the virtual address (from the graphic adapter perspective) and the physical address (either System RAM or the memory on the card) which is obtained using the pci_map_* calls (which does the virtual to physical - or bus address translation). During the graphic application's "life" those pages can be shuffled around, swapped out to disk, moved from the VRAM to System RAM or vice-versa. This all works with the existing TTM pool code - except when we want to use the software IOTLB (SWIOTLB) code to "map" the physical addresses to the graphic adapter MMU. We end up programming the bounce buffer's physical address instead of the TTM pool memory's and get a non-worky driver. There are two solutions: 1) using the DMA API to allocate pages that are screened by the DMA API, or 2) using the pci_sync_* calls to copy the pages from the bounce-buffer and back. This patch fixes the issue by allocating pages using the DMA API. The second is a viable option - but it has performance drawbacks and potential correctness issues - think of the write cache page being bounced (SWIOTLB->TTM), the WC is set on the TTM page and the copy from SWIOTLB not making it to the TTM page until the page has been recycled in the pool (and used by another application). The bounce buffer does not get activated often - only in cases where we have a 32-bit capable card and we want to use a page that is allocated above the 4GB limit. The bounce buffer offers the solution of copying the contents of that 4GB page to an location below 4GB and then back when the operation has been completed (or vice-versa). This is done by using the 'pci_sync_*' calls. Note: If you look carefully enough in the existing TTM page pool code you will notice the GFP_DMA32 flag is used - which should guarantee that the provided page is under 4GB. It certainly is the case, except this gets ignored in two cases: - If user specifies 'swiotlb=force' which bounces _every_ page. - If user is using a Xen's PV Linux guest (which uses the SWIOTLB and the underlaying PFN's aren't necessarily under 4GB). To not have this extra copying done the other option is to allocate the pages using the DMA API so that there is not need to map the page and perform the expensive 'pci_sync_*' calls. This DMA API capable TTM pool requires for this the 'struct device' to properly call the DMA API. It also has to track the virtual and bus address of the page being handed out in case it ends up being swapped out or de-allocated - to make sure it is de-allocated using the proper's 'struct device'. Implementation wise the code keeps two lists: one that is attached to the 'struct device' (via the dev->dma_pools list) and a global one to be used when the 'struct device' is unavailable (think shrinker code). The global list can iterate over all of the 'struct device' and its associated dma_pool. The list in dev->dma_pools can only iterate the device's dma_pool. /[struct device_pool]\ /---| dev | /+---| dma_pool | /-+--\/ \/ |struct device| /-->[struct dma_pool for WC][struct dma_pool for uncached]<-/--| dma_pool | \-+--/ / \/ \--/ [Two pools associated with the device (WC and UC), and the parallel list containing the 'struct dev' and 'struct dma_pool' entries] The maximum amount of dma pools a device can have is six: write-combined, uncached, and cached; then there are the DMA32 variants which are: write-combined dma32, uncached dma32, and cached dma32. Currently this code only gets activated when any variant of the SWIOTLB IOMMU code is running (Intel without VT-d, AMD without GART, IBM Calgary and Xen PV with PCI devices). Tested-by: Michel Dänzer [v1: Using swiotlb_nr_tbl instead of swiotlb_enabled] [v2: Major overhaul - added 'inuse_list' to seperate used from inuse and reorder the order of lists to get better performance.] [v3: Added comments/and some logic based on review, Added Jerome tag] [v4: rebase on top of ttm_tt & ttm_backend merge] [v5: rebase on top of ttm memory accounting overhaul] [v6: New rebase on top of more memory accouting changes] [v7: well rebase on top of no memory accounting changes] Reviewed-by: Jerome Glisse Signed-off-by: Konrad Rzeszutek Wilk --- drivers/gpu/drm/ttm/Makefile |4 + drivers/gpu/drm/ttm/ttm_memory.c |2 +