Re: [Intel-gfx] [PATCH v2 00/17] drm/i915/dg2: Enabling 64k page size and flat ccs

2021-10-25 Thread Robert Beckett

(apologies for not quoting, I wasn't subscribed before now)


some quick thoughts:

- Can we split these patches in to two series, one for each topic. They 
don't seem specifically related.


- to simplify 64K page support, could we just set minimum allocation 
size to 64K and round up for allocation requests?
Placement then becomes much simpler, no need to align the va to 2MB, 
just fit it in wherever it fits and always use 64K PTEs in GTT


This would simplify the code a lot and would benefit performance due up 
to 16x fewer page walks.
If we did this, we would not have to consider 2MB boundaries at all, we 
could drop all the colour handling etc.


The only down side might be some waste of allocation if there are lots 
of very small buffers.
However, I think most gfx related use cases would not be badly affected 
by this (even a cursor plane is 64k, usually).


Are there any usecases that you are aware of that would be impacted 
badly by this idea? (maybe some compute workload?)



- flat ccs modifiers: there seems to be some confusion over whether 
there should be a separate modifier for this.

As it dictates a new layout it seems like it should be a new modifier.
Was there any internal discussions about this that you could elaborate 
on here?




[Intel-gfx] [PATCH v6 00/10] drm/i915: ttm for stolen

2022-06-17 Thread Robert Beckett
This series refactors i915's stolen memory region to use ttm.

v2: handle disabled stolen similar to legacy version.
relying on ttm to fail allocs works fine, but is dmesg noisy and causes 
testing
dmesg warning regressions.

v3: rebase to latest drm-tip.
fix v2 code refactor which could leave a buffer pinned.
locally passes fftl again now.

v4: - Allow memory regions creators to do allocation. Allows stolen region 
to track
  it's own reservations.
- Pre-reserve first page of stolen mem (add back 
WaSkipStolenMemoryFirstPage:bdw+)
- Improve commit descritpion for "drm/i915: sanitize mem_flags for 
stolen buffers"
- replace i915_gem_object_pin_pages_unlocked() call with manual locking 
and pinning.
  this avoids ww ctx class reuse during context creation -> ring vma 
obj alloc.

v5: - detect both types of stolen as stolen buffers in
  "drm/i915: sanitize mem_flags for stolen buffers"
- in stolen_object_init limit page size to mem region minimum.
  The range allocator expects the page_size to define the
  alignment

v6: - Share first 4 patches from ttm for internal series as generic
  i915 ttm fixes
- Drop patch 4 from v5. We don't need separate object ops just
  to satisfy test interfaces. The tests have now been fixed via
  checking whether the memory region is private to decide
  whether to mmap
- Add new buffer pin alloc flag to allow creation of buffers in
  their final ttm placement instead of deferring until
  get_pages. This fixes legacy fallback paths for buffer
  allocations during stolen memory pressure.

Robert Beckett (10):
  drm/i915/ttm: dont trample cache_level overrides during ttm move
  drm/i915: limit ttm to dma32 for i965G[M]
  drm/i915/ttm: only trust snooping for dgfx when deciding default
cache_level
  drm/i915/gem: selftest should not attempt mmap of private regions
  drm/i915: instantiate ttm ranger manager for stolen memory
  drm/i915: sanitize mem_flags for stolen buffers
  drm/i915: ttm move/clear logic fix
  drm/i915: allow memory region creators to alloc and free the region
  drm/i915/ttm: add buffer pin on alloc flag
  drm/i915: stolen memory use ttm backend

 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c|   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 440 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  29 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  47 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|   3 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_memory_region.c|  16 +-
 drivers/gpu/drm/i915/intel_memory_region.h|   2 +
 drivers/gpu/drm/i915/intel_region_ttm.c   |  80 +++-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |   3 +-
 18 files changed, 414 insertions(+), 369 deletions(-)

-- 
2.25.1



[Intel-gfx] [PATCH v6 01/10] drm/i915/ttm: dont trample cache_level overrides during ttm move

2022-06-17 Thread Robert Beckett
Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 9 ++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct 
drm_i915_gem_object *obj,
struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
obj->cache_level = cache_level;
+   obj->ttm.cache_level_override = true;
 
if (cache_level != I915_CACHE_NONE)
obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
struct i915_gem_object_page_iter get_io_page;
struct drm_i915_gem_object *backup;
bool created:1;
+   bool cache_level_override:1;
} ttm;
 
/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_gem_object_init_memory_region(obj, mem);
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
+   obj->ttm.cache_level_override = false;
i915_gem_object_unlock(obj);
 
return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
I915_BO_FLAG_STRUCT_PAGE;
 
-   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-  bo->ttm);
-   i915_gem_object_set_cache_coherency(obj, cache_level);
+   if (!obj->ttm.cache_level_override) {
+   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+  bo->resource, bo->ttm);
+   i915_gem_object_set_cache_coherency(obj, cache_level);
+   obj->ttm.cache_level_override = false;
+   }
 }
 
 /**
-- 
2.25.1



[Intel-gfx] [PATCH v6 02/10] drm/i915: limit ttm to dma32 for i965G[M]

2022-06-17 Thread Robert Beckett
i965G[M] cannot relocate objects above 4GiB.
Ensure ttm uses dma32 on these systems.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index 62ff77445b01..fd2ecfdd8fa1 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -32,10 +32,15 @@
 int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 {
struct drm_device *drm = &dev_priv->drm;
+   bool use_dma32 = false;
+
+   /* i965g[m] cannot relocate objects above 4GiB. */
+   if (IS_I965GM(dev_priv) || IS_I965G(dev_priv))
+   use_dma32 = true;
 
return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
   drm->dev, drm->anon_inode->i_mapping,
-  drm->vma_offset_manager, false, false);
+  drm->vma_offset_manager, false, use_dma32);
 }
 
 /**
-- 
2.25.1



[Intel-gfx] [PATCH v6 04/10] drm/i915/gem: selftest should not attempt mmap of private regions

2022-06-17 Thread Robert Beckett
During testing make can_mmap consider whether the region is private.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum 
i915_mmap_type type)
struct drm_i915_private *i915 = to_i915(obj->base.dev);
bool no_map;
 
+   if (obj->mm.region && obj->mm.region->private)
+   return false;
+
if (obj->ops->mmap_offset)
return type == I915_MMAP_TYPE_FIXED;
else if (type == I915_MMAP_TYPE_FIXED)
-- 
2.25.1



[Intel-gfx] [PATCH v6 03/10] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level

2022-06-17 Thread Robert Beckett
By default i915_ttm_cache_level() decides I915_CACHE_LLC if HAS_SNOOP.
This is divergent from existing backends code which only considers
HAS_LLC.
Testing shows that trusting snooping on gen5- is unreliable and bsw via
ggtt mappings, so limit DGFX for now and maintain previous behaviour.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 4c1de0b4a10f..40249fa28a7a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -46,7 +46,9 @@ static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 struct ttm_tt *ttm)
 {
-   return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+   bool can_snoop = HAS_SNOOP(i915) && IS_DGFX(i915);
+
+   return ((HAS_LLC(i915) || can_snoop) &&
!i915_ttm_gtt_binds_lmem(res) &&
ttm->caching == ttm_cached) ? I915_CACHE_LLC :
I915_CACHE_NONE;
-- 
2.25.1



[Intel-gfx] [PATCH v6 09/10] drm/i915/ttm: add buffer pin on alloc flag

2022-06-17 Thread Robert Beckett
For situations where allocations need to fail on alloc instead of
delayed get_pages, add a new alloc flag to pin the ttm bo.
This makes sure that the resource has been allocated during buffer
creation, allowing it to fail with an error if the placement is
exhausted.
This allows existing fallback options for stolen backend allocation like
create_ring_vma to work as expected.

Signed-off-by: Robert Beckett 
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 25 ++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 6632ed52e919..07bc11247a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -325,17 +325,20 @@ struct drm_i915_gem_object {
  * dealing with userspace objects the CPU fault handler is free to ignore this.
  */
 #define I915_BO_ALLOC_GPU_ONLY   BIT(6)
+/* object should be pinned in destination region from allocation */
+#define I915_BO_ALLOC_PINNED BIT(7)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
 I915_BO_ALLOC_USER | \
 I915_BO_ALLOC_PM_VOLATILE | \
 I915_BO_ALLOC_PM_EARLY | \
-I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY  BIT(7)
-#define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED BIT(9)
-#define I915_BO_WAS_BOUND_BIT 10
+I915_BO_ALLOC_GPU_ONLY | \
+I915_BO_ALLOC_PINNED)
+#define I915_BO_READONLY  BIT(8)
+#define I915_TILING_QUIRK_BIT 9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED BIT(10)
+#define I915_BO_WAS_BOUND_BIT 11
/**
 * @mem_flags - Mutable placement-related flags
 *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27d59639177f..bb988608296d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -998,6 +998,13 @@ static void i915_ttm_delayed_free(struct 
drm_i915_gem_object *obj)
 {
GEM_BUG_ON(!obj->ttm.created);
 
+   /* stolen objects are pinned for lifetime. Unpin before putting */
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   ttm_bo_reserve(i915_gem_to_ttm(obj), true, false, NULL);
+   ttm_bo_unpin(i915_gem_to_ttm(obj));
+   ttm_bo_unreserve(i915_gem_to_ttm(obj));
+   }
+
ttm_bo_put(i915_gem_to_ttm(obj));
 }
 
@@ -1193,6 +1200,9 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
.no_wait_gpu = false,
};
enum ttm_bo_type bo_type;
+   struct ttm_place _place;
+   struct ttm_placement _placement;
+   struct ttm_placement *placement;
int ret;
 
drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,6 +1232,17 @@ int __i915_gem_ttm_object_init(struct 
intel_memory_region *mem,
 */
i915_gem_object_make_unshrinkable(obj);
 
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   i915_ttm_place_from_region(mem, &_place, obj->bo_offset,
+  obj->base.size, obj->flags);
+   _placement.num_placement = 1;
+   _placement.placement = &_place;
+   _placement.num_busy_placement = 0;
+   _placement.busy_placement = NULL;
+   placement = &_placement;
+   } else {
+   placement = &i915_sys_placement;
+   }
/*
 * If this function fails, it will call the destructor, but
 * our caller still owns the object. So no freeing in the
@@ -1230,7 +1251,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
 * until successful initialization.
 */
ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
-  bo_type, &i915_sys_placement,
+  bo_type, placement,
   page_size >> PAGE_SHIFT,
   &ctx, NULL, NULL, i915_ttm_bo_destroy);
if (ret)
@@ -1242,6 +1263,8 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
obj->ttm.cache_level_override = false;
+   if (obj->flags & I915_BO_ALLOC_PINNED)
+   ttm_bo_pin(i915_gem_to_ttm(obj));
i915_gem_object_unlock(obj);
 
return 0;
-- 
2.25.1



[Intel-gfx] [PATCH v6 08/10] drm/i915: allow memory region creators to alloc and free the region

2022-06-17 Thread Robert Beckett
add callbacks for alloc and free.
this allows region creators to allocate any extra storage they may
require.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_memory_region.c | 16 +---
 drivers/gpu/drm/i915/intel_memory_region.h |  2 ++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c 
b/drivers/gpu/drm/i915/intel_memory_region.c
index e38d2db1c3e3..3da07a712f90 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -231,7 +231,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
struct intel_memory_region *mem;
int err;
 
-   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+   if (ops->alloc)
+   mem = ops->alloc();
+   else
+   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return ERR_PTR(-ENOMEM);
 
@@ -265,7 +268,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
if (mem->ops->release)
mem->ops->release(mem);
 err_free:
-   kfree(mem);
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
+   kfree(mem);
return ERR_PTR(err);
 }
 
@@ -288,7 +294,11 @@ void intel_memory_region_destroy(struct 
intel_memory_region *mem)
 
GEM_WARN_ON(!list_empty_careful(&mem->objects.list));
mutex_destroy(&mem->objects.lock);
-   if (!ret)
+   if (ret)
+   return;
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
kfree(mem);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h 
b/drivers/gpu/drm/i915/intel_memory_region.h
index 3d8378c1b447..048955b5429f 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -61,6 +61,8 @@ struct intel_memory_region_ops {
   resource_size_t size,
   resource_size_t page_size,
   unsigned int flags);
+   struct intel_memory_region *(*alloc)(void);
+   void (*free)(struct intel_memory_region *mem);
 };
 
 struct intel_memory_region {
-- 
2.25.1



[Intel-gfx] [PATCH v6 07/10] drm/i915: ttm move/clear logic fix

2022-06-17 Thread Robert Beckett
ttm managed buffers start off with system resource definitions and ttm_tt
tracking structures allocated (though unpopulated).
currently this prevents clearing of buffers on first move to desired
placements.

The desired behaviour is to clear user allocated buffers and any kernel
buffers that specifically requests it only.
Make the logic match the desired behaviour.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 22 +++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 81c67ca9edda..a3f8fc056dbc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,7 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include "drm/ttm/ttm_tt.h"
 #include 
 
 #include "i915_deps.h"
@@ -476,6 +477,25 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
return fence;
 }
 
+static bool
+allow_clear(struct drm_i915_gem_object *obj, struct ttm_tt *ttm, struct 
ttm_resource *dst_mem)
+{
+   /* never clear stolen */
+   if (dst_mem->mem_type == I915_PL_STOLEN)
+   return false;
+   /*
+* we want to clear user buffers and any kernel buffers
+* that specifically request clearing.
+*/
+   if (obj->flags & I915_BO_ALLOC_USER)
+   return true;
+
+   if (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)
+   return true;
+
+   return false;
+}
+
 /**
  * i915_ttm_move - The TTM move callback used by i915.
  * @bo: The buffer object.
@@ -526,7 +546,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
return PTR_ERR(dst_rsgt);
 
clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || 
!ttm_tt_is_populated(ttm));
-   if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+   if (!clear || allow_clear(obj, ttm, dst_mem)) {
struct i915_deps deps;
 
i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | 
__GFP_NOWARN);
-- 
2.25.1



[Intel-gfx] [PATCH v6 06/10] drm/i915: sanitize mem_flags for stolen buffers

2022-06-17 Thread Robert Beckett
Stolen regions are not page backed or considered iomem.
Prevent flags indicating such.
This correctly prevents stolen buffers from attempting to directly map
them.

See i915_gem_object_has_struct_page() and i915_gem_object_has_iomem()
usage for where it would break otherwise.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 675e9ab30396..81c67ca9edda 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
+#include "gem/i915_gem_stolen.h"
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
@@ -124,8 +125,9 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
 
obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 
-   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
-   I915_BO_FLAG_STRUCT_PAGE;
+   if (!i915_gem_object_is_stolen(obj))
+   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
+   I915_BO_FLAG_STRUCT_PAGE;
 
if (!obj->ttm.cache_level_override) {
cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
-- 
2.25.1



[Intel-gfx] [PATCH v6 05/10] drm/i915: instantiate ttm ranger manager for stolen memory

2022-06-17 Thread Robert Beckett
prepare for ttm based stolen region by using ttm range manager
as the resource manager for stolen region.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  6 ++--
 drivers/gpu/drm/i915/intel_region_ttm.c  | 31 +++-
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 40249fa28a7a..675e9ab30396 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -60,11 +60,13 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
 
/* There's some room for optimization here... */
-   GEM_BUG_ON(ttm_mem_type != I915_PL_SYSTEM &&
-  ttm_mem_type < I915_PL_LMEM0);
+   GEM_BUG_ON(ttm_mem_type == I915_PL_GGTT);
+
if (ttm_mem_type == I915_PL_SYSTEM)
return intel_memory_region_lookup(i915, INTEL_MEMORY_SYSTEM,
  0);
+   if (ttm_mem_type == I915_PL_STOLEN)
+   return i915->mm.stolen_region;
 
return intel_memory_region_lookup(i915, INTEL_MEMORY_LOCAL,
  ttm_mem_type - I915_PL_LMEM0);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index fd2ecfdd8fa1..694e9acb69e2 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -54,7 +54,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private 
*dev_priv)
 
 /*
  * Map the i915 memory regions to TTM memory types. We use the
- * driver-private types for now, reserving TTM_PL_VRAM for stolen
+ * driver-private types for now, reserving I915_PL_STOLEN for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
 int intel_region_to_ttm_type(const struct intel_memory_region *mem)
@@ -63,11 +63,17 @@ int intel_region_to_ttm_type(const struct 
intel_memory_region *mem)
 
GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
   mem->type != INTEL_MEMORY_MOCK &&
-  mem->type != INTEL_MEMORY_SYSTEM);
+  mem->type != INTEL_MEMORY_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
if (mem->type == INTEL_MEMORY_SYSTEM)
return TTM_PL_SYSTEM;
 
+   if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+   mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+   return I915_PL_STOLEN;
+
type = mem->instance + TTM_PL_PRIV;
GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -91,10 +97,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
int mem_type = intel_region_to_ttm_type(mem);
int ret;
 
-   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
- resource_size(&mem->region),
- mem->io_size,
- mem->min_page_size, PAGE_SIZE);
+   if (mem_type == I915_PL_STOLEN) {
+   ret = ttm_range_man_init(bdev, mem_type, false,
+resource_size(&mem->region) >> 
PAGE_SHIFT);
+   mem->is_range_manager = true;
+   } else {
+   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+ resource_size(&mem->region),
+ mem->io_size,
+ mem->min_page_size, PAGE_SIZE);
+   }
if (ret)
return ret;
 
@@ -114,6 +126,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
struct ttm_resource_manager *man = mem->region_private;
+   int mem_type = intel_region_to_ttm_type(mem);
int ret = -EBUSY;
int count;
 
@@ -144,8 +157,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
if (ret || !man)
return ret;
 
-   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
- intel_region_to_ttm_type(mem));
+   if (mem_type == I915_PL_STOLEN)
+   ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+   else
+   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
GEM_WARN_ON(ret);
mem->region_private = NULL;
 
-- 
2.25.1



[Intel-gfx] [PATCH v6 10/10] drm/i915: stolen memory use ttm backend

2022-06-17 Thread Robert Beckett
refactor stolen memory region to use ttm.
this necessitates using ttm resources to track reserved stolen regions
instead of drm_mm_nodes.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 440 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_region_ttm.c   |  42 +-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |   3 +-
 13 files changed, 294 insertions(+), 342 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c 
b/drivers/gpu/drm/i915/display/intel_fbc.c
index 8b807284cde1..6f3afac5e8c9 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -38,6 +38,7 @@
  * forcibly disable it to allow proper screen updates.
  */
 
+#include "gem/i915_gem_stolen.h"
 #include 
 
 #include 
@@ -51,6 +52,7 @@
 #include "intel_display_types.h"
 #include "intel_fbc.h"
 #include "intel_frontbuffer.h"
+#include "gem/i915_gem_region.h"
 
 #define for_each_fbc_id(__dev_priv, __fbc_id) \
for ((__fbc_id) = INTEL_FBC_A; (__fbc_id) < I915_MAX_FBCS; 
(__fbc_id)++) \
@@ -92,8 +94,8 @@ struct intel_fbc {
struct mutex lock;
unsigned int busy_bits;
 
-   struct drm_mm_node compressed_fb;
-   struct drm_mm_node compressed_llb;
+   struct ttm_resource *compressed_fb;
+   struct ttm_resource *compressed_llb;
 
enum intel_fbc_id id;
 
@@ -331,16 +333,20 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
+   u64 llb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_llb);
 
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   GEM_BUG_ON(llb_offset == I915_BO_INVALID_OFFSET);
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_fb.start, U32_MAX));
+fb_offset, U32_MAX));
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_llb.start, U32_MAX));
+llb_offset, U32_MAX));
 
intel_de_write(i915, FBC_CFB_BASE,
-  i915->dsm.start + fbc->compressed_fb.start);
+  i915->dsm.start + fb_offset);
intel_de_write(i915, FBC_LL_BASE,
-  i915->dsm.start + fbc->compressed_llb.start);
+  i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -448,8 +454,10 @@ static bool g4x_fbc_is_compressing(struct intel_fbc *fbc)
 static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, DPFC_CB_BASE, fb_offset);
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -499,8 +507,10 @@ static bool ilk_fbc_is_compressing(struct intel_fbc *fbc)
 static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), 
fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fb_offset);
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -744,21 +754,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
struct drm_i915_private *i915 = fbc->i915;
u64 end = intel_fbc_stolen_end(i915);
-   int ret, limit = min_limit;
+   int limit = min_limit;
+   struct ttm_resource *res;
 
size /= limit;
 
/* Try to over-allocate to reduce reallocations and fragmentation. */
-   ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-  size <<= 1, 4096, 0, end);
-   if (ret == 0)
+   res = i915_gem_stolen_reserve_range(i915, size <<= 1, 0, 

[Intel-gfx] [PATCH v7 00/10] drm/i915: ttm for stolen

2022-06-20 Thread Robert Beckett
This series refactors i915's stolen memory region to use ttm.

v2: handle disabled stolen similar to legacy version.
relying on ttm to fail allocs works fine, but is dmesg noisy and causes 
testing
dmesg warning regressions.

v3: rebase to latest drm-tip.
fix v2 code refactor which could leave a buffer pinned.
locally passes fftl again now.

v4: - Allow memory regions creators to do allocation. Allows stolen region 
to track
  it's own reservations.
- Pre-reserve first page of stolen mem (add back 
WaSkipStolenMemoryFirstPage:bdw+)
- Improve commit descritpion for "drm/i915: sanitize mem_flags for 
stolen buffers"
- replace i915_gem_object_pin_pages_unlocked() call with manual locking 
and pinning.
  this avoids ww ctx class reuse during context creation -> ring vma 
obj alloc.

v5: - detect both types of stolen as stolen buffers in
  "drm/i915: sanitize mem_flags for stolen buffers"
- in stolen_object_init limit page size to mem region minimum.
  The range allocator expects the page_size to define the
  alignment

v6: - Share first 4 patches from ttm for internal series as generic
  i915 ttm fixes
- Drop patch 4 from v5. We don't need separate object ops just
  to satisfy test interfaces. The tests have now been fixed via
  checking whether the memory region is private to decide
  whether to mmap
- Add new buffer pin alloc flag to allow creation of buffers in
  their final ttm placement instead of deferring until
  get_pages. This fixes legacy fallback paths for buffer
  allocations during stolen memory pressure.

v7: - fix mock_region_get_pages() to correctly handle I915_BO_INVALID_OFFSET

Robert Beckett (10):
  drm/i915/ttm: dont trample cache_level overrides during ttm move
  drm/i915: limit ttm to dma32 for i965G[M]
  drm/i915/ttm: only trust snooping for dgfx when deciding default
cache_level
  drm/i915/gem: selftest should not attempt mmap of private regions
  drm/i915: instantiate ttm ranger manager for stolen memory
  drm/i915: sanitize mem_flags for stolen buffers
  drm/i915: ttm move/clear logic fix
  drm/i915: allow memory region creators to alloc and free the region
  drm/i915/ttm: add buffer pin on alloc flag
  drm/i915: stolen memory use ttm backend

 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c|   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 440 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  29 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  47 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|   3 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_memory_region.c|  16 +-
 drivers/gpu/drm/i915/intel_memory_region.h|   2 +
 drivers/gpu/drm/i915/intel_region_ttm.c   |  80 +++-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 18 files changed, 423 insertions(+), 369 deletions(-)

-- 
2.25.1



[Intel-gfx] [PATCH v7 02/10] drm/i915: limit ttm to dma32 for i965G[M]

2022-06-20 Thread Robert Beckett
i965G[M] cannot relocate objects above 4GiB.
Ensure ttm uses dma32 on these systems.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index 62ff77445b01..fd2ecfdd8fa1 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -32,10 +32,15 @@
 int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 {
struct drm_device *drm = &dev_priv->drm;
+   bool use_dma32 = false;
+
+   /* i965g[m] cannot relocate objects above 4GiB. */
+   if (IS_I965GM(dev_priv) || IS_I965G(dev_priv))
+   use_dma32 = true;
 
return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
   drm->dev, drm->anon_inode->i_mapping,
-  drm->vma_offset_manager, false, false);
+  drm->vma_offset_manager, false, use_dma32);
 }
 
 /**
-- 
2.25.1



[Intel-gfx] [PATCH v7 01/10] drm/i915/ttm: dont trample cache_level overrides during ttm move

2022-06-20 Thread Robert Beckett
Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 9 ++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct 
drm_i915_gem_object *obj,
struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
obj->cache_level = cache_level;
+   obj->ttm.cache_level_override = true;
 
if (cache_level != I915_CACHE_NONE)
obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
struct i915_gem_object_page_iter get_io_page;
struct drm_i915_gem_object *backup;
bool created:1;
+   bool cache_level_override:1;
} ttm;
 
/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_gem_object_init_memory_region(obj, mem);
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
+   obj->ttm.cache_level_override = false;
i915_gem_object_unlock(obj);
 
return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
I915_BO_FLAG_STRUCT_PAGE;
 
-   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-  bo->ttm);
-   i915_gem_object_set_cache_coherency(obj, cache_level);
+   if (!obj->ttm.cache_level_override) {
+   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+  bo->resource, bo->ttm);
+   i915_gem_object_set_cache_coherency(obj, cache_level);
+   obj->ttm.cache_level_override = false;
+   }
 }
 
 /**
-- 
2.25.1



[Intel-gfx] [PATCH v7 05/10] drm/i915: instantiate ttm ranger manager for stolen memory

2022-06-20 Thread Robert Beckett
prepare for ttm based stolen region by using ttm range manager
as the resource manager for stolen region.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  6 ++--
 drivers/gpu/drm/i915/intel_region_ttm.c  | 31 +++-
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 40249fa28a7a..675e9ab30396 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -60,11 +60,13 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
 
/* There's some room for optimization here... */
-   GEM_BUG_ON(ttm_mem_type != I915_PL_SYSTEM &&
-  ttm_mem_type < I915_PL_LMEM0);
+   GEM_BUG_ON(ttm_mem_type == I915_PL_GGTT);
+
if (ttm_mem_type == I915_PL_SYSTEM)
return intel_memory_region_lookup(i915, INTEL_MEMORY_SYSTEM,
  0);
+   if (ttm_mem_type == I915_PL_STOLEN)
+   return i915->mm.stolen_region;
 
return intel_memory_region_lookup(i915, INTEL_MEMORY_LOCAL,
  ttm_mem_type - I915_PL_LMEM0);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index fd2ecfdd8fa1..694e9acb69e2 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -54,7 +54,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private 
*dev_priv)
 
 /*
  * Map the i915 memory regions to TTM memory types. We use the
- * driver-private types for now, reserving TTM_PL_VRAM for stolen
+ * driver-private types for now, reserving I915_PL_STOLEN for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
 int intel_region_to_ttm_type(const struct intel_memory_region *mem)
@@ -63,11 +63,17 @@ int intel_region_to_ttm_type(const struct 
intel_memory_region *mem)
 
GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
   mem->type != INTEL_MEMORY_MOCK &&
-  mem->type != INTEL_MEMORY_SYSTEM);
+  mem->type != INTEL_MEMORY_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
if (mem->type == INTEL_MEMORY_SYSTEM)
return TTM_PL_SYSTEM;
 
+   if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+   mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+   return I915_PL_STOLEN;
+
type = mem->instance + TTM_PL_PRIV;
GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -91,10 +97,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
int mem_type = intel_region_to_ttm_type(mem);
int ret;
 
-   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
- resource_size(&mem->region),
- mem->io_size,
- mem->min_page_size, PAGE_SIZE);
+   if (mem_type == I915_PL_STOLEN) {
+   ret = ttm_range_man_init(bdev, mem_type, false,
+resource_size(&mem->region) >> 
PAGE_SHIFT);
+   mem->is_range_manager = true;
+   } else {
+   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+ resource_size(&mem->region),
+ mem->io_size,
+ mem->min_page_size, PAGE_SIZE);
+   }
if (ret)
return ret;
 
@@ -114,6 +126,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
struct ttm_resource_manager *man = mem->region_private;
+   int mem_type = intel_region_to_ttm_type(mem);
int ret = -EBUSY;
int count;
 
@@ -144,8 +157,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
if (ret || !man)
return ret;
 
-   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
- intel_region_to_ttm_type(mem));
+   if (mem_type == I915_PL_STOLEN)
+   ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+   else
+   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
GEM_WARN_ON(ret);
mem->region_private = NULL;
 
-- 
2.25.1



[Intel-gfx] [PATCH v7 06/10] drm/i915: sanitize mem_flags for stolen buffers

2022-06-20 Thread Robert Beckett
Stolen regions are not page backed or considered iomem.
Prevent flags indicating such.
This correctly prevents stolen buffers from attempting to directly map
them.

See i915_gem_object_has_struct_page() and i915_gem_object_has_iomem()
usage for where it would break otherwise.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 675e9ab30396..81c67ca9edda 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
+#include "gem/i915_gem_stolen.h"
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
@@ -124,8 +125,9 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
 
obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 
-   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
-   I915_BO_FLAG_STRUCT_PAGE;
+   if (!i915_gem_object_is_stolen(obj))
+   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
+   I915_BO_FLAG_STRUCT_PAGE;
 
if (!obj->ttm.cache_level_override) {
cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
-- 
2.25.1



[Intel-gfx] [PATCH v7 03/10] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level

2022-06-20 Thread Robert Beckett
By default i915_ttm_cache_level() decides I915_CACHE_LLC if HAS_SNOOP.
This is divergent from existing backends code which only considers
HAS_LLC.
Testing shows that trusting snooping on gen5- is unreliable and bsw via
ggtt mappings, so limit DGFX for now and maintain previous behaviour.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 4c1de0b4a10f..40249fa28a7a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -46,7 +46,9 @@ static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 struct ttm_tt *ttm)
 {
-   return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+   bool can_snoop = HAS_SNOOP(i915) && IS_DGFX(i915);
+
+   return ((HAS_LLC(i915) || can_snoop) &&
!i915_ttm_gtt_binds_lmem(res) &&
ttm->caching == ttm_cached) ? I915_CACHE_LLC :
I915_CACHE_NONE;
-- 
2.25.1



[Intel-gfx] [PATCH v7 07/10] drm/i915: ttm move/clear logic fix

2022-06-20 Thread Robert Beckett
ttm managed buffers start off with system resource definitions and ttm_tt
tracking structures allocated (though unpopulated).
currently this prevents clearing of buffers on first move to desired
placements.

The desired behaviour is to clear user allocated buffers and any kernel
buffers that specifically requests it only.
Make the logic match the desired behaviour.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 22 +++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 81c67ca9edda..a3f8fc056dbc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,7 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include "drm/ttm/ttm_tt.h"
 #include 
 
 #include "i915_deps.h"
@@ -476,6 +477,25 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
return fence;
 }
 
+static bool
+allow_clear(struct drm_i915_gem_object *obj, struct ttm_tt *ttm, struct 
ttm_resource *dst_mem)
+{
+   /* never clear stolen */
+   if (dst_mem->mem_type == I915_PL_STOLEN)
+   return false;
+   /*
+* we want to clear user buffers and any kernel buffers
+* that specifically request clearing.
+*/
+   if (obj->flags & I915_BO_ALLOC_USER)
+   return true;
+
+   if (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)
+   return true;
+
+   return false;
+}
+
 /**
  * i915_ttm_move - The TTM move callback used by i915.
  * @bo: The buffer object.
@@ -526,7 +546,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
return PTR_ERR(dst_rsgt);
 
clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || 
!ttm_tt_is_populated(ttm));
-   if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+   if (!clear || allow_clear(obj, ttm, dst_mem)) {
struct i915_deps deps;
 
i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | 
__GFP_NOWARN);
-- 
2.25.1



[Intel-gfx] [PATCH v7 08/10] drm/i915: allow memory region creators to alloc and free the region

2022-06-20 Thread Robert Beckett
add callbacks for alloc and free.
this allows region creators to allocate any extra storage they may
require.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_memory_region.c | 16 +---
 drivers/gpu/drm/i915/intel_memory_region.h |  2 ++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c 
b/drivers/gpu/drm/i915/intel_memory_region.c
index e38d2db1c3e3..3da07a712f90 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -231,7 +231,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
struct intel_memory_region *mem;
int err;
 
-   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+   if (ops->alloc)
+   mem = ops->alloc();
+   else
+   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return ERR_PTR(-ENOMEM);
 
@@ -265,7 +268,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
if (mem->ops->release)
mem->ops->release(mem);
 err_free:
-   kfree(mem);
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
+   kfree(mem);
return ERR_PTR(err);
 }
 
@@ -288,7 +294,11 @@ void intel_memory_region_destroy(struct 
intel_memory_region *mem)
 
GEM_WARN_ON(!list_empty_careful(&mem->objects.list));
mutex_destroy(&mem->objects.lock);
-   if (!ret)
+   if (ret)
+   return;
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
kfree(mem);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h 
b/drivers/gpu/drm/i915/intel_memory_region.h
index 3d8378c1b447..048955b5429f 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -61,6 +61,8 @@ struct intel_memory_region_ops {
   resource_size_t size,
   resource_size_t page_size,
   unsigned int flags);
+   struct intel_memory_region *(*alloc)(void);
+   void (*free)(struct intel_memory_region *mem);
 };
 
 struct intel_memory_region {
-- 
2.25.1



[Intel-gfx] [PATCH v7 09/10] drm/i915/ttm: add buffer pin on alloc flag

2022-06-20 Thread Robert Beckett
For situations where allocations need to fail on alloc instead of
delayed get_pages, add a new alloc flag to pin the ttm bo.
This makes sure that the resource has been allocated during buffer
creation, allowing it to fail with an error if the placement is
exhausted.
This allows existing fallback options for stolen backend allocation like
create_ring_vma to work as expected.

Signed-off-by: Robert Beckett 
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 25 ++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 6632ed52e919..07bc11247a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -325,17 +325,20 @@ struct drm_i915_gem_object {
  * dealing with userspace objects the CPU fault handler is free to ignore this.
  */
 #define I915_BO_ALLOC_GPU_ONLY   BIT(6)
+/* object should be pinned in destination region from allocation */
+#define I915_BO_ALLOC_PINNED BIT(7)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
 I915_BO_ALLOC_USER | \
 I915_BO_ALLOC_PM_VOLATILE | \
 I915_BO_ALLOC_PM_EARLY | \
-I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY  BIT(7)
-#define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED BIT(9)
-#define I915_BO_WAS_BOUND_BIT 10
+I915_BO_ALLOC_GPU_ONLY | \
+I915_BO_ALLOC_PINNED)
+#define I915_BO_READONLY  BIT(8)
+#define I915_TILING_QUIRK_BIT 9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED BIT(10)
+#define I915_BO_WAS_BOUND_BIT 11
/**
 * @mem_flags - Mutable placement-related flags
 *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27d59639177f..bb988608296d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -998,6 +998,13 @@ static void i915_ttm_delayed_free(struct 
drm_i915_gem_object *obj)
 {
GEM_BUG_ON(!obj->ttm.created);
 
+   /* stolen objects are pinned for lifetime. Unpin before putting */
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   ttm_bo_reserve(i915_gem_to_ttm(obj), true, false, NULL);
+   ttm_bo_unpin(i915_gem_to_ttm(obj));
+   ttm_bo_unreserve(i915_gem_to_ttm(obj));
+   }
+
ttm_bo_put(i915_gem_to_ttm(obj));
 }
 
@@ -1193,6 +1200,9 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
.no_wait_gpu = false,
};
enum ttm_bo_type bo_type;
+   struct ttm_place _place;
+   struct ttm_placement _placement;
+   struct ttm_placement *placement;
int ret;
 
drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,6 +1232,17 @@ int __i915_gem_ttm_object_init(struct 
intel_memory_region *mem,
 */
i915_gem_object_make_unshrinkable(obj);
 
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   i915_ttm_place_from_region(mem, &_place, obj->bo_offset,
+  obj->base.size, obj->flags);
+   _placement.num_placement = 1;
+   _placement.placement = &_place;
+   _placement.num_busy_placement = 0;
+   _placement.busy_placement = NULL;
+   placement = &_placement;
+   } else {
+   placement = &i915_sys_placement;
+   }
/*
 * If this function fails, it will call the destructor, but
 * our caller still owns the object. So no freeing in the
@@ -1230,7 +1251,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
 * until successful initialization.
 */
ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
-  bo_type, &i915_sys_placement,
+  bo_type, placement,
   page_size >> PAGE_SHIFT,
   &ctx, NULL, NULL, i915_ttm_bo_destroy);
if (ret)
@@ -1242,6 +1263,8 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
obj->ttm.cache_level_override = false;
+   if (obj->flags & I915_BO_ALLOC_PINNED)
+   ttm_bo_pin(i915_gem_to_ttm(obj));
i915_gem_object_unlock(obj);
 
return 0;
-- 
2.25.1



[Intel-gfx] [PATCH v7 04/10] drm/i915/gem: selftest should not attempt mmap of private regions

2022-06-20 Thread Robert Beckett
During testing make can_mmap consider whether the region is private.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum 
i915_mmap_type type)
struct drm_i915_private *i915 = to_i915(obj->base.dev);
bool no_map;
 
+   if (obj->mm.region && obj->mm.region->private)
+   return false;
+
if (obj->ops->mmap_offset)
return type == I915_MMAP_TYPE_FIXED;
else if (type == I915_MMAP_TYPE_FIXED)
-- 
2.25.1



[Intel-gfx] [PATCH v7 10/10] drm/i915: stolen memory use ttm backend

2022-06-20 Thread Robert Beckett
refactor stolen memory region to use ttm.
this necessitates using ttm resources to track reserved stolen regions
instead of drm_mm_nodes.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 440 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_region_ttm.c   |  42 +-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 13 files changed, 303 insertions(+), 342 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c 
b/drivers/gpu/drm/i915/display/intel_fbc.c
index 8b807284cde1..6f3afac5e8c9 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -38,6 +38,7 @@
  * forcibly disable it to allow proper screen updates.
  */
 
+#include "gem/i915_gem_stolen.h"
 #include 
 
 #include 
@@ -51,6 +52,7 @@
 #include "intel_display_types.h"
 #include "intel_fbc.h"
 #include "intel_frontbuffer.h"
+#include "gem/i915_gem_region.h"
 
 #define for_each_fbc_id(__dev_priv, __fbc_id) \
for ((__fbc_id) = INTEL_FBC_A; (__fbc_id) < I915_MAX_FBCS; 
(__fbc_id)++) \
@@ -92,8 +94,8 @@ struct intel_fbc {
struct mutex lock;
unsigned int busy_bits;
 
-   struct drm_mm_node compressed_fb;
-   struct drm_mm_node compressed_llb;
+   struct ttm_resource *compressed_fb;
+   struct ttm_resource *compressed_llb;
 
enum intel_fbc_id id;
 
@@ -331,16 +333,20 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
+   u64 llb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_llb);
 
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   GEM_BUG_ON(llb_offset == I915_BO_INVALID_OFFSET);
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_fb.start, U32_MAX));
+fb_offset, U32_MAX));
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_llb.start, U32_MAX));
+llb_offset, U32_MAX));
 
intel_de_write(i915, FBC_CFB_BASE,
-  i915->dsm.start + fbc->compressed_fb.start);
+  i915->dsm.start + fb_offset);
intel_de_write(i915, FBC_LL_BASE,
-  i915->dsm.start + fbc->compressed_llb.start);
+  i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -448,8 +454,10 @@ static bool g4x_fbc_is_compressing(struct intel_fbc *fbc)
 static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, DPFC_CB_BASE, fb_offset);
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -499,8 +507,10 @@ static bool ilk_fbc_is_compressing(struct intel_fbc *fbc)
 static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), 
fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fb_offset);
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -744,21 +754,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
struct drm_i915_private *i915 = fbc->i915;
u64 end = intel_fbc_stolen_end(i915);
-   int ret, limit = min_limit;
+   int limit = min_limit;
+   struct ttm_resource *res;
 
size /= limit;
 
/* Try to over-allocate to reduce reallocations and fragmentation. */
-   ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-  size <<= 1, 4096, 0, end);
-   if (ret == 0)
+   res = i915_gem_stolen_reserve_range(i915, size <<= 1, 0, 

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

2022-06-21 Thread Robert Beckett




On 21/06/2022 18:37, Patchwork wrote:

*Patch Details*
*Series:*   drm/i915: ttm for stolen (rev5)
*URL:*	https://patchwork.freedesktop.org/series/101396/ 


*State:*failure
*Details:* 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 




  CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5


Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5 absolutely need to be
verified manually.

If you think the reported changes have nothing to do with the changes
introduced in Patchwork_101396v5, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.

External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html



Participating hosts (40 -> 41)

Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus


Possible new issues

Here are the unknown changes that may have been introduced in 
Patchwork_101396v5:



  IGT changes


Possible regressions

  * igt@i915_selftest@live@reset:
  o bat-adlp-4: PASS


-> DMESG-FAIL





I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and occasionally on 
bat-adlp-6.


Should bat-adlp-4 be considered an unreliable machine like bat-adlp-6 is 
for now?


Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude low 
pages (128KiB) of stolen from use"


could this be an indication that maybe the original issue is worse on 
adlp machines?
I have only ever seen page page 135 or 136 clobbered across many runs 
via trybot, so it looks fairly consistent.

Though excluding the use of over 540K of stolen might be too severe.




Suppressed

The following results come from untrusted machines, tests, or statuses.
They do not affect the overall result.

  * igt@kms_busy@basic@flip:
  o {bat-dg2-9}: NOTRUN -> DMESG-WARN




Known issues

Here are the changes found in Patchwork_101396v5 that come from known 
issues:



  IGT changes


Issues hit

  *

igt@gem_huc_copy@huc-copy:

  o fi-icl-u2: NOTRUN -> SKIP


(i915#2190 )
  *

igt@gem_lmem_swapping@random-engines:

  o fi-icl-u2: NOTRUN -> SKIP


(i915#4613
) +3
similar issues
  *

igt@i915_pm_rpm@module-reload:

  o bat-adlp-4: PASS


-> DMESG-WARN


(i915#1888
 /
i915#3576 )
  *

igt@i915_selftest@live@hangcheck:

  o bat-dg1-6: PASS


-> DMESG-FAIL


(i915#4494
 /
i915#4957 )
  *

igt@i915_suspend@basic-s3-without-i915:

  o fi-icl-u2: NOTRUN -> SKIP


(i915#5903 )
  *

igt@kms_busy@basic@flip:

  o bat-adlp-4: PASS


-> DMESG-WARN


(i915#3576 )
  *

igt@kms_chamelium@common-hpd-after-suspend:

  o

fi-hsw-g3258: NOTRUN -> SKIP



[Intel-gfx] [PATCH v8 00/10] drm/i915: ttm for stolen

2022-06-21 Thread Robert Beckett
This series refactors i915's stolen memory region to use ttm.

v2: handle disabled stolen similar to legacy version.
relying on ttm to fail allocs works fine, but is dmesg noisy and causes 
testing
dmesg warning regressions.

v3: rebase to latest drm-tip.
fix v2 code refactor which could leave a buffer pinned.
locally passes fftl again now.

v4: - Allow memory regions creators to do allocation. Allows stolen region 
to track
  it's own reservations.
- Pre-reserve first page of stolen mem (add back 
WaSkipStolenMemoryFirstPage:bdw+)
- Improve commit descritpion for "drm/i915: sanitize mem_flags for 
stolen buffers"
- replace i915_gem_object_pin_pages_unlocked() call with manual locking 
and pinning.
  this avoids ww ctx class reuse during context creation -> ring vma 
obj alloc.

v5: - detect both types of stolen as stolen buffers in
  "drm/i915: sanitize mem_flags for stolen buffers"
- in stolen_object_init limit page size to mem region minimum.
  The range allocator expects the page_size to define the
  alignment

v6: - Share first 4 patches from ttm for internal series as generic
  i915 ttm fixes
- Drop patch 4 from v5. We don't need separate object ops just
  to satisfy test interfaces. The tests have now been fixed via
  checking whether the memory region is private to decide
  whether to mmap
- Add new buffer pin alloc flag to allow creation of buffers in
  their final ttm placement instead of deferring until
  get_pages. This fixes legacy fallback paths for buffer
  allocations during stolen memory pressure.

v7: - fix mock_region_get_pages() to correctly handle I915_BO_INVALID_OFFSET

v8: - Reserve I915_GEM_STOLEN_BIAS area from stolen

Robert Beckett (10):
  drm/i915/ttm: dont trample cache_level overrides during ttm move
  drm/i915: limit ttm to dma32 for i965G[M]
  drm/i915/ttm: only trust snooping for dgfx when deciding default
cache_level
  drm/i915/gem: selftest should not attempt mmap of private regions
  drm/i915: instantiate ttm ranger manager for stolen memory
  drm/i915: sanitize mem_flags for stolen buffers
  drm/i915: ttm move/clear logic fix
  drm/i915: allow memory region creators to alloc and free the region
  drm/i915/ttm: add buffer pin on alloc flag
  drm/i915: stolen memory use ttm backend

 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c|   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 441 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  29 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  47 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|   3 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_memory_region.c|  16 +-
 drivers/gpu/drm/i915/intel_memory_region.h|   2 +
 drivers/gpu/drm/i915/intel_region_ttm.c   |  80 +++-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 18 files changed, 424 insertions(+), 369 deletions(-)

-- 
2.25.1



[Intel-gfx] [PATCH v8 03/10] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level

2022-06-21 Thread Robert Beckett
By default i915_ttm_cache_level() decides I915_CACHE_LLC if HAS_SNOOP.
This is divergent from existing backends code which only considers
HAS_LLC.
Testing shows that trusting snooping on gen5- is unreliable and bsw via
ggtt mappings, so limit DGFX for now and maintain previous behaviour.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 4c1de0b4a10f..40249fa28a7a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -46,7 +46,9 @@ static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 struct ttm_tt *ttm)
 {
-   return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+   bool can_snoop = HAS_SNOOP(i915) && IS_DGFX(i915);
+
+   return ((HAS_LLC(i915) || can_snoop) &&
!i915_ttm_gtt_binds_lmem(res) &&
ttm->caching == ttm_cached) ? I915_CACHE_LLC :
I915_CACHE_NONE;
-- 
2.25.1



[Intel-gfx] [PATCH v8 01/10] drm/i915/ttm: dont trample cache_level overrides during ttm move

2022-06-21 Thread Robert Beckett
Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 9 ++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct 
drm_i915_gem_object *obj,
struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
obj->cache_level = cache_level;
+   obj->ttm.cache_level_override = true;
 
if (cache_level != I915_CACHE_NONE)
obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
struct i915_gem_object_page_iter get_io_page;
struct drm_i915_gem_object *backup;
bool created:1;
+   bool cache_level_override:1;
} ttm;
 
/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_gem_object_init_memory_region(obj, mem);
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
+   obj->ttm.cache_level_override = false;
i915_gem_object_unlock(obj);
 
return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
I915_BO_FLAG_STRUCT_PAGE;
 
-   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-  bo->ttm);
-   i915_gem_object_set_cache_coherency(obj, cache_level);
+   if (!obj->ttm.cache_level_override) {
+   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+  bo->resource, bo->ttm);
+   i915_gem_object_set_cache_coherency(obj, cache_level);
+   obj->ttm.cache_level_override = false;
+   }
 }
 
 /**
-- 
2.25.1



[Intel-gfx] [PATCH v8 05/10] drm/i915: instantiate ttm ranger manager for stolen memory

2022-06-21 Thread Robert Beckett
prepare for ttm based stolen region by using ttm range manager
as the resource manager for stolen region.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  6 ++--
 drivers/gpu/drm/i915/intel_region_ttm.c  | 31 +++-
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 40249fa28a7a..675e9ab30396 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -60,11 +60,13 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
 
/* There's some room for optimization here... */
-   GEM_BUG_ON(ttm_mem_type != I915_PL_SYSTEM &&
-  ttm_mem_type < I915_PL_LMEM0);
+   GEM_BUG_ON(ttm_mem_type == I915_PL_GGTT);
+
if (ttm_mem_type == I915_PL_SYSTEM)
return intel_memory_region_lookup(i915, INTEL_MEMORY_SYSTEM,
  0);
+   if (ttm_mem_type == I915_PL_STOLEN)
+   return i915->mm.stolen_region;
 
return intel_memory_region_lookup(i915, INTEL_MEMORY_LOCAL,
  ttm_mem_type - I915_PL_LMEM0);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index fd2ecfdd8fa1..694e9acb69e2 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -54,7 +54,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private 
*dev_priv)
 
 /*
  * Map the i915 memory regions to TTM memory types. We use the
- * driver-private types for now, reserving TTM_PL_VRAM for stolen
+ * driver-private types for now, reserving I915_PL_STOLEN for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
 int intel_region_to_ttm_type(const struct intel_memory_region *mem)
@@ -63,11 +63,17 @@ int intel_region_to_ttm_type(const struct 
intel_memory_region *mem)
 
GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
   mem->type != INTEL_MEMORY_MOCK &&
-  mem->type != INTEL_MEMORY_SYSTEM);
+  mem->type != INTEL_MEMORY_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
if (mem->type == INTEL_MEMORY_SYSTEM)
return TTM_PL_SYSTEM;
 
+   if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+   mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+   return I915_PL_STOLEN;
+
type = mem->instance + TTM_PL_PRIV;
GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -91,10 +97,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
int mem_type = intel_region_to_ttm_type(mem);
int ret;
 
-   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
- resource_size(&mem->region),
- mem->io_size,
- mem->min_page_size, PAGE_SIZE);
+   if (mem_type == I915_PL_STOLEN) {
+   ret = ttm_range_man_init(bdev, mem_type, false,
+resource_size(&mem->region) >> 
PAGE_SHIFT);
+   mem->is_range_manager = true;
+   } else {
+   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+ resource_size(&mem->region),
+ mem->io_size,
+ mem->min_page_size, PAGE_SIZE);
+   }
if (ret)
return ret;
 
@@ -114,6 +126,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
struct ttm_resource_manager *man = mem->region_private;
+   int mem_type = intel_region_to_ttm_type(mem);
int ret = -EBUSY;
int count;
 
@@ -144,8 +157,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
if (ret || !man)
return ret;
 
-   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
- intel_region_to_ttm_type(mem));
+   if (mem_type == I915_PL_STOLEN)
+   ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+   else
+   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
GEM_WARN_ON(ret);
mem->region_private = NULL;
 
-- 
2.25.1



[Intel-gfx] [PATCH v8 02/10] drm/i915: limit ttm to dma32 for i965G[M]

2022-06-21 Thread Robert Beckett
i965G[M] cannot relocate objects above 4GiB.
Ensure ttm uses dma32 on these systems.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index 62ff77445b01..fd2ecfdd8fa1 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -32,10 +32,15 @@
 int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 {
struct drm_device *drm = &dev_priv->drm;
+   bool use_dma32 = false;
+
+   /* i965g[m] cannot relocate objects above 4GiB. */
+   if (IS_I965GM(dev_priv) || IS_I965G(dev_priv))
+   use_dma32 = true;
 
return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
   drm->dev, drm->anon_inode->i_mapping,
-  drm->vma_offset_manager, false, false);
+  drm->vma_offset_manager, false, use_dma32);
 }
 
 /**
-- 
2.25.1



[Intel-gfx] [PATCH v8 04/10] drm/i915/gem: selftest should not attempt mmap of private regions

2022-06-21 Thread Robert Beckett
During testing make can_mmap consider whether the region is private.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum 
i915_mmap_type type)
struct drm_i915_private *i915 = to_i915(obj->base.dev);
bool no_map;
 
+   if (obj->mm.region && obj->mm.region->private)
+   return false;
+
if (obj->ops->mmap_offset)
return type == I915_MMAP_TYPE_FIXED;
else if (type == I915_MMAP_TYPE_FIXED)
-- 
2.25.1



[Intel-gfx] [PATCH v8 06/10] drm/i915: sanitize mem_flags for stolen buffers

2022-06-21 Thread Robert Beckett
Stolen regions are not page backed or considered iomem.
Prevent flags indicating such.
This correctly prevents stolen buffers from attempting to directly map
them.

See i915_gem_object_has_struct_page() and i915_gem_object_has_iomem()
usage for where it would break otherwise.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 675e9ab30396..81c67ca9edda 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
+#include "gem/i915_gem_stolen.h"
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
@@ -124,8 +125,9 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
 
obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 
-   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
-   I915_BO_FLAG_STRUCT_PAGE;
+   if (!i915_gem_object_is_stolen(obj))
+   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
+   I915_BO_FLAG_STRUCT_PAGE;
 
if (!obj->ttm.cache_level_override) {
cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
-- 
2.25.1



[Intel-gfx] [PATCH v8 07/10] drm/i915: ttm move/clear logic fix

2022-06-21 Thread Robert Beckett
ttm managed buffers start off with system resource definitions and ttm_tt
tracking structures allocated (though unpopulated).
currently this prevents clearing of buffers on first move to desired
placements.

The desired behaviour is to clear user allocated buffers and any kernel
buffers that specifically requests it only.
Make the logic match the desired behaviour.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 22 +++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 81c67ca9edda..a3f8fc056dbc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,7 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include "drm/ttm/ttm_tt.h"
 #include 
 
 #include "i915_deps.h"
@@ -476,6 +477,25 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
return fence;
 }
 
+static bool
+allow_clear(struct drm_i915_gem_object *obj, struct ttm_tt *ttm, struct 
ttm_resource *dst_mem)
+{
+   /* never clear stolen */
+   if (dst_mem->mem_type == I915_PL_STOLEN)
+   return false;
+   /*
+* we want to clear user buffers and any kernel buffers
+* that specifically request clearing.
+*/
+   if (obj->flags & I915_BO_ALLOC_USER)
+   return true;
+
+   if (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)
+   return true;
+
+   return false;
+}
+
 /**
  * i915_ttm_move - The TTM move callback used by i915.
  * @bo: The buffer object.
@@ -526,7 +546,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
return PTR_ERR(dst_rsgt);
 
clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || 
!ttm_tt_is_populated(ttm));
-   if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+   if (!clear || allow_clear(obj, ttm, dst_mem)) {
struct i915_deps deps;
 
i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | 
__GFP_NOWARN);
-- 
2.25.1



[Intel-gfx] [PATCH v8 08/10] drm/i915: allow memory region creators to alloc and free the region

2022-06-21 Thread Robert Beckett
add callbacks for alloc and free.
this allows region creators to allocate any extra storage they may
require.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_memory_region.c | 16 +---
 drivers/gpu/drm/i915/intel_memory_region.h |  2 ++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c 
b/drivers/gpu/drm/i915/intel_memory_region.c
index e38d2db1c3e3..3da07a712f90 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -231,7 +231,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
struct intel_memory_region *mem;
int err;
 
-   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+   if (ops->alloc)
+   mem = ops->alloc();
+   else
+   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return ERR_PTR(-ENOMEM);
 
@@ -265,7 +268,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
if (mem->ops->release)
mem->ops->release(mem);
 err_free:
-   kfree(mem);
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
+   kfree(mem);
return ERR_PTR(err);
 }
 
@@ -288,7 +294,11 @@ void intel_memory_region_destroy(struct 
intel_memory_region *mem)
 
GEM_WARN_ON(!list_empty_careful(&mem->objects.list));
mutex_destroy(&mem->objects.lock);
-   if (!ret)
+   if (ret)
+   return;
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
kfree(mem);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h 
b/drivers/gpu/drm/i915/intel_memory_region.h
index 3d8378c1b447..048955b5429f 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -61,6 +61,8 @@ struct intel_memory_region_ops {
   resource_size_t size,
   resource_size_t page_size,
   unsigned int flags);
+   struct intel_memory_region *(*alloc)(void);
+   void (*free)(struct intel_memory_region *mem);
 };
 
 struct intel_memory_region {
-- 
2.25.1



[Intel-gfx] [PATCH v8 09/10] drm/i915/ttm: add buffer pin on alloc flag

2022-06-21 Thread Robert Beckett
For situations where allocations need to fail on alloc instead of
delayed get_pages, add a new alloc flag to pin the ttm bo.
This makes sure that the resource has been allocated during buffer
creation, allowing it to fail with an error if the placement is
exhausted.
This allows existing fallback options for stolen backend allocation like
create_ring_vma to work as expected.

Signed-off-by: Robert Beckett 
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 25 ++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 6632ed52e919..07bc11247a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -325,17 +325,20 @@ struct drm_i915_gem_object {
  * dealing with userspace objects the CPU fault handler is free to ignore this.
  */
 #define I915_BO_ALLOC_GPU_ONLY   BIT(6)
+/* object should be pinned in destination region from allocation */
+#define I915_BO_ALLOC_PINNED BIT(7)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
 I915_BO_ALLOC_USER | \
 I915_BO_ALLOC_PM_VOLATILE | \
 I915_BO_ALLOC_PM_EARLY | \
-I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY  BIT(7)
-#define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED BIT(9)
-#define I915_BO_WAS_BOUND_BIT 10
+I915_BO_ALLOC_GPU_ONLY | \
+I915_BO_ALLOC_PINNED)
+#define I915_BO_READONLY  BIT(8)
+#define I915_TILING_QUIRK_BIT 9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED BIT(10)
+#define I915_BO_WAS_BOUND_BIT 11
/**
 * @mem_flags - Mutable placement-related flags
 *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27d59639177f..bb988608296d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -998,6 +998,13 @@ static void i915_ttm_delayed_free(struct 
drm_i915_gem_object *obj)
 {
GEM_BUG_ON(!obj->ttm.created);
 
+   /* stolen objects are pinned for lifetime. Unpin before putting */
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   ttm_bo_reserve(i915_gem_to_ttm(obj), true, false, NULL);
+   ttm_bo_unpin(i915_gem_to_ttm(obj));
+   ttm_bo_unreserve(i915_gem_to_ttm(obj));
+   }
+
ttm_bo_put(i915_gem_to_ttm(obj));
 }
 
@@ -1193,6 +1200,9 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
.no_wait_gpu = false,
};
enum ttm_bo_type bo_type;
+   struct ttm_place _place;
+   struct ttm_placement _placement;
+   struct ttm_placement *placement;
int ret;
 
drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,6 +1232,17 @@ int __i915_gem_ttm_object_init(struct 
intel_memory_region *mem,
 */
i915_gem_object_make_unshrinkable(obj);
 
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   i915_ttm_place_from_region(mem, &_place, obj->bo_offset,
+  obj->base.size, obj->flags);
+   _placement.num_placement = 1;
+   _placement.placement = &_place;
+   _placement.num_busy_placement = 0;
+   _placement.busy_placement = NULL;
+   placement = &_placement;
+   } else {
+   placement = &i915_sys_placement;
+   }
/*
 * If this function fails, it will call the destructor, but
 * our caller still owns the object. So no freeing in the
@@ -1230,7 +1251,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
 * until successful initialization.
 */
ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
-  bo_type, &i915_sys_placement,
+  bo_type, placement,
   page_size >> PAGE_SHIFT,
   &ctx, NULL, NULL, i915_ttm_bo_destroy);
if (ret)
@@ -1242,6 +1263,8 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
obj->ttm.cache_level_override = false;
+   if (obj->flags & I915_BO_ALLOC_PINNED)
+   ttm_bo_pin(i915_gem_to_ttm(obj));
i915_gem_object_unlock(obj);
 
return 0;
-- 
2.25.1



[Intel-gfx] [PATCH v8 10/10] drm/i915: stolen memory use ttm backend

2022-06-21 Thread Robert Beckett
refactor stolen memory region to use ttm.
this necessitates using ttm resources to track reserved stolen regions
instead of drm_mm_nodes.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 441 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_region_ttm.c   |  42 +-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 13 files changed, 304 insertions(+), 342 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c 
b/drivers/gpu/drm/i915/display/intel_fbc.c
index 8b807284cde1..6f3afac5e8c9 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -38,6 +38,7 @@
  * forcibly disable it to allow proper screen updates.
  */
 
+#include "gem/i915_gem_stolen.h"
 #include 
 
 #include 
@@ -51,6 +52,7 @@
 #include "intel_display_types.h"
 #include "intel_fbc.h"
 #include "intel_frontbuffer.h"
+#include "gem/i915_gem_region.h"
 
 #define for_each_fbc_id(__dev_priv, __fbc_id) \
for ((__fbc_id) = INTEL_FBC_A; (__fbc_id) < I915_MAX_FBCS; 
(__fbc_id)++) \
@@ -92,8 +94,8 @@ struct intel_fbc {
struct mutex lock;
unsigned int busy_bits;
 
-   struct drm_mm_node compressed_fb;
-   struct drm_mm_node compressed_llb;
+   struct ttm_resource *compressed_fb;
+   struct ttm_resource *compressed_llb;
 
enum intel_fbc_id id;
 
@@ -331,16 +333,20 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
+   u64 llb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_llb);
 
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   GEM_BUG_ON(llb_offset == I915_BO_INVALID_OFFSET);
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_fb.start, U32_MAX));
+fb_offset, U32_MAX));
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_llb.start, U32_MAX));
+llb_offset, U32_MAX));
 
intel_de_write(i915, FBC_CFB_BASE,
-  i915->dsm.start + fbc->compressed_fb.start);
+  i915->dsm.start + fb_offset);
intel_de_write(i915, FBC_LL_BASE,
-  i915->dsm.start + fbc->compressed_llb.start);
+  i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -448,8 +454,10 @@ static bool g4x_fbc_is_compressing(struct intel_fbc *fbc)
 static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, DPFC_CB_BASE, fb_offset);
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -499,8 +507,10 @@ static bool ilk_fbc_is_compressing(struct intel_fbc *fbc)
 static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), 
fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fb_offset);
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -744,21 +754,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
struct drm_i915_private *i915 = fbc->i915;
u64 end = intel_fbc_stolen_end(i915);
-   int ret, limit = min_limit;
+   int limit = min_limit;
+   struct ttm_resource *res;
 
size /= limit;
 
/* Try to over-allocate to reduce reallocations and fragmentation. */
-   ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-  size <<= 1, 4096, 0, end);
-   if (ret == 0)
+   res = i915_gem_stolen_reserve_range(i915, size <<= 1, 0, 

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

2022-06-27 Thread Robert Beckett




On 22/06/2022 10:05, Tvrtko Ursulin wrote:


On 21/06/2022 20:11, Robert Beckett wrote:



On 21/06/2022 18:37, Patchwork wrote:

*Patch Details*
*Series:*    drm/i915: ttm for stolen (rev5)
*URL:*    https://patchwork.freedesktop.org/series/101396/ 
<https://patchwork.freedesktop.org/series/101396/>

*State:*    failure
*Details:* 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html> 




  CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5


    Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5 absolutely 
need to be

verified manually.

If you think the reported changes have nothing to do with the changes
introduced in Patchwork_101396v5, please notify your bug team to 
allow them
to document this new failure mode, which will reduce false positives 
in CI.


External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html



    Participating hosts (40 -> 41)

Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus


    Possible new issues

Here are the unknown changes that may have been introduced in 
Patchwork_101396v5:



  IGT changes


    Possible regressions

  * igt@i915_selftest@live@reset:
  o bat-adlp-4: PASS
<https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@l...@reset.html> 


    -> DMESG-FAIL
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@l...@reset.html> 





I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and occasionally 
on bat-adlp-6.


Should bat-adlp-4 be considered an unreliable machine like bat-adlp-6 
is for now?


Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude low 
pages (128KiB) of stolen from use"


could this be an indication that maybe the original issue is worse on 
adlp machines?
I have only ever seen page page 135 or 136 clobbered across many runs 
via trybot, so it looks fairly consistent.

Though excluding the use of over 540K of stolen might be too severe.


Don't know but I see that on the latest version you even hit pages 165/166.

Any history of hitting this in CI without your series? If not, are there 
some other changes which could explain it? Are you touching the selftest 
itself?


Hexdump of the clobbered page looks quite complex. Especially 
POISON_FREE. Any idea how that ends up there?



(see 
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@l...@reset.html#dmesg-warnings702)


after lots of slow debug via CI, it looks like the issue is that a ring 
buffer was allocated and taking up that page during the initial crc 
capture in the test, but by the time it came to check for corruption, it 
had been freed from that page.


The test has a number of weaknesses:

1. the busy check is done twice, without taking in to account any change 
in between. I assume previously this could be relied on never to occur, 
but now it can for some reason (more on that later)


2. the engine reset returns early with an error for guc submission 
engines, but it is silently ignored in the test. Perhaps it should 
ignore guc submission engines as it is a largely useless test for those 
situations.



A quick obvious fix is to have a busy bitmask that remembers each page's 
busy state initially and only check for corruption if it was busy during 
both checks.


However, the main question is why this is occurring now with my changes.
I have added more debug to check where the stolen memory is being freed, 
but the first run last night didn't hit the issue for once.
I am running again now, will report back if I figure out where it is 
being freed.


I am pretty sure the "corruption" (which isn't actually corruption) is 
from a ring buffer.
The POISON_FREE is the only difference between the captured before and 
after dumps:


[0040]  0280 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 
6b6b6b6b


with the 2nd dword being the MI_ARB_CHECK used for the spinner.
I think this is the request poisoning from i915_request_retire()

The bit I don't know yet is why a ring buffer was freed between the 
initial crc capture and the corruption check. The spinner should be 
active across the entire test, maintaining a ref on the context and it's 
ring.


hopefully my latest debug will give more answers.




Btw what is the benefit of converting stolen to start with? It's not 
much of a backend since it just uses the drm range manager. So quite 
thin and uneventful. Diffstats for the series also do not look like you 
end up with much code reduction?


Regards,

Tvrtko


Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

2022-06-28 Thread Robert Beckett




On 28/06/2022 09:46, Tvrtko Ursulin wrote:


On 27/06/2022 18:08, Robert Beckett wrote:



On 22/06/2022 10:05, Tvrtko Ursulin wrote:


On 21/06/2022 20:11, Robert Beckett wrote:



On 21/06/2022 18:37, Patchwork wrote:

*Patch Details*
*Series:*    drm/i915: ttm for stolen (rev5)
*URL:*    https://patchwork.freedesktop.org/series/101396/ 
<https://patchwork.freedesktop.org/series/101396/>

*State:*    failure
*Details:* 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html> 




  CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5


    Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5 absolutely 
need to be

verified manually.

If you think the reported changes have nothing to do with the changes
introduced in Patchwork_101396v5, please notify your bug team to 
allow them
to document this new failure mode, which will reduce false 
positives in CI.


External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html



    Participating hosts (40 -> 41)

Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus


    Possible new issues

Here are the unknown changes that may have been introduced in 
Patchwork_101396v5:



  IGT changes


    Possible regressions

  * igt@i915_selftest@live@reset:
  o bat-adlp-4: PASS
<https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@l...@reset.html> 


    -> DMESG-FAIL
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@l...@reset.html> 





I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and occasionally 
on bat-adlp-6.


Should bat-adlp-4 be considered an unreliable machine like 
bat-adlp-6 is for now?


Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude 
low pages (128KiB) of stolen from use"


could this be an indication that maybe the original issue is worse 
on adlp machines?
I have only ever seen page page 135 or 136 clobbered across many 
runs via trybot, so it looks fairly consistent.

Though excluding the use of over 540K of stolen might be too severe.


Don't know but I see that on the latest version you even hit pages 
165/166.


Any history of hitting this in CI without your series? If not, are 
there some other changes which could explain it? Are you touching the 
selftest itself?


Hexdump of the clobbered page looks quite complex. Especially 
POISON_FREE. Any idea how that ends up there?



(see 
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@l...@reset.html#dmesg-warnings702) 



after lots of slow debug via CI, it looks like the issue is that a 
ring buffer was allocated and taking up that page during the initial 
crc capture in the test, but by the time it came to check for 
corruption, it had been freed from that page.


The test has a number of weaknesses:

1. the busy check is done twice, without taking in to account any 
change in between. I assume previously this could be relied on never 
to occur, but now it can for some reason (more on that later)


You mean the stolen page used/unused test? Probably the premise is that 
the test controls the driver completely ie. is the sole user and the two 
checks are run at the time where nothing else could have changed the state.


With the nerfed request (as with GuC) this actually should hold. In the 
generic case I am less sure, my working knowledge faded a bit, but 
perhaps there was something guaranteeing the spinner couldn't have been 
retired yet at the time of the second check. Would need clarifying at 
least in comments.


2. the engine reset returns early with an error for guc submission 
engines, but it is silently ignored in the test. Perhaps it should 
ignore guc submission engines as it is a largely useless test for 
those situations.


Yes looks dodgy indeed. You will need to summon the owners of the GuC 
backend to comment on this.


However even if the test should be skipped with GuC it is extremely 
interesting that you are hitting this so I suspect there is a more 
serious issue at play.


indeed. That's why I am keen to get to the root cause instead of just 
slapping in a fix.




A quick obvious fix is to have a busy bitmask that remembers each 
page's busy state initially and only check for corruption if it was 
busy during both checks.


However, the main question is why this is occurring now with my changes.
I have added more debug to check where the stolen memory is being 
freed, but the first run last night didn't hit the issue for once.
I am running again now, will report back if I figure out where it is 
being freed.


I am pretty sure the "corruption" (which isn't actually corruption) 

Re: [Intel-gfx] [PATCH 2/3] drm/i915/ttm: don't overwrite cache_dirty after setting coherency

2022-06-28 Thread Robert Beckett




On 14/06/2022 18:55, Matthew Auld wrote:

On Tue, 14 Jun 2022 at 02:14, Adrian Larumbe
 wrote:


When i915_gem_object_set_cache_level sets the GEM object's cache_dirty to
true, in the case of TTM that will sometimes be overwritten when getting
the object's pages, more specifically for shmem-placed objects for which
its ttm structure has just been populated.

This wasn't an issue so far, even though intel_dpt_create was setting the
object's cache level to 'none', regardless of the platform and memory
placement of the framebuffer. However, commit 2f0ec95ed20c ("drm/i915/ttm:
dont trample cache_level overrides during ttm move") makes sure the cache
level set by older backends soon to be managed by TTM isn't altered after
their TTM bo ttm structure is populated.

However this led to the 'obj->cache_dirty = true' set in
i915_gem_object_set_cache_level to stick around rather than being reset
inside i915_ttm_adjust_gem_after_move after calling ttm_tt_populate in
__i915_ttm_get_pages, which eventually led to a warning in DGFX platforms.

There also seems to be no need for this statement to be kept in
i915_gem_object_set_cache_level, since i915_gem_object_set_cache_coherency
is already taking care of it, and also considering whether it's a discrete
platform.

Remove statement altogether.

Signed-off-by: Adrian Larumbe 
---
  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 3e5d6057b3ef..b2c9e16bfa55 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -273,10 +273,8 @@ int i915_gem_object_set_cache_level(struct 
drm_i915_gem_object *obj,
 return ret;

 /* Always invalidate stale cachelines */
-   if (obj->cache_level != cache_level) {
+   if (obj->cache_level != cache_level)
 i915_gem_object_set_cache_coherency(obj, cache_level);
-   obj->cache_dirty = true;


Maybe ban calling this on dgpu or have it fail silently? At the ioctl
level this should already be banned.

Ignoring dgpu, the cache_dirty handling is quite thorny on non-LLC
platforms. I'm not sure if there are other historical reasons for
having this here, but one big issue is that we are not allowed to
freely set cache_dirty = false, unless we are certain that the pages
underneath have been populated and the potential flush-on-acquire
completed. See the kernel-doc for @cache_dirty for more details.


given the commit "068b1bd09253 drm/i915: stop setting cache_dirty on 
discrete"
with it's justification of cache_dirty should not be set on discreet as 
it is not needed, I think this patch should change to set


obj->cache_dirty = !IS_DGFX(to_i915(obj->base.dev));

along with the assignment in flush_write_domain()

and should be considered a fix for that patch.

It should keep the asignment for integrated as it's original purpose 
still holds there.







-   }

 /* The cache-level will be applied when each vma is rebound. */
 return i915_gem_object_unbind(obj,
--
2.36.1



Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

2022-06-29 Thread Robert Beckett




On 28/06/2022 17:22, Robert Beckett wrote:



On 28/06/2022 09:46, Tvrtko Ursulin wrote:


On 27/06/2022 18:08, Robert Beckett wrote:



On 22/06/2022 10:05, Tvrtko Ursulin wrote:


On 21/06/2022 20:11, Robert Beckett wrote:



On 21/06/2022 18:37, Patchwork wrote:

*Patch Details*
*Series:*    drm/i915: ttm for stolen (rev5)
*URL:*    https://patchwork.freedesktop.org/series/101396/ 
<https://patchwork.freedesktop.org/series/101396/>

*State:*    failure
*Details:* 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html> 




  CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5


    Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5 absolutely 
need to be

verified manually.

If you think the reported changes have nothing to do with the changes
introduced in Patchwork_101396v5, please notify your bug team to 
allow them
to document this new failure mode, which will reduce false 
positives in CI.


External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 




    Participating hosts (40 -> 41)

Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus


    Possible new issues

Here are the unknown changes that may have been introduced in 
Patchwork_101396v5:



  IGT changes


    Possible regressions

  * igt@i915_selftest@live@reset:
  o bat-adlp-4: PASS
<https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@l...@reset.html> 


    -> DMESG-FAIL
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@l...@reset.html> 





I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and 
occasionally on bat-adlp-6.


Should bat-adlp-4 be considered an unreliable machine like 
bat-adlp-6 is for now?


Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude 
low pages (128KiB) of stolen from use"


could this be an indication that maybe the original issue is worse 
on adlp machines?
I have only ever seen page page 135 or 136 clobbered across many 
runs via trybot, so it looks fairly consistent.

Though excluding the use of over 540K of stolen might be too severe.


Don't know but I see that on the latest version you even hit pages 
165/166.


Any history of hitting this in CI without your series? If not, are 
there some other changes which could explain it? Are you touching 
the selftest itself?


Hexdump of the clobbered page looks quite complex. Especially 
POISON_FREE. Any idea how that ends up there?



(see 
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@l...@reset.html#dmesg-warnings702) 



after lots of slow debug via CI, it looks like the issue is that a 
ring buffer was allocated and taking up that page during the initial 
crc capture in the test, but by the time it came to check for 
corruption, it had been freed from that page.


The test has a number of weaknesses:

1. the busy check is done twice, without taking in to account any 
change in between. I assume previously this could be relied on never 
to occur, but now it can for some reason (more on that later)


You mean the stolen page used/unused test? Probably the premise is 
that the test controls the driver completely ie. is the sole user and 
the two checks are run at the time where nothing else could have 
changed the state.


With the nerfed request (as with GuC) this actually should hold. In 
the generic case I am less sure, my working knowledge faded a bit, but 
perhaps there was something guaranteeing the spinner couldn't have 
been retired yet at the time of the second check. Would need 
clarifying at least in comments.


2. the engine reset returns early with an error for guc submission 
engines, but it is silently ignored in the test. Perhaps it should 
ignore guc submission engines as it is a largely useless test for 
those situations.


Yes looks dodgy indeed. You will need to summon the owners of the GuC 
backend to comment on this.


However even if the test should be skipped with GuC it is extremely 
interesting that you are hitting this so I suspect there is a more 
serious issue at play.


indeed. That's why I am keen to get to the root cause instead of just 
slapping in a fix.




A quick obvious fix is to have a busy bitmask that remembers each 
page's busy state initially and only check for corruption if it was 
busy during both checks.


However, the main question is why this is occurring now with my changes.
I have added more debug to check where the stolen memory is being 
freed, but the first run last night didn't hit the issue for once.
I am running again now, will report back if I figure out where it is 
being freed.


I am pretty sure the "corruption&q

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

2022-06-30 Thread Robert Beckett




On 29/06/2022 13:51, Robert Beckett wrote:



On 28/06/2022 17:22, Robert Beckett wrote:



On 28/06/2022 09:46, Tvrtko Ursulin wrote:


On 27/06/2022 18:08, Robert Beckett wrote:



On 22/06/2022 10:05, Tvrtko Ursulin wrote:


On 21/06/2022 20:11, Robert Beckett wrote:



On 21/06/2022 18:37, Patchwork wrote:

*Patch Details*
*Series:*    drm/i915: ttm for stolen (rev5)
*URL:*    https://patchwork.freedesktop.org/series/101396/ 
<https://patchwork.freedesktop.org/series/101396/>

*State:*    failure
*Details:* 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html> 




  CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5


    Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5 absolutely 
need to be

verified manually.

If you think the reported changes have nothing to do with the 
changes
introduced in Patchwork_101396v5, please notify your bug team to 
allow them
to document this new failure mode, which will reduce false 
positives in CI.


External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 




    Participating hosts (40 -> 41)

Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus


    Possible new issues

Here are the unknown changes that may have been introduced in 
Patchwork_101396v5:



  IGT changes


    Possible regressions

  * igt@i915_selftest@live@reset:
  o bat-adlp-4: PASS
<https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@l...@reset.html> 


    -> DMESG-FAIL
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@l...@reset.html> 





I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and 
occasionally on bat-adlp-6.


Should bat-adlp-4 be considered an unreliable machine like 
bat-adlp-6 is for now?


Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude 
low pages (128KiB) of stolen from use"


could this be an indication that maybe the original issue is worse 
on adlp machines?
I have only ever seen page page 135 or 136 clobbered across many 
runs via trybot, so it looks fairly consistent.

Though excluding the use of over 540K of stolen might be too severe.


Don't know but I see that on the latest version you even hit pages 
165/166.


Any history of hitting this in CI without your series? If not, are 
there some other changes which could explain it? Are you touching 
the selftest itself?


Hexdump of the clobbered page looks quite complex. Especially 
POISON_FREE. Any idea how that ends up there?



(see 
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@l...@reset.html#dmesg-warnings702) 



after lots of slow debug via CI, it looks like the issue is that a 
ring buffer was allocated and taking up that page during the initial 
crc capture in the test, but by the time it came to check for 
corruption, it had been freed from that page.


The test has a number of weaknesses:

1. the busy check is done twice, without taking in to account any 
change in between. I assume previously this could be relied on never 
to occur, but now it can for some reason (more on that later)


You mean the stolen page used/unused test? Probably the premise is 
that the test controls the driver completely ie. is the sole user and 
the two checks are run at the time where nothing else could have 
changed the state.


With the nerfed request (as with GuC) this actually should hold. In 
the generic case I am less sure, my working knowledge faded a bit, 
but perhaps there was something guaranteeing the spinner couldn't 
have been retired yet at the time of the second check. Would need 
clarifying at least in comments.


2. the engine reset returns early with an error for guc submission 
engines, but it is silently ignored in the test. Perhaps it should 
ignore guc submission engines as it is a largely useless test for 
those situations.


Yes looks dodgy indeed. You will need to summon the owners of the GuC 
backend to comment on this.


However even if the test should be skipped with GuC it is extremely 
interesting that you are hitting this so I suspect there is a more 
serious issue at play.


indeed. That's why I am keen to get to the root cause instead of just 
slapping in a fix.




A quick obvious fix is to have a busy bitmask that remembers each 
page's busy state initially and only check for corruption if it was 
busy during both checks.


However, the main question is why this is occurring now with my 
changes.
I have added more debug to check where the stolen memory is being 
freed, but the first run last night didn't hit the issue for once.
I am running again now, will report back if I figure out where it is 
be

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

2022-06-30 Thread Robert Beckett




On 30/06/2022 15:52, Hellstrom, Thomas wrote:

Hi!

On Thu, 2022-06-30 at 15:20 +0100, Robert Beckett wrote:



On 29/06/2022 13:51, Robert Beckett wrote:



On 28/06/2022 17:22, Robert Beckett wrote:



On 28/06/2022 09:46, Tvrtko Ursulin wrote:


On 27/06/2022 18:08, Robert Beckett wrote:



On 22/06/2022 10:05, Tvrtko Ursulin wrote:


On 21/06/2022 20:11, Robert Beckett wrote:



On 21/06/2022 18:37, Patchwork wrote:

*Patch Details*
*Series:*    drm/i915: ttm for stolen (rev5)
*URL:*
https://patchwork.freedesktop.org/series/101396/
<https://patchwork.freedesktop.org/series/101396/>
*State:*    failure
*Details:*
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
  
<

https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html






   CI Bug Log - changes from CI_DRM_11790 ->
Patchwork_101396v5


     Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5
absolutely
need to be
verified manually.

If you think the reported changes have nothing to do
with the
changes
introduced in Patchwork_101396v5, please notify your
bug team to
allow them
to document this new failure mode, which will reduce
false
positives in CI.

External URL:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
  




     Participating hosts (40 -> 41)

Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus


     Possible new issues

Here are the unknown changes that may have been
introduced in
Patchwork_101396v5:


   IGT changes


     Possible regressions

   * igt@i915_selftest@live@reset:
   o bat-adlp-4: PASS
<
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@l...@reset.html




     -> DMESG-FAIL
<
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@l...@reset.html







I keep hitting clobbered pages during engine resets on
bat-adlp-4.
It seems to happen most of the time on that machine and
occasionally on bat-adlp-6.

Should bat-adlp-4 be considered an unreliable machine
like
bat-adlp-6 is for now?

Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969
"drm/i915: Exclude
low pages (128KiB) of stolen from use"

could this be an indication that maybe the original issue
is worse
on adlp machines?
I have only ever seen page page 135 or 136 clobbered
across many
runs via trybot, so it looks fairly consistent.
Though excluding the use of over 540K of stolen might be
too severe.


Don't know but I see that on the latest version you even
hit pages
165/166.

Any history of hitting this in CI without your series? If
not, are
there some other changes which could explain it? Are you
touching
the selftest itself?

Hexdump of the clobbered page looks quite complex.
Especially
POISON_FREE. Any idea how that ends up there?



(see
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@l...@reset.html#dmesg-warnings702
)


after lots of slow debug via CI, it looks like the issue is
that a
ring buffer was allocated and taking up that page during the
initial
crc capture in the test, but by the time it came to check for
corruption, it had been freed from that page.

The test has a number of weaknesses:

1. the busy check is done twice, without taking in to account
any
change in between. I assume previously this could be relied
on never
to occur, but now it can for some reason (more on that later)


You mean the stolen page used/unused test? Probably the premise
is
that the test controls the driver completely ie. is the sole
user and
the two checks are run at the time where nothing else could
have
changed the state.

With the nerfed request (as with GuC) this actually should
hold. In
the generic case I am less sure, my working knowledge faded a
bit,
but perhaps there was something guaranteeing the spinner
couldn't
have been retired yet at the time of the second check. Would
need
clarifying at least in comments.


2. the engine reset returns early with an error for guc
submission
engines, but it is silently ignored in the test. Perhaps it
should
ignore guc submission engines as it is a largely useless test
for
those situations.


Yes looks dodgy indeed. You will need to summon the owners of
the GuC
backend to comment on this.

However even if the test should be skipped with GuC it is
extremely
interesting that you are hitting this so I suspect there is a
more
serious issue at play.


indeed. That's why I am keen to get to the root cause instead of
just
slapping in a fix.




A quick obvious fix is to have a busy bitmask that remembers
each
page's busy state initially and only check for corruption if
it was
busy during both checks.

However, the main question is why this is occurring now with
my
changes.
I have added more debug to check where the stolen memory is
being
freed, but the first run last night didn't hit the issue for
once.
I am running ag

[Intel-gfx] [PATCH v9 00/11] drm/i915: ttm for stolen

2022-07-07 Thread Robert Beckett
This series refactors i915's stolen memory region to use ttm.

v2: handle disabled stolen similar to legacy version.
relying on ttm to fail allocs works fine, but is dmesg noisy and causes 
testing
dmesg warning regressions.

v3: rebase to latest drm-tip.
fix v2 code refactor which could leave a buffer pinned.
locally passes fftl again now.

v4: - Allow memory regions creators to do allocation. Allows stolen region 
to track
  it's own reservations.
- Pre-reserve first page of stolen mem (add back 
WaSkipStolenMemoryFirstPage:bdw+)
- Improve commit descritpion for "drm/i915: sanitize mem_flags for 
stolen buffers"
- replace i915_gem_object_pin_pages_unlocked() call with manual locking 
and pinning.
  this avoids ww ctx class reuse during context creation -> ring vma 
obj alloc.

v5: - detect both types of stolen as stolen buffers in
  "drm/i915: sanitize mem_flags for stolen buffers"
- in stolen_object_init limit page size to mem region minimum.
  The range allocator expects the page_size to define the
  alignment

v6: - Share first 4 patches from ttm for internal series as generic
  i915 ttm fixes
- Drop patch 4 from v5. We don't need separate object ops just
  to satisfy test interfaces. The tests have now been fixed via
  checking whether the memory region is private to decide
  whether to mmap
- Add new buffer pin alloc flag to allow creation of buffers in
  their final ttm placement instead of deferring until
  get_pages. This fixes legacy fallback paths for buffer
  allocations during stolen memory pressure.

v7: - fix mock_region_get_pages() to correctly handle I915_BO_INVALID_OFFSET

v8: - Reserve I915_GEM_STOLEN_BIAS area from stolen

v9: - drop patch 8 "drm/i915: allow memory region creators to alloc and 
free the region"
  store bias reservation in drm_i915_private instead.
- Restrict reset selftest to only test !GuC engines.
  Resetting individual GuC engines from host is not supported
- Wait for outstanding requests in reset selftest
  This prevents previous engine test context cleanup appearing
  as false positive stolen corruption check

Robert Beckett (11):
  drm/i915/ttm: dont trample cache_level overrides during ttm move
  drm/i915: limit ttm to dma32 for i965G[M]
  drm/i915/ttm: only trust snooping for dgfx when deciding default
cache_level
  drm/i915/gem: selftest should not attempt mmap of private regions
  drm/i915: instantiate ttm ranger manager for stolen memory
  drm/i915: sanitize mem_flags for stolen buffers
  drm/i915: ttm move/clear logic fix
  drm/i915/ttm: add buffer pin on alloc flag
  drm/i915/selftest: don't attempt engine reset of guc submission
engines
  drm/i915/selftest: wait for requests during engine reset selftest
  drm/i915: stolen memory use ttm backend

 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c|   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 425 ++
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  29 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  47 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|   3 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  42 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   6 +-
 drivers/gpu/drm/i915/intel_region_ttm.c   |  80 +++-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 16 files changed, 411 insertions(+), 375 deletions(-)

-- 
2.25.1



[Intel-gfx] [PATCH v9 01/11] drm/i915/ttm: dont trample cache_level overrides during ttm move

2022-07-07 Thread Robert Beckett
Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 9 ++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct 
drm_i915_gem_object *obj,
struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
obj->cache_level = cache_level;
+   obj->ttm.cache_level_override = true;
 
if (cache_level != I915_CACHE_NONE)
obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
struct i915_gem_object_page_iter get_io_page;
struct drm_i915_gem_object *backup;
bool created:1;
+   bool cache_level_override:1;
} ttm;
 
/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_gem_object_init_memory_region(obj, mem);
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
+   obj->ttm.cache_level_override = false;
i915_gem_object_unlock(obj);
 
return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
I915_BO_FLAG_STRUCT_PAGE;
 
-   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-  bo->ttm);
-   i915_gem_object_set_cache_coherency(obj, cache_level);
+   if (!obj->ttm.cache_level_override) {
+   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+  bo->resource, bo->ttm);
+   i915_gem_object_set_cache_coherency(obj, cache_level);
+   obj->ttm.cache_level_override = false;
+   }
 }
 
 /**
-- 
2.25.1



[Intel-gfx] [PATCH v9 03/11] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level

2022-07-07 Thread Robert Beckett
By default i915_ttm_cache_level() decides I915_CACHE_LLC if HAS_SNOOP.
This is divergent from existing backends code which only considers
HAS_LLC.
Testing shows that trusting snooping on gen5- is unreliable and bsw via
ggtt mappings, so limit DGFX for now and maintain previous behaviour.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 4c1de0b4a10f..40249fa28a7a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -46,7 +46,9 @@ static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 struct ttm_tt *ttm)
 {
-   return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+   bool can_snoop = HAS_SNOOP(i915) && IS_DGFX(i915);
+
+   return ((HAS_LLC(i915) || can_snoop) &&
!i915_ttm_gtt_binds_lmem(res) &&
ttm->caching == ttm_cached) ? I915_CACHE_LLC :
I915_CACHE_NONE;
-- 
2.25.1



[Intel-gfx] [PATCH v9 02/11] drm/i915: limit ttm to dma32 for i965G[M]

2022-07-07 Thread Robert Beckett
i965G[M] cannot relocate objects above 4GiB.
Ensure ttm uses dma32 on these systems.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index 62ff77445b01..fd2ecfdd8fa1 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -32,10 +32,15 @@
 int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 {
struct drm_device *drm = &dev_priv->drm;
+   bool use_dma32 = false;
+
+   /* i965g[m] cannot relocate objects above 4GiB. */
+   if (IS_I965GM(dev_priv) || IS_I965G(dev_priv))
+   use_dma32 = true;
 
return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
   drm->dev, drm->anon_inode->i_mapping,
-  drm->vma_offset_manager, false, false);
+  drm->vma_offset_manager, false, use_dma32);
 }
 
 /**
-- 
2.25.1



[Intel-gfx] [PATCH v9 04/11] drm/i915/gem: selftest should not attempt mmap of private regions

2022-07-07 Thread Robert Beckett
During testing make can_mmap consider whether the region is private.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum 
i915_mmap_type type)
struct drm_i915_private *i915 = to_i915(obj->base.dev);
bool no_map;
 
+   if (obj->mm.region && obj->mm.region->private)
+   return false;
+
if (obj->ops->mmap_offset)
return type == I915_MMAP_TYPE_FIXED;
else if (type == I915_MMAP_TYPE_FIXED)
-- 
2.25.1



[Intel-gfx] [PATCH v9 07/11] drm/i915: ttm move/clear logic fix

2022-07-07 Thread Robert Beckett
ttm managed buffers start off with system resource definitions and ttm_tt
tracking structures allocated (though unpopulated).
currently this prevents clearing of buffers on first move to desired
placements.

The desired behaviour is to clear user allocated buffers and any kernel
buffers that specifically requests it only.
Make the logic match the desired behaviour.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 22 +++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 81c67ca9edda..a3f8fc056dbc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,7 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include "drm/ttm/ttm_tt.h"
 #include 
 
 #include "i915_deps.h"
@@ -476,6 +477,25 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
return fence;
 }
 
+static bool
+allow_clear(struct drm_i915_gem_object *obj, struct ttm_tt *ttm, struct 
ttm_resource *dst_mem)
+{
+   /* never clear stolen */
+   if (dst_mem->mem_type == I915_PL_STOLEN)
+   return false;
+   /*
+* we want to clear user buffers and any kernel buffers
+* that specifically request clearing.
+*/
+   if (obj->flags & I915_BO_ALLOC_USER)
+   return true;
+
+   if (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)
+   return true;
+
+   return false;
+}
+
 /**
  * i915_ttm_move - The TTM move callback used by i915.
  * @bo: The buffer object.
@@ -526,7 +546,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
return PTR_ERR(dst_rsgt);
 
clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || 
!ttm_tt_is_populated(ttm));
-   if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+   if (!clear || allow_clear(obj, ttm, dst_mem)) {
struct i915_deps deps;
 
i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | 
__GFP_NOWARN);
-- 
2.25.1



[Intel-gfx] [PATCH v9 06/11] drm/i915: sanitize mem_flags for stolen buffers

2022-07-07 Thread Robert Beckett
Stolen regions are not page backed or considered iomem.
Prevent flags indicating such.
This correctly prevents stolen buffers from attempting to directly map
them.

See i915_gem_object_has_struct_page() and i915_gem_object_has_iomem()
usage for where it would break otherwise.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 675e9ab30396..81c67ca9edda 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
+#include "gem/i915_gem_stolen.h"
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
@@ -124,8 +125,9 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
 
obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 
-   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
-   I915_BO_FLAG_STRUCT_PAGE;
+   if (!i915_gem_object_is_stolen(obj))
+   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
+   I915_BO_FLAG_STRUCT_PAGE;
 
if (!obj->ttm.cache_level_override) {
cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
-- 
2.25.1



[Intel-gfx] [PATCH v9 05/11] drm/i915: instantiate ttm ranger manager for stolen memory

2022-07-07 Thread Robert Beckett
prepare for ttm based stolen region by using ttm range manager
as the resource manager for stolen region.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  6 ++--
 drivers/gpu/drm/i915/intel_region_ttm.c  | 31 +++-
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 40249fa28a7a..675e9ab30396 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -60,11 +60,13 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
 
/* There's some room for optimization here... */
-   GEM_BUG_ON(ttm_mem_type != I915_PL_SYSTEM &&
-  ttm_mem_type < I915_PL_LMEM0);
+   GEM_BUG_ON(ttm_mem_type == I915_PL_GGTT);
+
if (ttm_mem_type == I915_PL_SYSTEM)
return intel_memory_region_lookup(i915, INTEL_MEMORY_SYSTEM,
  0);
+   if (ttm_mem_type == I915_PL_STOLEN)
+   return i915->mm.stolen_region;
 
return intel_memory_region_lookup(i915, INTEL_MEMORY_LOCAL,
  ttm_mem_type - I915_PL_LMEM0);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index fd2ecfdd8fa1..694e9acb69e2 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -54,7 +54,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private 
*dev_priv)
 
 /*
  * Map the i915 memory regions to TTM memory types. We use the
- * driver-private types for now, reserving TTM_PL_VRAM for stolen
+ * driver-private types for now, reserving I915_PL_STOLEN for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
 int intel_region_to_ttm_type(const struct intel_memory_region *mem)
@@ -63,11 +63,17 @@ int intel_region_to_ttm_type(const struct 
intel_memory_region *mem)
 
GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
   mem->type != INTEL_MEMORY_MOCK &&
-  mem->type != INTEL_MEMORY_SYSTEM);
+  mem->type != INTEL_MEMORY_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
if (mem->type == INTEL_MEMORY_SYSTEM)
return TTM_PL_SYSTEM;
 
+   if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+   mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+   return I915_PL_STOLEN;
+
type = mem->instance + TTM_PL_PRIV;
GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -91,10 +97,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
int mem_type = intel_region_to_ttm_type(mem);
int ret;
 
-   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
- resource_size(&mem->region),
- mem->io_size,
- mem->min_page_size, PAGE_SIZE);
+   if (mem_type == I915_PL_STOLEN) {
+   ret = ttm_range_man_init(bdev, mem_type, false,
+resource_size(&mem->region) >> 
PAGE_SHIFT);
+   mem->is_range_manager = true;
+   } else {
+   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+ resource_size(&mem->region),
+ mem->io_size,
+ mem->min_page_size, PAGE_SIZE);
+   }
if (ret)
return ret;
 
@@ -114,6 +126,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
struct ttm_resource_manager *man = mem->region_private;
+   int mem_type = intel_region_to_ttm_type(mem);
int ret = -EBUSY;
int count;
 
@@ -144,8 +157,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
if (ret || !man)
return ret;
 
-   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
- intel_region_to_ttm_type(mem));
+   if (mem_type == I915_PL_STOLEN)
+   ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+   else
+   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
GEM_WARN_ON(ret);
mem->region_private = NULL;
 
-- 
2.25.1



[Intel-gfx] [PATCH v9 08/11] drm/i915/ttm: add buffer pin on alloc flag

2022-07-07 Thread Robert Beckett
For situations where allocations need to fail on alloc instead of
delayed get_pages, add a new alloc flag to pin the ttm bo.
This makes sure that the resource has been allocated during buffer
creation, allowing it to fail with an error if the placement is
exhausted.
This allows existing fallback options for stolen backend allocation like
create_ring_vma to work as expected.

Signed-off-by: Robert Beckett 
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 25 ++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 6632ed52e919..07bc11247a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -325,17 +325,20 @@ struct drm_i915_gem_object {
  * dealing with userspace objects the CPU fault handler is free to ignore this.
  */
 #define I915_BO_ALLOC_GPU_ONLY   BIT(6)
+/* object should be pinned in destination region from allocation */
+#define I915_BO_ALLOC_PINNED BIT(7)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
 I915_BO_ALLOC_USER | \
 I915_BO_ALLOC_PM_VOLATILE | \
 I915_BO_ALLOC_PM_EARLY | \
-I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY  BIT(7)
-#define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED BIT(9)
-#define I915_BO_WAS_BOUND_BIT 10
+I915_BO_ALLOC_GPU_ONLY | \
+I915_BO_ALLOC_PINNED)
+#define I915_BO_READONLY  BIT(8)
+#define I915_TILING_QUIRK_BIT 9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED BIT(10)
+#define I915_BO_WAS_BOUND_BIT 11
/**
 * @mem_flags - Mutable placement-related flags
 *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27d59639177f..bb988608296d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -998,6 +998,13 @@ static void i915_ttm_delayed_free(struct 
drm_i915_gem_object *obj)
 {
GEM_BUG_ON(!obj->ttm.created);
 
+   /* stolen objects are pinned for lifetime. Unpin before putting */
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   ttm_bo_reserve(i915_gem_to_ttm(obj), true, false, NULL);
+   ttm_bo_unpin(i915_gem_to_ttm(obj));
+   ttm_bo_unreserve(i915_gem_to_ttm(obj));
+   }
+
ttm_bo_put(i915_gem_to_ttm(obj));
 }
 
@@ -1193,6 +1200,9 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
.no_wait_gpu = false,
};
enum ttm_bo_type bo_type;
+   struct ttm_place _place;
+   struct ttm_placement _placement;
+   struct ttm_placement *placement;
int ret;
 
drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,6 +1232,17 @@ int __i915_gem_ttm_object_init(struct 
intel_memory_region *mem,
 */
i915_gem_object_make_unshrinkable(obj);
 
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   i915_ttm_place_from_region(mem, &_place, obj->bo_offset,
+  obj->base.size, obj->flags);
+   _placement.num_placement = 1;
+   _placement.placement = &_place;
+   _placement.num_busy_placement = 0;
+   _placement.busy_placement = NULL;
+   placement = &_placement;
+   } else {
+   placement = &i915_sys_placement;
+   }
/*
 * If this function fails, it will call the destructor, but
 * our caller still owns the object. So no freeing in the
@@ -1230,7 +1251,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
 * until successful initialization.
 */
ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
-  bo_type, &i915_sys_placement,
+  bo_type, placement,
   page_size >> PAGE_SHIFT,
   &ctx, NULL, NULL, i915_ttm_bo_destroy);
if (ret)
@@ -1242,6 +1263,8 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
obj->ttm.cache_level_override = false;
+   if (obj->flags & I915_BO_ALLOC_PINNED)
+   ttm_bo_pin(i915_gem_to_ttm(obj));
i915_gem_object_unlock(obj);
 
return 0;
-- 
2.25.1



[Intel-gfx] [PATCH v9 09/11] drm/i915/selftest: don't attempt engine reset of guc submission engines

2022-07-07 Thread Robert Beckett
igt_reset_engines_stolen tries to reset engines without checking if it
is possible.
Engines using GuC submission are not able to be reset from the host.

In this scenario, the reset exits early, then on the next iteration of
the each engine loop, the async teardown of the spinner request
context's ring occurs while the next engine is under test.

This is seen as a stolen memory corruption as the ring buffer was busy
initially, but free during the confirmation check and had been poisoned
during cleanup.

Fix this by not testing GuC submission using engines.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gt/selftest_reset.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c 
b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 37c38bdd5f47..55f3b34e5f6e 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -194,6 +194,8 @@ static int igt_reset_engines_stolen(void *arg)
return 0;
 
for_each_engine(engine, gt, id) {
+   if (intel_engine_uses_guc(engine))
+   continue;
err = __igt_reset_stolen(gt, engine->mask, engine->name);
if (err)
return err;
-- 
2.25.1



[Intel-gfx] [PATCH v9 10/11] drm/i915/selftest: wait for requests during engine reset selftest

2022-07-07 Thread Robert Beckett
While looping around each engine and testing for corrupted solen memory
during engine reset, the old requests from the previous engine can still
be yet to retire.
To prevent false positive corruption tests, wait for the outstanding
requests at the end of the test

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gt/selftest_reset.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c 
b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 55f3b34e5f6e..52acef647396 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -6,6 +6,7 @@
 #include 
 
 #include "gem/i915_gem_stolen.h"
+#include "gt/intel_gt.h"
 
 #include "i915_memcpy.h"
 #include "i915_selftest.h"
@@ -26,6 +27,7 @@ __igt_reset_stolen(struct intel_gt *gt,
intel_wakeref_t wakeref;
enum intel_engine_id id;
struct igt_spinner spin;
+   struct i915_request *requests[I915_NUM_ENGINES] = {0};
long max, count;
void *tmp;
u32 *crc;
@@ -77,6 +79,7 @@ __igt_reset_stolen(struct intel_gt *gt,
goto err_spin;
}
i915_request_add(rq);
+   requests[id] = i915_request_get(rq);
}
 
for (page = 0; page < num_pages; page++) {
@@ -165,6 +168,27 @@ __igt_reset_stolen(struct intel_gt *gt,
err = -EINVAL;
}
 
+   /* wait for requests and idle, otherwise cleanup can happen on next 
loop */
+   for (id = 0; id < I915_NUM_ENGINES; id++) {
+   if (!requests[id])
+   continue;
+   err = i915_request_wait(requests[id], I915_WAIT_INTERRUPTIBLE, 
HZ);
+   if (err < 0) {
+   pr_err("%s failed to wait for rq: %d\n", msg, err);
+   goto err_spin;
+   }
+
+   i915_request_put(requests[id]);
+   }
+
+   err = intel_gt_wait_for_idle(gt, HZ);
+   if (err < 0) {
+   pr_err("%s failed to wait for gt idle: %d\n", msg, err);
+   goto err_spin;
+   }
+
+   err = 0;
+
 err_spin:
igt_spinner_fini(&spin);
 
-- 
2.25.1



[Intel-gfx] [PATCH v9 11/11] drm/i915: stolen memory use ttm backend

2022-07-07 Thread Robert Beckett
refactor stolen memory region to use ttm.
this necessitates using ttm resources to track reserved stolen regions
instead of drm_mm_nodes.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 425 ++
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   6 +-
 drivers/gpu/drm/i915/intel_region_ttm.c   |  42 +-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 13 files changed, 280 insertions(+), 351 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c 
b/drivers/gpu/drm/i915/display/intel_fbc.c
index 8b807284cde1..6f3afac5e8c9 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -38,6 +38,7 @@
  * forcibly disable it to allow proper screen updates.
  */
 
+#include "gem/i915_gem_stolen.h"
 #include 
 
 #include 
@@ -51,6 +52,7 @@
 #include "intel_display_types.h"
 #include "intel_fbc.h"
 #include "intel_frontbuffer.h"
+#include "gem/i915_gem_region.h"
 
 #define for_each_fbc_id(__dev_priv, __fbc_id) \
for ((__fbc_id) = INTEL_FBC_A; (__fbc_id) < I915_MAX_FBCS; 
(__fbc_id)++) \
@@ -92,8 +94,8 @@ struct intel_fbc {
struct mutex lock;
unsigned int busy_bits;
 
-   struct drm_mm_node compressed_fb;
-   struct drm_mm_node compressed_llb;
+   struct ttm_resource *compressed_fb;
+   struct ttm_resource *compressed_llb;
 
enum intel_fbc_id id;
 
@@ -331,16 +333,20 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
+   u64 llb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_llb);
 
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   GEM_BUG_ON(llb_offset == I915_BO_INVALID_OFFSET);
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_fb.start, U32_MAX));
+fb_offset, U32_MAX));
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_llb.start, U32_MAX));
+llb_offset, U32_MAX));
 
intel_de_write(i915, FBC_CFB_BASE,
-  i915->dsm.start + fbc->compressed_fb.start);
+  i915->dsm.start + fb_offset);
intel_de_write(i915, FBC_LL_BASE,
-  i915->dsm.start + fbc->compressed_llb.start);
+  i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -448,8 +454,10 @@ static bool g4x_fbc_is_compressing(struct intel_fbc *fbc)
 static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, DPFC_CB_BASE, fb_offset);
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -499,8 +507,10 @@ static bool ilk_fbc_is_compressing(struct intel_fbc *fbc)
 static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), 
fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fb_offset);
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -744,21 +754,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
struct drm_i915_private *i915 = fbc->i915;
u64 end = intel_fbc_stolen_end(i915);
-   int ret, limit = min_limit;
+   int limit = min_limit;
+   struct ttm_resource *res;
 
size /= limit;
 
/* Try to over-allocate to reduce reallocations and fragmentation. */
-   ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-  size <<= 1, 4096, 0, end);
-   if (ret == 0)
+   res = i915_gem_stolen_reserve_range(i915, size <<= 1, 0, 

[Intel-gfx] [PATCH v10 00/11] drm/i915: ttm for stolen

2022-07-07 Thread Robert Beckett
This series refactors i915's stolen memory region to use ttm.

v2: handle disabled stolen similar to legacy version.
relying on ttm to fail allocs works fine, but is dmesg noisy and causes 
testing
dmesg warning regressions.

v3: rebase to latest drm-tip.
fix v2 code refactor which could leave a buffer pinned.
locally passes fftl again now.

v4: - Allow memory regions creators to do allocation. Allows stolen region 
to track
  it's own reservations.
- Pre-reserve first page of stolen mem (add back 
WaSkipStolenMemoryFirstPage:bdw+)
- Improve commit descritpion for "drm/i915: sanitize mem_flags for 
stolen buffers"
- replace i915_gem_object_pin_pages_unlocked() call with manual locking 
and pinning.
  this avoids ww ctx class reuse during context creation -> ring vma 
obj alloc.

v5: - detect both types of stolen as stolen buffers in
  "drm/i915: sanitize mem_flags for stolen buffers"
- in stolen_object_init limit page size to mem region minimum.
  The range allocator expects the page_size to define the
  alignment

v6: - Share first 4 patches from ttm for internal series as generic
  i915 ttm fixes
- Drop patch 4 from v5. We don't need separate object ops just
  to satisfy test interfaces. The tests have now been fixed via
  checking whether the memory region is private to decide
  whether to mmap
- Add new buffer pin alloc flag to allow creation of buffers in
  their final ttm placement instead of deferring until
  get_pages. This fixes legacy fallback paths for buffer
  allocations during stolen memory pressure.

v7: - fix mock_region_get_pages() to correctly handle I915_BO_INVALID_OFFSET

v8: - Reserve I915_GEM_STOLEN_BIAS area from stolen

v9: - drop patch 8 "drm/i915: allow memory region creators to alloc and 
free the region"
  store bias reservation in drm_i915_private instead.
- Restrict reset selftest to only test !GuC engines.
  Resetting individual GuC engines from host is not supported
- Wait for outstanding requests in reset selftest
  This prevents previous engine test context cleanup appearing
  as false positive stolen corruption check

v10:- Fix wiating on requests early error path during reset selftest
  If a single request fails to complete, the others would not be
  put, resulting in leaks. Make sure all requests are put before
  test exit.

Robert Beckett (11):
  drm/i915/ttm: dont trample cache_level overrides during ttm move
  drm/i915: limit ttm to dma32 for i965G[M]
  drm/i915/ttm: only trust snooping for dgfx when deciding default
cache_level
  drm/i915/gem: selftest should not attempt mmap of private regions
  drm/i915: instantiate ttm ranger manager for stolen memory
  drm/i915: sanitize mem_flags for stolen buffers
  drm/i915: ttm move/clear logic fix
  drm/i915/ttm: add buffer pin on alloc flag
  drm/i915/selftest: don't attempt engine reset of guc submission
engines
  drm/i915/selftest: wait for requests during engine reset selftest
  drm/i915: stolen memory use ttm backend

 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c|   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 425 ++
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  29 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  47 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|   3 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  53 ++-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   6 +-
 drivers/gpu/drm/i915/intel_region_ttm.c   |  80 +++-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 16 files changed, 420 insertions(+), 377 deletions(-)

-- 
2.25.1



[Intel-gfx] [PATCH v10 01/11] drm/i915/ttm: dont trample cache_level overrides during ttm move

2022-07-07 Thread Robert Beckett
Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 9 ++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct 
drm_i915_gem_object *obj,
struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
obj->cache_level = cache_level;
+   obj->ttm.cache_level_override = true;
 
if (cache_level != I915_CACHE_NONE)
obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
struct i915_gem_object_page_iter get_io_page;
struct drm_i915_gem_object *backup;
bool created:1;
+   bool cache_level_override:1;
} ttm;
 
/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_gem_object_init_memory_region(obj, mem);
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
+   obj->ttm.cache_level_override = false;
i915_gem_object_unlock(obj);
 
return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
I915_BO_FLAG_STRUCT_PAGE;
 
-   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-  bo->ttm);
-   i915_gem_object_set_cache_coherency(obj, cache_level);
+   if (!obj->ttm.cache_level_override) {
+   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+  bo->resource, bo->ttm);
+   i915_gem_object_set_cache_coherency(obj, cache_level);
+   obj->ttm.cache_level_override = false;
+   }
 }
 
 /**
-- 
2.25.1



[Intel-gfx] [PATCH v10 07/11] drm/i915: ttm move/clear logic fix

2022-07-07 Thread Robert Beckett
ttm managed buffers start off with system resource definitions and ttm_tt
tracking structures allocated (though unpopulated).
currently this prevents clearing of buffers on first move to desired
placements.

The desired behaviour is to clear user allocated buffers and any kernel
buffers that specifically requests it only.
Make the logic match the desired behaviour.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 22 +++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 81c67ca9edda..a3f8fc056dbc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,7 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include "drm/ttm/ttm_tt.h"
 #include 
 
 #include "i915_deps.h"
@@ -476,6 +477,25 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
return fence;
 }
 
+static bool
+allow_clear(struct drm_i915_gem_object *obj, struct ttm_tt *ttm, struct 
ttm_resource *dst_mem)
+{
+   /* never clear stolen */
+   if (dst_mem->mem_type == I915_PL_STOLEN)
+   return false;
+   /*
+* we want to clear user buffers and any kernel buffers
+* that specifically request clearing.
+*/
+   if (obj->flags & I915_BO_ALLOC_USER)
+   return true;
+
+   if (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)
+   return true;
+
+   return false;
+}
+
 /**
  * i915_ttm_move - The TTM move callback used by i915.
  * @bo: The buffer object.
@@ -526,7 +546,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
return PTR_ERR(dst_rsgt);
 
clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || 
!ttm_tt_is_populated(ttm));
-   if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+   if (!clear || allow_clear(obj, ttm, dst_mem)) {
struct i915_deps deps;
 
i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | 
__GFP_NOWARN);
-- 
2.25.1



[Intel-gfx] [PATCH v10 05/11] drm/i915: instantiate ttm ranger manager for stolen memory

2022-07-07 Thread Robert Beckett
prepare for ttm based stolen region by using ttm range manager
as the resource manager for stolen region.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  6 ++--
 drivers/gpu/drm/i915/intel_region_ttm.c  | 31 +++-
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 40249fa28a7a..675e9ab30396 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -60,11 +60,13 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
 
/* There's some room for optimization here... */
-   GEM_BUG_ON(ttm_mem_type != I915_PL_SYSTEM &&
-  ttm_mem_type < I915_PL_LMEM0);
+   GEM_BUG_ON(ttm_mem_type == I915_PL_GGTT);
+
if (ttm_mem_type == I915_PL_SYSTEM)
return intel_memory_region_lookup(i915, INTEL_MEMORY_SYSTEM,
  0);
+   if (ttm_mem_type == I915_PL_STOLEN)
+   return i915->mm.stolen_region;
 
return intel_memory_region_lookup(i915, INTEL_MEMORY_LOCAL,
  ttm_mem_type - I915_PL_LMEM0);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index fd2ecfdd8fa1..694e9acb69e2 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -54,7 +54,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private 
*dev_priv)
 
 /*
  * Map the i915 memory regions to TTM memory types. We use the
- * driver-private types for now, reserving TTM_PL_VRAM for stolen
+ * driver-private types for now, reserving I915_PL_STOLEN for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
 int intel_region_to_ttm_type(const struct intel_memory_region *mem)
@@ -63,11 +63,17 @@ int intel_region_to_ttm_type(const struct 
intel_memory_region *mem)
 
GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
   mem->type != INTEL_MEMORY_MOCK &&
-  mem->type != INTEL_MEMORY_SYSTEM);
+  mem->type != INTEL_MEMORY_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
if (mem->type == INTEL_MEMORY_SYSTEM)
return TTM_PL_SYSTEM;
 
+   if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+   mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+   return I915_PL_STOLEN;
+
type = mem->instance + TTM_PL_PRIV;
GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -91,10 +97,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
int mem_type = intel_region_to_ttm_type(mem);
int ret;
 
-   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
- resource_size(&mem->region),
- mem->io_size,
- mem->min_page_size, PAGE_SIZE);
+   if (mem_type == I915_PL_STOLEN) {
+   ret = ttm_range_man_init(bdev, mem_type, false,
+resource_size(&mem->region) >> 
PAGE_SHIFT);
+   mem->is_range_manager = true;
+   } else {
+   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+ resource_size(&mem->region),
+ mem->io_size,
+ mem->min_page_size, PAGE_SIZE);
+   }
if (ret)
return ret;
 
@@ -114,6 +126,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
struct ttm_resource_manager *man = mem->region_private;
+   int mem_type = intel_region_to_ttm_type(mem);
int ret = -EBUSY;
int count;
 
@@ -144,8 +157,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
if (ret || !man)
return ret;
 
-   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
- intel_region_to_ttm_type(mem));
+   if (mem_type == I915_PL_STOLEN)
+   ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+   else
+   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
GEM_WARN_ON(ret);
mem->region_private = NULL;
 
-- 
2.25.1



[Intel-gfx] [PATCH v10 04/11] drm/i915/gem: selftest should not attempt mmap of private regions

2022-07-07 Thread Robert Beckett
During testing make can_mmap consider whether the region is private.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum 
i915_mmap_type type)
struct drm_i915_private *i915 = to_i915(obj->base.dev);
bool no_map;
 
+   if (obj->mm.region && obj->mm.region->private)
+   return false;
+
if (obj->ops->mmap_offset)
return type == I915_MMAP_TYPE_FIXED;
else if (type == I915_MMAP_TYPE_FIXED)
-- 
2.25.1



[Intel-gfx] [PATCH v10 09/11] drm/i915/selftest: don't attempt engine reset of guc submission engines

2022-07-07 Thread Robert Beckett
igt_reset_engines_stolen tries to reset engines without checking if it
is possible.
Engines using GuC submission are not able to be reset from the host.

In this scenario, the reset exits early, then on the next iteration of
the each engine loop, the async teardown of the spinner request
context's ring occurs while the next engine is under test.

This is seen as a stolen memory corruption as the ring buffer was busy
initially, but free during the confirmation check and had been poisoned
during cleanup.

Fix this by not testing GuC submission using engines.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gt/selftest_reset.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c 
b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 37c38bdd5f47..55f3b34e5f6e 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -194,6 +194,8 @@ static int igt_reset_engines_stolen(void *arg)
return 0;
 
for_each_engine(engine, gt, id) {
+   if (intel_engine_uses_guc(engine))
+   continue;
err = __igt_reset_stolen(gt, engine->mask, engine->name);
if (err)
return err;
-- 
2.25.1



[Intel-gfx] [PATCH v10 06/11] drm/i915: sanitize mem_flags for stolen buffers

2022-07-07 Thread Robert Beckett
Stolen regions are not page backed or considered iomem.
Prevent flags indicating such.
This correctly prevents stolen buffers from attempting to directly map
them.

See i915_gem_object_has_struct_page() and i915_gem_object_has_iomem()
usage for where it would break otherwise.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 675e9ab30396..81c67ca9edda 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
+#include "gem/i915_gem_stolen.h"
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
@@ -124,8 +125,9 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
 
obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 
-   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
-   I915_BO_FLAG_STRUCT_PAGE;
+   if (!i915_gem_object_is_stolen(obj))
+   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
+   I915_BO_FLAG_STRUCT_PAGE;
 
if (!obj->ttm.cache_level_override) {
cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
-- 
2.25.1



[Intel-gfx] [PATCH v10 10/11] drm/i915/selftest: wait for requests during engine reset selftest

2022-07-07 Thread Robert Beckett
While looping around each engine and testing for corrupted solen memory
during engine reset, the old requests from the previous engine can still
be yet to retire.
To prevent false positive corruption tests, wait for the outstanding
requests at the end of the test

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gt/selftest_reset.c | 35 ++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c 
b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 55f3b34e5f6e..a2558bc31408 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -6,6 +6,7 @@
 #include 
 
 #include "gem/i915_gem_stolen.h"
+#include "gt/intel_gt.h"
 
 #include "i915_memcpy.h"
 #include "i915_selftest.h"
@@ -26,6 +27,7 @@ __igt_reset_stolen(struct intel_gt *gt,
intel_wakeref_t wakeref;
enum intel_engine_id id;
struct igt_spinner spin;
+   struct i915_request *requests[I915_NUM_ENGINES] = {0};
long max, count;
void *tmp;
u32 *crc;
@@ -68,15 +70,16 @@ __igt_reset_stolen(struct intel_gt *gt,
ce = intel_context_create(engine);
if (IS_ERR(ce)) {
err = PTR_ERR(ce);
-   goto err_spin;
+   goto err_requests;
}
rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
intel_context_put(ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
-   goto err_spin;
+   goto err_requests;
}
i915_request_add(rq);
+   requests[id] = i915_request_get(rq);
}
 
for (page = 0; page < num_pages; page++) {
@@ -165,6 +168,34 @@ __igt_reset_stolen(struct intel_gt *gt,
err = -EINVAL;
}
 
+   /* wait for requests and idle, otherwise cleanup can happen on next 
loop */
+   for (id = 0; id < I915_NUM_ENGINES; id++) {
+   if (!requests[id])
+   continue;
+   err = i915_request_wait(requests[id], I915_WAIT_INTERRUPTIBLE, 
HZ);
+   if (err < 0) {
+   pr_err("%s failed to wait for rq: %d\n", msg, err);
+   goto err_requests;
+   }
+
+   i915_request_put(requests[id]);
+   requests[id] = NULL;
+   }
+
+   err = intel_gt_wait_for_idle(gt, HZ);
+   if (err < 0) {
+   pr_err("%s failed to wait for gt idle: %d\n", msg, err);
+   goto err_spin;
+   }
+
+   err = 0;
+
+err_requests:
+   for (id = 0; id < I915_NUM_ENGINES; id++) {
+   if (!requests[id])
+   continue;
+   i915_request_put(requests[id]);
+   }
 err_spin:
igt_spinner_fini(&spin);
 
-- 
2.25.1



[Intel-gfx] [PATCH v10 03/11] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level

2022-07-07 Thread Robert Beckett
By default i915_ttm_cache_level() decides I915_CACHE_LLC if HAS_SNOOP.
This is divergent from existing backends code which only considers
HAS_LLC.
Testing shows that trusting snooping on gen5- is unreliable and bsw via
ggtt mappings, so limit DGFX for now and maintain previous behaviour.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 4c1de0b4a10f..40249fa28a7a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -46,7 +46,9 @@ static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 struct ttm_tt *ttm)
 {
-   return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+   bool can_snoop = HAS_SNOOP(i915) && IS_DGFX(i915);
+
+   return ((HAS_LLC(i915) || can_snoop) &&
!i915_ttm_gtt_binds_lmem(res) &&
ttm->caching == ttm_cached) ? I915_CACHE_LLC :
I915_CACHE_NONE;
-- 
2.25.1



[Intel-gfx] [PATCH v10 08/11] drm/i915/ttm: add buffer pin on alloc flag

2022-07-07 Thread Robert Beckett
For situations where allocations need to fail on alloc instead of
delayed get_pages, add a new alloc flag to pin the ttm bo.
This makes sure that the resource has been allocated during buffer
creation, allowing it to fail with an error if the placement is
exhausted.
This allows existing fallback options for stolen backend allocation like
create_ring_vma to work as expected.

Signed-off-by: Robert Beckett 
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 25 ++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 6632ed52e919..07bc11247a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -325,17 +325,20 @@ struct drm_i915_gem_object {
  * dealing with userspace objects the CPU fault handler is free to ignore this.
  */
 #define I915_BO_ALLOC_GPU_ONLY   BIT(6)
+/* object should be pinned in destination region from allocation */
+#define I915_BO_ALLOC_PINNED BIT(7)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
 I915_BO_ALLOC_USER | \
 I915_BO_ALLOC_PM_VOLATILE | \
 I915_BO_ALLOC_PM_EARLY | \
-I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY  BIT(7)
-#define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED BIT(9)
-#define I915_BO_WAS_BOUND_BIT 10
+I915_BO_ALLOC_GPU_ONLY | \
+I915_BO_ALLOC_PINNED)
+#define I915_BO_READONLY  BIT(8)
+#define I915_TILING_QUIRK_BIT 9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED BIT(10)
+#define I915_BO_WAS_BOUND_BIT 11
/**
 * @mem_flags - Mutable placement-related flags
 *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27d59639177f..bb988608296d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -998,6 +998,13 @@ static void i915_ttm_delayed_free(struct 
drm_i915_gem_object *obj)
 {
GEM_BUG_ON(!obj->ttm.created);
 
+   /* stolen objects are pinned for lifetime. Unpin before putting */
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   ttm_bo_reserve(i915_gem_to_ttm(obj), true, false, NULL);
+   ttm_bo_unpin(i915_gem_to_ttm(obj));
+   ttm_bo_unreserve(i915_gem_to_ttm(obj));
+   }
+
ttm_bo_put(i915_gem_to_ttm(obj));
 }
 
@@ -1193,6 +1200,9 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
.no_wait_gpu = false,
};
enum ttm_bo_type bo_type;
+   struct ttm_place _place;
+   struct ttm_placement _placement;
+   struct ttm_placement *placement;
int ret;
 
drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,6 +1232,17 @@ int __i915_gem_ttm_object_init(struct 
intel_memory_region *mem,
 */
i915_gem_object_make_unshrinkable(obj);
 
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   i915_ttm_place_from_region(mem, &_place, obj->bo_offset,
+  obj->base.size, obj->flags);
+   _placement.num_placement = 1;
+   _placement.placement = &_place;
+   _placement.num_busy_placement = 0;
+   _placement.busy_placement = NULL;
+   placement = &_placement;
+   } else {
+   placement = &i915_sys_placement;
+   }
/*
 * If this function fails, it will call the destructor, but
 * our caller still owns the object. So no freeing in the
@@ -1230,7 +1251,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
 * until successful initialization.
 */
ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
-  bo_type, &i915_sys_placement,
+  bo_type, placement,
   page_size >> PAGE_SHIFT,
   &ctx, NULL, NULL, i915_ttm_bo_destroy);
if (ret)
@@ -1242,6 +1263,8 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
obj->ttm.cache_level_override = false;
+   if (obj->flags & I915_BO_ALLOC_PINNED)
+   ttm_bo_pin(i915_gem_to_ttm(obj));
i915_gem_object_unlock(obj);
 
return 0;
-- 
2.25.1



[Intel-gfx] [PATCH v10 02/11] drm/i915: limit ttm to dma32 for i965G[M]

2022-07-07 Thread Robert Beckett
i965G[M] cannot relocate objects above 4GiB.
Ensure ttm uses dma32 on these systems.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index 62ff77445b01..fd2ecfdd8fa1 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -32,10 +32,15 @@
 int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 {
struct drm_device *drm = &dev_priv->drm;
+   bool use_dma32 = false;
+
+   /* i965g[m] cannot relocate objects above 4GiB. */
+   if (IS_I965GM(dev_priv) || IS_I965G(dev_priv))
+   use_dma32 = true;
 
return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
   drm->dev, drm->anon_inode->i_mapping,
-  drm->vma_offset_manager, false, false);
+  drm->vma_offset_manager, false, use_dma32);
 }
 
 /**
-- 
2.25.1



[Intel-gfx] [PATCH v10 11/11] drm/i915: stolen memory use ttm backend

2022-07-07 Thread Robert Beckett
refactor stolen memory region to use ttm.
this necessitates using ttm resources to track reserved stolen regions
instead of drm_mm_nodes.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 425 ++
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   6 +-
 drivers/gpu/drm/i915/intel_region_ttm.c   |  42 +-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 13 files changed, 280 insertions(+), 351 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c 
b/drivers/gpu/drm/i915/display/intel_fbc.c
index 8b807284cde1..6f3afac5e8c9 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -38,6 +38,7 @@
  * forcibly disable it to allow proper screen updates.
  */
 
+#include "gem/i915_gem_stolen.h"
 #include 
 
 #include 
@@ -51,6 +52,7 @@
 #include "intel_display_types.h"
 #include "intel_fbc.h"
 #include "intel_frontbuffer.h"
+#include "gem/i915_gem_region.h"
 
 #define for_each_fbc_id(__dev_priv, __fbc_id) \
for ((__fbc_id) = INTEL_FBC_A; (__fbc_id) < I915_MAX_FBCS; 
(__fbc_id)++) \
@@ -92,8 +94,8 @@ struct intel_fbc {
struct mutex lock;
unsigned int busy_bits;
 
-   struct drm_mm_node compressed_fb;
-   struct drm_mm_node compressed_llb;
+   struct ttm_resource *compressed_fb;
+   struct ttm_resource *compressed_llb;
 
enum intel_fbc_id id;
 
@@ -331,16 +333,20 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
+   u64 llb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_llb);
 
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   GEM_BUG_ON(llb_offset == I915_BO_INVALID_OFFSET);
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_fb.start, U32_MAX));
+fb_offset, U32_MAX));
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_llb.start, U32_MAX));
+llb_offset, U32_MAX));
 
intel_de_write(i915, FBC_CFB_BASE,
-  i915->dsm.start + fbc->compressed_fb.start);
+  i915->dsm.start + fb_offset);
intel_de_write(i915, FBC_LL_BASE,
-  i915->dsm.start + fbc->compressed_llb.start);
+  i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -448,8 +454,10 @@ static bool g4x_fbc_is_compressing(struct intel_fbc *fbc)
 static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, DPFC_CB_BASE, fb_offset);
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -499,8 +507,10 @@ static bool ilk_fbc_is_compressing(struct intel_fbc *fbc)
 static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), 
fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fb_offset);
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -744,21 +754,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
struct drm_i915_private *i915 = fbc->i915;
u64 end = intel_fbc_stolen_end(i915);
-   int ret, limit = min_limit;
+   int limit = min_limit;
+   struct ttm_resource *res;
 
size /= limit;
 
/* Try to over-allocate to reduce reallocations and fragmentation. */
-   ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-  size <<= 1, 4096, 0, end);
-   if (ret == 0)
+   res = i915_gem_stolen_reserve_range(i915, size <<= 1, 0, 

Re: [Intel-gfx] [PATCH v10 04/11] drm/i915/gem: selftest should not attempt mmap of private regions

2022-07-08 Thread Robert Beckett




On 08/07/2022 08:53, Matthew Auld wrote:

On 07/07/2022 21:02, Robert Beckett wrote:

During testing make can_mmap consider whether the region is private.


Do we still need this with: 938d2fd17d17 ("drm/i915/selftests: skip the 
mman tests for stolen") ?


huh, I guess not. That wasn't in my tree. I guess I should rebase.

Looking at it, my patch would have been preferable initially I think. 
Each location of the additional checks in that patch first call 
cam_mmap(), which I think is the most appropriate place to make the 
decision.


I could do a replacement patch that reverts that one if preferred, or we 
can leave it as is and I will drop this patch.







Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
  drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c

index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object 
*obj, enum i915_mmap_type type)

  struct drm_i915_private *i915 = to_i915(obj->base.dev);
  bool no_map;
+    if (obj->mm.region && obj->mm.region->private)
+    return false;
+
  if (obj->ops->mmap_offset)
  return type == I915_MMAP_TYPE_FIXED;
  else if (type == I915_MMAP_TYPE_FIXED)


Re: [Intel-gfx] [PATCH v10 04/11] drm/i915/gem: selftest should not attempt mmap of private regions

2022-07-08 Thread Robert Beckett




On 08/07/2022 14:27, Matthew Auld wrote:

On 08/07/2022 14:22, Robert Beckett wrote:



On 08/07/2022 08:53, Matthew Auld wrote:

On 07/07/2022 21:02, Robert Beckett wrote:

During testing make can_mmap consider whether the region is private.


Do we still need this with: 938d2fd17d17 ("drm/i915/selftests: skip 
the mman tests for stolen") ?


huh, I guess not. That wasn't in my tree. I guess I should rebase.

Looking at it, my patch would have been preferable initially I think. 
Each location of the additional checks in that patch first call 
cam_mmap(), which I think is the most appropriate place to make the 
decision.


It fails at the object_create() I think (on small-BAR I mean), which is 
before we can call can_mmap(), passing in the object.


ah, okay. That makes sense to keep as is then.
I'll drop this patch.
Thanks.





I could do a replacement patch that reverts that one if preferred, or 
we can leave it as is and I will drop this patch.







Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
  drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c

index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object 
*obj, enum i915_mmap_type type)

  struct drm_i915_private *i915 = to_i915(obj->base.dev);
  bool no_map;
+    if (obj->mm.region && obj->mm.region->private)
+    return false;
+
  if (obj->ops->mmap_offset)
  return type == I915_MMAP_TYPE_FIXED;
  else if (type == I915_MMAP_TYPE_FIXED)


Re: [Intel-gfx] [PATCH] drm/i915/dg2: make GuC FW a requirement for Gen12 and beyond devices

2021-12-08 Thread Robert Beckett




On 07/12/2021 23:15, John Harrison wrote:

On 12/7/2021 09:53, Adrian Larumbe wrote:

Beginning with DG2, all successive devices will require GuC FW to be
present and loaded at probe() time. This change alters error handling in
the FW init and load functions so that the driver's probe() function will
fail if GuC could not be loaded.
We still need to load the i915 driver in fall back mode (display but no 
engines) if the GuC is missing. Otherwise you may have just bricked the 
user's device.


good point, well made.
though this still seems like an issue for gen12+ (excluding rkl and adl).

maybe a redesign of toplevel driver probe, with i915_driver_early_probe 
before i915_driver_create could work. If the GuC fw is not found, it 
could then register a new kms only version of i915_drm_driver.


or something like like that ...



Also, we do want to be able to disable the GuC via the enable_guc module 
parameter.


John.



Signed-off-by: Adrian Larumbe 
---
  drivers/gpu/drm/i915/gt/uc/intel_uc.c | 20 
  drivers/gpu/drm/i915/gt/uc/intel_uc.h |  4 ++--
  drivers/gpu/drm/i915/i915_gem.c   |  7 ++-
  3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c

index 7660eba893fa..8b0778b6d9ab 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -277,14 +277,19 @@ static void guc_disable_communication(struct 
intel_guc *guc)

  drm_dbg(&i915->drm, "GuC communication disabled\n");
  }
-static void __uc_fetch_firmwares(struct intel_uc *uc)
+static int __uc_fetch_firmwares(struct intel_uc *uc)
  {
+    struct drm_i915_private *i915 = uc_to_gt(uc)->i915;
  int err;
  GEM_BUG_ON(!intel_uc_wants_guc(uc));
  err = intel_uc_fw_fetch(&uc->guc.fw);
  if (err) {
+    /* GuC is mandatory on Gen12 and beyond */
+    if (GRAPHICS_VER(i915) >= 12)
+    return err;
+
  /* Make sure we transition out of transient "SELECTED" state */
  if (intel_uc_wants_huc(uc)) {
  drm_dbg(&uc_to_gt(uc)->i915->drm,
@@ -293,11 +298,13 @@ static void __uc_fetch_firmwares(struct intel_uc 
*uc)

    INTEL_UC_FIRMWARE_ERROR);
  }
-    return;
+    return 0;
  }
  if (intel_uc_wants_huc(uc))
  intel_uc_fw_fetch(&uc->huc.fw);
+
+    return 0;
  }
  static void __uc_cleanup_firmwares(struct intel_uc *uc)
@@ -308,14 +315,19 @@ static void __uc_cleanup_firmwares(struct 
intel_uc *uc)

  static int __uc_init(struct intel_uc *uc)
  {
+    struct drm_i915_private *i915 = uc_to_gt(uc)->i915;
  struct intel_guc *guc = &uc->guc;
  struct intel_huc *huc = &uc->huc;
  int ret;
  GEM_BUG_ON(!intel_uc_wants_guc(uc));
-    if (!intel_uc_uses_guc(uc))
-    return 0;
+    if (!intel_uc_uses_guc(uc)) {
+    if (GRAPHICS_VER(i915) >= 12)
+    return -EINVAL;
+    else
+    return 0;
+    }
  if (i915_inject_probe_failure(uc_to_gt(uc)->i915))
  return -ENOMEM;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.h

index 866b462821c0..3bcd781447bc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -17,7 +17,7 @@ struct intel_uc;
  struct intel_uc_ops {
  int (*sanitize)(struct intel_uc *uc);
-    void (*init_fw)(struct intel_uc *uc);
+    int (*init_fw)(struct intel_uc *uc);
  void (*fini_fw)(struct intel_uc *uc);
  int (*init)(struct intel_uc *uc);
  void (*fini)(struct intel_uc *uc);
@@ -104,7 +104,7 @@ static inline _TYPE intel_uc_##_NAME(struct 
intel_uc *uc) \

  return _RET; \
  }
  intel_uc_ops_function(sanitize, sanitize, int, 0);
-intel_uc_ops_function(fetch_firmwares, init_fw, void, );
+intel_uc_ops_function(fetch_firmwares, init_fw, int, 0);
  intel_uc_ops_function(cleanup_firmwares, fini_fw, void, );
  intel_uc_ops_function(init, init, int, 0);
  intel_uc_ops_function(fini, fini, void, );
diff --git a/drivers/gpu/drm/i915/i915_gem.c 
b/drivers/gpu/drm/i915/i915_gem.c

index 527228d4da7e..7f8204af6826 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1049,7 +1049,12 @@ int i915_gem_init(struct drm_i915_private 
*dev_priv)

  if (ret)
  return ret;
-    intel_uc_fetch_firmwares(&dev_priv->gt.uc);
+    ret = intel_uc_fetch_firmwares(&dev_priv->gt.uc);
+    if (ret) {
+    i915_probe_error(dev_priv, "Failed to fetch firmware\n");
+    return ret;
+    }
+
  intel_wopcm_init(&dev_priv->wopcm);
  ret = i915_init_ggtt(dev_priv);




Re: [Intel-gfx] [PATCH] drm/i915/dg2: make GuC FW a requirement for Gen12 and beyond devices

2021-12-09 Thread Robert Beckett




On 09/12/2021 00:24, John Harrison wrote:

On 12/8/2021 09:58, Robert Beckett wrote:

On 07/12/2021 23:15, John Harrison wrote:

On 12/7/2021 09:53, Adrian Larumbe wrote:

Beginning with DG2, all successive devices will require GuC FW to be
present and loaded at probe() time. This change alters error 
handling in
the FW init and load functions so that the driver's probe() function 
will

fail if GuC could not be loaded.
We still need to load the i915 driver in fall back mode (display but 
no engines) if the GuC is missing. Otherwise you may have just 
bricked the user's device.


good point, well made.
though this still seems like an issue for gen12+ (excluding rkl and adl).

maybe a redesign of toplevel driver probe, with 
i915_driver_early_probe before i915_driver_create could work. If the 
GuC fw is not found, it could then register a new kms only version of 
i915_drm_driver.


or something like like that ...

Or we could just leave it all alone?

AFAIK, this is working just fine at the moment. If the platform default 
is to use GuC submission and you have the fw then the driver loads fine. 
If the platform default is to use GuC submission and you don't have the 
firmware then the driver wedges but keeps loading. That means it returns 
no engines to userland but the display is unaffected. Hence the user 
gets a slow but safe fallback path in which they can still load their 
Ubuntu desktop and try to work out what package they need to install.


What is the problem that this patch is trying to fix?


In dg2 enablement branch, when fw was unavailable, submissions could 
still be attempted and it would segfault the kernel due to some function 
pointers not being set up.


From what you said, it sounds like this may just be a bug in the dg2 
enablement, which we can diagnose and fix if so.


Though I still think it would be a better design to only register kms 
capabilities if that is all that will be supported without the fw. It 
seems a bit messy to advertise render and create the render node for 
userland sw to attempt to use and have it fail, but if that is the 
prefered design, then we can make dg2 match that.





John.






Also, we do want to be able to disable the GuC via the enable_guc 
module parameter.


John.



Signed-off-by: Adrian Larumbe 
---
  drivers/gpu/drm/i915/gt/uc/intel_uc.c | 20 
  drivers/gpu/drm/i915/gt/uc/intel_uc.h |  4 ++--
  drivers/gpu/drm/i915/i915_gem.c   |  7 ++-
  3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c

index 7660eba893fa..8b0778b6d9ab 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -277,14 +277,19 @@ static void guc_disable_communication(struct 
intel_guc *guc)

  drm_dbg(&i915->drm, "GuC communication disabled\n");
  }
-static void __uc_fetch_firmwares(struct intel_uc *uc)
+static int __uc_fetch_firmwares(struct intel_uc *uc)
  {
+    struct drm_i915_private *i915 = uc_to_gt(uc)->i915;
  int err;
  GEM_BUG_ON(!intel_uc_wants_guc(uc));
  err = intel_uc_fw_fetch(&uc->guc.fw);
  if (err) {
+    /* GuC is mandatory on Gen12 and beyond */
+    if (GRAPHICS_VER(i915) >= 12)
+    return err;
+
  /* Make sure we transition out of transient "SELECTED" 
state */

  if (intel_uc_wants_huc(uc)) {
  drm_dbg(&uc_to_gt(uc)->i915->drm,
@@ -293,11 +298,13 @@ static void __uc_fetch_firmwares(struct 
intel_uc *uc)

    INTEL_UC_FIRMWARE_ERROR);
  }
-    return;
+    return 0;
  }
  if (intel_uc_wants_huc(uc))
  intel_uc_fw_fetch(&uc->huc.fw);
+
+    return 0;
  }
  static void __uc_cleanup_firmwares(struct intel_uc *uc)
@@ -308,14 +315,19 @@ static void __uc_cleanup_firmwares(struct 
intel_uc *uc)

  static int __uc_init(struct intel_uc *uc)
  {
+    struct drm_i915_private *i915 = uc_to_gt(uc)->i915;
  struct intel_guc *guc = &uc->guc;
  struct intel_huc *huc = &uc->huc;
  int ret;
  GEM_BUG_ON(!intel_uc_wants_guc(uc));
-    if (!intel_uc_uses_guc(uc))
-    return 0;
+    if (!intel_uc_uses_guc(uc)) {
+    if (GRAPHICS_VER(i915) >= 12)
+    return -EINVAL;
+    else
+    return 0;
+    }
  if (i915_inject_probe_failure(uc_to_gt(uc)->i915))
  return -ENOMEM;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.h

index 866b462821c0..3bcd781447bc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -17,7 +17,7 @@ struct intel_uc;
  struct intel_uc_ops {
  int (*sanitize)(struct intel_uc *uc);
-    void (*init_fw)(struct intel_uc *uc);
+    int (*init_fw)(struct intel_uc *uc);
  void (*fini_fw)(struct intel_uc *uc);
  int (*init)(struct intel_uc *

[Intel-gfx] [PATCH] drm/i915/ttm: fix large buffer population trucation

2021-12-10 Thread Robert Beckett
ttm->num_pages is uint32_t which was causing very large buffers to
only populate a truncated size.

This fixes gem_create@create-clear igt test on large memory systems.

Fixes: 7ae034590cea ("drm/i915/ttm: add tt shmem backend")
Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 218a9b3037c7..923cc7ad8d70 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -166,7 +166,7 @@ static int i915_ttm_tt_shmem_populate(struct ttm_device 
*bdev,
struct intel_memory_region *mr = i915->mm.regions[INTEL_MEMORY_SYSTEM];
struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
const unsigned int max_segment = i915_sg_segment_size();
-   const size_t size = ttm->num_pages << PAGE_SHIFT;
+   const size_t size = (size_t)ttm->num_pages << PAGE_SHIFT;
struct file *filp = i915_tt->filp;
struct sgt_iter sgt_iter;
struct sg_table *st;
-- 
2.25.1



Re: [Intel-gfx] [PATCH v4 06/16] drm/i915/gt: Clear compress metadata for Xe_HP platforms

2021-12-15 Thread Robert Beckett

The fixes below fix gem_lmem_swapping@basic igt test

On 09/12/2021 15:45, Ramalingam C wrote:

From: Ayaz A Siddiqui 

Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.

Flat CCS memory can not be directly accessed by S/W.
Address of CCS buffer associated main BO is automatically calculated
by device itself. KMD/UMD can only access this buffer indirectly using
XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.

v2: Fixed issues with platform naming [Lucas]

Cc: CQ Tang 
Signed-off-by: Ayaz A Siddiqui 
Signed-off-by: Ramalingam C 
---
  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  14 +++
  drivers/gpu/drm/i915/gt/intel_migrate.c  | 120 ++-
  2 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h 
b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index f8253012d166..07bf5a1753bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -203,6 +203,20 @@
  #define GFX_OP_DRAWRECT_INFO ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
  
+#define XY_CTRL_SURF_INSTR_SIZE	5

+#define MI_FLUSH_DW_SIZE   3
+#define XY_CTRL_SURF_COPY_BLT  ((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT21
+#define   DST_ACCESS_TYPE_SHIFT20
+#define   CCS_SIZE_SHIFT   8
+#define   XY_CTRL_SURF_MOCS_SHIFT  25
+#define   NUM_CCS_BYTES_PER_BLOCK  256
+#define   NUM_CCS_BLKS_PER_XFER1024
+#define   INDIRECT_ACCESS  0
+#define   DIRECT_ACCESS1
+#define  MI_FLUSH_LLC  BIT(9)
+#define  MI_FLUSH_CCS  BIT(16)
+
  #define COLOR_BLT_CMD (2 << 29 | 0x40 << 22 | (5 - 2))
  #define XY_COLOR_BLT_CMD  (2 << 29 | 0x50 << 22)
  #define SRC_COPY_BLT_CMD  (2 << 29 | 0x43 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c 
b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 19a01878fee3..64ffaacac1e0 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -16,6 +16,7 @@ struct insert_pte_data {
  };
  
  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */

+#define GET_CCS_SIZE(i915, size)   (HAS_FLAT_CCS(i915) ? (size) >> 8 : 0)


do the rounding here. Don't manually round, use kernel macros:

-#define GET_CCS_SIZE(i915, size)   (HAS_FLAT_CCS(i915) ? (size) >> 
8 : 0)


+#define GET_CCS_SIZE(i915, size)   (HAS_FLAT_CCS(i915) ? 
DIV_ROUND_UP(size, NUM_CCS_BYTES_PER_BLOCK ) : 0)



  
  static bool engine_supports_migration(struct intel_engine_cs *engine)

  {
@@ -488,15 +489,104 @@ intel_context_migrate_copy(struct intel_context *ce,
return err;
  }
  
-static int emit_clear(struct i915_request *rq, int size, u32 value)

+static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
+{
+   /* Mask the 3 LSB to use the PPGTT address space */
+   *cmd++ = MI_FLUSH_DW | flags;
+   *cmd++ = lower_32_bits(dst);
+   *cmd++ = upper_32_bits(dst);
+
+   return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
+{
+   u32 num_cmds, num_blks, total_size;
+
+   if (!GET_CCS_SIZE(i915, size))
+   return 0;
+
+   /*
+* XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+* blocks. one XY_CTRL_SURF_COPY_BLT command can
+* trnasfer upto 1024 blocks.
+*/
+   num_blks = (GET_CCS_SIZE(i915, size) +
+  (NUM_CCS_BYTES_PER_BLOCK - 1)) >> 8;


-   num_blks = (GET_CCS_SIZE(i915, size) +

-  (NUM_CCS_BYTES_PER_BLOCK - 1)) >> 8;

+   num_blks = GET_CCS_SIZE(i915, size);




+   num_cmds = (num_blks + (NUM_CCS_BLKS_PER_XFER - 1)) >> 10;
+   total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
+
+   /*
+* We need to add a flush before and after
+* XY_CTRL_SURF_COPY_BLT
+*/
+   total_size += 2 * MI_FLUSH_DW_SIZE;
+   return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+u8 src_mem_access, u8 dst_mem_access,
+int src_mocs, int dst_mocs,
+u16 num_ccs_blocks)
+{
+   int i = num_ccs_blocks;
+
+   /*
+* The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+* data in and out of the CCS region.
+*
+* We can copy at most 1024 blocks of 256 bytes using one
+* XY_CTRL_SURF_COPY_BLT instruction.
+*
+* In case we need to copy more than 1024 blocks, we need to add
+* another instruction to the same batch buffer.
+*
+* 1024 bl

[Intel-gfx] [PATCH 0/4] dicsrete card 64K page support

2022-01-11 Thread Robert Beckett
This series continues support for 64K pages for discrete cards.
It supersedes the 64K patches from 
https://patchwork.freedesktop.org/series/95686/#rev4
Changes since that series:

- set min alignment for DG2 to 2MB in i915_address_space_init
- replace coloring with simpler 2MB VA alignment for lmem buffers
- enforce alignment to 2MB for lmem objects on DG2 in i915_vma_insert
- expand vma reservation to round up to 2MB on DG2 in i915_vma_insert
- add alignment test


Matthew Auld (3):
  drm/i915: enforce min GTT alignment for discrete cards
  drm/i915: support 64K GTT pages for discrete cards
  drm/i915/uapi: document behaviour for DG2 64K support

Robert Beckett (1):
  drm/i915: add gtt misalignment test

 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 +
 .../i915/gem/selftests/i915_gem_client_blt.c  |  23 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 109 -
 drivers/gpu/drm/i915/gt/intel_gtt.c   |  14 ++
 drivers/gpu/drm/i915/gt/intel_gtt.h   |  12 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |   1 +
 drivers/gpu/drm/i915/i915_vma.c   |  14 ++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 226 +++---
 include/uapi/drm/i915_drm.h   |  44 +++-
 9 files changed, 454 insertions(+), 49 deletions(-)

-- 
2.25.1



[Intel-gfx] [PATCH 1/4] drm/i915: enforce min GTT alignment for discrete cards

2022-01-11 Thread Robert Beckett
From: Matthew Auld 

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For DG2 we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

Signed-off-by: Matthew Auld 
Signed-off-by: Ramalingam C 
Signed-off-by: Robert Beckett 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
 drivers/gpu/drm/i915/gt/intel_gtt.c   | 14 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h   |  9 ++
 drivers/gpu/drm/i915/i915_vma.c   | 14 +++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ---
 5 files changed, 115 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index c08f766e6e15..7fee95a65414 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -39,6 +39,7 @@ struct tiled_blits {
struct blit_buffer scratch;
struct i915_vma *batch;
u64 hole;
+   u64 align;
u32 width;
u32 height;
 };
@@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct 
rnd_state *prng)
goto err_free;
}
 
-   hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+   t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
+   t->align = max(t->align,
+  i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
+   t->align = max(t->align,
+  i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+   hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
hole_size *= 2; /* room to maneuver */
-   hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+   hole_size += 2 * t->align; /* padding on either side */
 
mutex_lock(&t->ce->vm->mutex);
memset(&hole, 0, sizeof(hole));
err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
- hole_size, 0, I915_COLOR_UNEVICTABLE,
+ hole_size, t->align,
+ I915_COLOR_UNEVICTABLE,
  0, U64_MAX,
  DRM_MM_INSERT_BEST);
if (!err)
@@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct 
rnd_state *prng)
goto err_put;
}
 
-   t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+   t->hole = hole.start + t->align;
pr_info("Using hole at %llx\n", t->hole);
 
err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
   struct rnd_state *prng)
 {
-   u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+   u64 offset = round_up(t->width * t->height * 4, t->align);
u32 *map;
int err;
int i;
@@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-   u64 offset =
-   round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+   u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
int err;
 
/* We want to check position invariant tiling across GTT eviction */
@@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct 
rnd_state *prng)
 
/* Reposition so that we overlap the old addresses, and slightly off */
err = tiled_blit(t,
-&t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+&t->buffers[2], t->hole + t->align,
 &t->buffers[1], t->hole + 3 * offset / 2);
if (err)
return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
b/drivers/gpu/drm/i915/gt/intel_gtt.c
index a94be0306464..156852dcf33a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -219,6 +219,20 @@ void i915_address_space_init(struct i915_address_space 
*vm, int subclass)
 
GEM_BUG_ON(!vm->total);
drm_mm_init(&vm->mm, 0, vm->total);
+
+   memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+ARRAY_SIZE(vm->min_alignment));
+
+   if (HAS_64K_PAGES(vm->i915)) {
+   if (IS_DG2(vm->i915

[Intel-gfx] [PATCH 2/4] drm/i915: support 64K GTT pages for discrete cards

2022-01-11 Thread Robert Beckett
From: Matthew Auld 

discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

Signed-off-by: Matthew Auld 
Signed-off-by: Stuart Summers 
Signed-off-by: Ramalingam C 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 109 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h   |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |   1 +
 4 files changed, 170 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 11f0aa65f8a3..ef3439b290ca 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1483,6 +1483,65 @@ static int igt_ppgtt_sanity_check(void *arg)
return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+   struct drm_i915_private *i915 = arg;
+   struct drm_i915_gem_object *obj;
+   int err;
+
+   /*
+* Simple test to catch issues with compact 64K pages -- since the pt is
+* compacted to 256B that gives us 32 entries per pt, however since the
+* backing page for the pt is 4K, any extra entries we might incorrectly
+* write out should be ignored by the HW. If ever hit such a case this
+* test should catch it since some of our writes would land in scratch.
+*/
+
+   if (!HAS_64K_PAGES(i915)) {
+   pr_info("device lacks compact 64K page support, skipping\n");
+   return 0;
+   }
+
+   if (!HAS_LMEM(i915)) {
+   pr_info("device lacks LMEM support, skipping\n");
+   return 0;
+   }
+
+   /* We want the range to cover multiple page-table boundaries. */
+   obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+   if (IS_ERR(obj))
+   return err;
+
+   err = i915_gem_object_pin_pages_unlocked(obj);
+   if (err)
+   goto out_put;
+
+   if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+   pr_info("LMEM compact unable to allocate huge-page(s)\n");
+   goto out_unpin;
+   }
+
+   /*
+* Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+* insertion.
+*/
+   obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+   err = igt_write_huge(i915, obj);
+   if (err)
+   pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+   i915_gem_object_unpin_pages(obj);
+out_put:
+   i915_gem_object_put(obj);
+
+   if (err == -ENOMEM)
+   err = 0;
+
+   return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
struct drm_i915_private *i915 = arg;
@@ -1740,6 +1799,7 @@ int i915_gem_huge_page_live_selftests(struct 
drm_i915_private *i915)
SUBTEST(igt_tmpfs_fallback),
SUBTEST(igt_ppgtt_smoke_huge),
SUBTEST(igt_ppgtt_sanity_check),
+   SUBTEST(igt_ppgtt_compact),
};
 
if (!HAS_PPGTT(i915)) {
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index b012c50f7ce7..8d081497e87e 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * 
const vm,
   start, end, lvl);
} else {
unsigned int count;
+   unsigned int pte = gen8_pd_index(start, 0);
+   unsigned int num_ptes;
u64 *vaddr;
 
count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * 
const vm,
atomic_read(&pt->used));
GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+   num_ptes = count;
+   if (pt->is_compact) {
+   GEM_BUG_ON(num_ptes % 16);
+   GEM_BUG_ON(pte % 16);
+   num_ptes /= 16;
+   pte /= 16;
+   }
+
vaddr = px_vaddr(pt);
-   memset64(vaddr + gen8_pd_index(start, 0),
+   memset64(vaddr + pte,
 vm->scratch[0]->encode,
-count);
+num_ptes);
 
atomic_sub(count, &pt->used);
start += count;
@@ -453,6 +463,96 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,

[Intel-gfx] [PATCH 3/4] drm/i915: add gtt misalignment test

2022-01-11 Thread Robert Beckett
add test to check handling of misaligned offsets and sizes

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 130 ++
 1 file changed, 130 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index fea031b4ec4f..28de0b333835 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -22,10 +22,12 @@
  *
  */
 
+#include "gt/intel_gtt.h"
 #include 
 #include 
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_region.h"
 #include "gem/selftests/mock_context.h"
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
@@ -1066,6 +1068,120 @@ static int shrink_boom(struct i915_address_space *vm,
return err;
 }
 
+static int misaligned_case(struct i915_address_space *vm, struct 
intel_memory_region *mr,
+  u64 addr, u64 size, unsigned long flags)
+{
+   struct drm_i915_gem_object *obj;
+   struct i915_vma *vma;
+   int err = 0;
+   u64 expected_vma_size, expected_node_size;
+
+   obj = i915_gem_object_create_region(mr, size, 0, 0);
+   if (IS_ERR(obj))
+   return PTR_ERR(obj);
+
+   vma = i915_vma_instance(obj, vm, NULL);
+   if (IS_ERR(vma)) {
+   err = PTR_ERR(vma);
+   goto err_put;
+   }
+
+   err = i915_vma_pin(vma, 0, 0, addr | flags);
+   if (err)
+   goto err_put;
+   i915_vma_unpin(vma);
+
+   if (!drm_mm_node_allocated(&vma->node)) {
+   err = -EINVAL;
+   goto err_put;
+   }
+
+   if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
+   err = -EINVAL;
+   goto err_put;
+   }
+
+   expected_vma_size = round_up(size, 1 << (ffs(vma->page_sizes.gtt) - 1));
+   expected_node_size = expected_vma_size;
+
+   if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) {
+   /* dg2 should expand lmem node to 2MB */
+   expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K);
+   expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+   }
+
+   if (vma->size != expected_vma_size || vma->node.size != 
expected_node_size) {
+   err = i915_vma_unbind(vma);
+   err = -EBADSLT;
+   goto err_put;
+   }
+
+   err = i915_vma_unbind(vma);
+   if (err)
+   goto err_put;
+
+   GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+
+err_put:
+   i915_gem_object_put(obj);
+   cleanup_freed_objects(vm->i915);
+   return err;
+}
+
+static int misaligned_pin(struct i915_address_space *vm,
+ u64 hole_start, u64 hole_end,
+ unsigned long end_time)
+{
+   struct intel_memory_region *mr;
+   enum intel_region_id id;
+   unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+   int err = 0;
+   u64 hole_size = hole_end - hole_start;
+
+   if (i915_is_ggtt(vm))
+   flags |= PIN_GLOBAL;
+
+   for_each_memory_region(mr, vm->i915, id) {
+   u64 min_alignment = i915_vm_min_alignment(vm, id);
+   u64 size = min_alignment;
+   u64 addr = round_up(hole_start + (hole_size / 2), 
min_alignment);
+
+   /* we can't test < 4k alignment due to flags being encoded in 
lower bits */
+   if (min_alignment != I915_GTT_PAGE_SIZE_4K) {
+   err = misaligned_case(vm, mr, addr + (min_alignment / 
2), size, flags);
+   /* misaligned should error with -EINVAL*/
+   if (!err)
+   err = -EBADSLT;
+   if (err != -EINVAL)
+   return err;
+   }
+
+   /* test for vma->size expansion to min page size */
+   err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags);
+   if (min_alignment > hole_size) {
+   if (!err)
+   err = -EBADSLT;
+   else if (err == -ENOSPC)
+   err = 0;
+   }
+   if (err)
+   return err;
+
+   /* test for intermediate size not expanding vma->size for large 
alignments */
+   err = misaligned_case(vm, mr, addr, size / 2, flags);
+   if (min_alignment > hole_size) {
+   if (!err)
+   err = -EBADSLT;
+   else if (err == -ENOSPC)
+   err = 0;
+   }
+   if (err)
+   return err;
+   }
+
+   return 0;
+}
+
 static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 

[Intel-gfx] [PATCH 4/4] drm/i915/uapi: document behaviour for DG2 64K support

2022-01-11 Thread Robert Beckett
From: Matthew Auld 

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld 
Signed-off-by: Ramalingam C 
Signed-off-by: Robert Beckett 
cc: Simon Ser 
cc: Pekka Paalanen 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: mesa-...@lists.freedesktop.org
Cc: Tony Ye 
Cc: Slawomir Milczarek 
---
 include/uapi/drm/i915_drm.h | 44 -
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 5e678917da70..486b7b96291e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
/**
 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 * the user with the GTT offset at which this object will be pinned.
+*
 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 * presumed_offset of the object.
+*
 * During execbuffer2 the kernel populates it with the value of the
 * current GTT offset of the object, for future presumed_offset writes.
+*
+* See struct drm_i915_gem_create_ext for the rules when dealing with
+* alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+* minimum page sizes, like DG2.
 */
__u64 offset;
 
@@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
 *
 * The (page-aligned) allocated size for the object will be returned.
 *
-* Note that for some devices we have might have further minimum
-* page-size restrictions(larger than 4K), like for device local-memory.
-* However in general the final size here should always reflect any
-* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
-* extension to place the object in device local-memory.
+*
+* **DG2 64K min page size implications:**
+*
+* On discrete platforms, starting from DG2, we have to contend with GTT
+* page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+* objects.  Specifically the hardware only supports 64K or larger GTT
+* page sizes for such memory. The kernel will already ensure that all
+* I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+* sizes underneath.
+*
+* Note that the returned size here will always reflect any required
+* rounding up done by the kernel, i.e 4K will now become 64K on devices
+* such as DG2.
+*
+* **Special DG2 GTT address alignment requirement:**
+*
+* The GTT alignment will also need be at least 2M for  such objects.
+*
+* Note that due to how the hardware implements 64K GTT page support, we
+* have some further complications:
+*
+*   1) The entire PDE(which covers a 2MB virtual address range), must
+*   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+*   PDE is forbidden by the hardware.
+*
+*   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+*   objects.
+*
+* To keep things simple for userland, we mandate that any GTT mappings
+* must be aligned to and rounded up to 2MB. As this only wastes virtual
+* address space and avoids userland having to copy any needlessly
+* complicated PDE sharing scheme (coloring) and only affects GD2, this
+* id deemed to be a good compromise.
 */
__u64 size;
/**
-- 
2.25.1



Re: [Intel-gfx] [PATCH v7 4/6] drm/i915: Use vma resources for async unbinding

2022-01-17 Thread Robert Beckett




On 10/01/2022 17:22, Thomas Hellström wrote:

Implement async (non-blocking) unbinding by not syncing the vma before
calling unbind on the vma_resource.
Add the resulting unbind fence to the object's dma_resv from where it is
picked up by the ttm migration code.
Ideally these unbind fences should be coalesced with the migration blit
fence to avoid stalling the migration blit waiting for unbind, as they
can certainly go on in parallel, but since we don't yet have a
reasonable data structure to use to coalesce fences and attach the
resulting fence to a timeline, we defer that for now.

Note that with async unbinding, even while the unbind waits for the
preceding bind to complete before unbinding, the vma itself might have been
destroyed in the process, clearing the vma pages. Therefore we can
only allow async unbinding if we have a refcounted sg-list and keep a
refcount on that for the vma resource pages to stay intact until
binding occurs. If this condition is not met, a request for an async
unbind is diverted to a sync unbind.

v2:
- Use a separate kmem_cache for vma resources for now to isolate their
   memory allocation and aid debugging.
- Move the check for vm closed to the actual unbinding thread. Regardless
   of whether the vm is closed, we need the unbind fence to properly wait
   for capture.
- Clear vma_res::vm on unbind and update its documentation.
v4:
- Take cache coloring into account when searching for vma resources
   pending unbind. (Matthew Auld)
v5:
- Fix timeout and error check in i915_vma_resource_bind_dep_await().
- Avoid taking a reference on the object for async binding if
   async unbind capable.
- Fix braces around a single-line if statement.
v6:
- Fix up the cache coloring adjustment. (Kernel test robot )
- Don't allow async unbinding if the vma_res pages are not the same as
   the object pages. (Matthew Auld)
v7:
- s/unsigned long/u64/ in a number of places (Matthew Auld)

Signed-off-by: Thomas Hellström 
Reviewed-by: Matthew Auld 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  11 +-
  drivers/gpu/drm/i915/gt/intel_ggtt.c |   2 +-
  drivers/gpu/drm/i915/gt/intel_gtt.c  |   4 +
  drivers/gpu/drm/i915/gt/intel_gtt.h  |   3 +
  drivers/gpu/drm/i915/i915_drv.h  |   1 +
  drivers/gpu/drm/i915/i915_gem.c  |  12 +-
  drivers/gpu/drm/i915/i915_module.c   |   3 +
  drivers/gpu/drm/i915/i915_vma.c  | 205 +--
  drivers/gpu/drm/i915/i915_vma.h  |   3 +-
  drivers/gpu/drm/i915/i915_vma_resource.c | 354 +--
  drivers/gpu/drm/i915/i915_vma_resource.h |  48 +++
  11 files changed, 579 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 8653855d808b..1de306c03aaf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -142,7 +142,16 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo)
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
int ret;
  
-	ret = i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE);

+   /*
+* Note: The async unbinding here will actually transform the
+* blocking wait for unbind into a wait before finally submitting
+* evict / migration blit and thus stall the migration timeline
+* which may not be good for overall throughput. We should make
+* sure we await the unbind fences *after* the migration blit
+* instead of *before* as we currently do.
+*/
+   ret = i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE |
+I915_GEM_OBJECT_UNBIND_ASYNC);
if (ret)
return ret;
  
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c

index e49b6250c4b7..a1b2761bc16e 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -142,7 +142,7 @@ void i915_ggtt_suspend_vm(struct i915_address_space *vm)
continue;
  
  		if (!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) {

-   __i915_vma_evict(vma);
+   __i915_vma_evict(vma, false);
drm_mm_remove_node(&vma->node);
}
}
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
b/drivers/gpu/drm/i915/gt/intel_gtt.c
index a94be0306464..46be4197b93f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -161,6 +161,9 @@ static void __i915_vm_release(struct work_struct *work)
struct i915_address_space *vm =
container_of(work, struct i915_address_space, release_work);
  
+	/* Synchronize async unbinds. */

+   i915_vma_resource_bind_dep_sync_all(vm);
+
vm->cleanup(vm);
i915_address_space_fini(vm);
  
@@ -189,6 +192,7 @@ void i915_address_space_init(struct i915_address_sp

[Intel-gfx] [PATCH v2 0/4] discsrete card 64K page support

2022-01-18 Thread Robert Beckett
This series continues support for 64K pages for discrete cards.
It supersedes the 64K patches from 
https://patchwork.freedesktop.org/series/95686/#rev4
Changes since that series:

- set min alignment for DG2 to 2MB in i915_address_space_init
- replace coloring with simpler 2MB VA alignment for lmem buffers
- enforce alignment to 2MB for lmem objects on DG2 in i915_vma_insert
- expand vma reservation to round up to 2MB on DG2 in i915_vma_insert
- add alignment test

v2: rebase and fix for async vma that landed

Matthew Auld (3):
  drm/i915: enforce min GTT alignment for discrete cards
  drm/i915: support 64K GTT pages for discrete cards
  drm/i915/uapi: document behaviour for DG2 64K support

Robert Beckett (1):
  drm/i915: add gtt misalignment test

 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 +
 .../i915/gem/selftests/i915_gem_client_blt.c  |  23 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 108 -
 drivers/gpu/drm/i915/gt/intel_gtt.c   |  14 ++
 drivers/gpu/drm/i915/gt/intel_gtt.h   |  12 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |   1 +
 drivers/gpu/drm/i915/i915_vma.c   |  14 ++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 226 +++---
 include/uapi/drm/i915_drm.h   |  44 +++-
 9 files changed, 453 insertions(+), 49 deletions(-)

-- 
2.25.1



[Intel-gfx] [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards

2022-01-18 Thread Robert Beckett
From: Matthew Auld 

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For DG2 we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

Signed-off-by: Matthew Auld 
Signed-off-by: Ramalingam C 
Signed-off-by: Robert Beckett 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
 drivers/gpu/drm/i915/gt/intel_gtt.c   | 14 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h   |  9 ++
 drivers/gpu/drm/i915/i915_vma.c   | 14 +++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ---
 5 files changed, 115 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index c08f766e6e15..7fee95a65414 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -39,6 +39,7 @@ struct tiled_blits {
struct blit_buffer scratch;
struct i915_vma *batch;
u64 hole;
+   u64 align;
u32 width;
u32 height;
 };
@@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct 
rnd_state *prng)
goto err_free;
}
 
-   hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+   t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
+   t->align = max(t->align,
+  i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
+   t->align = max(t->align,
+  i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+   hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
hole_size *= 2; /* room to maneuver */
-   hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+   hole_size += 2 * t->align; /* padding on either side */
 
mutex_lock(&t->ce->vm->mutex);
memset(&hole, 0, sizeof(hole));
err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
- hole_size, 0, I915_COLOR_UNEVICTABLE,
+ hole_size, t->align,
+ I915_COLOR_UNEVICTABLE,
  0, U64_MAX,
  DRM_MM_INSERT_BEST);
if (!err)
@@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct 
rnd_state *prng)
goto err_put;
}
 
-   t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+   t->hole = hole.start + t->align;
pr_info("Using hole at %llx\n", t->hole);
 
err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
   struct rnd_state *prng)
 {
-   u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+   u64 offset = round_up(t->width * t->height * 4, t->align);
u32 *map;
int err;
int i;
@@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-   u64 offset =
-   round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+   u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
int err;
 
/* We want to check position invariant tiling across GTT eviction */
@@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct 
rnd_state *prng)
 
/* Reposition so that we overlap the old addresses, and slightly off */
err = tiled_blit(t,
-&t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+&t->buffers[2], t->hole + t->align,
 &t->buffers[1], t->hole + 3 * offset / 2);
if (err)
return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 46be4197b93f..7c92b25c0f26 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -223,6 +223,20 @@ void i915_address_space_init(struct i915_address_space 
*vm, int subclass)
 
GEM_BUG_ON(!vm->total);
drm_mm_init(&vm->mm, 0, vm->total);
+
+   memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+ARRAY_SIZE(vm->min_alignment));
+
+   if (HAS_64K_PAGES(vm->i915)) {
+   if (IS_DG2(vm->i915

[Intel-gfx] [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support

2022-01-18 Thread Robert Beckett
From: Matthew Auld 

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld 
Signed-off-by: Ramalingam C 
Signed-off-by: Robert Beckett 
cc: Simon Ser 
cc: Pekka Paalanen 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: mesa-...@lists.freedesktop.org
Cc: Tony Ye 
Cc: Slawomir Milczarek 
---
 include/uapi/drm/i915_drm.h | 44 -
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 5e678917da70..486b7b96291e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
/**
 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 * the user with the GTT offset at which this object will be pinned.
+*
 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 * presumed_offset of the object.
+*
 * During execbuffer2 the kernel populates it with the value of the
 * current GTT offset of the object, for future presumed_offset writes.
+*
+* See struct drm_i915_gem_create_ext for the rules when dealing with
+* alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+* minimum page sizes, like DG2.
 */
__u64 offset;
 
@@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
 *
 * The (page-aligned) allocated size for the object will be returned.
 *
-* Note that for some devices we have might have further minimum
-* page-size restrictions(larger than 4K), like for device local-memory.
-* However in general the final size here should always reflect any
-* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
-* extension to place the object in device local-memory.
+*
+* **DG2 64K min page size implications:**
+*
+* On discrete platforms, starting from DG2, we have to contend with GTT
+* page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+* objects.  Specifically the hardware only supports 64K or larger GTT
+* page sizes for such memory. The kernel will already ensure that all
+* I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+* sizes underneath.
+*
+* Note that the returned size here will always reflect any required
+* rounding up done by the kernel, i.e 4K will now become 64K on devices
+* such as DG2.
+*
+* **Special DG2 GTT address alignment requirement:**
+*
+* The GTT alignment will also need be at least 2M for  such objects.
+*
+* Note that due to how the hardware implements 64K GTT page support, we
+* have some further complications:
+*
+*   1) The entire PDE(which covers a 2MB virtual address range), must
+*   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+*   PDE is forbidden by the hardware.
+*
+*   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+*   objects.
+*
+* To keep things simple for userland, we mandate that any GTT mappings
+* must be aligned to and rounded up to 2MB. As this only wastes virtual
+* address space and avoids userland having to copy any needlessly
+* complicated PDE sharing scheme (coloring) and only affects GD2, this
+* id deemed to be a good compromise.
 */
__u64 size;
/**
-- 
2.25.1



[Intel-gfx] [PATCH v2 2/4] drm/i915: support 64K GTT pages for discrete cards

2022-01-18 Thread Robert Beckett
From: Matthew Auld 

discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

Signed-off-by: Matthew Auld 
Signed-off-by: Stuart Summers 
Signed-off-by: Ramalingam C 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 108 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h   |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |   1 +
 4 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 26f997c376a2..7efa6a598b03 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg)
return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+   struct drm_i915_private *i915 = arg;
+   struct drm_i915_gem_object *obj;
+   int err;
+
+   /*
+* Simple test to catch issues with compact 64K pages -- since the pt is
+* compacted to 256B that gives us 32 entries per pt, however since the
+* backing page for the pt is 4K, any extra entries we might incorrectly
+* write out should be ignored by the HW. If ever hit such a case this
+* test should catch it since some of our writes would land in scratch.
+*/
+
+   if (!HAS_64K_PAGES(i915)) {
+   pr_info("device lacks compact 64K page support, skipping\n");
+   return 0;
+   }
+
+   if (!HAS_LMEM(i915)) {
+   pr_info("device lacks LMEM support, skipping\n");
+   return 0;
+   }
+
+   /* We want the range to cover multiple page-table boundaries. */
+   obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+   if (IS_ERR(obj))
+   return err;
+
+   err = i915_gem_object_pin_pages_unlocked(obj);
+   if (err)
+   goto out_put;
+
+   if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+   pr_info("LMEM compact unable to allocate huge-page(s)\n");
+   goto out_unpin;
+   }
+
+   /*
+* Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+* insertion.
+*/
+   obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+   err = igt_write_huge(i915, obj);
+   if (err)
+   pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+   i915_gem_object_unpin_pages(obj);
+out_put:
+   i915_gem_object_put(obj);
+
+   if (err == -ENOMEM)
+   err = 0;
+
+   return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
struct drm_i915_private *i915 = arg;
@@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct 
drm_i915_private *i915)
SUBTEST(igt_tmpfs_fallback),
SUBTEST(igt_ppgtt_smoke_huge),
SUBTEST(igt_ppgtt_sanity_check),
+   SUBTEST(igt_ppgtt_compact),
};
 
if (!HAS_PPGTT(i915)) {
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index c43e724afa9f..62471730266c 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * 
const vm,
   start, end, lvl);
} else {
unsigned int count;
+   unsigned int pte = gen8_pd_index(start, 0);
+   unsigned int num_ptes;
u64 *vaddr;
 
count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * 
const vm,
atomic_read(&pt->used));
GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+   num_ptes = count;
+   if (pt->is_compact) {
+   GEM_BUG_ON(num_ptes % 16);
+   GEM_BUG_ON(pte % 16);
+   num_ptes /= 16;
+   pte /= 16;
+   }
+
vaddr = px_vaddr(pt);
-   memset64(vaddr + gen8_pd_index(start, 0),
+   memset64(vaddr + pte,
 vm->scratch[0]->encode,
-count);
+num_ptes);
 
atomic_sub(count, &pt->used);
start += count;
@@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,

[Intel-gfx] [PATCH v2 3/4] drm/i915: add gtt misalignment test

2022-01-18 Thread Robert Beckett
add test to check handling of misaligned offsets and sizes

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 130 ++
 1 file changed, 130 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 2f3f0c01786b..76696a5e547e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -22,10 +22,12 @@
  *
  */
 
+#include "gt/intel_gtt.h"
 #include 
 #include 
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_region.h"
 #include "gem/selftests/mock_context.h"
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
@@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm,
return err;
 }
 
+static int misaligned_case(struct i915_address_space *vm, struct 
intel_memory_region *mr,
+  u64 addr, u64 size, unsigned long flags)
+{
+   struct drm_i915_gem_object *obj;
+   struct i915_vma *vma;
+   int err = 0;
+   u64 expected_vma_size, expected_node_size;
+
+   obj = i915_gem_object_create_region(mr, size, 0, 0);
+   if (IS_ERR(obj))
+   return PTR_ERR(obj);
+
+   vma = i915_vma_instance(obj, vm, NULL);
+   if (IS_ERR(vma)) {
+   err = PTR_ERR(vma);
+   goto err_put;
+   }
+
+   err = i915_vma_pin(vma, 0, 0, addr | flags);
+   if (err)
+   goto err_put;
+   i915_vma_unpin(vma);
+
+   if (!drm_mm_node_allocated(&vma->node)) {
+   err = -EINVAL;
+   goto err_put;
+   }
+
+   if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
+   err = -EINVAL;
+   goto err_put;
+   }
+
+   expected_vma_size = round_up(size, 1 << 
(ffs(vma->resource->page_sizes_gtt) - 1));
+   expected_node_size = expected_vma_size;
+
+   if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) {
+   /* dg2 should expand lmem node to 2MB */
+   expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K);
+   expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+   }
+
+   if (vma->size != expected_vma_size || vma->node.size != 
expected_node_size) {
+   err = i915_vma_unbind(vma);
+   err = -EBADSLT;
+   goto err_put;
+   }
+
+   err = i915_vma_unbind(vma);
+   if (err)
+   goto err_put;
+
+   GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+
+err_put:
+   i915_gem_object_put(obj);
+   cleanup_freed_objects(vm->i915);
+   return err;
+}
+
+static int misaligned_pin(struct i915_address_space *vm,
+ u64 hole_start, u64 hole_end,
+ unsigned long end_time)
+{
+   struct intel_memory_region *mr;
+   enum intel_region_id id;
+   unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+   int err = 0;
+   u64 hole_size = hole_end - hole_start;
+
+   if (i915_is_ggtt(vm))
+   flags |= PIN_GLOBAL;
+
+   for_each_memory_region(mr, vm->i915, id) {
+   u64 min_alignment = i915_vm_min_alignment(vm, id);
+   u64 size = min_alignment;
+   u64 addr = round_up(hole_start + (hole_size / 2), 
min_alignment);
+
+   /* we can't test < 4k alignment due to flags being encoded in 
lower bits */
+   if (min_alignment != I915_GTT_PAGE_SIZE_4K) {
+   err = misaligned_case(vm, mr, addr + (min_alignment / 
2), size, flags);
+   /* misaligned should error with -EINVAL*/
+   if (!err)
+   err = -EBADSLT;
+   if (err != -EINVAL)
+   return err;
+   }
+
+   /* test for vma->size expansion to min page size */
+   err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags);
+   if (min_alignment > hole_size) {
+   if (!err)
+   err = -EBADSLT;
+   else if (err == -ENOSPC)
+   err = 0;
+   }
+   if (err)
+   return err;
+
+   /* test for intermediate size not expanding vma->size for large 
alignments */
+   err = misaligned_case(vm, mr, addr, size / 2, flags);
+   if (min_alignment > hole_size) {
+   if (!err)
+   err = -EBADSLT;
+   else if (err == -ENOSPC)
+   err = 0;
+   }
+   if (err)
+   return err;
+   }
+
+   return 0;
+}
+
 static int exercise_ppgtt(struct drm_i915_private *dev_pri

Re: [Intel-gfx] [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support

2022-01-19 Thread Robert Beckett




On 19/01/2022 18:36, Jordan Justen wrote:

Robert Beckett  writes:


From: Matthew Auld 

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld 
Signed-off-by: Ramalingam C 
Signed-off-by: Robert Beckett 
cc: Simon Ser 
cc: Pekka Paalanen 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: mesa-...@lists.freedesktop.org
Cc: Tony Ye 
Cc: Slawomir Milczarek 
---
  include/uapi/drm/i915_drm.h | 44 -
  1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 5e678917da70..486b7b96291e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
/**
 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 * the user with the GTT offset at which this object will be pinned.
+*
 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 * presumed_offset of the object.
+*
 * During execbuffer2 the kernel populates it with the value of the
 * current GTT offset of the object, for future presumed_offset writes.
+*
+* See struct drm_i915_gem_create_ext for the rules when dealing with
+* alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+* minimum page sizes, like DG2.
 */
__u64 offset;
  
@@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {

 *
 * The (page-aligned) allocated size for the object will be returned.
 *
-* Note that for some devices we have might have further minimum
-* page-size restrictions(larger than 4K), like for device local-memory.
-* However in general the final size here should always reflect any
-* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
-* extension to place the object in device local-memory.
+*
+* **DG2 64K min page size implications:**


Long term, I'm not sure that the "**" (for emphasis) is needed here or
below. It's interesting at the moment, but will be just another thing
baked into the kernel/user code in a month from now. :)


fair point, I'll make it less shouty




+*
+* On discrete platforms, starting from DG2, we have to contend with GTT
+* page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+* objects.  Specifically the hardware only supports 64K or larger GTT
+* page sizes for such memory. The kernel will already ensure that all
+* I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+* sizes underneath.
+*
+* Note that the returned size here will always reflect any required
+* rounding up done by the kernel, i.e 4K will now become 64K on devices
+* such as DG2.
+*
+* **Special DG2 GTT address alignment requirement:**
+*
+* The GTT alignment will also need be at least 2M for  such objects.
+*
+* Note that due to how the hardware implements 64K GTT page support, we
+* have some further complications:
+*
+*   1) The entire PDE(which covers a 2MB virtual address range), must
+*   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+*   PDE is forbidden by the hardware.
+*
+*   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+*   objects.
+*
+* To keep things simple for userland, we mandate that any GTT mappings
+* must be aligned to and rounded up to 2MB. As this only wastes virtual
+* address space and avoids userland having to copy any needlessly
+* complicated PDE sharing scheme (coloring) and only affects GD2, this
+* id deemed to be a good compromise.


typos: GD2, id


thanks



Isn't much of this more relavent to the vma offset at exec time? Is
there actually any new restriction on the size field during buffer
creation?


No new restriction on size, just placement, which mesa is already doing.
The request for ack was just to get an ack from mesa folks that they are 
happy with the mandatory 2MB alignment for DG2 vma.




I see Matthew references these notes from the offset comments, so if the
kernel devs prefer it here, then you can add my Acked-by on this patch.


thanks!



-Jordan


 */
__u64 size;
/**
--
2.25.1


[Intel-gfx] [RFC PATCH 1/7] drm/i915: instantiate ttm ranger manager for stolen memory

2022-03-15 Thread Robert Beckett
Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 29 +++--
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index 737ef3f4ab54..bb564b830c96 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -57,11 +57,17 @@ int intel_region_to_ttm_type(const struct 
intel_memory_region *mem)
 
GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
   mem->type != INTEL_MEMORY_MOCK &&
-  mem->type != INTEL_MEMORY_SYSTEM);
+  mem->type != INTEL_MEMORY_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
if (mem->type == INTEL_MEMORY_SYSTEM)
return TTM_PL_SYSTEM;
 
+   if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+   mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+   return TTM_PL_VRAM;
+
type = mem->instance + TTM_PL_PRIV;
GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -85,10 +91,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
int mem_type = intel_region_to_ttm_type(mem);
int ret;
 
-   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
- resource_size(&mem->region),
- mem->io_size,
- mem->min_page_size, PAGE_SIZE);
+   if (mem_type == TTM_PL_VRAM) {
+   ret = ttm_range_man_init(bdev, mem_type, false,
+resource_size(&mem->region) >> 
PAGE_SHIFT);
+   mem->is_range_manager = true;
+   } else {
+   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+ resource_size(&mem->region),
+ mem->io_size,
+ mem->min_page_size, PAGE_SIZE);
+   }
if (ret)
return ret;
 
@@ -108,6 +120,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
struct ttm_resource_manager *man = mem->region_private;
+   int mem_type = intel_region_to_ttm_type(mem);
int ret = -EBUSY;
int count;
 
@@ -138,8 +151,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
if (ret || !man)
return ret;
 
-   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
- intel_region_to_ttm_type(mem));
+   if (mem_type == TTM_PL_VRAM)
+   ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+   else
+   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
GEM_WARN_ON(ret);
mem->region_private = NULL;
 
-- 
2.25.1



[Intel-gfx] [RFC PATCH 0/7] drm/i915: ttm for stolen

2022-03-15 Thread Robert Beckett
Refactor stolen gem backend to use ttm.

While this series is finished off to handle CI issues, I would
appreciate a design review.
In particulare any opinions on the following would be appreciated:

1. display fbc code using gem objects instead of drm_mm_node. The intent
is rely on memory region as interface, instead of relying on knowledge
of internals. This way ttm can be used in place of original stolen
region without issue.

2. checking if a region has anything alloceted within a range. Instead
of relying on internal access to the stolen region's drm_mm, add an
interface to check if the range is busy that can work with any backend
if implemetned.

3. Instead of region busy checking which is currently only used in
testing, would we prefer a more general interface that could potentially
be used for other infrastructure? e.g. for_each with callback over
each resource/buffer within the range.

Robert Beckett (7):
  drm/i915: instantiate ttm ranger manager for stolen memory
  drm/i915: add ability to create memory region object in place
  drm/i915: use gem objects to track stolen nodes
  drm/i915: stolen memory use ttm backend
  drm/ttm: add range busy check for range manager
  drm/i915: add range busy check for ttm region
  drm/i915: cleanup old stolen state

 drivers/gpu/drm/i915/display/intel_fbc.c   |  76 +++--
 drivers/gpu/drm/i915/gem/i915_gem_region.c |  55 
 drivers/gpu/drm/i915/gem/i915_gem_region.h |   6 +
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 351 +++--
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c|  84 -
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h|   7 +
 drivers/gpu/drm/i915/gt/selftest_reset.c   |  16 +-
 drivers/gpu/drm/i915/i915_drv.h|   5 -
 drivers/gpu/drm/i915/intel_memory_region.h |   6 +
 drivers/gpu/drm/i915/intel_region_ttm.c|  48 ++-
 drivers/gpu/drm/i915/intel_region_ttm.h|   3 +
 drivers/gpu/drm/ttm/ttm_range_manager.c|  21 ++
 include/drm/ttm/ttm_range_manager.h|   3 +
 14 files changed, 306 insertions(+), 391 deletions(-)

-- 
2.25.1



[Intel-gfx] [RFC PATCH 2/7] drm/i915: add ability to create memory region object in place

2022-03-15 Thread Robert Beckett
Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_region.c | 55 ++
 drivers/gpu/drm/i915/gem/i915_gem_region.h |  6 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 84 ++
 drivers/gpu/drm/i915/intel_memory_region.h |  6 ++
 4 files changed, 136 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c 
b/drivers/gpu/drm/i915/gem/i915_gem_region.c
index c9b2e8b91053..e25ad0b9b636 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_region.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c
@@ -98,6 +98,61 @@ i915_gem_object_create_region(struct intel_memory_region 
*mem,
return ERR_PTR(err);
 }
 
+struct drm_i915_gem_object *
+i915_gem_object_create_region_in_place(struct intel_memory_region *mem,
+  resource_size_t size,
+  resource_size_t page_size,
+  unsigned int flags,
+  u64 start, u64 end)
+{
+   struct drm_i915_gem_object *obj;
+   resource_size_t default_page_size;
+   int err;
+
+   /*
+* NB: Our use of resource_size_t for the size stems from using struct
+* resource for the mem->region. We might need to revisit this in the
+* future.
+*/
+
+   GEM_BUG_ON(flags & ~I915_BO_ALLOC_FLAGS);
+
+   if (!mem)
+   return ERR_PTR(-ENODEV);
+   if (!mem->ops->init_object_in_place)
+   return ERR_PTR(-EINVAL);
+
+   default_page_size = mem->min_page_size;
+   if (page_size)
+   default_page_size = page_size;
+
+   GEM_BUG_ON(!is_power_of_2_u64(default_page_size));
+   GEM_BUG_ON(default_page_size < PAGE_SIZE);
+
+   size = round_up(size, default_page_size);
+
+   GEM_BUG_ON(!size);
+   GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_MIN_ALIGNMENT));
+
+   if (i915_gem_object_size_2big(size))
+   return ERR_PTR(-E2BIG);
+
+   obj = i915_gem_object_alloc();
+   if (!obj)
+   return ERR_PTR(-ENOMEM);
+
+   err = mem->ops->init_object_in_place(mem, obj, size, page_size, flags, 
start, end);
+   if (err)
+   goto err_object_free;
+
+   trace_i915_gem_object_create(obj);
+   return obj;
+
+err_object_free:
+   i915_gem_object_free(obj);
+   return ERR_PTR(err);
+}
+
 /**
  * i915_gem_process_region - Iterate over all objects of a region using ops
  * to process and optionally skip objects
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.h 
b/drivers/gpu/drm/i915/gem/i915_gem_region.h
index fcaa12d657d4..0cad90ac4a92 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_region.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_region.h
@@ -56,6 +56,12 @@ i915_gem_object_create_region(struct intel_memory_region 
*mem,
  resource_size_t size,
  resource_size_t page_size,
  unsigned int flags);
+struct drm_i915_gem_object *
+i915_gem_object_create_region_in_place(struct intel_memory_region *mem,
+  resource_size_t size,
+  resource_size_t page_size,
+  unsigned int flags,
+  u64 start, u64 end);
 
 int i915_gem_process_region(struct intel_memory_region *mr,
struct i915_gem_apply_to_region *apply);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 45cc5837ce00..35d1bde19267 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1131,20 +1131,12 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
}
 }
 
-/**
- * __i915_gem_ttm_object_init - Initialize a ttm-backed i915 gem object
- * @mem: The initial memory region for the object.
- * @obj: The gem object.
- * @size: Object size in bytes.
- * @flags: gem object flags.
- *
- * Return: 0 on success, negative error code on failure.
- */
-int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
-  struct drm_i915_gem_object *obj,
-  resource_size_t size,
-  resource_size_t page_size,
-  unsigned int flags)
+static int __i915_gem_ttm_object_init_with_placement(struct 
intel_memory_region *mem,
+struct drm_i915_gem_object 
*obj,
+resource_size_t size,
+resource_size_t page_size,
+unsigned int flags,
+struct ttm_placement 
*placement)
 {
static struct lock_class_key lock_class;
struct drm_i915_private *i91

[Intel-gfx] [RFC PATCH 6/7] drm/i915: add range busy check for ttm region

2022-03-15 Thread Robert Beckett
RFC: should this become a generic interface in intel_memory_region_ops?

RFC: would we prefer an different interface? e.g. for_each_obj_in_range

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 19 +++
 drivers/gpu/drm/i915/intel_region_ttm.h |  3 +++
 2 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index bb564b830c96..2ccefa76348f 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -256,3 +256,22 @@ void intel_region_ttm_resource_free(struct 
intel_memory_region *mem,
 
man->func->free(man, res);
 }
+
+/**
+ * intel_region_ttm_range_busy - check whether range has any allocations
+ * @mem: The region to check
+ * @start: the start of the range to check
+ * @end: the end of the range to check
+ *
+ * Return: true if something is alloceted within the region, false otherwise.
+ */
+bool intel_region_ttm_range_busy(struct intel_memory_region *mem,
+u64 start, u64 end)
+{
+   struct ttm_resource_manager *man = mem->region_private;
+
+   /* currently only supported for range allocator */
+   GEM_BUG_ON(!mem->is_range_manager);
+
+   return ttm_range_man_range_busy(man, PFN_DOWN(start), PFN_UP(end));
+}
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h 
b/drivers/gpu/drm/i915/intel_region_ttm.h
index fdee5e7bd46c..670ba9b618f7 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -29,6 +29,9 @@ intel_region_ttm_resource_to_rsgt(struct intel_memory_region 
*mem,
 void intel_region_ttm_resource_free(struct intel_memory_region *mem,
struct ttm_resource *res);
 
+bool intel_region_ttm_range_busy(struct intel_memory_region *mem,
+u64 start, u64 end);
+
 int intel_region_to_ttm_type(const struct intel_memory_region *mem);
 
 struct ttm_device_funcs *i915_ttm_driver(void);
-- 
2.25.1



[Intel-gfx] [RFC PATCH 3/7] drm/i915: use gem objects to track stolen nodes

2022-03-15 Thread Robert Beckett
Construct gem objects around stolen nodes.
This stops the abuse of interfaces and aids future patches that done use
drm nodes for stolen areas.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/display/intel_fbc.c   | 72 --
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 60 ++
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h |  7 +++
 3 files changed, 106 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c 
b/drivers/gpu/drm/i915/display/intel_fbc.c
index 142280b6ce6d..9df64ecab70e 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -92,8 +92,8 @@ struct intel_fbc {
unsigned int possible_framebuffer_bits;
unsigned int busy_bits;
 
-   struct drm_mm_node compressed_fb;
-   struct drm_mm_node compressed_llb;
+   struct drm_i915_gem_object *compressed_fb;
+   struct drm_i915_gem_object *compressed_llb;
 
enum intel_fbc_id id;
 
@@ -331,16 +331,18 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_object_stolen_offset(fbc->compressed_fb);
+   u64 llb_offset = i915_gem_object_stolen_offset(fbc->compressed_llb);
 
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_fb.start, U32_MAX));
+fb_offset, U32_MAX));
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_llb.start, U32_MAX));
+llb_offset, U32_MAX));
 
intel_de_write(i915, FBC_CFB_BASE,
-  i915->dsm.start + fbc->compressed_fb.start);
+  i915->dsm.start + fb_offset);
intel_de_write(i915, FBC_LL_BASE,
-  i915->dsm.start + fbc->compressed_llb.start);
+  i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -449,7 +451,8 @@ static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
 
-   intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+   intel_de_write(i915, DPFC_CB_BASE,
+  i915_gem_object_stolen_offset(fbc->compressed_fb));
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -500,7 +503,8 @@ static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
 
-   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), 
fbc->compressed_fb.start);
+   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id),
+   i915_gem_object_stolen_offset(fbc->compressed_fb));
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -740,21 +744,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
struct drm_i915_private *i915 = fbc->i915;
u64 end = intel_fbc_stolen_end(i915);
-   int ret, limit = min_limit;
+   int limit = min_limit;
+   struct drm_i915_gem_object *obj;
 
size /= limit;
 
/* Try to over-allocate to reduce reallocations and fragmentation. */
-   ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-  size <<= 1, 4096, 0, end);
-   if (ret == 0)
+   obj = i915_gem_object_create_stolen_in_range(i915, size <<= 1, 4096, 0, 
end);
+   if (!IS_ERR(obj)) {
+   fbc->compressed_fb = obj;
return limit;
+   }
 
for (; limit <= intel_fbc_max_limit(i915); limit <<= 1) {
-   ret = i915_gem_stolen_insert_node_in_range(i915, 
&fbc->compressed_fb,
-  size >>= 1, 4096, 0, 
end);
-   if (ret == 0)
+   obj = i915_gem_object_create_stolen_in_range(i915, size >>= 1, 
4096, 0, end);
+   if (!IS_ERR(obj)) {
+   fbc->compressed_fb = obj;
return limit;
+   }
}
 
return 0;
@@ -765,17 +772,19 @@ static int intel_fbc_alloc_cfb(struct intel_fbc *fbc,
 {
struct drm_i915_private *i915 = fbc->i915;
int ret;
+   struct drm_i915_gem_object *obj;
 
-   drm_WARN_ON(&i915->drm,
-   drm_mm_node_allocated(&fbc->compressed_fb));
-   drm_WARN_ON(&i915->drm,
-   drm_mm_node_allocated(&fbc->compressed_llb));
+   drm_WARN_ON(&i915->drm, fbc->compressed_fb);
+   drm_WARN_ON(&i915->drm, fbc->compressed_llb);
 
if (DISPLAY_VER(i915) < 5 && !IS_G4X(i915)) {
-   

[Intel-gfx] [RFC PATCH 5/7] drm/ttm: add range busy check for range manager

2022-03-15 Thread Robert Beckett
RFC: do we want this to become a generic interface in
ttm_resource_manager_func?

RFC: would we prefer a different interface? e.g.
for_each_resource_in_range or for_each_bo_in_range

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/ttm/ttm_range_manager.c | 21 +
 include/drm/ttm/ttm_range_manager.h |  3 +++
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c
index 8cd4f3fb9f79..5662627bb933 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -206,3 +206,24 @@ int ttm_range_man_fini_nocheck(struct ttm_device *bdev,
return 0;
 }
 EXPORT_SYMBOL(ttm_range_man_fini_nocheck);
+
+/**
+ * ttm_range_man_range_busy - Check whether anything is allocated with a range
+ *
+ * @man: memory manager to check
+ * @fpfn: first page number to check
+ * @lpfn: last page number to check
+ *
+ * Return: true if anything allocated within the range, false otherwise.
+ */
+bool ttm_range_man_range_busy(struct ttm_resource_manager *man,
+ unsigned fpfn, unsigned lpfn)
+{
+   struct ttm_range_manager *rman = to_range_manager(man);
+   struct drm_mm *mm = &rman->mm;
+
+   if (__drm_mm_interval_first(mm, PFN_PHYS(fpfn), PFN_PHYS(lpfn + 1) - 1))
+   return true;
+   return false;
+}
+EXPORT_SYMBOL(ttm_range_man_range_busy);
diff --git a/include/drm/ttm/ttm_range_manager.h 
b/include/drm/ttm/ttm_range_manager.h
index 7963b957e9ef..86794a3f9101 100644
--- a/include/drm/ttm/ttm_range_manager.h
+++ b/include/drm/ttm/ttm_range_manager.h
@@ -53,4 +53,7 @@ static __always_inline int ttm_range_man_fini(struct 
ttm_device *bdev,
BUILD_BUG_ON(__builtin_constant_p(type) && type >= TTM_NUM_MEM_TYPES);
return ttm_range_man_fini_nocheck(bdev, type);
 }
+
+bool ttm_range_man_range_busy(struct ttm_resource_manager *man,
+ unsigned fpfn, unsigned lpfn);
 #endif
-- 
2.25.1



[Intel-gfx] [RFC PATCH 4/7] drm/i915: stolen memory use ttm backend

2022-03-15 Thread Robert Beckett
Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 385 ++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h |   9 -
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c|  14 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h|   7 +
 4 files changed, 40 insertions(+), 375 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 265133cb2a47..e58f9902ef47 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -4,19 +4,22 @@
  * Copyright © 2008-2012 Intel Corporation
  */
 
+#include "drm/ttm/ttm_placement.h"
+#include "gem/i915_gem_object_types.h"
 #include 
 #include 
 
-#include 
 #include 
 
 #include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_region.h"
+#include "gem/i915_gem_ttm.h"
 #include "i915_drv.h"
 #include "i915_gem_stolen.h"
 #include "i915_reg.h"
 #include "i915_vgpu.h"
 #include "intel_mchbar_regs.h"
+#include "intel_region_ttm.h"
 
 /*
  * The BIOS typically reserves some of the system's memory for the exclusive
@@ -30,46 +33,6 @@
  * for is a boon.
  */
 
-int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *i915,
-struct drm_mm_node *node, u64 size,
-unsigned alignment, u64 start, u64 end)
-{
-   int ret;
-
-   if (!drm_mm_initialized(&i915->mm.stolen))
-   return -ENODEV;
-
-   /* WaSkipStolenMemoryFirstPage:bdw+ */
-   if (GRAPHICS_VER(i915) >= 8 && start < 4096)
-   start = 4096;
-
-   mutex_lock(&i915->mm.stolen_lock);
-   ret = drm_mm_insert_node_in_range(&i915->mm.stolen, node,
- size, alignment, 0,
- start, end, DRM_MM_INSERT_BEST);
-   mutex_unlock(&i915->mm.stolen_lock);
-
-   return ret;
-}
-
-int i915_gem_stolen_insert_node(struct drm_i915_private *i915,
-   struct drm_mm_node *node, u64 size,
-   unsigned alignment)
-{
-   return i915_gem_stolen_insert_node_in_range(i915, node,
-   size, alignment,
-   I915_GEM_STOLEN_BIAS,
-   U64_MAX);
-}
-
-void i915_gem_stolen_remove_node(struct drm_i915_private *i915,
-struct drm_mm_node *node)
-{
-   mutex_lock(&i915->mm.stolen_lock);
-   drm_mm_remove_node(node);
-   mutex_unlock(&i915->mm.stolen_lock);
-}
-
 static int i915_adjust_stolen(struct drm_i915_private *i915,
  struct resource *dsm)
 {
@@ -170,14 +133,6 @@ static int i915_adjust_stolen(struct drm_i915_private 
*i915,
return 0;
 }
 
-static void i915_gem_cleanup_stolen(struct drm_i915_private *i915)
-{
-   if (!drm_mm_initialized(&i915->mm.stolen))
-   return;
-
-   drm_mm_takedown(&i915->mm.stolen);
-}
-
 static void g4x_get_stolen_reserved(struct drm_i915_private *i915,
struct intel_uncore *uncore,
resource_size_t *base,
@@ -510,216 +465,15 @@ static int i915_gem_init_stolen(struct 
intel_memory_region *mem)
return 0;
 
/* Basic memrange allocator for stolen space. */
-   drm_mm_init(&i915->mm.stolen, 0, i915->stolen_usable_size);
-
-   return 0;
-}
-
-static void dbg_poison(struct i915_ggtt *ggtt,
-  dma_addr_t addr, resource_size_t size,
-  u8 x)
-{
-#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
-   if (!drm_mm_node_allocated(&ggtt->error_capture))
-   return;
-
-   if (ggtt->vm.bind_async_flags & I915_VMA_GLOBAL_BIND)
-   return; /* beware stop_machine() inversion */
-
-   GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
-
-   mutex_lock(&ggtt->error_mutex);
-   while (size) {
-   void __iomem *s;
-
-   ggtt->vm.insert_page(&ggtt->vm, addr,
-ggtt->error_capture.start,
-I915_CACHE_NONE, 0);
-   mb();
-
-   s = io_mapping_map_wc(&ggtt->iomap,
- ggtt->error_capture.start,
- PAGE_SIZE);
-   memset_io(s, x, PAGE_SIZE);
-   io_mapping_unmap(s);
-
-   addr += PAGE_SIZE;
-   size -= PAGE_SIZE;
-   }
-   mb();
-   ggtt->vm.clear_range(&ggtt->vm, ggtt->error_capture.start, PAGE_SIZE);
-   mutex_unlock(&ggtt->error_m

[Intel-gfx] [RFC PATCH 7/7] drm/i915: cleanup old stolen state

2022-03-15 Thread Robert Beckett
remove i915->mm.stolen
remove i915->mm.stolen_lock

they are no longer needed.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/display/intel_fbc.c   |  4 ++--
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c |  2 --
 drivers/gpu/drm/i915/gt/selftest_reset.c   | 16 +---
 drivers/gpu/drm/i915/i915_drv.h|  5 -
 4 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c 
b/drivers/gpu/drm/i915/display/intel_fbc.c
index 9df64ecab70e..644bb599eee6 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -805,7 +805,7 @@ static int intel_fbc_alloc_cfb(struct intel_fbc *fbc,
 err_llb:
i915_gem_object_put(fetch_and_zero(&fbc->compressed_llb));
 err:
-   if (drm_mm_initialized(&i915->mm.stolen))
+   if (IS_ERR(obj) && (PTR_ERR(obj) == -ENOMEM || PTR_ERR(obj) == -ENXIO))
drm_info_once(&i915->drm, "not enough stolen space for 
compressed buffer (need %d more bytes), disabling. Hint: you may be able to 
increase stolen memory size in the BIOS to avoid this.\n", size);
return -ENOSPC;
 }
@@ -1708,7 +1708,7 @@ void intel_fbc_init(struct drm_i915_private *i915)
 {
enum intel_fbc_id fbc_id;
 
-   if (!drm_mm_initialized(&i915->mm.stolen))
+   if (!i915->mm.stolen_region)
mkwrite_device_info(i915)->display.fbc_mask = 0;
 
if (need_fbc_vtd_wa(i915))
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index e58f9902ef47..930521a84607 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -347,8 +347,6 @@ static int i915_gem_init_stolen(struct intel_memory_region 
*mem)
resource_size_t reserved_base, stolen_top;
resource_size_t reserved_total, reserved_size;
 
-   mutex_init(&i915->mm.stolen_lock);
-
if (intel_vgpu_active(i915)) {
drm_notice(&i915->drm,
   "%s, disabling use of stolen memory\n",
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c 
b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 37c38bdd5f47..ad2ecc582be2 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -6,6 +6,7 @@
 #include 
 
 #include "gem/i915_gem_stolen.h"
+#include "intel_region_ttm.h"
 
 #include "i915_memcpy.h"
 #include "i915_selftest.h"
@@ -83,6 +84,7 @@ __igt_reset_stolen(struct intel_gt *gt,
dma_addr_t dma = (dma_addr_t)dsm->start + (page << PAGE_SHIFT);
void __iomem *s;
void *in;
+   bool busy;
 
ggtt->vm.insert_page(&ggtt->vm, dma,
 ggtt->error_capture.start,
@@ -93,9 +95,9 @@ __igt_reset_stolen(struct intel_gt *gt,
  ggtt->error_capture.start,
  PAGE_SIZE);
 
-   if (!__drm_mm_interval_first(>->i915->mm.stolen,
-page << PAGE_SHIFT,
-((page + 1) << PAGE_SHIFT) - 1))
+   busy = intel_region_ttm_range_busy(gt->i915->mm.stolen_region,
+  PFN_PHYS(page), 
PFN_PHYS(page + 1) - 1);
+   if (!busy)
memset_io(s, STACK_MAGIC, PAGE_SIZE);
 
in = (void __force *)s;
@@ -124,6 +126,7 @@ __igt_reset_stolen(struct intel_gt *gt,
void __iomem *s;
void *in;
u32 x;
+   bool busy;
 
ggtt->vm.insert_page(&ggtt->vm, dma,
 ggtt->error_capture.start,
@@ -139,10 +142,9 @@ __igt_reset_stolen(struct intel_gt *gt,
in = tmp;
x = crc32_le(0, in, PAGE_SIZE);
 
-   if (x != crc[page] &&
-   !__drm_mm_interval_first(>->i915->mm.stolen,
-page << PAGE_SHIFT,
-((page + 1) << PAGE_SHIFT) - 1)) {
+   busy = intel_region_ttm_range_busy(gt->i915->mm.stolen_region,
+  PFN_PHYS(page), 
PFN_PHYS(page + 1) - 1);
+   if (x != crc[page] && !busy) {
pr_debug("unused stolen page %pa modified by GPU 
reset\n",
 &page);
if (count++ == 0)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7d622d1afe93..1f9fa2d6d198 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_dr

Re: [Intel-gfx] [RFC PATCH 5/7] drm/ttm: add range busy check for range manager

2022-03-16 Thread Robert Beckett




On 16/03/2022 09:54, Christian König wrote:

Am 15.03.22 um 19:04 schrieb Robert Beckett:

RFC: do we want this to become a generic interface in
ttm_resource_manager_func?

RFC: would we prefer a different interface? e.g.
for_each_resource_in_range or for_each_bo_in_range


Well completely NAK to that. Why do you need that?

The long term goal is to completely remove the range checks from TTM 
instead.


ah, I did not know that.
I wanted it just to enable parity with a selftest that checks whether a 
range is allocated before initializing a given range with test data 
behind the allocator's back. It needs to check the range so that it 
doesn't destroy in use data.


I suppose we could add another drm_mm range tracker just for testing and 
shadow track each allocation in the range, but that seemed like a lot of 
extra infrastructure for no general runtime use.


would you mind explaining the rationale for removing range checks? It 
seems to me like a natural fit for a memory manager




Regards,
Christian.



Signed-off-by: Robert Beckett 
---
  drivers/gpu/drm/ttm/ttm_range_manager.c | 21 +
  include/drm/ttm/ttm_range_manager.h |  3 +++
  2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c

index 8cd4f3fb9f79..5662627bb933 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -206,3 +206,24 @@ int ttm_range_man_fini_nocheck(struct ttm_device 
*bdev,

  return 0;
  }
  EXPORT_SYMBOL(ttm_range_man_fini_nocheck);
+
+/**
+ * ttm_range_man_range_busy - Check whether anything is allocated 
with a range

+ *
+ * @man: memory manager to check
+ * @fpfn: first page number to check
+ * @lpfn: last page number to check
+ *
+ * Return: true if anything allocated within the range, false otherwise.
+ */
+bool ttm_range_man_range_busy(struct ttm_resource_manager *man,
+  unsigned fpfn, unsigned lpfn)
+{
+    struct ttm_range_manager *rman = to_range_manager(man);
+    struct drm_mm *mm = &rman->mm;
+
+    if (__drm_mm_interval_first(mm, PFN_PHYS(fpfn), PFN_PHYS(lpfn + 
1) - 1))

+    return true;
+    return false;
+}
+EXPORT_SYMBOL(ttm_range_man_range_busy);
diff --git a/include/drm/ttm/ttm_range_manager.h 
b/include/drm/ttm/ttm_range_manager.h

index 7963b957e9ef..86794a3f9101 100644
--- a/include/drm/ttm/ttm_range_manager.h
+++ b/include/drm/ttm/ttm_range_manager.h
@@ -53,4 +53,7 @@ static __always_inline int ttm_range_man_fini(struct 
ttm_device *bdev,
  BUILD_BUG_ON(__builtin_constant_p(type) && type >= 
TTM_NUM_MEM_TYPES);

  return ttm_range_man_fini_nocheck(bdev, type);
  }
+
+bool ttm_range_man_range_busy(struct ttm_resource_manager *man,
+  unsigned fpfn, unsigned lpfn);
  #endif




Re: [Intel-gfx] [RFC PATCH 5/7] drm/ttm: add range busy check for range manager

2022-03-16 Thread Robert Beckett




On 16/03/2022 13:43, Christian König wrote:

Am 16.03.22 um 14:19 schrieb Robert Beckett:



On 16/03/2022 09:54, Christian König wrote:

Am 15.03.22 um 19:04 schrieb Robert Beckett:

RFC: do we want this to become a generic interface in
ttm_resource_manager_func?

RFC: would we prefer a different interface? e.g.
for_each_resource_in_range or for_each_bo_in_range


Well completely NAK to that. Why do you need that?

The long term goal is to completely remove the range checks from TTM 
instead.


ah, I did not know that.
I wanted it just to enable parity with a selftest that checks whether 
a range is allocated before initializing a given range with test data 
behind the allocator's back. It needs to check the range so that it 
doesn't destroy in use data.


Mhm, of hand that doesn't sounds like a valid test case. Do you have the 
code at hand?


https://patchwork.freedesktop.org/patch/478347/?series=101396&rev=1
this is where I replace an existing range check via drm_mm with the 
range check I added in this patch.






I suppose we could add another drm_mm range tracker just for testing 
and shadow track each allocation in the range, but that seemed like a 
lot of extra infrastructure for no general runtime use.


I have no idea what you mean with that.


I meant as a potential solution to tracking allocations without a range 
check, we would need to add something external. e.g. adding a shadow 
drm_mm range tracker, or a bitmask across the range, or stick objects in 
a list etc.






would you mind explaining the rationale for removing range checks? It 
seems to me like a natural fit for a memory manager


TTM manages buffer objects and resources, not address space. The 
lpfn/fpfn parameter for the resource allocators are actually used as 
just two independent parameters and not define any range. We just keep 
the names for historical reasons.


The only places we still use and compare them as ranges are 
ttm_resource_compat() and ttm_bo_eviction_valuable() and I already have 
patches to clean up those and move them into the backend resource handling.


except the ttm_range_manager seems to still use them as a range specifier.

If the general design going forward is to not consider ranges, how would 
you recommend constructing buffers around pre-allocated regions e.g. 
uefi frame buffers who's range is dictated externally?




Regards,
Christian.





Regards,
Christian.



Signed-off-by: Robert Beckett 
---
  drivers/gpu/drm/ttm/ttm_range_manager.c | 21 +
  include/drm/ttm/ttm_range_manager.h |  3 +++
  2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c

index 8cd4f3fb9f79..5662627bb933 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -206,3 +206,24 @@ int ttm_range_man_fini_nocheck(struct 
ttm_device *bdev,

  return 0;
  }
  EXPORT_SYMBOL(ttm_range_man_fini_nocheck);
+
+/**
+ * ttm_range_man_range_busy - Check whether anything is allocated 
with a range

+ *
+ * @man: memory manager to check
+ * @fpfn: first page number to check
+ * @lpfn: last page number to check
+ *
+ * Return: true if anything allocated within the range, false 
otherwise.

+ */
+bool ttm_range_man_range_busy(struct ttm_resource_manager *man,
+  unsigned fpfn, unsigned lpfn)
+{
+    struct ttm_range_manager *rman = to_range_manager(man);
+    struct drm_mm *mm = &rman->mm;
+
+    if (__drm_mm_interval_first(mm, PFN_PHYS(fpfn), PFN_PHYS(lpfn + 
1) - 1))

+    return true;
+    return false;
+}
+EXPORT_SYMBOL(ttm_range_man_range_busy);
diff --git a/include/drm/ttm/ttm_range_manager.h 
b/include/drm/ttm/ttm_range_manager.h

index 7963b957e9ef..86794a3f9101 100644
--- a/include/drm/ttm/ttm_range_manager.h
+++ b/include/drm/ttm/ttm_range_manager.h
@@ -53,4 +53,7 @@ static __always_inline int 
ttm_range_man_fini(struct ttm_device *bdev,
  BUILD_BUG_ON(__builtin_constant_p(type) && type >= 
TTM_NUM_MEM_TYPES);

  return ttm_range_man_fini_nocheck(bdev, type);
  }
+
+bool ttm_range_man_range_busy(struct ttm_resource_manager *man,
+  unsigned fpfn, unsigned lpfn);
  #endif






Re: [Intel-gfx] [RFC PATCH 5/7] drm/ttm: add range busy check for range manager

2022-03-16 Thread Robert Beckett




On 16/03/2022 14:39, Christian König wrote:

Am 16.03.22 um 15:26 schrieb Robert Beckett:


[SNIP]
this is where I replace an existing range check via drm_mm with the 
range check I added in this patch.


Mhm, I still don't get the use case from the code, but I don't think it 
matters any more.


I suppose we could add another drm_mm range tracker just for testing 
and shadow track each allocation in the range, but that seemed like 
a lot of extra infrastructure for no general runtime use.


I have no idea what you mean with that.


I meant as a potential solution to tracking allocations without a 
range check, we would need to add something external. e.g. adding a 
shadow drm_mm range tracker, or a bitmask across the range, or stick 
objects in a list etc.


Ah! So you are trying to get access to the drm_mm inside the 
ttm_range_manager and not add some additional range check function! Now 
I got your use case.


well, specifically I was trying to avoid having to get access to the drm_mm.
I wanted to maintain an abstract interface at the resource manager 
level, hence the rfc to ask if we could add a range check to 
ttm_resource_manager_func.


I don't like the idea of code external to ttm having to poke in to the 
implementation details of the manager to get it's underlying drm_mm.




would you mind explaining the rationale for removing range checks? 
It seems to me like a natural fit for a memory manager


TTM manages buffer objects and resources, not address space. The 
lpfn/fpfn parameter for the resource allocators are actually used as 
just two independent parameters and not define any range. We just 
keep the names for historical reasons.


The only places we still use and compare them as ranges are 
ttm_resource_compat() and ttm_bo_eviction_valuable() and I already 
have patches to clean up those and move them into the backend 
resource handling.


except the ttm_range_manager seems to still use them as a range 
specifier.


Yeah, because the range manager is the backend which handles ranges 
using the drm_mm :)


If the general design going forward is to not consider ranges, how 
would you recommend constructing buffers around pre-allocated regions 
e.g. uefi frame buffers who's range is dictated externally?


Call ttm_bo_mem_space() with the fpfn/lpfn filled in as required. See 
function amdgpu_bo_create_kernel_at() for an example.


ah, I see, thanks.

To allow similar code to before, which was conceptually just trying to 
see if a range was currently free, would you be okay with a new 
ttm_bo_mem_try_space, which does not do the force to evict, but instead 
returns -EBUSY?


If so, the test can try to alloc, and immediately free if successful 
which would imply it was free.




Regards,
Christian.





Regards,
Christian.





Regards,
Christian.



Signed-off-by: Robert Beckett 
---
  drivers/gpu/drm/ttm/ttm_range_manager.c | 21 +
  include/drm/ttm/ttm_range_manager.h |  3 +++
  2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c

index 8cd4f3fb9f79..5662627bb933 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -206,3 +206,24 @@ int ttm_range_man_fini_nocheck(struct 
ttm_device *bdev,

  return 0;
  }
  EXPORT_SYMBOL(ttm_range_man_fini_nocheck);
+
+/**
+ * ttm_range_man_range_busy - Check whether anything is allocated 
with a range

+ *
+ * @man: memory manager to check
+ * @fpfn: first page number to check
+ * @lpfn: last page number to check
+ *
+ * Return: true if anything allocated within the range, false 
otherwise.

+ */
+bool ttm_range_man_range_busy(struct ttm_resource_manager *man,
+  unsigned fpfn, unsigned lpfn)
+{
+    struct ttm_range_manager *rman = to_range_manager(man);
+    struct drm_mm *mm = &rman->mm;
+
+    if (__drm_mm_interval_first(mm, PFN_PHYS(fpfn), PFN_PHYS(lpfn 
+ 1) - 1))

+    return true;
+    return false;
+}
+EXPORT_SYMBOL(ttm_range_man_range_busy);
diff --git a/include/drm/ttm/ttm_range_manager.h 
b/include/drm/ttm/ttm_range_manager.h

index 7963b957e9ef..86794a3f9101 100644
--- a/include/drm/ttm/ttm_range_manager.h
+++ b/include/drm/ttm/ttm_range_manager.h
@@ -53,4 +53,7 @@ static __always_inline int 
ttm_range_man_fini(struct ttm_device *bdev,
  BUILD_BUG_ON(__builtin_constant_p(type) && type >= 
TTM_NUM_MEM_TYPES);

  return ttm_range_man_fini_nocheck(bdev, type);
  }
+
+bool ttm_range_man_range_busy(struct ttm_resource_manager *man,
+  unsigned fpfn, unsigned lpfn);
  #endif








Re: [Intel-gfx] [PATCH 1/4] drm/i915: add gen6 ppgtt dummy creation function

2022-05-23 Thread Robert Beckett




On 11/05/2022 11:13, Thomas Hellström wrote:

Hi,

On Tue, 2022-05-03 at 19:13 +, Robert Beckett wrote:

Internal gem objects will soon just be volatile system memory region
objects.
To enable this, create a separate dummy object creation function
for gen6 ppgtt



It's not clear from the commit message why we need a special case for
this. Could you describe more in detail?


it always was a special case, that used the internal backend but 
provided it's own ops, so actually had no benefit from using the 
internal backend. See b0b0f2d225da6fe58417fae37e3f797e2db27b62


I'll add some further explanation in the commit message for v2.



Thanks,
Thomas




Signed-off-by: Robert Beckett 
---
  drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 43
++--
  1 file changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 1bb766c79dcb..f3b660cfeb7f 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -372,6 +372,45 @@ static const struct drm_i915_gem_object_ops
pd_dummy_obj_ops = {
 .put_pages = pd_dummy_obj_put_pages,
  };
  
+static struct drm_i915_gem_object *

+i915_gem_object_create_dummy(struct drm_i915_private *i915,
phys_addr_t size)
+{
+   static struct lock_class_key lock_class;
+   struct drm_i915_gem_object *obj;
+   unsigned int cache_level;
+
+   GEM_BUG_ON(!size);
+   GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
+
+   if (overflows_type(size, obj->base.size))
+   return ERR_PTR(-E2BIG);
+
+   obj = i915_gem_object_alloc();
+   if (!obj)
+   return ERR_PTR(-ENOMEM);
+
+   drm_gem_private_object_init(&i915->drm, &obj->base, size);
+   i915_gem_object_init(obj, &pd_dummy_obj_ops, &lock_class, 0);
+   obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
+
+   /*
+    * Mark the object as volatile, such that the pages are
marked as
+    * dontneed whilst they are still pinned. As soon as they are
unpinned
+    * they are allowed to be reaped by the shrinker, and the
caller is
+    * expected to repopulate - the contents of this object are
only valid
+    * whilst active and pinned.
+    */
+   i915_gem_object_set_volatile(obj);
+
+   obj->read_domains = I915_GEM_DOMAIN_CPU;
+   obj->write_domain = I915_GEM_DOMAIN_CPU;
+
+   cache_level = HAS_LLC(i915) ? I915_CACHE_LLC :
I915_CACHE_NONE;
+   i915_gem_object_set_cache_coherency(obj, cache_level);
+
+   return obj;
+}
+
  static struct i915_page_directory *
  gen6_alloc_top_pd(struct gen6_ppgtt *ppgtt)
  {
@@ -383,9 +422,7 @@ gen6_alloc_top_pd(struct gen6_ppgtt *ppgtt)
 if (unlikely(!pd))
 return ERR_PTR(-ENOMEM);
  
-   pd->pt.base = __i915_gem_object_create_internal(ppgtt-

base.vm.gt->i915,

-
    &pd_dummy_obj_o
ps,
-   I915_PDES *
SZ_4K);
+   pd->pt.base = i915_gem_object_create_dummy(ppgtt->base.vm.gt-

i915, I915_PDES * SZ_4K);

 if (IS_ERR(pd->pt.base)) {
 err = PTR_ERR(pd->pt.base);
 pd->pt.base = NULL;





Re: [Intel-gfx] [PATCH 4/4] drm/i915: internal buffers use ttm backend

2022-05-23 Thread Robert Beckett




On 11/05/2022 15:14, Thomas Hellström wrote:

On Tue, 2022-05-03 at 19:13 +, Robert Beckett wrote:

refactor internal buffer backend to allocate volatile pages via
ttm pool allocator

Signed-off-by: Robert Beckett 
---
  drivers/gpu/drm/i915/gem/i915_gem_internal.c | 264 -
--
  drivers/gpu/drm/i915/gem/i915_gem_internal.h |   5 -
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c  |  12 +-
  drivers/gpu/drm/i915/gem/i915_gem_ttm.h  |  12 +-
  4 files changed, 125 insertions(+), 168 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index c698f95af15f..815ec9466cc0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -4,156 +4,119 @@
   * Copyright © 2014-2016 Intel Corporation
   */
  
-#include 

-#include 
-#include 
-
+#include 
+#include 
+#include "drm/ttm/ttm_bo_api.h"
+#include "gem/i915_gem_internal.h"
+#include "gem/i915_gem_region.h"
+#include "gem/i915_gem_ttm.h"
  #include "i915_drv.h"
-#include "i915_gem.h"
-#include "i915_gem_internal.h"
-#include "i915_gem_object.h"
-#include "i915_scatterlist.h"
-#include "i915_utils.h"
-
-#define QUIET (__GFP_NORETRY | __GFP_NOWARN)
-#define MAYFAIL (__GFP_RETRY_MAYFAIL | __GFP_NOWARN)
-
-static void internal_free_pages(struct sg_table *st)
-{
-   struct scatterlist *sg;
-
-   for (sg = st->sgl; sg; sg = __sg_next(sg)) {
-   if (sg_page(sg))
-   __free_pages(sg_page(sg), get_order(sg-

length));

-   }
-
-   sg_free_table(st);
-   kfree(st);
-}
  
-static int i915_gem_object_get_pages_internal(struct

drm_i915_gem_object *obj)
+static int i915_internal_get_pages(struct drm_i915_gem_object *obj)
  {
-   struct drm_i915_private *i915 = to_i915(obj->base.dev);
-   struct sg_table *st;
-   struct scatterlist *sg;
-   unsigned int sg_page_sizes;
-   unsigned int npages;
-   int max_order;
-   gfp_t gfp;
-
-   max_order = MAX_ORDER;
-#ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active(obj->base.dev->dev)) {
-   unsigned int max_segment;
-
-   max_segment = swiotlb_max_segment();
-   if (max_segment) {
-   max_segment = max_t(unsigned int,
max_segment,
-   PAGE_SIZE) >> PAGE_SHIFT;
-   max_order = min(max_order,
ilog2(max_segment));
-   }
+   struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
+   struct ttm_operation_ctx ctx = {
+   .interruptible = true,
+   .no_wait_gpu = false,
+   };
+   struct ttm_place place = {
+   .fpfn = 0,
+   .lpfn = 0,
+   .mem_type = I915_PL_SYSTEM,
+   .flags = 0,
+   };
+   struct ttm_placement placement = {
+   .num_placement = 1,
+   .placement = &place,
+   .num_busy_placement = 0,
+   .busy_placement = NULL,
+   };
+   int ret;
+
+   ret = ttm_bo_validate(bo, &placement, &ctx);
+   if (ret) {
+   ret = i915_ttm_err_to_gem(ret);
+   return ret;
 }
-#endif
  
-   gfp = GFP_KERNEL | __GFP_HIGHMEM | __GFP_RECLAIMABLE;

-   if (IS_I965GM(i915) || IS_I965G(i915)) {
-   /* 965gm cannot relocate objects above 4GiB. */
-   gfp &= ~__GFP_HIGHMEM;
-   gfp |= __GFP_DMA32;



It looks like we're losing this restriction?

There is a flag to ttm_device_init() to make TTM only do __GFP_DMA32
allocations.


agreed. will fix for v2




+   if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
+   ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
+   if (ret)
+   return ret;
 }
  
-create_st:

-   st = kmalloc(sizeof(*st), GFP_KERNEL);
-   if (!st)
-   return -ENOMEM;
+   if (!i915_gem_object_has_pages(obj)) {
+   struct i915_refct_sgt *rsgt =
+   i915_ttm_resource_get_st(obj, bo->resource);
  
-   npages = obj->base.size / PAGE_SIZE;

-   if (sg_alloc_table(st, npages, GFP_KERNEL)) {
-   kfree(st);
-   return -ENOMEM;
-   }
+   if (IS_ERR(rsgt))
+   return PTR_ERR(rsgt);
  
-   sg = st->sgl;

-   st->nents = 0;
-   sg_page_sizes = 0;
-
-   do {
-   int order = min(fls(npages) - 1, max_order);
-   struct page *page;
-
-   do {
-   page = alloc_pages(gfp | (order ? QUIET :
MAYFAIL),
-  order);
-   if (page)
-   break;
-   if (!order--)
-

Re: [Intel-gfx] [PATCH 3/4] drm/i915: allow volatile buffers to use ttm pool allocator

2022-05-23 Thread Robert Beckett




On 11/05/2022 13:42, Thomas Hellström wrote:

Hi, Bob,

On Tue, 2022-05-03 at 19:13 +, Robert Beckett wrote:

internal buffers should be shmem backed.
if a volatile buffer is requested, allow ttm to use the pool
allocator
to provide volatile pages as backing

Signed-off-by: Robert Beckett 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..fdb3a1c18cb6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -309,7 +309,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct
ttm_buffer_object *bo,
 page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
  
 caching = i915_ttm_select_tt_caching(obj);

-   if (i915_gem_object_is_shrinkable(obj) && caching ==
ttm_cached) {
+   if (i915_gem_object_is_shrinkable(obj) && caching ==
ttm_cached &&
+   !i915_gem_object_is_volatile(obj)) {
 page_flags |= TTM_TT_FLAG_EXTERNAL |
   TTM_TT_FLAG_EXTERNAL_MAPPABLE;
 i915_tt->is_shmem = true;


While this is ok, I think it also needs adjustment in the i915_ttm
shrink callback. If someone creates a volatile smem object which then
hits the shrinker, I think we might hit asserts that it's a is_shem
ttm?

In this case, the shrink callback should just i915_ttm_purge().


agreed. nice catch.
I'll fix for v2

looks like we could maybe do with some extra shrinker testing too? looks 
like nothing caught this during CI testing




/Thomas




[Intel-gfx] [PATCH v2 0/8] drm/i915: ttm for internal

2022-06-08 Thread Robert Beckett
This series refactors i915's internal buffer backend to use ttm.
It uses ttm's pool allocator to allocate volatile pages in place of the
old code which rolled its own via alloc_pages.
This is continuing progress to align all backends on using ttm.

v2: - commit message improvements to add detail
- fix i915_ttm_shrink to purge !is_shmem volatile buffers
- limit ttm pool allocator to using dma32 on i965G[M]
- fix mman selftest to always use smem buffers
- create new internal memory region
- make internal backend allocate from internal region
- Fixed various issues with tests and i915 ttm usage as a result
  of supporting regions other than lmem via ttm.

Robert Beckett (8):
  drm/i915/ttm: dont trample cache_level overrides during ttm move
  drm/i915: add gen6 ppgtt dummy creation function
  drm/i915: setup ggtt scratch page after memory regions
  drm/i915: allow volatile buffers to use ttm pool allocator
  drm/i915: limit ttm to dma32 for i965G[M]
  drm/i915/gem: further fix mman selftest
  drm/i915/ttm: trust snooping when gen 6+ when deciding default
cache_level
  drm/i915: internal buffers use ttm backend

 drivers/gpu/drm/i915/gem/i915_gem_internal.c  | 187 +-
 drivers/gpu/drm/i915/gem/i915_gem_internal.h  |   5 -
 drivers/gpu/drm/i915/gem/i915_gem_object.c|   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  13 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|  20 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |  43 +++-
 drivers/gpu/drm/i915/gt/intel_gt_gmch.c   |  20 +-
 drivers/gpu/drm/i915/gt/intel_gt_gmch.h   |   6 +
 drivers/gpu/drm/i915/i915_driver.c|  16 +-
 drivers/gpu/drm/i915/i915_pci.c   |   4 +-
 drivers/gpu/drm/i915/intel_memory_region.c|   8 +-
 drivers/gpu/drm/i915/intel_memory_region.h|   2 +
 drivers/gpu/drm/i915/intel_region_ttm.c   |   7 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |   2 +-
 17 files changed, 123 insertions(+), 221 deletions(-)

-- 
2.25.1



[Intel-gfx] [PATCH v2 1/8] drm/i915/ttm: dont trample cache_level overrides during ttm move

2022-06-08 Thread Robert Beckett
Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 9 ++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct 
drm_i915_gem_object *obj,
struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
obj->cache_level = cache_level;
+   obj->ttm.cache_level_override = true;
 
if (cache_level != I915_CACHE_NONE)
obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
struct i915_gem_object_page_iter get_io_page;
struct drm_i915_gem_object *backup;
bool created:1;
+   bool cache_level_override:1;
} ttm;
 
/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_gem_object_init_memory_region(obj, mem);
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
+   obj->ttm.cache_level_override = false;
i915_gem_object_unlock(obj);
 
return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
I915_BO_FLAG_STRUCT_PAGE;
 
-   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-  bo->ttm);
-   i915_gem_object_set_cache_coherency(obj, cache_level);
+   if (!obj->ttm.cache_level_override) {
+   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+  bo->resource, bo->ttm);
+   i915_gem_object_set_cache_coherency(obj, cache_level);
+   obj->ttm.cache_level_override = false;
+   }
 }
 
 /**
-- 
2.25.1



[Intel-gfx] [PATCH v2 2/8] drm/i915: add gen6 ppgtt dummy creation function

2022-06-08 Thread Robert Beckett
Internal gem objects will soon just be volatile system memory region
objects.
To enable this, create a separate dummy object creation function
for gen6 ppgtt. The object only exists as a fake object pointing to ggtt
and gains no benefit in going via the internal backend.
Instead, create a dummy gem object and avoid having to maintain a custom
ops api in the internal backend, which makes later refactoring easier.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 43 ++--
 1 file changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 1bb766c79dcb..f3b660cfeb7f 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -372,6 +372,45 @@ static const struct drm_i915_gem_object_ops 
pd_dummy_obj_ops = {
.put_pages = pd_dummy_obj_put_pages,
 };
 
+static struct drm_i915_gem_object *
+i915_gem_object_create_dummy(struct drm_i915_private *i915, phys_addr_t size)
+{
+   static struct lock_class_key lock_class;
+   struct drm_i915_gem_object *obj;
+   unsigned int cache_level;
+
+   GEM_BUG_ON(!size);
+   GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
+
+   if (overflows_type(size, obj->base.size))
+   return ERR_PTR(-E2BIG);
+
+   obj = i915_gem_object_alloc();
+   if (!obj)
+   return ERR_PTR(-ENOMEM);
+
+   drm_gem_private_object_init(&i915->drm, &obj->base, size);
+   i915_gem_object_init(obj, &pd_dummy_obj_ops, &lock_class, 0);
+   obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
+
+   /*
+* Mark the object as volatile, such that the pages are marked as
+* dontneed whilst they are still pinned. As soon as they are unpinned
+* they are allowed to be reaped by the shrinker, and the caller is
+* expected to repopulate - the contents of this object are only valid
+* whilst active and pinned.
+*/
+   i915_gem_object_set_volatile(obj);
+
+   obj->read_domains = I915_GEM_DOMAIN_CPU;
+   obj->write_domain = I915_GEM_DOMAIN_CPU;
+
+   cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
+   i915_gem_object_set_cache_coherency(obj, cache_level);
+
+   return obj;
+}
+
 static struct i915_page_directory *
 gen6_alloc_top_pd(struct gen6_ppgtt *ppgtt)
 {
@@ -383,9 +422,7 @@ gen6_alloc_top_pd(struct gen6_ppgtt *ppgtt)
if (unlikely(!pd))
return ERR_PTR(-ENOMEM);
 
-   pd->pt.base = __i915_gem_object_create_internal(ppgtt->base.vm.gt->i915,
-   &pd_dummy_obj_ops,
-   I915_PDES * SZ_4K);
+   pd->pt.base = i915_gem_object_create_dummy(ppgtt->base.vm.gt->i915, 
I915_PDES * SZ_4K);
if (IS_ERR(pd->pt.base)) {
err = PTR_ERR(pd->pt.base);
pd->pt.base = NULL;
-- 
2.25.1



  1   2   3   >