from:"Robert Beckett"

Re: [Intel-gfx] [PATCH v2 00/17] drm/i915/dg2: Enabling 64k page size and flat ccs

2021-10-25 Thread Robert Beckett


(apologies for not quoting, I wasn't subscribed before now)


some quick thoughts:

- Can we split these patches in to two series, one for each topic. They 
don't seem specifically related.


- to simplify 64K page support, could we just set minimum allocation 
size to 64K and round up for allocation requests?
Placement then becomes much simpler, no need to align the va to 2MB, 
just fit it in wherever it fits and always use 64K PTEs in GTT


This would simplify the code a lot and would benefit performance due up 
to 16x fewer page walks.
If we did this, we would not have to consider 2MB boundaries at all, we 
could drop all the colour handling etc.


The only down side might be some waste of allocation if there are lots 
of very small buffers.
However, I think most gfx related use cases would not be badly affected 
by this (even a cursor plane is 64k, usually).


Are there any usecases that you are aware of that would be impacted 
badly by this idea? (maybe some compute workload?)



- flat ccs modifiers: there seems to be some confusion over whether 
there should be a separate modifier for this.

As it dictates a new layout it seems like it should be a new modifier.
Was there any internal discussions about this that you could elaborate 
on here?

[Intel-gfx] [PATCH v6 00/10] drm/i915: ttm for stolen

2022-06-17 Thread Robert Beckett

This series refactors i915's stolen memory region to use ttm.

v2: handle disabled stolen similar to legacy version.
relying on ttm to fail allocs works fine, but is dmesg noisy and causes 
testing
dmesg warning regressions.

v3: rebase to latest drm-tip.
fix v2 code refactor which could leave a buffer pinned.
locally passes fftl again now.

v4: - Allow memory regions creators to do allocation. Allows stolen region 
to track
  it's own reservations.
- Pre-reserve first page of stolen mem (add back 
WaSkipStolenMemoryFirstPage:bdw+)
- Improve commit descritpion for "drm/i915: sanitize mem_flags for 
stolen buffers"
- replace i915_gem_object_pin_pages_unlocked() call with manual locking 
and pinning.
  this avoids ww ctx class reuse during context creation -> ring vma 
obj alloc.

v5: - detect both types of stolen as stolen buffers in
  "drm/i915: sanitize mem_flags for stolen buffers"
- in stolen_object_init limit page size to mem region minimum.
  The range allocator expects the page_size to define the
  alignment

v6: - Share first 4 patches from ttm for internal series as generic
  i915 ttm fixes
- Drop patch 4 from v5. We don't need separate object ops just
  to satisfy test interfaces. The tests have now been fixed via
  checking whether the memory region is private to decide
  whether to mmap
- Add new buffer pin alloc flag to allow creation of buffers in
  their final ttm placement instead of deferring until
  get_pages. This fixes legacy fallback paths for buffer
  allocations during stolen memory pressure.

Robert Beckett (10):
  drm/i915/ttm: dont trample cache_level overrides during ttm move
  drm/i915: limit ttm to dma32 for i965G[M]
  drm/i915/ttm: only trust snooping for dgfx when deciding default
cache_level
  drm/i915/gem: selftest should not attempt mmap of private regions
  drm/i915: instantiate ttm ranger manager for stolen memory
  drm/i915: sanitize mem_flags for stolen buffers
  drm/i915: ttm move/clear logic fix
  drm/i915: allow memory region creators to alloc and free the region
  drm/i915/ttm: add buffer pin on alloc flag
  drm/i915: stolen memory use ttm backend

 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c|   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 440 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  29 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  47 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|   3 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_memory_region.c|  16 +-
 drivers/gpu/drm/i915/intel_memory_region.h|   2 +
 drivers/gpu/drm/i915/intel_region_ttm.c   |  80 +++-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |   3 +-
 18 files changed, 414 insertions(+), 369 deletions(-)

-- 
2.25.1

[Intel-gfx] [PATCH v6 01/10] drm/i915/ttm: dont trample cache_level overrides during ttm move

2022-06-17 Thread Robert Beckett

Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 9 ++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct 
drm_i915_gem_object *obj,
struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
obj->cache_level = cache_level;
+   obj->ttm.cache_level_override = true;
 
if (cache_level != I915_CACHE_NONE)
obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
struct i915_gem_object_page_iter get_io_page;
struct drm_i915_gem_object *backup;
bool created:1;
+   bool cache_level_override:1;
} ttm;
 
/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_gem_object_init_memory_region(obj, mem);
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
+   obj->ttm.cache_level_override = false;
i915_gem_object_unlock(obj);
 
return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
I915_BO_FLAG_STRUCT_PAGE;
 
-   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-  bo->ttm);
-   i915_gem_object_set_cache_coherency(obj, cache_level);
+   if (!obj->ttm.cache_level_override) {
+   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+  bo->resource, bo->ttm);
+   i915_gem_object_set_cache_coherency(obj, cache_level);
+   obj->ttm.cache_level_override = false;
+   }
 }
 
 /**
-- 
2.25.1

[Intel-gfx] [PATCH v6 02/10] drm/i915: limit ttm to dma32 for i965G[M]

2022-06-17 Thread Robert Beckett

i965G[M] cannot relocate objects above 4GiB.
Ensure ttm uses dma32 on these systems.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index 62ff77445b01..fd2ecfdd8fa1 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -32,10 +32,15 @@
 int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 {
struct drm_device *drm = &dev_priv->drm;
+   bool use_dma32 = false;
+
+   /* i965g[m] cannot relocate objects above 4GiB. */
+   if (IS_I965GM(dev_priv) || IS_I965G(dev_priv))
+   use_dma32 = true;
 
return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
   drm->dev, drm->anon_inode->i_mapping,
-  drm->vma_offset_manager, false, false);
+  drm->vma_offset_manager, false, use_dma32);
 }
 
 /**
-- 
2.25.1

[Intel-gfx] [PATCH v6 04/10] drm/i915/gem: selftest should not attempt mmap of private regions

2022-06-17 Thread Robert Beckett

During testing make can_mmap consider whether the region is private.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum 
i915_mmap_type type)
struct drm_i915_private *i915 = to_i915(obj->base.dev);
bool no_map;
 
+   if (obj->mm.region && obj->mm.region->private)
+   return false;
+
if (obj->ops->mmap_offset)
return type == I915_MMAP_TYPE_FIXED;
else if (type == I915_MMAP_TYPE_FIXED)
-- 
2.25.1

[Intel-gfx] [PATCH v6 03/10] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level

2022-06-17 Thread Robert Beckett

By default i915_ttm_cache_level() decides I915_CACHE_LLC if HAS_SNOOP.
This is divergent from existing backends code which only considers
HAS_LLC.
Testing shows that trusting snooping on gen5- is unreliable and bsw via
ggtt mappings, so limit DGFX for now and maintain previous behaviour.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 4c1de0b4a10f..40249fa28a7a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -46,7 +46,9 @@ static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 struct ttm_tt *ttm)
 {
-   return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+   bool can_snoop = HAS_SNOOP(i915) && IS_DGFX(i915);
+
+   return ((HAS_LLC(i915) || can_snoop) &&
!i915_ttm_gtt_binds_lmem(res) &&
ttm->caching == ttm_cached) ? I915_CACHE_LLC :
I915_CACHE_NONE;
-- 
2.25.1

[Intel-gfx] [PATCH v6 09/10] drm/i915/ttm: add buffer pin on alloc flag

2022-06-17 Thread Robert Beckett

For situations where allocations need to fail on alloc instead of
delayed get_pages, add a new alloc flag to pin the ttm bo.
This makes sure that the resource has been allocated during buffer
creation, allowing it to fail with an error if the placement is
exhausted.
This allows existing fallback options for stolen backend allocation like
create_ring_vma to work as expected.

Signed-off-by: Robert Beckett 
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 25 ++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 6632ed52e919..07bc11247a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -325,17 +325,20 @@ struct drm_i915_gem_object {
  * dealing with userspace objects the CPU fault handler is free to ignore this.
  */
 #define I915_BO_ALLOC_GPU_ONLY   BIT(6)
+/* object should be pinned in destination region from allocation */
+#define I915_BO_ALLOC_PINNED BIT(7)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
 I915_BO_ALLOC_USER | \
 I915_BO_ALLOC_PM_VOLATILE | \
 I915_BO_ALLOC_PM_EARLY | \
-I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY  BIT(7)
-#define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED BIT(9)
-#define I915_BO_WAS_BOUND_BIT 10
+I915_BO_ALLOC_GPU_ONLY | \
+I915_BO_ALLOC_PINNED)
+#define I915_BO_READONLY  BIT(8)
+#define I915_TILING_QUIRK_BIT 9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED BIT(10)
+#define I915_BO_WAS_BOUND_BIT 11
/**
 * @mem_flags - Mutable placement-related flags
 *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27d59639177f..bb988608296d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -998,6 +998,13 @@ static void i915_ttm_delayed_free(struct 
drm_i915_gem_object *obj)
 {
GEM_BUG_ON(!obj->ttm.created);
 
+   /* stolen objects are pinned for lifetime. Unpin before putting */
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   ttm_bo_reserve(i915_gem_to_ttm(obj), true, false, NULL);
+   ttm_bo_unpin(i915_gem_to_ttm(obj));
+   ttm_bo_unreserve(i915_gem_to_ttm(obj));
+   }
+
ttm_bo_put(i915_gem_to_ttm(obj));
 }
 
@@ -1193,6 +1200,9 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
.no_wait_gpu = false,
};
enum ttm_bo_type bo_type;
+   struct ttm_place _place;
+   struct ttm_placement _placement;
+   struct ttm_placement *placement;
int ret;
 
drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,6 +1232,17 @@ int __i915_gem_ttm_object_init(struct 
intel_memory_region *mem,
 */
i915_gem_object_make_unshrinkable(obj);
 
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   i915_ttm_place_from_region(mem, &_place, obj->bo_offset,
+  obj->base.size, obj->flags);
+   _placement.num_placement = 1;
+   _placement.placement = &_place;
+   _placement.num_busy_placement = 0;
+   _placement.busy_placement = NULL;
+   placement = &_placement;
+   } else {
+   placement = &i915_sys_placement;
+   }
/*
 * If this function fails, it will call the destructor, but
 * our caller still owns the object. So no freeing in the
@@ -1230,7 +1251,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
 * until successful initialization.
 */
ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
-  bo_type, &i915_sys_placement,
+  bo_type, placement,
   page_size >> PAGE_SHIFT,
   &ctx, NULL, NULL, i915_ttm_bo_destroy);
if (ret)
@@ -1242,6 +1263,8 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
obj->ttm.cache_level_override = false;
+   if (obj->flags & I915_BO_ALLOC_PINNED)
+   ttm_bo_pin(i915_gem_to_ttm(obj));
i915_gem_object_unlock(obj);
 
return 0;
-- 
2.25.1

[Intel-gfx] [PATCH v6 08/10] drm/i915: allow memory region creators to alloc and free the region

2022-06-17 Thread Robert Beckett

add callbacks for alloc and free.
this allows region creators to allocate any extra storage they may
require.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_memory_region.c | 16 +---
 drivers/gpu/drm/i915/intel_memory_region.h |  2 ++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c 
b/drivers/gpu/drm/i915/intel_memory_region.c
index e38d2db1c3e3..3da07a712f90 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -231,7 +231,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
struct intel_memory_region *mem;
int err;
 
-   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+   if (ops->alloc)
+   mem = ops->alloc();
+   else
+   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return ERR_PTR(-ENOMEM);
 
@@ -265,7 +268,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
if (mem->ops->release)
mem->ops->release(mem);
 err_free:
-   kfree(mem);
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
+   kfree(mem);
return ERR_PTR(err);
 }
 
@@ -288,7 +294,11 @@ void intel_memory_region_destroy(struct 
intel_memory_region *mem)
 
GEM_WARN_ON(!list_empty_careful(&mem->objects.list));
mutex_destroy(&mem->objects.lock);
-   if (!ret)
+   if (ret)
+   return;
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
kfree(mem);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h 
b/drivers/gpu/drm/i915/intel_memory_region.h
index 3d8378c1b447..048955b5429f 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -61,6 +61,8 @@ struct intel_memory_region_ops {
   resource_size_t size,
   resource_size_t page_size,
   unsigned int flags);
+   struct intel_memory_region *(*alloc)(void);
+   void (*free)(struct intel_memory_region *mem);
 };
 
 struct intel_memory_region {
-- 
2.25.1

[Intel-gfx] [PATCH v6 07/10] drm/i915: ttm move/clear logic fix

2022-06-17 Thread Robert Beckett

ttm managed buffers start off with system resource definitions and ttm_tt
tracking structures allocated (though unpopulated).
currently this prevents clearing of buffers on first move to desired
placements.

The desired behaviour is to clear user allocated buffers and any kernel
buffers that specifically requests it only.
Make the logic match the desired behaviour.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 22 +++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 81c67ca9edda..a3f8fc056dbc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,7 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include "drm/ttm/ttm_tt.h"
 #include 
 
 #include "i915_deps.h"
@@ -476,6 +477,25 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
return fence;
 }
 
+static bool
+allow_clear(struct drm_i915_gem_object *obj, struct ttm_tt *ttm, struct 
ttm_resource *dst_mem)
+{
+   /* never clear stolen */
+   if (dst_mem->mem_type == I915_PL_STOLEN)
+   return false;
+   /*
+* we want to clear user buffers and any kernel buffers
+* that specifically request clearing.
+*/
+   if (obj->flags & I915_BO_ALLOC_USER)
+   return true;
+
+   if (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)
+   return true;
+
+   return false;
+}
+
 /**
  * i915_ttm_move - The TTM move callback used by i915.
  * @bo: The buffer object.
@@ -526,7 +546,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
return PTR_ERR(dst_rsgt);
 
clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || 
!ttm_tt_is_populated(ttm));
-   if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+   if (!clear || allow_clear(obj, ttm, dst_mem)) {
struct i915_deps deps;
 
i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | 
__GFP_NOWARN);
-- 
2.25.1

[Intel-gfx] [PATCH v6 06/10] drm/i915: sanitize mem_flags for stolen buffers

2022-06-17 Thread Robert Beckett

Stolen regions are not page backed or considered iomem.
Prevent flags indicating such.
This correctly prevents stolen buffers from attempting to directly map
them.

See i915_gem_object_has_struct_page() and i915_gem_object_has_iomem()
usage for where it would break otherwise.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 675e9ab30396..81c67ca9edda 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
+#include "gem/i915_gem_stolen.h"
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
@@ -124,8 +125,9 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
 
obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 
-   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
-   I915_BO_FLAG_STRUCT_PAGE;
+   if (!i915_gem_object_is_stolen(obj))
+   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
+   I915_BO_FLAG_STRUCT_PAGE;
 
if (!obj->ttm.cache_level_override) {
cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
-- 
2.25.1

[Intel-gfx] [PATCH v6 05/10] drm/i915: instantiate ttm ranger manager for stolen memory

2022-06-17 Thread Robert Beckett

prepare for ttm based stolen region by using ttm range manager
as the resource manager for stolen region.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  6 ++--
 drivers/gpu/drm/i915/intel_region_ttm.c  | 31 +++-
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 40249fa28a7a..675e9ab30396 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -60,11 +60,13 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
 
/* There's some room for optimization here... */
-   GEM_BUG_ON(ttm_mem_type != I915_PL_SYSTEM &&
-  ttm_mem_type < I915_PL_LMEM0);
+   GEM_BUG_ON(ttm_mem_type == I915_PL_GGTT);
+
if (ttm_mem_type == I915_PL_SYSTEM)
return intel_memory_region_lookup(i915, INTEL_MEMORY_SYSTEM,
  0);
+   if (ttm_mem_type == I915_PL_STOLEN)
+   return i915->mm.stolen_region;
 
return intel_memory_region_lookup(i915, INTEL_MEMORY_LOCAL,
  ttm_mem_type - I915_PL_LMEM0);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index fd2ecfdd8fa1..694e9acb69e2 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -54,7 +54,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private 
*dev_priv)
 
 /*
  * Map the i915 memory regions to TTM memory types. We use the
- * driver-private types for now, reserving TTM_PL_VRAM for stolen
+ * driver-private types for now, reserving I915_PL_STOLEN for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
 int intel_region_to_ttm_type(const struct intel_memory_region *mem)
@@ -63,11 +63,17 @@ int intel_region_to_ttm_type(const struct 
intel_memory_region *mem)
 
GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
   mem->type != INTEL_MEMORY_MOCK &&
-  mem->type != INTEL_MEMORY_SYSTEM);
+  mem->type != INTEL_MEMORY_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
if (mem->type == INTEL_MEMORY_SYSTEM)
return TTM_PL_SYSTEM;
 
+   if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+   mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+   return I915_PL_STOLEN;
+
type = mem->instance + TTM_PL_PRIV;
GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -91,10 +97,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
int mem_type = intel_region_to_ttm_type(mem);
int ret;
 
-   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
- resource_size(&mem->region),
- mem->io_size,
- mem->min_page_size, PAGE_SIZE);
+   if (mem_type == I915_PL_STOLEN) {
+   ret = ttm_range_man_init(bdev, mem_type, false,
+resource_size(&mem->region) >> 
PAGE_SHIFT);
+   mem->is_range_manager = true;
+   } else {
+   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+ resource_size(&mem->region),
+ mem->io_size,
+ mem->min_page_size, PAGE_SIZE);
+   }
if (ret)
return ret;
 
@@ -114,6 +126,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
struct ttm_resource_manager *man = mem->region_private;
+   int mem_type = intel_region_to_ttm_type(mem);
int ret = -EBUSY;
int count;
 
@@ -144,8 +157,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
if (ret || !man)
return ret;
 
-   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
- intel_region_to_ttm_type(mem));
+   if (mem_type == I915_PL_STOLEN)
+   ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+   else
+   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
GEM_WARN_ON(ret);
mem->region_private = NULL;
 
-- 
2.25.1

[Intel-gfx] [PATCH v6 10/10] drm/i915: stolen memory use ttm backend

2022-06-17 Thread Robert Beckett

refactor stolen memory region to use ttm.
this necessitates using ttm resources to track reserved stolen regions
instead of drm_mm_nodes.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 440 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_region_ttm.c   |  42 +-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |   3 +-
 13 files changed, 294 insertions(+), 342 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c 
b/drivers/gpu/drm/i915/display/intel_fbc.c
index 8b807284cde1..6f3afac5e8c9 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -38,6 +38,7 @@
  * forcibly disable it to allow proper screen updates.
  */
 
+#include "gem/i915_gem_stolen.h"
 #include 
 
 #include 
@@ -51,6 +52,7 @@
 #include "intel_display_types.h"
 #include "intel_fbc.h"
 #include "intel_frontbuffer.h"
+#include "gem/i915_gem_region.h"
 
 #define for_each_fbc_id(__dev_priv, __fbc_id) \
for ((__fbc_id) = INTEL_FBC_A; (__fbc_id) < I915_MAX_FBCS; 
(__fbc_id)++) \
@@ -92,8 +94,8 @@ struct intel_fbc {
struct mutex lock;
unsigned int busy_bits;
 
-   struct drm_mm_node compressed_fb;
-   struct drm_mm_node compressed_llb;
+   struct ttm_resource *compressed_fb;
+   struct ttm_resource *compressed_llb;
 
enum intel_fbc_id id;
 
@@ -331,16 +333,20 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
+   u64 llb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_llb);
 
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   GEM_BUG_ON(llb_offset == I915_BO_INVALID_OFFSET);
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_fb.start, U32_MAX));
+fb_offset, U32_MAX));
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_llb.start, U32_MAX));
+llb_offset, U32_MAX));
 
intel_de_write(i915, FBC_CFB_BASE,
-  i915->dsm.start + fbc->compressed_fb.start);
+  i915->dsm.start + fb_offset);
intel_de_write(i915, FBC_LL_BASE,
-  i915->dsm.start + fbc->compressed_llb.start);
+  i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -448,8 +454,10 @@ static bool g4x_fbc_is_compressing(struct intel_fbc *fbc)
 static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, DPFC_CB_BASE, fb_offset);
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -499,8 +507,10 @@ static bool ilk_fbc_is_compressing(struct intel_fbc *fbc)
 static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), 
fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fb_offset);
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -744,21 +754,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
struct drm_i915_private *i915 = fbc->i915;
u64 end = intel_fbc_stolen_end(i915);
-   int ret, limit = min_limit;
+   int limit = min_limit;
+   struct ttm_resource *res;
 
size /= limit;
 
/* Try to over-allocate to reduce reallocations and fragmentation. */
-   ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-  size <<= 1, 4096, 0, end);
-   if (ret == 0)
+   res = i915_gem_stolen_reserve_range(i915, size <<= 1, 0,

[Intel-gfx] [PATCH v7 00/10] drm/i915: ttm for stolen

2022-06-20 Thread Robert Beckett

This series refactors i915's stolen memory region to use ttm.

v2: handle disabled stolen similar to legacy version.
relying on ttm to fail allocs works fine, but is dmesg noisy and causes 
testing
dmesg warning regressions.

v3: rebase to latest drm-tip.
fix v2 code refactor which could leave a buffer pinned.
locally passes fftl again now.

v4: - Allow memory regions creators to do allocation. Allows stolen region 
to track
  it's own reservations.
- Pre-reserve first page of stolen mem (add back 
WaSkipStolenMemoryFirstPage:bdw+)
- Improve commit descritpion for "drm/i915: sanitize mem_flags for 
stolen buffers"
- replace i915_gem_object_pin_pages_unlocked() call with manual locking 
and pinning.
  this avoids ww ctx class reuse during context creation -> ring vma 
obj alloc.

v5: - detect both types of stolen as stolen buffers in
  "drm/i915: sanitize mem_flags for stolen buffers"
- in stolen_object_init limit page size to mem region minimum.
  The range allocator expects the page_size to define the
  alignment

v6: - Share first 4 patches from ttm for internal series as generic
  i915 ttm fixes
- Drop patch 4 from v5. We don't need separate object ops just
  to satisfy test interfaces. The tests have now been fixed via
  checking whether the memory region is private to decide
  whether to mmap
- Add new buffer pin alloc flag to allow creation of buffers in
  their final ttm placement instead of deferring until
  get_pages. This fixes legacy fallback paths for buffer
  allocations during stolen memory pressure.

v7: - fix mock_region_get_pages() to correctly handle I915_BO_INVALID_OFFSET

Robert Beckett (10):
  drm/i915/ttm: dont trample cache_level overrides during ttm move
  drm/i915: limit ttm to dma32 for i965G[M]
  drm/i915/ttm: only trust snooping for dgfx when deciding default
cache_level
  drm/i915/gem: selftest should not attempt mmap of private regions
  drm/i915: instantiate ttm ranger manager for stolen memory
  drm/i915: sanitize mem_flags for stolen buffers
  drm/i915: ttm move/clear logic fix
  drm/i915: allow memory region creators to alloc and free the region
  drm/i915/ttm: add buffer pin on alloc flag
  drm/i915: stolen memory use ttm backend

 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c|   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 440 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  29 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  47 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|   3 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_memory_region.c|  16 +-
 drivers/gpu/drm/i915/intel_memory_region.h|   2 +
 drivers/gpu/drm/i915/intel_region_ttm.c   |  80 +++-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 18 files changed, 423 insertions(+), 369 deletions(-)

-- 
2.25.1

[Intel-gfx] [PATCH v7 02/10] drm/i915: limit ttm to dma32 for i965G[M]

2022-06-20 Thread Robert Beckett

i965G[M] cannot relocate objects above 4GiB.
Ensure ttm uses dma32 on these systems.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index 62ff77445b01..fd2ecfdd8fa1 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -32,10 +32,15 @@
 int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 {
struct drm_device *drm = &dev_priv->drm;
+   bool use_dma32 = false;
+
+   /* i965g[m] cannot relocate objects above 4GiB. */
+   if (IS_I965GM(dev_priv) || IS_I965G(dev_priv))
+   use_dma32 = true;
 
return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
   drm->dev, drm->anon_inode->i_mapping,
-  drm->vma_offset_manager, false, false);
+  drm->vma_offset_manager, false, use_dma32);
 }
 
 /**
-- 
2.25.1

[Intel-gfx] [PATCH v7 01/10] drm/i915/ttm: dont trample cache_level overrides during ttm move

2022-06-20 Thread Robert Beckett

Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 9 ++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct 
drm_i915_gem_object *obj,
struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
obj->cache_level = cache_level;
+   obj->ttm.cache_level_override = true;
 
if (cache_level != I915_CACHE_NONE)
obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
struct i915_gem_object_page_iter get_io_page;
struct drm_i915_gem_object *backup;
bool created:1;
+   bool cache_level_override:1;
} ttm;
 
/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_gem_object_init_memory_region(obj, mem);
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
+   obj->ttm.cache_level_override = false;
i915_gem_object_unlock(obj);
 
return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
I915_BO_FLAG_STRUCT_PAGE;
 
-   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-  bo->ttm);
-   i915_gem_object_set_cache_coherency(obj, cache_level);
+   if (!obj->ttm.cache_level_override) {
+   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+  bo->resource, bo->ttm);
+   i915_gem_object_set_cache_coherency(obj, cache_level);
+   obj->ttm.cache_level_override = false;
+   }
 }
 
 /**
-- 
2.25.1

[Intel-gfx] [PATCH v7 05/10] drm/i915: instantiate ttm ranger manager for stolen memory

2022-06-20 Thread Robert Beckett

prepare for ttm based stolen region by using ttm range manager
as the resource manager for stolen region.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  6 ++--
 drivers/gpu/drm/i915/intel_region_ttm.c  | 31 +++-
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 40249fa28a7a..675e9ab30396 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -60,11 +60,13 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
 
/* There's some room for optimization here... */
-   GEM_BUG_ON(ttm_mem_type != I915_PL_SYSTEM &&
-  ttm_mem_type < I915_PL_LMEM0);
+   GEM_BUG_ON(ttm_mem_type == I915_PL_GGTT);
+
if (ttm_mem_type == I915_PL_SYSTEM)
return intel_memory_region_lookup(i915, INTEL_MEMORY_SYSTEM,
  0);
+   if (ttm_mem_type == I915_PL_STOLEN)
+   return i915->mm.stolen_region;
 
return intel_memory_region_lookup(i915, INTEL_MEMORY_LOCAL,
  ttm_mem_type - I915_PL_LMEM0);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index fd2ecfdd8fa1..694e9acb69e2 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -54,7 +54,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private 
*dev_priv)
 
 /*
  * Map the i915 memory regions to TTM memory types. We use the
- * driver-private types for now, reserving TTM_PL_VRAM for stolen
+ * driver-private types for now, reserving I915_PL_STOLEN for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
 int intel_region_to_ttm_type(const struct intel_memory_region *mem)
@@ -63,11 +63,17 @@ int intel_region_to_ttm_type(const struct 
intel_memory_region *mem)
 
GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
   mem->type != INTEL_MEMORY_MOCK &&
-  mem->type != INTEL_MEMORY_SYSTEM);
+  mem->type != INTEL_MEMORY_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
if (mem->type == INTEL_MEMORY_SYSTEM)
return TTM_PL_SYSTEM;
 
+   if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+   mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+   return I915_PL_STOLEN;
+
type = mem->instance + TTM_PL_PRIV;
GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -91,10 +97,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
int mem_type = intel_region_to_ttm_type(mem);
int ret;
 
-   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
- resource_size(&mem->region),
- mem->io_size,
- mem->min_page_size, PAGE_SIZE);
+   if (mem_type == I915_PL_STOLEN) {
+   ret = ttm_range_man_init(bdev, mem_type, false,
+resource_size(&mem->region) >> 
PAGE_SHIFT);
+   mem->is_range_manager = true;
+   } else {
+   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+ resource_size(&mem->region),
+ mem->io_size,
+ mem->min_page_size, PAGE_SIZE);
+   }
if (ret)
return ret;
 
@@ -114,6 +126,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
struct ttm_resource_manager *man = mem->region_private;
+   int mem_type = intel_region_to_ttm_type(mem);
int ret = -EBUSY;
int count;
 
@@ -144,8 +157,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
if (ret || !man)
return ret;
 
-   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
- intel_region_to_ttm_type(mem));
+   if (mem_type == I915_PL_STOLEN)
+   ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+   else
+   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
GEM_WARN_ON(ret);
mem->region_private = NULL;
 
-- 
2.25.1

[Intel-gfx] [PATCH v7 06/10] drm/i915: sanitize mem_flags for stolen buffers

2022-06-20 Thread Robert Beckett

Stolen regions are not page backed or considered iomem.
Prevent flags indicating such.
This correctly prevents stolen buffers from attempting to directly map
them.

See i915_gem_object_has_struct_page() and i915_gem_object_has_iomem()
usage for where it would break otherwise.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 675e9ab30396..81c67ca9edda 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
+#include "gem/i915_gem_stolen.h"
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
@@ -124,8 +125,9 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
 
obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 
-   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
-   I915_BO_FLAG_STRUCT_PAGE;
+   if (!i915_gem_object_is_stolen(obj))
+   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
+   I915_BO_FLAG_STRUCT_PAGE;
 
if (!obj->ttm.cache_level_override) {
cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
-- 
2.25.1

[Intel-gfx] [PATCH v7 03/10] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level

2022-06-20 Thread Robert Beckett

By default i915_ttm_cache_level() decides I915_CACHE_LLC if HAS_SNOOP.
This is divergent from existing backends code which only considers
HAS_LLC.
Testing shows that trusting snooping on gen5- is unreliable and bsw via
ggtt mappings, so limit DGFX for now and maintain previous behaviour.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 4c1de0b4a10f..40249fa28a7a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -46,7 +46,9 @@ static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 struct ttm_tt *ttm)
 {
-   return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+   bool can_snoop = HAS_SNOOP(i915) && IS_DGFX(i915);
+
+   return ((HAS_LLC(i915) || can_snoop) &&
!i915_ttm_gtt_binds_lmem(res) &&
ttm->caching == ttm_cached) ? I915_CACHE_LLC :
I915_CACHE_NONE;
-- 
2.25.1

[Intel-gfx] [PATCH v7 07/10] drm/i915: ttm move/clear logic fix

2022-06-20 Thread Robert Beckett

ttm managed buffers start off with system resource definitions and ttm_tt
tracking structures allocated (though unpopulated).
currently this prevents clearing of buffers on first move to desired
placements.

The desired behaviour is to clear user allocated buffers and any kernel
buffers that specifically requests it only.
Make the logic match the desired behaviour.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 22 +++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 81c67ca9edda..a3f8fc056dbc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,7 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include "drm/ttm/ttm_tt.h"
 #include 
 
 #include "i915_deps.h"
@@ -476,6 +477,25 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
return fence;
 }
 
+static bool
+allow_clear(struct drm_i915_gem_object *obj, struct ttm_tt *ttm, struct 
ttm_resource *dst_mem)
+{
+   /* never clear stolen */
+   if (dst_mem->mem_type == I915_PL_STOLEN)
+   return false;
+   /*
+* we want to clear user buffers and any kernel buffers
+* that specifically request clearing.
+*/
+   if (obj->flags & I915_BO_ALLOC_USER)
+   return true;
+
+   if (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)
+   return true;
+
+   return false;
+}
+
 /**
  * i915_ttm_move - The TTM move callback used by i915.
  * @bo: The buffer object.
@@ -526,7 +546,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
return PTR_ERR(dst_rsgt);
 
clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || 
!ttm_tt_is_populated(ttm));
-   if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+   if (!clear || allow_clear(obj, ttm, dst_mem)) {
struct i915_deps deps;
 
i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | 
__GFP_NOWARN);
-- 
2.25.1

[Intel-gfx] [PATCH v7 08/10] drm/i915: allow memory region creators to alloc and free the region

2022-06-20 Thread Robert Beckett

add callbacks for alloc and free.
this allows region creators to allocate any extra storage they may
require.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_memory_region.c | 16 +---
 drivers/gpu/drm/i915/intel_memory_region.h |  2 ++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c 
b/drivers/gpu/drm/i915/intel_memory_region.c
index e38d2db1c3e3..3da07a712f90 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -231,7 +231,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
struct intel_memory_region *mem;
int err;
 
-   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+   if (ops->alloc)
+   mem = ops->alloc();
+   else
+   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return ERR_PTR(-ENOMEM);
 
@@ -265,7 +268,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
if (mem->ops->release)
mem->ops->release(mem);
 err_free:
-   kfree(mem);
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
+   kfree(mem);
return ERR_PTR(err);
 }
 
@@ -288,7 +294,11 @@ void intel_memory_region_destroy(struct 
intel_memory_region *mem)
 
GEM_WARN_ON(!list_empty_careful(&mem->objects.list));
mutex_destroy(&mem->objects.lock);
-   if (!ret)
+   if (ret)
+   return;
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
kfree(mem);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h 
b/drivers/gpu/drm/i915/intel_memory_region.h
index 3d8378c1b447..048955b5429f 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -61,6 +61,8 @@ struct intel_memory_region_ops {
   resource_size_t size,
   resource_size_t page_size,
   unsigned int flags);
+   struct intel_memory_region *(*alloc)(void);
+   void (*free)(struct intel_memory_region *mem);
 };
 
 struct intel_memory_region {
-- 
2.25.1

[Intel-gfx] [PATCH v7 09/10] drm/i915/ttm: add buffer pin on alloc flag

2022-06-20 Thread Robert Beckett

For situations where allocations need to fail on alloc instead of
delayed get_pages, add a new alloc flag to pin the ttm bo.
This makes sure that the resource has been allocated during buffer
creation, allowing it to fail with an error if the placement is
exhausted.
This allows existing fallback options for stolen backend allocation like
create_ring_vma to work as expected.

Signed-off-by: Robert Beckett 
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 25 ++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 6632ed52e919..07bc11247a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -325,17 +325,20 @@ struct drm_i915_gem_object {
  * dealing with userspace objects the CPU fault handler is free to ignore this.
  */
 #define I915_BO_ALLOC_GPU_ONLY   BIT(6)
+/* object should be pinned in destination region from allocation */
+#define I915_BO_ALLOC_PINNED BIT(7)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
 I915_BO_ALLOC_USER | \
 I915_BO_ALLOC_PM_VOLATILE | \
 I915_BO_ALLOC_PM_EARLY | \
-I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY  BIT(7)
-#define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED BIT(9)
-#define I915_BO_WAS_BOUND_BIT 10
+I915_BO_ALLOC_GPU_ONLY | \
+I915_BO_ALLOC_PINNED)
+#define I915_BO_READONLY  BIT(8)
+#define I915_TILING_QUIRK_BIT 9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED BIT(10)
+#define I915_BO_WAS_BOUND_BIT 11
/**
 * @mem_flags - Mutable placement-related flags
 *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27d59639177f..bb988608296d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -998,6 +998,13 @@ static void i915_ttm_delayed_free(struct 
drm_i915_gem_object *obj)
 {
GEM_BUG_ON(!obj->ttm.created);
 
+   /* stolen objects are pinned for lifetime. Unpin before putting */
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   ttm_bo_reserve(i915_gem_to_ttm(obj), true, false, NULL);
+   ttm_bo_unpin(i915_gem_to_ttm(obj));
+   ttm_bo_unreserve(i915_gem_to_ttm(obj));
+   }
+
ttm_bo_put(i915_gem_to_ttm(obj));
 }
 
@@ -1193,6 +1200,9 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
.no_wait_gpu = false,
};
enum ttm_bo_type bo_type;
+   struct ttm_place _place;
+   struct ttm_placement _placement;
+   struct ttm_placement *placement;
int ret;
 
drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,6 +1232,17 @@ int __i915_gem_ttm_object_init(struct 
intel_memory_region *mem,
 */
i915_gem_object_make_unshrinkable(obj);
 
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   i915_ttm_place_from_region(mem, &_place, obj->bo_offset,
+  obj->base.size, obj->flags);
+   _placement.num_placement = 1;
+   _placement.placement = &_place;
+   _placement.num_busy_placement = 0;
+   _placement.busy_placement = NULL;
+   placement = &_placement;
+   } else {
+   placement = &i915_sys_placement;
+   }
/*
 * If this function fails, it will call the destructor, but
 * our caller still owns the object. So no freeing in the
@@ -1230,7 +1251,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
 * until successful initialization.
 */
ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
-  bo_type, &i915_sys_placement,
+  bo_type, placement,
   page_size >> PAGE_SHIFT,
   &ctx, NULL, NULL, i915_ttm_bo_destroy);
if (ret)
@@ -1242,6 +1263,8 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
obj->ttm.cache_level_override = false;
+   if (obj->flags & I915_BO_ALLOC_PINNED)
+   ttm_bo_pin(i915_gem_to_ttm(obj));
i915_gem_object_unlock(obj);
 
return 0;
-- 
2.25.1

[Intel-gfx] [PATCH v7 04/10] drm/i915/gem: selftest should not attempt mmap of private regions

2022-06-20 Thread Robert Beckett

During testing make can_mmap consider whether the region is private.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum 
i915_mmap_type type)
struct drm_i915_private *i915 = to_i915(obj->base.dev);
bool no_map;
 
+   if (obj->mm.region && obj->mm.region->private)
+   return false;
+
if (obj->ops->mmap_offset)
return type == I915_MMAP_TYPE_FIXED;
else if (type == I915_MMAP_TYPE_FIXED)
-- 
2.25.1

[Intel-gfx] [PATCH v7 10/10] drm/i915: stolen memory use ttm backend

2022-06-20 Thread Robert Beckett

refactor stolen memory region to use ttm.
this necessitates using ttm resources to track reserved stolen regions
instead of drm_mm_nodes.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 440 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_region_ttm.c   |  42 +-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 13 files changed, 303 insertions(+), 342 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c 
b/drivers/gpu/drm/i915/display/intel_fbc.c
index 8b807284cde1..6f3afac5e8c9 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -38,6 +38,7 @@
  * forcibly disable it to allow proper screen updates.
  */
 
+#include "gem/i915_gem_stolen.h"
 #include 
 
 #include 
@@ -51,6 +52,7 @@
 #include "intel_display_types.h"
 #include "intel_fbc.h"
 #include "intel_frontbuffer.h"
+#include "gem/i915_gem_region.h"
 
 #define for_each_fbc_id(__dev_priv, __fbc_id) \
for ((__fbc_id) = INTEL_FBC_A; (__fbc_id) < I915_MAX_FBCS; 
(__fbc_id)++) \
@@ -92,8 +94,8 @@ struct intel_fbc {
struct mutex lock;
unsigned int busy_bits;
 
-   struct drm_mm_node compressed_fb;
-   struct drm_mm_node compressed_llb;
+   struct ttm_resource *compressed_fb;
+   struct ttm_resource *compressed_llb;
 
enum intel_fbc_id id;
 
@@ -331,16 +333,20 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
+   u64 llb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_llb);
 
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   GEM_BUG_ON(llb_offset == I915_BO_INVALID_OFFSET);
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_fb.start, U32_MAX));
+fb_offset, U32_MAX));
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_llb.start, U32_MAX));
+llb_offset, U32_MAX));
 
intel_de_write(i915, FBC_CFB_BASE,
-  i915->dsm.start + fbc->compressed_fb.start);
+  i915->dsm.start + fb_offset);
intel_de_write(i915, FBC_LL_BASE,
-  i915->dsm.start + fbc->compressed_llb.start);
+  i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -448,8 +454,10 @@ static bool g4x_fbc_is_compressing(struct intel_fbc *fbc)
 static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, DPFC_CB_BASE, fb_offset);
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -499,8 +507,10 @@ static bool ilk_fbc_is_compressing(struct intel_fbc *fbc)
 static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), 
fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fb_offset);
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -744,21 +754,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
struct drm_i915_private *i915 = fbc->i915;
u64 end = intel_fbc_stolen_end(i915);
-   int ret, limit = min_limit;
+   int limit = min_limit;
+   struct ttm_resource *res;
 
size /= limit;
 
/* Try to over-allocate to reduce reallocations and fragmentation. */
-   ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-  size <<= 1, 4096, 0, end);
-   if (ret == 0)
+   res = i915_gem_stolen_reserve_range(i915, size <<= 1, 0,

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

2022-06-21 Thread Robert Beckett





On 21/06/2022 18:37, Patchwork wrote:

*Patch Details*
*Series:*   drm/i915: ttm for stolen (rev5)
*URL:*	https://patchwork.freedesktop.org/series/101396/ 


*State:*failure
*Details:* 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 




  CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5


Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5 absolutely need to be
verified manually.

If you think the reported changes have nothing to do with the changes
introduced in Patchwork_101396v5, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.

External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html



Participating hosts (40 -> 41)

Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus


Possible new issues

Here are the unknown changes that may have been introduced in 
Patchwork_101396v5:



  IGT changes


Possible regressions

  * igt@i915_selftest@live@reset:
  o bat-adlp-4: PASS


-> DMESG-FAIL





I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and occasionally on 
bat-adlp-6.


Should bat-adlp-4 be considered an unreliable machine like bat-adlp-6 is 
for now?


Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude low 
pages (128KiB) of stolen from use"


could this be an indication that maybe the original issue is worse on 
adlp machines?
I have only ever seen page page 135 or 136 clobbered across many runs 
via trybot, so it looks fairly consistent.

Though excluding the use of over 540K of stolen might be too severe.




Suppressed

The following results come from untrusted machines, tests, or statuses.
They do not affect the overall result.

  * igt@kms_busy@basic@flip:
  o {bat-dg2-9}: NOTRUN -> DMESG-WARN




Known issues

Here are the changes found in Patchwork_101396v5 that come from known 
issues:



  IGT changes


Issues hit

  *

igt@gem_huc_copy@huc-copy:

  o fi-icl-u2: NOTRUN -> SKIP


(i915#2190 )
  *

igt@gem_lmem_swapping@random-engines:

  o fi-icl-u2: NOTRUN -> SKIP


(i915#4613
) +3
similar issues
  *

igt@i915_pm_rpm@module-reload:

  o bat-adlp-4: PASS


-> DMESG-WARN


(i915#1888
 /
i915#3576 )
  *

igt@i915_selftest@live@hangcheck:

  o bat-dg1-6: PASS


-> DMESG-FAIL


(i915#4494
 /
i915#4957 )
  *

igt@i915_suspend@basic-s3-without-i915:

  o fi-icl-u2: NOTRUN -> SKIP


(i915#5903 )
  *

igt@kms_busy@basic@flip:

  o bat-adlp-4: PASS


-> DMESG-WARN


(i915#3576 )
  *

igt@kms_chamelium@common-hpd-after-suspend:

  o

fi-hsw-g3258: NOTRUN -> SKIP

[Intel-gfx] [PATCH v8 00/10] drm/i915: ttm for stolen

2022-06-21 Thread Robert Beckett

This series refactors i915's stolen memory region to use ttm.

v2: handle disabled stolen similar to legacy version.
relying on ttm to fail allocs works fine, but is dmesg noisy and causes 
testing
dmesg warning regressions.

v3: rebase to latest drm-tip.
fix v2 code refactor which could leave a buffer pinned.
locally passes fftl again now.

v4: - Allow memory regions creators to do allocation. Allows stolen region 
to track
  it's own reservations.
- Pre-reserve first page of stolen mem (add back 
WaSkipStolenMemoryFirstPage:bdw+)
- Improve commit descritpion for "drm/i915: sanitize mem_flags for 
stolen buffers"
- replace i915_gem_object_pin_pages_unlocked() call with manual locking 
and pinning.
  this avoids ww ctx class reuse during context creation -> ring vma 
obj alloc.

v5: - detect both types of stolen as stolen buffers in
  "drm/i915: sanitize mem_flags for stolen buffers"
- in stolen_object_init limit page size to mem region minimum.
  The range allocator expects the page_size to define the
  alignment

v6: - Share first 4 patches from ttm for internal series as generic
  i915 ttm fixes
- Drop patch 4 from v5. We don't need separate object ops just
  to satisfy test interfaces. The tests have now been fixed via
  checking whether the memory region is private to decide
  whether to mmap
- Add new buffer pin alloc flag to allow creation of buffers in
  their final ttm placement instead of deferring until
  get_pages. This fixes legacy fallback paths for buffer
  allocations during stolen memory pressure.

v7: - fix mock_region_get_pages() to correctly handle I915_BO_INVALID_OFFSET

v8: - Reserve I915_GEM_STOLEN_BIAS area from stolen

Robert Beckett (10):
  drm/i915/ttm: dont trample cache_level overrides during ttm move
  drm/i915: limit ttm to dma32 for i965G[M]
  drm/i915/ttm: only trust snooping for dgfx when deciding default
cache_level
  drm/i915/gem: selftest should not attempt mmap of private regions
  drm/i915: instantiate ttm ranger manager for stolen memory
  drm/i915: sanitize mem_flags for stolen buffers
  drm/i915: ttm move/clear logic fix
  drm/i915: allow memory region creators to alloc and free the region
  drm/i915/ttm: add buffer pin on alloc flag
  drm/i915: stolen memory use ttm backend

 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c|   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 441 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  29 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  47 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|   3 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_memory_region.c|  16 +-
 drivers/gpu/drm/i915/intel_memory_region.h|   2 +
 drivers/gpu/drm/i915/intel_region_ttm.c   |  80 +++-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 18 files changed, 424 insertions(+), 369 deletions(-)

-- 
2.25.1

[Intel-gfx] [PATCH v8 03/10] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level

2022-06-21 Thread Robert Beckett

By default i915_ttm_cache_level() decides I915_CACHE_LLC if HAS_SNOOP.
This is divergent from existing backends code which only considers
HAS_LLC.
Testing shows that trusting snooping on gen5- is unreliable and bsw via
ggtt mappings, so limit DGFX for now and maintain previous behaviour.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 4c1de0b4a10f..40249fa28a7a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -46,7 +46,9 @@ static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 struct ttm_tt *ttm)
 {
-   return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+   bool can_snoop = HAS_SNOOP(i915) && IS_DGFX(i915);
+
+   return ((HAS_LLC(i915) || can_snoop) &&
!i915_ttm_gtt_binds_lmem(res) &&
ttm->caching == ttm_cached) ? I915_CACHE_LLC :
I915_CACHE_NONE;
-- 
2.25.1

[Intel-gfx] [PATCH v8 01/10] drm/i915/ttm: dont trample cache_level overrides during ttm move

2022-06-21 Thread Robert Beckett

Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 9 ++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct 
drm_i915_gem_object *obj,
struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
obj->cache_level = cache_level;
+   obj->ttm.cache_level_override = true;
 
if (cache_level != I915_CACHE_NONE)
obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
struct i915_gem_object_page_iter get_io_page;
struct drm_i915_gem_object *backup;
bool created:1;
+   bool cache_level_override:1;
} ttm;
 
/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_gem_object_init_memory_region(obj, mem);
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
+   obj->ttm.cache_level_override = false;
i915_gem_object_unlock(obj);
 
return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
I915_BO_FLAG_STRUCT_PAGE;
 
-   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-  bo->ttm);
-   i915_gem_object_set_cache_coherency(obj, cache_level);
+   if (!obj->ttm.cache_level_override) {
+   cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+  bo->resource, bo->ttm);
+   i915_gem_object_set_cache_coherency(obj, cache_level);
+   obj->ttm.cache_level_override = false;
+   }
 }
 
 /**
-- 
2.25.1

[Intel-gfx] [PATCH v8 05/10] drm/i915: instantiate ttm ranger manager for stolen memory

2022-06-21 Thread Robert Beckett

prepare for ttm based stolen region by using ttm range manager
as the resource manager for stolen region.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  6 ++--
 drivers/gpu/drm/i915/intel_region_ttm.c  | 31 +++-
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 40249fa28a7a..675e9ab30396 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -60,11 +60,13 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
 
/* There's some room for optimization here... */
-   GEM_BUG_ON(ttm_mem_type != I915_PL_SYSTEM &&
-  ttm_mem_type < I915_PL_LMEM0);
+   GEM_BUG_ON(ttm_mem_type == I915_PL_GGTT);
+
if (ttm_mem_type == I915_PL_SYSTEM)
return intel_memory_region_lookup(i915, INTEL_MEMORY_SYSTEM,
  0);
+   if (ttm_mem_type == I915_PL_STOLEN)
+   return i915->mm.stolen_region;
 
return intel_memory_region_lookup(i915, INTEL_MEMORY_LOCAL,
  ttm_mem_type - I915_PL_LMEM0);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index fd2ecfdd8fa1..694e9acb69e2 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -54,7 +54,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private 
*dev_priv)
 
 /*
  * Map the i915 memory regions to TTM memory types. We use the
- * driver-private types for now, reserving TTM_PL_VRAM for stolen
+ * driver-private types for now, reserving I915_PL_STOLEN for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
 int intel_region_to_ttm_type(const struct intel_memory_region *mem)
@@ -63,11 +63,17 @@ int intel_region_to_ttm_type(const struct 
intel_memory_region *mem)
 
GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
   mem->type != INTEL_MEMORY_MOCK &&
-  mem->type != INTEL_MEMORY_SYSTEM);
+  mem->type != INTEL_MEMORY_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+  mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
if (mem->type == INTEL_MEMORY_SYSTEM)
return TTM_PL_SYSTEM;
 
+   if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+   mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+   return I915_PL_STOLEN;
+
type = mem->instance + TTM_PL_PRIV;
GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -91,10 +97,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
int mem_type = intel_region_to_ttm_type(mem);
int ret;
 
-   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
- resource_size(&mem->region),
- mem->io_size,
- mem->min_page_size, PAGE_SIZE);
+   if (mem_type == I915_PL_STOLEN) {
+   ret = ttm_range_man_init(bdev, mem_type, false,
+resource_size(&mem->region) >> 
PAGE_SHIFT);
+   mem->is_range_manager = true;
+   } else {
+   ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+ resource_size(&mem->region),
+ mem->io_size,
+ mem->min_page_size, PAGE_SIZE);
+   }
if (ret)
return ret;
 
@@ -114,6 +126,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
struct ttm_resource_manager *man = mem->region_private;
+   int mem_type = intel_region_to_ttm_type(mem);
int ret = -EBUSY;
int count;
 
@@ -144,8 +157,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
if (ret || !man)
return ret;
 
-   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
- intel_region_to_ttm_type(mem));
+   if (mem_type == I915_PL_STOLEN)
+   ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+   else
+   ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
GEM_WARN_ON(ret);
mem->region_private = NULL;
 
-- 
2.25.1

[Intel-gfx] [PATCH v8 02/10] drm/i915: limit ttm to dma32 for i965G[M]

2022-06-21 Thread Robert Beckett

i965G[M] cannot relocate objects above 4GiB.
Ensure ttm uses dma32 on these systems.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index 62ff77445b01..fd2ecfdd8fa1 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -32,10 +32,15 @@
 int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 {
struct drm_device *drm = &dev_priv->drm;
+   bool use_dma32 = false;
+
+   /* i965g[m] cannot relocate objects above 4GiB. */
+   if (IS_I965GM(dev_priv) || IS_I965G(dev_priv))
+   use_dma32 = true;
 
return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
   drm->dev, drm->anon_inode->i_mapping,
-  drm->vma_offset_manager, false, false);
+  drm->vma_offset_manager, false, use_dma32);
 }
 
 /**
-- 
2.25.1

[Intel-gfx] [PATCH v8 04/10] drm/i915/gem: selftest should not attempt mmap of private regions

2022-06-21 Thread Robert Beckett

During testing make can_mmap consider whether the region is private.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum 
i915_mmap_type type)
struct drm_i915_private *i915 = to_i915(obj->base.dev);
bool no_map;
 
+   if (obj->mm.region && obj->mm.region->private)
+   return false;
+
if (obj->ops->mmap_offset)
return type == I915_MMAP_TYPE_FIXED;
else if (type == I915_MMAP_TYPE_FIXED)
-- 
2.25.1

[Intel-gfx] [PATCH v8 06/10] drm/i915: sanitize mem_flags for stolen buffers

2022-06-21 Thread Robert Beckett

Stolen regions are not page backed or considered iomem.
Prevent flags indicating such.
This correctly prevents stolen buffers from attempting to directly map
them.

See i915_gem_object_has_struct_page() and i915_gem_object_has_iomem()
usage for where it would break otherwise.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 675e9ab30396..81c67ca9edda 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
+#include "gem/i915_gem_stolen.h"
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
@@ -124,8 +125,9 @@ void i915_ttm_adjust_gem_after_move(struct 
drm_i915_gem_object *obj)
 
obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 
-   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
-   I915_BO_FLAG_STRUCT_PAGE;
+   if (!i915_gem_object_is_stolen(obj))
+   obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? 
I915_BO_FLAG_IOMEM :
+   I915_BO_FLAG_STRUCT_PAGE;
 
if (!obj->ttm.cache_level_override) {
cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
-- 
2.25.1

[Intel-gfx] [PATCH v8 07/10] drm/i915: ttm move/clear logic fix

2022-06-21 Thread Robert Beckett

ttm managed buffers start off with system resource definitions and ttm_tt
tracking structures allocated (though unpopulated).
currently this prevents clearing of buffers on first move to desired
placements.

The desired behaviour is to clear user allocated buffers and any kernel
buffers that specifically requests it only.
Make the logic match the desired behaviour.

Signed-off-by: Robert Beckett 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 22 +++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 81c67ca9edda..a3f8fc056dbc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,7 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include "drm/ttm/ttm_tt.h"
 #include 
 
 #include "i915_deps.h"
@@ -476,6 +477,25 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
return fence;
 }
 
+static bool
+allow_clear(struct drm_i915_gem_object *obj, struct ttm_tt *ttm, struct 
ttm_resource *dst_mem)
+{
+   /* never clear stolen */
+   if (dst_mem->mem_type == I915_PL_STOLEN)
+   return false;
+   /*
+* we want to clear user buffers and any kernel buffers
+* that specifically request clearing.
+*/
+   if (obj->flags & I915_BO_ALLOC_USER)
+   return true;
+
+   if (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)
+   return true;
+
+   return false;
+}
+
 /**
  * i915_ttm_move - The TTM move callback used by i915.
  * @bo: The buffer object.
@@ -526,7 +546,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
return PTR_ERR(dst_rsgt);
 
clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || 
!ttm_tt_is_populated(ttm));
-   if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+   if (!clear || allow_clear(obj, ttm, dst_mem)) {
struct i915_deps deps;
 
i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | 
__GFP_NOWARN);
-- 
2.25.1

[Intel-gfx] [PATCH v8 08/10] drm/i915: allow memory region creators to alloc and free the region

2022-06-21 Thread Robert Beckett

add callbacks for alloc and free.
this allows region creators to allocate any extra storage they may
require.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/intel_memory_region.c | 16 +---
 drivers/gpu/drm/i915/intel_memory_region.h |  2 ++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c 
b/drivers/gpu/drm/i915/intel_memory_region.c
index e38d2db1c3e3..3da07a712f90 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -231,7 +231,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
struct intel_memory_region *mem;
int err;
 
-   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+   if (ops->alloc)
+   mem = ops->alloc();
+   else
+   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return ERR_PTR(-ENOMEM);
 
@@ -265,7 +268,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
if (mem->ops->release)
mem->ops->release(mem);
 err_free:
-   kfree(mem);
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
+   kfree(mem);
return ERR_PTR(err);
 }
 
@@ -288,7 +294,11 @@ void intel_memory_region_destroy(struct 
intel_memory_region *mem)
 
GEM_WARN_ON(!list_empty_careful(&mem->objects.list));
mutex_destroy(&mem->objects.lock);
-   if (!ret)
+   if (ret)
+   return;
+   if (mem->ops->free)
+   mem->ops->free(mem);
+   else
kfree(mem);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h 
b/drivers/gpu/drm/i915/intel_memory_region.h
index 3d8378c1b447..048955b5429f 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -61,6 +61,8 @@ struct intel_memory_region_ops {
   resource_size_t size,
   resource_size_t page_size,
   unsigned int flags);
+   struct intel_memory_region *(*alloc)(void);
+   void (*free)(struct intel_memory_region *mem);
 };
 
 struct intel_memory_region {
-- 
2.25.1

[Intel-gfx] [PATCH v8 09/10] drm/i915/ttm: add buffer pin on alloc flag

2022-06-21 Thread Robert Beckett

For situations where allocations need to fail on alloc instead of
delayed get_pages, add a new alloc flag to pin the ttm bo.
This makes sure that the resource has been allocated during buffer
creation, allowing it to fail with an error if the placement is
exhausted.
This allows existing fallback options for stolen backend allocation like
create_ring_vma to work as expected.

Signed-off-by: Robert Beckett 
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 25 ++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 6632ed52e919..07bc11247a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -325,17 +325,20 @@ struct drm_i915_gem_object {
  * dealing with userspace objects the CPU fault handler is free to ignore this.
  */
 #define I915_BO_ALLOC_GPU_ONLY   BIT(6)
+/* object should be pinned in destination region from allocation */
+#define I915_BO_ALLOC_PINNED BIT(7)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
 I915_BO_ALLOC_USER | \
 I915_BO_ALLOC_PM_VOLATILE | \
 I915_BO_ALLOC_PM_EARLY | \
-I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY  BIT(7)
-#define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED BIT(9)
-#define I915_BO_WAS_BOUND_BIT 10
+I915_BO_ALLOC_GPU_ONLY | \
+I915_BO_ALLOC_PINNED)
+#define I915_BO_READONLY  BIT(8)
+#define I915_TILING_QUIRK_BIT 9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED BIT(10)
+#define I915_BO_WAS_BOUND_BIT 11
/**
 * @mem_flags - Mutable placement-related flags
 *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27d59639177f..bb988608296d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -998,6 +998,13 @@ static void i915_ttm_delayed_free(struct 
drm_i915_gem_object *obj)
 {
GEM_BUG_ON(!obj->ttm.created);
 
+   /* stolen objects are pinned for lifetime. Unpin before putting */
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   ttm_bo_reserve(i915_gem_to_ttm(obj), true, false, NULL);
+   ttm_bo_unpin(i915_gem_to_ttm(obj));
+   ttm_bo_unreserve(i915_gem_to_ttm(obj));
+   }
+
ttm_bo_put(i915_gem_to_ttm(obj));
 }
 
@@ -1193,6 +1200,9 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
.no_wait_gpu = false,
};
enum ttm_bo_type bo_type;
+   struct ttm_place _place;
+   struct ttm_placement _placement;
+   struct ttm_placement *placement;
int ret;
 
drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,6 +1232,17 @@ int __i915_gem_ttm_object_init(struct 
intel_memory_region *mem,
 */
i915_gem_object_make_unshrinkable(obj);
 
+   if (obj->flags & I915_BO_ALLOC_PINNED) {
+   i915_ttm_place_from_region(mem, &_place, obj->bo_offset,
+  obj->base.size, obj->flags);
+   _placement.num_placement = 1;
+   _placement.placement = &_place;
+   _placement.num_busy_placement = 0;
+   _placement.busy_placement = NULL;
+   placement = &_placement;
+   } else {
+   placement = &i915_sys_placement;
+   }
/*
 * If this function fails, it will call the destructor, but
 * our caller still owns the object. So no freeing in the
@@ -1230,7 +1251,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
 * until successful initialization.
 */
ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
-  bo_type, &i915_sys_placement,
+  bo_type, placement,
   page_size >> PAGE_SHIFT,
   &ctx, NULL, NULL, i915_ttm_bo_destroy);
if (ret)
@@ -1242,6 +1263,8 @@ int __i915_gem_ttm_object_init(struct intel_memory_region 
*mem,
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
obj->ttm.cache_level_override = false;
+   if (obj->flags & I915_BO_ALLOC_PINNED)
+   ttm_bo_pin(i915_gem_to_ttm(obj));
i915_gem_object_unlock(obj);
 
return 0;
-- 
2.25.1

[Intel-gfx] [PATCH v8 10/10] drm/i915: stolen memory use ttm backend

2022-06-21 Thread Robert Beckett

refactor stolen memory region to use ttm.
this necessitates using ttm resources to track reserved stolen regions
instead of drm_mm_nodes.

Signed-off-by: Robert Beckett 
---
 drivers/gpu/drm/i915/display/intel_fbc.c  |  78 ++--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 441 +++---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |   7 +
 drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |   7 +-
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/intel_region_ttm.c   |  42 +-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 13 files changed, 304 insertions(+), 342 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c 
b/drivers/gpu/drm/i915/display/intel_fbc.c
index 8b807284cde1..6f3afac5e8c9 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -38,6 +38,7 @@
  * forcibly disable it to allow proper screen updates.
  */
 
+#include "gem/i915_gem_stolen.h"
 #include 
 
 #include 
@@ -51,6 +52,7 @@
 #include "intel_display_types.h"
 #include "intel_fbc.h"
 #include "intel_frontbuffer.h"
+#include "gem/i915_gem_region.h"
 
 #define for_each_fbc_id(__dev_priv, __fbc_id) \
for ((__fbc_id) = INTEL_FBC_A; (__fbc_id) < I915_MAX_FBCS; 
(__fbc_id)++) \
@@ -92,8 +94,8 @@ struct intel_fbc {
struct mutex lock;
unsigned int busy_bits;
 
-   struct drm_mm_node compressed_fb;
-   struct drm_mm_node compressed_llb;
+   struct ttm_resource *compressed_fb;
+   struct ttm_resource *compressed_llb;
 
enum intel_fbc_id id;
 
@@ -331,16 +333,20 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
+   u64 llb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_llb);
 
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   GEM_BUG_ON(llb_offset == I915_BO_INVALID_OFFSET);
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_fb.start, U32_MAX));
+fb_offset, U32_MAX));
GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-fbc->compressed_llb.start, U32_MAX));
+llb_offset, U32_MAX));
 
intel_de_write(i915, FBC_CFB_BASE,
-  i915->dsm.start + fbc->compressed_fb.start);
+  i915->dsm.start + fb_offset);
intel_de_write(i915, FBC_LL_BASE,
-  i915->dsm.start + fbc->compressed_llb.start);
+  i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -448,8 +454,10 @@ static bool g4x_fbc_is_compressing(struct intel_fbc *fbc)
 static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, DPFC_CB_BASE, fb_offset);
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -499,8 +507,10 @@ static bool ilk_fbc_is_compressing(struct intel_fbc *fbc)
 static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
struct drm_i915_private *i915 = fbc->i915;
+   u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), 
fbc->compressed_fb.start);
+   GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+   intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fb_offset);
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -744,21 +754,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
struct drm_i915_private *i915 = fbc->i915;
u64 end = intel_fbc_stolen_end(i915);
-   int ret, limit = min_limit;
+   int limit = min_limit;
+   struct ttm_resource *res;
 
size /= limit;
 
/* Try to over-allocate to reduce reallocations and fragmentation. */
-   ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-  size <<= 1, 4096, 0, end);
-   if (ret == 0)
+   res = i915_gem_stolen_reserve_range(i915, size <<= 1, 0,

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

2022-06-27 Thread Robert Beckett

On 22/06/2022 10:05, Tvrtko Ursulin wrote:

On 21/06/2022 20:11, Robert Beckett wrote:

On 21/06/2022 18:37, Patchwork wrote:

*Patch Details*
*Series:* drm/i915: ttm for stolen (rev5)
*URL:* https://patchwork.freedesktop.org/series/101396/
<https://patchwork.freedesktop.org/series/101396/>

*State:* failure
*Details:*
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html>

CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5

Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5 absolutely
need to be

verified manually.

If you think the reported changes have nothing to do with the changes
introduced in Patchwork_101396v5, please notify your bug team to
allow them
to document this new failure mode, which will reduce false positives
in CI.

External URL:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html

Participating hosts (40 -> 41)

Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus

Possible new issues

Here are the unknown changes that may have been introduced in
Patchwork_101396v5:

IGT changes

Possible regressions

* igt@i915_selftest@live@reset:
o bat-adlp-4: PASS
<https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@l...@reset.html>

-> DMESG-FAIL
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@l...@reset.html>

I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and occasionally
on bat-adlp-6.

Should bat-adlp-4 be considered an unreliable machine like bat-adlp-6
is for now?

Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude low
pages (128KiB) of stolen from use"

could this be an indication that maybe the original issue is worse on
adlp machines?
I have only ever seen page page 135 or 136 clobbered across many runs
via trybot, so it looks fairly consistent.

Though excluding the use of over 540K of stolen might be too severe.

Don't know but I see that on the latest version you even hit pages 165/166.

Any history of hitting this in CI without your series? If not, are there
some other changes which could explain it? Are you touching the selftest
itself?

Hexdump of the clobbered page looks quite complex. Especially
POISON_FREE. Any idea how that ends up there?

(see
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@l...@reset.html#dmesg-warnings702)

after lots of slow debug via CI, it looks like the issue is that a ring
buffer was allocated and taking up that page during the initial crc
capture in the test, but by the time it came to check for corruption, it
had been freed from that page.

The test has a number of weaknesses:

1. the busy check is done twice, without taking in to account any change
in between. I assume previously this could be relied on never to occur,
but now it can for some reason (more on that later)

2. the engine reset returns early with an error for guc submission
engines, but it is silently ignored in the test. Perhaps it should
ignore guc submission engines as it is a largely useless test for those
situations.

A quick obvious fix is to have a busy bitmask that remembers each page's
busy state initially and only check for corruption if it was busy during
both checks.

However, the main question is why this is occurring now with my changes.
I have added more debug to check where the stolen memory is being freed,
but the first run last night didn't hit the issue for once.
I am running again now, will report back if I figure out where it is
being freed.

I am pretty sure the "corruption" (which isn't actually corruption) is
from a ring buffer.
The POISON_FREE is the only difference between the captured before and
after dumps:

[0040] 0280 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b
6b6b6b6b

with the 2nd dword being the MI_ARB_CHECK used for the spinner.
I think this is the request poisoning from i915_request_retire()

The bit I don't know yet is why a ring buffer was freed between the
initial crc capture and the corruption check. The spinner should be
active across the entire test, maintaining a ref on the context and it's
ring.

hopefully my latest debug will give more answers.

Btw what is the benefit of converting stolen to start with? It's not
much of a backend since it just uses the drm range manager. So quite
thin and uneventful. Diffstats for the series also do not look like you
end up with much code reduction?

Regards,

Tvrtko

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

2022-06-28 Thread Robert Beckett

On 28/06/2022 09:46, Tvrtko Ursulin wrote:

On 27/06/2022 18:08, Robert Beckett wrote:

On 22/06/2022 10:05, Tvrtko Ursulin wrote:

On 21/06/2022 20:11, Robert Beckett wrote:

On 21/06/2022 18:37, Patchwork wrote:

*Patch Details*
*Series:* drm/i915: ttm for stolen (rev5)
*URL:* https://patchwork.freedesktop.org/series/101396/
<https://patchwork.freedesktop.org/series/101396/>

*State:* failure
*Details:*
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html>

CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5

Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5 absolutely
need to be

verified manually.

External URL:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html

Participating hosts (40 -> 41)

Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus

Possible new issues

Here are the unknown changes that may have been introduced in
Patchwork_101396v5:

IGT changes

Possible regressions

* igt@i915_selftest@live@reset:
o bat-adlp-4: PASS
<https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@l...@reset.html>

-> DMESG-FAIL
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@l...@reset.html>

I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and occasionally
on bat-adlp-6.

Should bat-adlp-4 be considered an unreliable machine like
bat-adlp-6 is for now?

Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude
low pages (128KiB) of stolen from use"

could this be an indication that maybe the original issue is worse
on adlp machines?
I have only ever seen page page 135 or 136 clobbered across many
runs via trybot, so it looks fairly consistent.

Though excluding the use of over 540K of stolen might be too severe.

Don't know but I see that on the latest version you even hit pages
165/166.

Any history of hitting this in CI without your series? If not, are
there some other changes which could explain it? Are you touching the
selftest itself?

Hexdump of the clobbered page looks quite complex. Especially
POISON_FREE. Any idea how that ends up there?

(see
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@l...@reset.html#dmesg-warnings702)

after lots of slow debug via CI, it looks like the issue is that a
ring buffer was allocated and taking up that page during the initial
crc capture in the test, but by the time it came to check for
corruption, it had been freed from that page.

The test has a number of weaknesses:

1. the busy check is done twice, without taking in to account any
change in between. I assume previously this could be relied on never
to occur, but now it can for some reason (more on that later)

You mean the stolen page used/unused test? Probably the premise is that
the test controls the driver completely ie. is the sole user and the two
checks are run at the time where nothing else could have changed the state.

With the nerfed request (as with GuC) this actually should hold. In the
generic case I am less sure, my working knowledge faded a bit, but
perhaps there was something guaranteeing the spinner couldn't have been
retired yet at the time of the second check. Would need clarifying at
least in comments.

Yes looks dodgy indeed. You will need to summon the owners of the GuC
backend to comment on this.

However even if the test should be skipped with GuC it is extremely
interesting that you are hitting this so I suspect there is a more
serious issue at play.

indeed. That's why I am keen to get to the root cause instead of just
slapping in a fix.

A quick obvious fix is to have a busy bitmask that remembers each
page's busy state initially and only check for corruption if it was
busy during both checks.

However, the main question is why this is occurring now with my changes.
I have added more debug to check where the stolen memory is being
freed, but the first run last night didn't hit the issue for once.
I am running again now, will report back if I figure out where it is
being freed.

I am pretty sure the "corruption" (which isn't actually corruption)

Re: [Intel-gfx] [PATCH 2/3] drm/i915/ttm: don't overwrite cache_dirty after setting coherency

2022-06-28 Thread Robert Beckett





On 14/06/2022 18:55, Matthew Auld wrote:

On Tue, 14 Jun 2022 at 02:14, Adrian Larumbe
 wrote:


When i915_gem_object_set_cache_level sets the GEM object's cache_dirty to
true, in the case of TTM that will sometimes be overwritten when getting
the object's pages, more specifically for shmem-placed objects for which
its ttm structure has just been populated.

This wasn't an issue so far, even though intel_dpt_create was setting the
object's cache level to 'none', regardless of the platform and memory
placement of the framebuffer. However, commit 2f0ec95ed20c ("drm/i915/ttm:
dont trample cache_level overrides during ttm move") makes sure the cache
level set by older backends soon to be managed by TTM isn't altered after
their TTM bo ttm structure is populated.

However this led to the 'obj->cache_dirty = true' set in
i915_gem_object_set_cache_level to stick around rather than being reset
inside i915_ttm_adjust_gem_after_move after calling ttm_tt_populate in
__i915_ttm_get_pages, which eventually led to a warning in DGFX platforms.

There also seems to be no need for this statement to be kept in
i915_gem_object_set_cache_level, since i915_gem_object_set_cache_coherency
is already taking care of it, and also considering whether it's a discrete
platform.

Remove statement altogether.

Signed-off-by: Adrian Larumbe 
---
  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 3e5d6057b3ef..b2c9e16bfa55 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -273,10 +273,8 @@ int i915_gem_object_set_cache_level(struct 
drm_i915_gem_object *obj,
 return ret;

 /* Always invalidate stale cachelines */
-   if (obj->cache_level != cache_level) {
+   if (obj->cache_level != cache_level)
 i915_gem_object_set_cache_coherency(obj, cache_level);
-   obj->cache_dirty = true;


Maybe ban calling this on dgpu or have it fail silently? At the ioctl
level this should already be banned.

Ignoring dgpu, the cache_dirty handling is quite thorny on non-LLC
platforms. I'm not sure if there are other historical reasons for
having this here, but one big issue is that we are not allowed to
freely set cache_dirty = false, unless we are certain that the pages
underneath have been populated and the potential flush-on-acquire
completed. See the kernel-doc for @cache_dirty for more details.


given the commit "068b1bd09253 drm/i915: stop setting cache_dirty on 
discrete"
with it's justification of cache_dirty should not be set on discreet as 
it is not needed, I think this patch should change to set


obj->cache_dirty = !IS_DGFX(to_i915(obj->base.dev));

along with the assignment in flush_write_domain()

and should be considered a fix for that patch.

It should keep the asignment for integrated as it's original purpose 
still holds there.







-   }

 /* The cache-level will be applied when each vma is rebound. */
 return i915_gem_object_unbind(obj,
--
2.36.1

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

2022-06-29 Thread Robert Beckett

On 28/06/2022 17:22, Robert Beckett wrote:

On 28/06/2022 09:46, Tvrtko Ursulin wrote:

On 27/06/2022 18:08, Robert Beckett wrote:

On 22/06/2022 10:05, Tvrtko Ursulin wrote:

On 21/06/2022 20:11, Robert Beckett wrote:

On 21/06/2022 18:37, Patchwork wrote:

*Patch Details*
*Series:* drm/i915: ttm for stolen (rev5)
*URL:* https://patchwork.freedesktop.org/series/101396/
<https://patchwork.freedesktop.org/series/101396/>

*State:* failure
*Details:*
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html>

CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5

Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5 absolutely
need to be

verified manually.

External URL:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html

Participating hosts (40 -> 41)

Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus

Possible new issues

Here are the unknown changes that may have been introduced in
Patchwork_101396v5:

IGT changes

Possible regressions

* igt@i915_selftest@live@reset:
o bat-adlp-4: PASS
<https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@l...@reset.html>

-> DMESG-FAIL
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@l...@reset.html>

I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and
occasionally on bat-adlp-6.

Should bat-adlp-4 be considered an unreliable machine like
bat-adlp-6 is for now?

Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude
low pages (128KiB) of stolen from use"

could this be an indication that maybe the original issue is worse
on adlp machines?
I have only ever seen page page 135 or 136 clobbered across many
runs via trybot, so it looks fairly consistent.

Though excluding the use of over 540K of stolen might be too severe.

Don't know but I see that on the latest version you even hit pages
165/166.

Any history of hitting this in CI without your series? If not, are
there some other changes which could explain it? Are you touching
the selftest itself?

Hexdump of the clobbered page looks quite complex. Especially
POISON_FREE. Any idea how that ends up there?

(see
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@l...@reset.html#dmesg-warnings702)

after lots of slow debug via CI, it looks like the issue is that a
ring buffer was allocated and taking up that page during the initial
crc capture in the test, but by the time it came to check for
corruption, it had been freed from that page.

The test has a number of weaknesses:

1. the busy check is done twice, without taking in to account any
change in between. I assume previously this could be relied on never
to occur, but now it can for some reason (more on that later)

You mean the stolen page used/unused test? Probably the premise is
that the test controls the driver completely ie. is the sole user and
the two checks are run at the time where nothing else could have
changed the state.

With the nerfed request (as with GuC) this actually should hold. In
the generic case I am less sure, my working knowledge faded a bit, but
perhaps there was something guaranteeing the spinner couldn't have
been retired yet at the time of the second check. Would need
clarifying at least in comments.

Yes looks dodgy indeed. You will need to summon the owners of the GuC
backend to comment on this.

However even if the test should be skipped with GuC it is extremely
interesting that you are hitting this so I suspect there is a more
serious issue at play.

indeed. That's why I am keen to get to the root cause instead of just
slapping in a fix.

A quick obvious fix is to have a busy bitmask that remembers each
page's busy state initially and only check for corruption if it was
busy during both checks.

However, the main question is why this is occurring now with my changes.
I have added more debug to check where the stolen memory is being
freed, but the first run last night didn't hit the issue for once.
I am running again now, will report back if I figure out where it is
being freed.

I am pretty sure the "corruption&q

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

2022-06-30 Thread Robert Beckett

On 29/06/2022 13:51, Robert Beckett wrote:

On 28/06/2022 17:22, Robert Beckett wrote:

On 28/06/2022 09:46, Tvrtko Ursulin wrote:

On 27/06/2022 18:08, Robert Beckett wrote:

On 22/06/2022 10:05, Tvrtko Ursulin wrote:

On 21/06/2022 20:11, Robert Beckett wrote:

On 21/06/2022 18:37, Patchwork wrote:

*Patch Details*
*Series:* drm/i915: ttm for stolen (rev5)
*URL:* https://patchwork.freedesktop.org/series/101396/
<https://patchwork.freedesktop.org/series/101396/>

*State:* failure
*Details:*
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html>

CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5

Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5 absolutely
need to be

verified manually.

If you think the reported changes have nothing to do with the
changes
introduced in Patchwork_101396v5, please notify your bug team to
allow them
to document this new failure mode, which will reduce false
positives in CI.