TTM does clear on free for pooled pages and clear on alloc for
non pooled pages using CPU this can have large latency for large
buffer objects.                                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                      GPU can clear pages much faster but mostly for larger 
buffers as gpu
clearing requires a gpu job submission which can make latency worse.            
                                                                                
                                                                                
                                                   XE driver on device with 
flat CCS clears CCS meta data with a clear                                      
                                                                                
                                                                                
                          job submission for all buffers. This series extend 
that clear job to
also clear system pages using GPU to improve job submission latency.            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                      To test the series I created a small test that tries to 
submit a job                                                                    
                                                                                
                                                                           
after binding various sizes of buffer.                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                      Without the series:                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                            sudo  
~/igt-gpu-tools/build/tests/xe_exec_store --run basic-store-benchmark
IGT-Version: 1.28-g2ed908c0b (x86_64) (Linux: 6.9.0-xe+ x86_64)
Using IGT_SRANDOM=1718889799 for randomisation
Opened device: /dev/dri/card0
Starting subtest: basic-store-benchmark
Starting dynamic subtest: WC
Dynamic subtest WC: SUCCESS (0.000s)
Time taken for size SZ_4K: 4882 us
Time taken for size SZ_2M: 3679 us
Time taken for size SZ_64M: 13367 us
Time taken for size SZ_128M: 21034 us
Time taken for size SZ_256M: 32940 us
Time taken for size SZ_1G: 116261 us
Starting dynamic subtest: WB
Dynamic subtest WB: SUCCESS (0.000s)
Time taken for size SZ_4K: 5417 us
Time taken for size SZ_2M: 5711 us
Time taken for size SZ_64M: 15718 us
Time taken for size SZ_128M: 26170 us
Time taken for size SZ_256M: 50529 us
Time taken for size SZ_1G: 177933 us
Subtest basic-store-benchmark: SUCCESS (0.504s)

With the series:                                                                
                                                                                
                                                                                
                                                   
~/igt-gpu-tools/build/tests/xe_exec_store --run basic-store-benchmark           
                                                                                
                                                                                
                                                   IGT-Version: 1.28-g2ed908c0b 
(x86_64) (Linux: 6.9.0-xe+ x86_64)                                              
                                                                                
                                                                                
                      Using IGT_SRANDOM=1718889593 for randomisation            
                                                                                
                                                                                
                                                                         Opened 
device: /dev/dri/card0                                                          
                                                                                
                                                                                
                                            Starting subtest: 
basic-store-benchmark                                                           
                                                                                
                                                                                
                                 Starting dynamic subtest: WC                   
                                                                                
                                                                                
                                                                                
    Dynamic subtest WC: SUCCESS (0.000s)                                        
                                                                                
                                                                                
                                                       Time taken for size 
SZ_4K: 4479 us                                                                  
                                                                                
                                                                                
                               Time taken for size SZ_2M: 3291 us               
                                                                                
                                                                                
                                                                                
  Time taken for size SZ_64M: 6595 us                                           
                                                                                
                                                                                
                                                     Time taken for size 
SZ_128M: 9069 us                                                                
                                                                                
                                                                                
                               Time taken for size SZ_256M: 12681 us            
                                                                                
                                                                                
                                                                                
  Time taken for size SZ_1G: 41806 us                                           
                                                                                
                                                                                
                                                     Starting dynamic subtest: 
WB                                                                              
                                                                                
                                                                                
                         Dynamic subtest WB: SUCCESS (0.000s)                   
                                                                                
                                                                                
                                                                            
Time taken for size SZ_4K: 3317 us                                              
                                                                                
                                                                                
                                                   Time taken for size SZ_2M: 
6458 us                                                                         
                                                                                
                                                                                
                        Time taken for size SZ_64M: 12802 us                    
                                                                                
                                                                                
                                                                           Time 
taken for size SZ_128M: 19579 us                                                
                                                                                
                                                                                
                                              Time taken for size SZ_256M: 
38768 us                                                                        
                                                                                
                                                                                
                      Time taken for size SZ_1G: 143250 us                      
                                                                                
                                                                                
                                                                         
Subtest basic-store-benchmark: SUCCESS (0.328s)

Cc: Christian Koenig <christian.koe...@amd.com>
Cc: "Thomas Hellström" <thomas.hellst...@linux.intel.com>
Cc: Matthew Auld <matthew.a...@intel.com>

Nirmoy Das (2):
  drm/ttm/pool: Introduce a way to skip clear on free
  drm/xe/lnl: Offload system clear page activity to GPU

 drivers/gpu/drm/ttm/ttm_device.c     | 42 +++++++++++++++++++++---
 drivers/gpu/drm/ttm/ttm_pool.c       | 49 +++++++++++++++++++++-------
 drivers/gpu/drm/xe/xe_bo.c           |  4 +++
 drivers/gpu/drm/xe/xe_device.c       | 38 ++++++++++++++++-----
 drivers/gpu/drm/xe/xe_device_types.h |  2 ++
 drivers/gpu/drm/xe/xe_migrate.c      |  6 ++--
 include/drm/ttm/ttm_device.h         |  8 +++++
 include/drm/ttm/ttm_pool.h           | 11 +++++++
 8 files changed, 133 insertions(+), 27 deletions(-)

-- 
2.42.0

Reply via email to