[Bug 106175] amdgpu.dc=1 shows performance issues with Xorg compositors when moving windows

2018-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=106175

--- Comment #27 from tempel.jul...@gmail.com ---
Is this commit related to it?
https://lists.freedesktop.org/archives/amd-gfx/2018-October/027726.html

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 107928] Screen regularly turns black, reboot needed

2018-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107928

--- Comment #7 from Matthew Vaughn  ---
I am able to reproduce this bug report in every detail on my machine. The only
difference is that I am never present to directly observe the driver deadlock;
it always occurs when I have left the machine idle for at least a few hours.

Both tests dwagner proposed yielded negative results.

I am attaching dmesg logs from the most recent instance of the problem.

Please advise. I run Gentoo, and am able to easily introduce patches into any
part of the system for testing.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 107928] Screen regularly turns black, reboot needed

2018-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107928

--- Comment #8 from Matthew Vaughn  ---
Created attachment 142018
  --> https://bugs.freedesktop.org/attachment.cgi?id=142018&action=edit
Trimmed dmesg logs

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 108356] AMD DC: Mullins APU: Possible race condition between vblank interrupt and atomic pageflip

2018-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108356

Bug ID: 108356
   Summary: AMD DC: Mullins APU: Possible race condition between
vblank interrupt and atomic pageflip
   Product: DRI
   Version: DRI git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: DRM/AMDgpu
  Assignee: dri-devel@lists.freedesktop.org
  Reporter: issor.or...@gmail.com

Created attachment 142019
  --> https://bugs.freedesktop.org/attachment.cgi?id=142019&action=edit
Screen without issue

Hi,

while doing tests with AMD DC on Mullins APU (Acer ES1-521)
a visual problem has been observed on HDMI output to LCD monitor

>From visual point of view there is a trapezoidal shape at the top of screen
appearing from time to time, but it lasts a fraction of second.

Stack: drm_hwcomposer + gbm_gralloc with AMD DC
Kernel: all kernels from 4.16 to 4.19rc7 are impacted

I would like to understand what irq/signals are involved in the screen scanout
and how to trace/profile the problem in Android.
NOTE: even disabling the HWC, by forcing GPU compositing the problem is still
happening.

The problem does not happen at all with Bonaire (HD7790) and Polaris (RX560)
Visual representation of the rapid glitches in the attach

Thanks for any help

Mauro
android-x86 team

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 108356] AMD DC: Mullins APU: Possible race condition between vblank interrupt and atomic pageflip

2018-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108356

--- Comment #1 from Mauro Rossi  ---
Created attachment 142020
  --> https://bugs.freedesktop.org/attachment.cgi?id=142020&action=edit
Screen slightly affected

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 108356] AMD DC: Mullins APU: Possible race condition between vblank interrupt and atomic pageflip

2018-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108356

--- Comment #2 from Mauro Rossi  ---
Created attachment 142021
  --> https://bugs.freedesktop.org/attachment.cgi?id=142021&action=edit
Screen affected up to one third of screen

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 107928] Screen regularly turns black, reboot needed

2018-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107928

Matthew Vaughn  changed:

   What|Removed |Added

 Attachment #142018|0   |1
is obsolete||

--- Comment #9 from Matthew Vaughn  ---
Created attachment 142022
  --> https://bugs.freedesktop.org/attachment.cgi?id=142022&action=edit
Full dmesg logs

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Possible lock inversion in ttm_bo_vm_access

2018-10-14 Thread Koenig, Christian
Hi Thomas,

> that the access() handler took a shortcut when the new locking order 
> was  established
There is no new locking order, the access handler is just for debugging 
and ignoring the correct locking order between mmap_sem and bo_reserve.

That this is throwing a lockdep warning is perfectly possible. We should 
probably move that to a trylock instead.

> bo_reserve()
> copy_to_user() / copy_from_user()
> bo_unreserve() 
That one is illegal for a completely different reason.

The address accessed by copy_to_user()/copy_from_user() could be a BO 
itself, so to resolve this we could end up locking a BO twice.

Adding a might_lock() to the beginning of ttm_bo_vm_fault as you 
suggested doesn't work either, because at this point the mmap_sem is 
still locked.

So lockdep would complain about the incorrect bo_reserve and mmap_sem order.

Christian.

Am 13.10.2018 um 21:04 schrieb Thomas Hellstrom:
> Hi, Christian,
>
> On 10/13/2018 07:36 PM, Christian König wrote:
>> Hi Thomas,
>>
>>> bo_reserve()
>>> copy_to_user() / copy_from_user()
>>> bo_unreserve() 
>>
>> That pattern is illegal for a number of reasons and the mmap_sem is 
>> only one of it.
>>
>> So the locking order must always be mmap_sem->bo_reservation. See the 
>> userptr implementation in amdgpu as well.
>>
>> Christian.
>
> I'm not arguing against that, and since vmwgfx doesn't use that 
> pattern, the locking order doesn't really matter to me since it's even 
> possible to make the TTM fault() handler more well-behaved if we were 
> to fix the locking order to mmap_sem->bo_reserve.
>
> My concern is, since the _opposite_ locking order is (admittedly 
> vaguely) documented in the fault handler, that the access() handler 
> took a shortcut when the new locking order was established possibly 
> without auditing of the other TTM drivers for locking inversion: For 
> example it looks from a quick glance like 
> nouveau_gem_pushbuf_reloc_apply() calls copy_from_user() with bo's 
> reserved (which IIRC was the typical use-case at the time this was 
> last lifted). And lockdep won't trip unless the access() callback is 
> actually called.
>
> My point is if AMD wants to enforce this locking order, then IMHO the 
> other drivers need to be audited and corrected if they are assuming 
> the locking order documented in fault(). A good way to catch such 
> drivers would be to add that might_lock().
>
> Thanks,
> Thomas
>
>
>>
>> Am 12.10.2018 um 16:52 schrieb Thomas Hellstrom:
>>> Hi, Felix,
>>>
>>> It looks like there is a locking inversion in ttm_bo_vm_access() 
>>> where we take a sleeping bo_reserve() while holding mmap_sem().
>>>
>>> Previously we've been assuming the other way around or at least 
>>> undefined allowing for drivers to do
>>>
>>> bo_reserve()
>>> copy_to_user() / copy_from_user()
>>> bo_unreserve()
>>>
>>> I'm not sure the latter pattern is used in any drivers, though, and 
>>> I guess there are ways around it. So it might make sense to fix the 
>>> locking order at this point. In that case, perhaps one should add a
>>>
>>> might_lock(&bo->resv->lock.base);
>>>
>>> at the start of the TTM fault handler to trip lockdep on locking 
>>> order violations in situations where the access() callback isn't 
>>> commonly used...
>>>
>>> /Thomas
>>>
>>>
>>>
>>>
>>> ___
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 107266] Radeon Pro Duo (Polaris) - ring sdma0 timeout

2018-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107266

--- Comment #7 from robert  ---
All Polaris are experiencing ring errors on mainline kernels, its not just Pro
Duo Polaris.


# lspci | grep VGA
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480] (rev ef)
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480] (rev ef)
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480] (rev ef)
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480] (rev cf)
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480] (rev ef)

# uname -a
Linux localhost 4.19.0-999-lowlatency #201810092201 SMP PREEMPT Wed Oct 10
02:12:06 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


# dmesg | grep amdgpu
[8.125848] amdgpu: [powerplay] Failed to retrieve minimum clocks.
[8.125849] amdgpu: [powerplay] Error in phm_get_clock_info 
[8.260967] [drm] Initialized amdgpu 3.27.0 20150101 for :09:00.0 on
minor 4
[   70.238071] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=597, emitted seq=599
[   70.238198] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=597, emitted seq=599

etc etc

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 108359] amdgpu-pro rpm packages cyclical dependencies

2018-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108359

Bug ID: 108359
   Summary: amdgpu-pro rpm packages cyclical dependencies
   Product: DRI
   Version: unspecified
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: DRM/AMDgpu-pro
  Assignee: dri-devel@lists.freedesktop.org
  Reporter: ilmost...@gmail.com

Using the latest amdgpu-pro packages from
https://www.amd.com/en/support/kb/release-notes/rn-prorad-lin-18-30 for RHEL7,
it seems that a number of the packages and their "*-pro" counterparts are
constantly reported as upgrades for the other.  This means that every time an
attempt is made to upgrade the system, e.g. vulkan-amdgpu is listed as an
upgrade that obsoletes vulkan-amdgpu-pro, and vice versa.



# yum upgrade
Loaded plugins: aliases, changelog, copr, langpacks, priorities, product-id,
protectbase, ps, search-disabled-repos, subscription-manager, versionlock
0 packages excluded due to repository protections
Resolving Dependencies
--> Running transaction check
---> Package vulkan-amdgpu.x86_64 0:18.30-641594.el7 will be obsoleting
---> Package vulkan-amdgpu-pro.x86_64 0:18.30-641594.el7 will be obsoleted
--> Finished Dependency Resolution

Dependencies Resolved

=
 Package   Arch
  Version  
  Repository   
Size
=
Installing:
 vulkan-amdgpu x86_64  
  18.30-641594.el7 
  amdgpu-pro-local 
10 M
 replacing  vulkan-amdgpu-pro.x86_64 18.30-641594.el7

Transaction Summary
=
Install  1 Package

Total download size: 10 M
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : vulkan-amdgpu-18.30-641594.el7.x86_64   
   
   
 1/2 
  Erasing: vulkan-amdgpu-pro-18.30-641594.el7.x86_64   
   
   
 2/2 
  Verifying  : vulkan-amdgpu-18.30-641594.el7.x86_64   
   
   
 1/2 
  Verifying  : vulkan-amdgpu-pro-18.30-641594.el7.x86_64   
   
   
 2/2 

Installed:
  vulkan-amdgpu.x86_64 0:18.30-641594.el7   

Replaced:
  vulkan-amdgpu-pro.x86_64 0:18.30-641594.el7   

Complete!
[19:24][20181014-1]# yum upgrade
Loaded plugins: aliases, changelog, copr, langpacks, priorities, product-id,
protectbase, ps, search-disabled-repos, subscription-manager, versionlock
0 packages excluded due to repository protections
Resolving Dependencies
--> Running transaction check
---> Package vulkan-amdgpu.x86_64 0:18.30-641594.el7 will be obsoleted
---> Package vulkan-amdgpu-pro.x86_64 0:18.30-641594.el7 will be obsoleting
--> Finished Dependency

Re: Gemini Lake graphics corruption at top of screen

2018-10-14 Thread Daniel Drake
Hi,

On Mon, Oct 8, 2018 at 1:48 PM Daniel Drake  wrote:
> I recently filed a bug report regarding graphics corruption seen on
> Gemini Lake platforms:
> https://bugs.freedesktop.org/show_bug.cgi?id=108085
>
> This has been reproduced on multiple distros on products from at least
> 4 vendors. It seems to apply to every GeminiLake product that we have
> seen.
>
> The graphics corruption is quite promiment when using these platforms
> for daily use.

Ping... how can we help diagnose this issue?

If you provide a shipping address we can send a sample to Intel, with
the issue easily reproducible and ready-to-go.

Thanks
Daniel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/amdgpu: correct SPDX identifier in amdgpu_trace_points.c

2018-10-14 Thread Jonathan Gray
Commit b24413180f5600bcb3bb70fbed5cf186b60864bd
'License cleanup: add SPDX GPL-2.0 license identifier to files with no license'
incorrectly added "SPDX-License-Identifier: GPL-2.0" to a file with MIT
license text.  Change the SPDX identifier to match the license text.

Signed-off-by: Jonathan Gray 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace_points.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace_points.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace_points.c
index b160b958e5fe..f212402570a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace_points.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace_points.c
@@ -1,4 +1,4 @@
-// SPDX-License-Identifier: GPL-2.0
+// SPDX-License-Identifier: MIT
 /* Copyright Red Hat Inc 2010.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
-- 
2.19.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: change SPDX identifier to MIT

2018-10-14 Thread Jonathan Gray
Commit b24413180f5600bcb3bb70fbed5cf186b60864bd added
"SPDX-License-Identifier: GPL-2.0" to files which previously had no
license, change this to MIT for radeon matching the license text of the
other radeon files.

Signed-off-by: Jonathan Gray 
---
 drivers/gpu/drm/radeon/mkregtable.c  | 2 +-
 drivers/gpu/drm/radeon/r100_track.h  | 2 +-
 drivers/gpu/drm/radeon/radeon_dp_mst.c   | 2 +-
 drivers/gpu/drm/radeon/radeon_legacy_tv.c| 2 +-
 drivers/gpu/drm/radeon/radeon_trace.h| 2 +-
 drivers/gpu/drm/radeon/radeon_trace_points.c | 2 +-
 include/drm/drm_pciids.h | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/radeon/mkregtable.c 
b/drivers/gpu/drm/radeon/mkregtable.c
index ba704633b072..52a7246fed9e 100644
--- a/drivers/gpu/drm/radeon/mkregtable.c
+++ b/drivers/gpu/drm/radeon/mkregtable.c
@@ -1,4 +1,4 @@
-// SPDX-License-Identifier: GPL-2.0
+// SPDX-License-Identifier: MIT
 /* utility to create the register check tables
  * this includes inlined list.h safe for userspace.
  *
diff --git a/drivers/gpu/drm/radeon/r100_track.h 
b/drivers/gpu/drm/radeon/r100_track.h
index ad16a925f8d5..57e2b09784be 100644
--- a/drivers/gpu/drm/radeon/r100_track.h
+++ b/drivers/gpu/drm/radeon/r100_track.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: MIT */
 
 #define R100_TRACK_MAX_TEXTURE 3
 #define R200_TRACK_MAX_TEXTURE 6
diff --git a/drivers/gpu/drm/radeon/radeon_dp_mst.c 
b/drivers/gpu/drm/radeon/radeon_dp_mst.c
index f920be236cc9..84b3ad2172a3 100644
--- a/drivers/gpu/drm/radeon/radeon_dp_mst.c
+++ b/drivers/gpu/drm/radeon/radeon_dp_mst.c
@@ -1,4 +1,4 @@
-// SPDX-License-Identifier: GPL-2.0
+// SPDX-License-Identifier: MIT
 
 #include 
 #include 
diff --git a/drivers/gpu/drm/radeon/radeon_legacy_tv.c 
b/drivers/gpu/drm/radeon/radeon_legacy_tv.c
index 611cf934b211..4278272e3191 100644
--- a/drivers/gpu/drm/radeon/radeon_legacy_tv.c
+++ b/drivers/gpu/drm/radeon/radeon_legacy_tv.c
@@ -1,4 +1,4 @@
-// SPDX-License-Identifier: GPL-2.0
+// SPDX-License-Identifier: MIT
 #include 
 #include 
 #include "radeon.h"
diff --git a/drivers/gpu/drm/radeon/radeon_trace.h 
b/drivers/gpu/drm/radeon/radeon_trace.h
index bc26efd1793e..0d84b8aafab3 100644
--- a/drivers/gpu/drm/radeon/radeon_trace.h
+++ b/drivers/gpu/drm/radeon/radeon_trace.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: MIT */
 #if !defined(_RADEON_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
 #define _RADEON_TRACE_H_
 
diff --git a/drivers/gpu/drm/radeon/radeon_trace_points.c 
b/drivers/gpu/drm/radeon/radeon_trace_points.c
index 66b3d5084662..65e92302f974 100644
--- a/drivers/gpu/drm/radeon/radeon_trace_points.c
+++ b/drivers/gpu/drm/radeon/radeon_trace_points.c
@@ -1,4 +1,4 @@
-// SPDX-License-Identifier: GPL-2.0
+// SPDX-License-Identifier: MIT
 /* Copyright Red Hat Inc 2010.
  * Author : Dave Airlie 
  */
diff --git a/include/drm/drm_pciids.h b/include/drm/drm_pciids.h
index 683742826511..b7e899ce44f0 100644
--- a/include/drm/drm_pciids.h
+++ b/include/drm/drm_pciids.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: MIT */
 #define radeon_PCI_IDS \
{0x1002, 0x1304, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 
CHIP_KAVERI|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP|RADEON_IS_IGP}, \
{0x1002, 0x1305, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 
CHIP_KAVERI|RADEON_NEW_MEMMAP|RADEON_IS_IGP}, \
-- 
2.19.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 108361] Radeon/Xorg crash during boot with Radeon R5 M230

2018-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108361

Bug ID: 108361
   Summary: Radeon/Xorg crash during boot with Radeon R5 M230
   Product: DRI
   Version: unspecified
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: DRM/Radeon
  Assignee: dri-devel@lists.freedesktop.org
  Reporter: jian-h...@endlessm.com

Created attachment 142025
  --> https://bugs.freedesktop.org/attachment.cgi?id=142025&action=edit
journal log when radeon/Xorg crash

This is found on Acer Veriton Z4660G desktop equipped with Intel(R) Core(TM)
i7-8700 CPU and an AMD/ATI Jet PRO Radeon R5 M230 graphic card.

01:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Jet
PRO [Radeon R5 M230] [1002:6665] (rev 83)
Subsystem: PC Partner Limited / Sapphire Technology Jet PRO [Radeon R5
M230] [174b:e332]
Flags: bus master, fast devsel, latency 0, IRQ 127
Memory at 9000 (64-bit, prefetchable) [size=256M]
Memory at 7c30 (64-bit, non-prefetchable) [size=256K]
I/O ports at 4000 [size=256]
Expansion ROM at 7c34 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010

Capabilities: [150] Advanced Error Reporting
Capabilities: [200] #15
Capabilities: [270] #19
Kernel driver in use: radeon
Kernel modules: radeon, amdgpu

I have tested it with Linux kernel 4.19-rc7.  System (radeon module) sometimes
hits the error during boot, then Xorg crashes.

Oct 12 17:28:52 endless kernel: [drm:atom_op_jump [radeon]] *ERROR* atombios
stuck in loop for more than 5secs aborting
Oct 12 17:28:52 endless kernel: [drm:atom_execute_table_locked [radeon]]
*ERROR* atombios stuck executing 67C0 (len 254, WS 0, PS 4) @ 0x67CE
Oct 12 17:28:52 endless kernel: [drm:atom_execute_table_locked [radeon]]
*ERROR* atombios stuck executing 612C (len 78, WS 12, PS 8) @ 0x6165
Oct 12 17:28:52 endless kernel: iwlwifi :00:14.3: HCMD_ACTIVE already clear
for command SCAN_REQ_UMAC
Oct 12 17:28:52 endless kernel: [drm] PCIE gen 3 link speeds already enabled
Oct 12 17:28:54 endless kernel: radeon :01:00.0: Wait for MC idle timedout
!
Oct 12 17:28:54 endless kernel: radeon :01:00.0: Wait for MC idle timedout
!
Oct 12 17:28:54 endless eos-metrics-ins[606]: Failed to start GeoClue2 client:
GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message recipient disconnected
from message bus without replying.
Oct 12 17:28:54 endless kernel: [drm] PCIE GART of 2048M enabled (table at
0x0004).
Oct 12 17:28:54 endless kernel: radeon :01:00.0: WB enabled
Oct 12 17:28:54 endless kernel: radeon :01:00.0: fence driver on ring 0 use
gpu addr 0x8c00 and cpu addr 0x7d0c53c5
Oct 12 17:28:54 endless kernel: radeon :01:00.0: fence driver on ring 1 use
gpu addr 0x8c04 and cpu addr 0x6e1c12be
Oct 12 17:28:54 endless kernel: radeon :01:00.0: fence driver on ring 2 use
gpu addr 0x8c08 and cpu addr 0xa603d5e9
Oct 12 17:28:54 endless kernel: radeon :01:00.0: fence driver on ring 3 use
gpu addr 0x8c0c and cpu addr 0x39a9e421
Oct 12 17:28:54 endless kernel: radeon :01:00.0: fence driver on ring 4 use
gpu addr 0x8c10 and cpu addr 0xba920de2
Oct 12 17:28:55 endless kernel: [drm:r600_ring_test [radeon]] *ERROR* radeon:
ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)
Oct 12 17:28:55 endless kernel: [drm:si_resume [radeon]] *ERROR* si startup
failed on resume

I also tried to disable radeon's runtime power management by passing
"radeon.runpm=0" to boot command.  It makes system working stable on this
model.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Possible lock inversion in ttm_bo_vm_access

2018-10-14 Thread Daniel Vetter
On Sun, Oct 14, 2018 at 8:20 PM Koenig, Christian
 wrote:
>
> Hi Thomas,
>
> > that the access() handler took a shortcut when the new locking order
> > was  established
> There is no new locking order, the access handler is just for debugging
> and ignoring the correct locking order between mmap_sem and bo_reserve.
>
> That this is throwing a lockdep warning is perfectly possible. We should
> probably move that to a trylock instead.
>
> > bo_reserve()
> > copy_to_user() / copy_from_user()
> > bo_unreserve()
> That one is illegal for a completely different reason.
>
> The address accessed by copy_to_user()/copy_from_user() could be a BO
> itself, so to resolve this we could end up locking a BO twice.
>
> Adding a might_lock() to the beginning of ttm_bo_vm_fault as you
> suggested doesn't work either, because at this point the mmap_sem is
> still locked.
>
> So lockdep would complain about the incorrect bo_reserve and mmap_sem order.

I think Thomas' point is the one below:

> Christian.
>
> Am 13.10.2018 um 21:04 schrieb Thomas Hellstrom:
> > Hi, Christian,
> >
> > On 10/13/2018 07:36 PM, Christian König wrote:
> >> Hi Thomas,
> >>
> >>> bo_reserve()
> >>> copy_to_user() / copy_from_user()
> >>> bo_unreserve()
> >>
> >> That pattern is illegal for a number of reasons and the mmap_sem is
> >> only one of it.
> >>
> >> So the locking order must always be mmap_sem->bo_reservation. See the
> >> userptr implementation in amdgpu as well.
> >>
> >> Christian.
> >
> > I'm not arguing against that, and since vmwgfx doesn't use that
> > pattern, the locking order doesn't really matter to me since it's even
> > possible to make the TTM fault() handler more well-behaved if we were
> > to fix the locking order to mmap_sem->bo_reserve.
> >
> > My concern is, since the _opposite_ locking order is (admittedly
> > vaguely) documented in the fault handler, that the access() handler
> > took a shortcut when the new locking order was established possibly
> > without auditing of the other TTM drivers for locking inversion: For
> > example it looks from a quick glance like
> > nouveau_gem_pushbuf_reloc_apply() calls copy_from_user() with bo's
> > reserved (which IIRC was the typical use-case at the time this was
> > last lifted). And lockdep won't trip unless the access() callback is
> > actually called.
> >
> > My point is if AMD wants to enforce this locking order, then IMHO the
> > other drivers need to be audited and corrected if they are assuming
> > the locking order documented in fault(). A good way to catch such
> > drivers would be to add that might_lock().

^^ This one here. There's a bunch of drivers which try-lock in the
fault handler, so that the _can_ do the

bo_reserve()
copy*user()
bo_unreserve()

pattern. Yes the trylock will just loop forever if you copy*user()
hits a bo itself that's already in the CS. Iirc I've complained about
this years back. Now amdgpu switched over (like i915 did years
earlier), because it's the only thing that reliably works even when
facing evil userspace, but there's still radeon&noveau. I think Thomas
argues that we should fix those, and I agree.

Once those are fixed I also think that a might_lock in the fault
handler should not blow up anymore. If it does, you have an inversion
still somewhere.

Aside: I think it'd be good to document this as part of struct
reservation_object, preferrably with lockdep annotations, to make sure
no one gets this wrong. Since we need _every_ driver to obey this, or
you just need the right buffer sharing combination to deadlock.

Cheers, Daniel

> >
> > Thanks,
> > Thomas
> >
> >
> >>
> >> Am 12.10.2018 um 16:52 schrieb Thomas Hellstrom:
> >>> Hi, Felix,
> >>>
> >>> It looks like there is a locking inversion in ttm_bo_vm_access()
> >>> where we take a sleeping bo_reserve() while holding mmap_sem().
> >>>
> >>> Previously we've been assuming the other way around or at least
> >>> undefined allowing for drivers to do
> >>>
> >>> bo_reserve()
> >>> copy_to_user() / copy_from_user()
> >>> bo_unreserve()
> >>>
> >>> I'm not sure the latter pattern is used in any drivers, though, and
> >>> I guess there are ways around it. So it might make sense to fix the
> >>> locking order at this point. In that case, perhaps one should add a
> >>>
> >>> might_lock(&bo->resv->lock.base);
> >>>
> >>> at the start of the TTM fault handler to trip lockdep on locking
> >>> order violations in situations where the access() callback isn't
> >>> commonly used...
> >>>
> >>> /Thomas
> >>>
> >>>
> >>>
> >>>
> >>> ___
> >>> dri-devel mailing list
> >>> dri-devel@lists.freedesktop.org
> >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> >
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___