On Fre, 2012-06-29 at 14:07 -0400, Jerome Glisse wrote: > On Fri, Jun 29, 2012 at 12:14 PM, Michel D?nzer <michel at daenzer.net> wrote: > > On Fre, 2012-06-29 at 11:28 -0400, Jerome Glisse wrote: > >> On Fri, Jun 29, 2012 at 11:23 AM, Alex Deucher <alexdeucher at gmail.com> > >> wrote: > >> > On Fri, Jun 29, 2012 at 10:49 AM, Michel D?nzer <michel at daenzer.net> > >> > wrote: > >> >> On Don, 2012-06-28 at 17:53 -0400, alexdeucher at gmail.com wrote: > >> >>> From: Alex Deucher <alexander.deucher at amd.com> > >> >>> > >> >>> Cayman and trinity allow for variable sized VM page > >> >>> tables, but SI requires that all page tables be the > >> >>> same size. The current code assumes variablely sized > >> >>> VM page tables so SI may end up with part of each page > >> >>> table overlapping with other memory which could end > >> >>> up being interpreted by the VM hw as garbage. > >> >>> > >> >>> Change the code to better accomodate SI. Allocate enough > >> >>> space for at least 2 full page tables and always set > >> >>> last_pfn to max_pfn on SI so each VM is backed by a full > >> >>> page table. This limits us to only 2 VMs active at any > >> >>> given time on SI. This will be rectified and the code can > >> >>> be reunified once we move to two level page tables. > >> >>> > >> >>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com> > >> >> > >> >> This change breaks the radeonsi driver for me. egltri_screen (the > >> >> 'golden' test for radeonsi at least basically working) locks up the > >> >> GPU. > >> >> > >> >> I don't have any details about the lockup yet, as the GPU reset attempt > >> >> hangs the machine. Any ideas offhand what radeonsi might be doing wrong? > >> > > >> > Maybe trying to access an unmapped page that happened to work by > >> > accident before and now causes a fault in the VM which halts the MC? > > > > Indeed, looks like it: > > > > radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000FF01B > > radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0202400C > > > > Oddly, while I have seen similar errors before (so at > > least some access to unmapped pages was caught even before your patch), > > I hadn't noticed them for a while with egltri_screen... > > > > > > Anyway, some more experimentation shows that it doesn't happen if I skip > > the clear, and it still happens when doing only a clear. I'll look into > > what might be wrong with the clears next week. > > > > > >> Yeah only thing i can think of, can you get dump of various mc fault > >> reg after lockup ? > > > > Did you have any particular registers in mind? > > I am guessing it's related to default page behavior, previously to > this patch you would likely ended up writting/reading to the dummy > page and thus not getting the segfault you deserved. With this patch > you get the segfault you deserve ;)
Actually, the problem doesn't occur when applying the patch to current drm-core-next. I'm guessing it was some kind of backend / tiling setup issue that's been fixed in the meantime. Thanks for the help anyway, guys. -- Earthling Michel D?nzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer