Update: UVD status on loongson 3a platform
Hi all, This thread is about http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html. We recently find some interesting thing about UVD based playback on loongson 3a plaform, and also find a way to fix the problem. First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c caused the problem: * If memcpy is implemented though 16B or 8B load/store instructions, it will normally caused video mosaic. When insert a memcmp after the copying code in memcpy, it will report the src and dest are not equal. * If memcpy use 1B load/store instructions only, the memcmp after the copying code reports equal. Then we find the following changeset fixs out problem: diff --git a/src/gallium/drivers/radeon/radeon_uvd.c b/src/gallium/drivers/radeon/radeon_uvd.c index 2f98de2..f9599b6 100644 --- a/src/gallium/drivers/radeon/radeon_uvd.c +++ b/src/gallium/drivers/radeon/radeon_uvd.c @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec, unsigned size) { buffer->buf = dec->ws->buffer_create(dec->ws, size, 4096, false, - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM); + RADEON_DOMAIN_GTT); if (!buffer->buf) return false; The VRAM is mapped to an uncached area in out platform, so, my question is what could go wrong while using >4B load/store instructions in UVD workflow? Any idea? -- Regards, Chen Jie ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re:[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, Add more information. We got occasionally "GPU lockup" after resuming from suspend(on mipsel platform with a mips64 compatible CPU and rs780e, the kernel is 3.1.0-rc8 64bit). Related kernel message: /* return from STR */ [ 156.152343] radeon :01:05.0: WB enabled [ 156.187500] [drm] ring test succeeded in 0 usecs [ 156.187500] [drm] ib test succeeded in 0 usecs [ 156.398437] ata2: SATA link down (SStatus 0 SControl 300) [ 156.398437] ata3: SATA link down (SStatus 0 SControl 300) [ 156.398437] ata4: SATA link down (SStatus 0 SControl 300) [ 156.578125] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 156.597656] ata1.00: configured for UDMA/133 [ 156.613281] usb 1-5: reset high speed USB device number 4 using ehci_hcd [ 157.027343] usb 3-2: reset low speed USB device number 2 using ohci_hcd [ 157.609375] usb 3-3: reset low speed USB device number 3 using ohci_hcd [ 157.683593] r8169 :02:00.0: eth0: link up [ 165.621093] PM: resume of devices complete after 9679.556 msecs [ 165.628906] Restarting tasks ... done. [ 177.085937] radeon :01:05.0: GPU lockup CP stall for more than 10019msec [ 177.089843] [ cut here ] [ 177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x25c/0x33c() [ 177.105468] GPU lockup (waiting for 0x13C3 last fence id 0x13AD) [ 177.113281] Modules linked in: psmouse serio_raw [ 177.117187] Call Trace: [ 177.121093] [] dump_stack+0x8/0x34 [ 177.125000] [] warn_slowpath_common+0x78/0xa0 [ 177.132812] [] warn_slowpath_fmt+0x38/0x44 [ 177.136718] [] radeon_fence_wait+0x25c/0x33c [ 177.144531] [] ttm_bo_wait+0x108/0x220 [ 177.148437] [] radeon_gem_wait_idle_ioctl+0x80/0x114 [ 177.156250] [] drm_ioctl+0x2e4/0x3fc [ 177.160156] [] radeon_kms_compat_ioctl+0x28/0x38 [ 177.167968] [] compat_sys_ioctl+0x120/0x35c [ 177.171875] [] handle_sys+0x118/0x138 [ 177.179687] ---[ end trace 92f63d998efe4c6d ]--- [ 177.187500] radeon :01:05.0: GPU softreset [ 177.191406] radeon :01:05.0: R_008010_GRBM_STATUS=0xF57C2030 [ 177.195312] radeon :01:05.0: R_008014_GRBM_STATUS2=0x0003 [ 177.203125] radeon :01:05.0: R_000E50_SRBM_STATUS=0x20023040 [ 177.363281] radeon :01:05.0: Wait for MC idle timedout ! [ 177.367187] radeon :01:05.0: R_008020_GRBM_SOFT_RESET=0x7FEE [ 177.390625] radeon :01:05.0: R_008020_GRBM_SOFT_RESET=0x0001 [ 177.414062] radeon :01:05.0: R_008010_GRBM_STATUS=0xA0003030 [ 177.417968] radeon :01:05.0: R_008014_GRBM_STATUS2=0x0003 [ 177.425781] radeon :01:05.0: R_000E50_SRBM_STATUS=0x2002B040 [ 177.433593] radeon :01:05.0: GPU reset succeed [ 177.605468] radeon :01:05.0: Wait for MC idle timedout ! [ 177.761718] radeon :01:05.0: Wait for MC idle timedout ! [ 177.804687] radeon :01:05.0: WB enabled [ 178.00] [drm:r600_ring_test] *ERROR* radeon: ring test failed (scratch(0x8504)=0xCAFEDEAD) [ 178.007812] [drm:r600_resume] *ERROR* r600 startup failed on resume [ 178.988281] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(5). [ 178.996093] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB ! [ 179.003906] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(6). ... What may cause a "GPU lockup"? Why reset didn't work? Any idea? BTW, one question: I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes need_dma32 was set. Is it correct? (drivers/char/agp is not available on mips, could that be the reason?) [ 177.179687]在 2011年9月28日 下午3:23, 写道: > Hi Alex, > > When we do STR (S3) with a RS780E radeon card on MIPS platform. "GPU > reset" may happen after resume (the possibility is about 5%). After that, > X is unusuable. > > We know there is a "ring test" at system resume time and GPU reset time. > Whether GPU reset happens, the "ring test" at system resume time is always > successful. But the "ring test" at GPU reset time usually fails. > > We use the latest kernel (3.1.0-RC8 from git) and X.org is 7.6. > > Any ideas? > > Best regards, > Huacai Chen > > Regards, - Chen Jie ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon/kms: set DMA mask properly on newer PCI asics
Hi Alex, Sorry for the late reply. I tried the patch on our mipsel platform, but got the following: [1.335937] [drm] Loading RS780 Microcode [1.910156] [drm:r600_ring_test] *ERROR* radeon: ring test failed (scratch(0x8504)=0xCAFEDEAD) [1.917968] radeon :01:05.0: disabling GPU acceleration The platform is equipped with 1G memory, and the physical address layout is: [0-256M] physical memory [256M - 4352M] hole [4352M - ] physical memory After applying the patch, the ring buffer BO is allocated at physical address(and is equal to the bus address) near 5G. I doubt RS780 fails to access such high bus address? (I can't validate it on X86+rs780e, since I doesn't have >4G memory at hand, could somebody please to validate it?) BTW, I found radeon_gart_bind() will call pci_map_page(), it hooks to swiotlb_map_page on our platform, which seems allocates and returns dma_addr_t of a new page from pool if not meet dma_mask. Seems a bug, since the BO backed by one set of pages, but mapped to GART was another set of pages? Regards, -- cee1 2011/10/5 > From: Alex Deucher > > If a card wasn't PCIE, we always set the DMA mask to 32 bits. > This is only applies to the old rage128/r1xx gart block on > early radeon asics (~r1xx-r4xx). Newer PCI and IGP cards > can handle 40 bits just fine. > > Signed-off-by: Alex Deucher > Cc: Chen Jie > --- > drivers/gpu/drm/radeon/radeon_device.c |7 --- > 1 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_device.c > b/drivers/gpu/drm/radeon/radeon_device.c > index b51e157..2c3429d 100644 > --- a/drivers/gpu/drm/radeon/radeon_device.c > +++ b/drivers/gpu/drm/radeon/radeon_device.c > @@ -750,14 +750,15 @@ int radeon_device_init(struct radeon_device *rdev, > >/* set DMA mask + need_dma32 flags. > * PCIE - can handle 40-bits. > -* IGP - can handle 40-bits (in theory) > +* IGP - can handle 40-bits > * AGP - generally dma32 is safest > -* PCI - only dma32 > +* PCI - dma32 for legacy pci gart, 40 bits on newer asics > */ >rdev->need_dma32 = false; >if (rdev->flags & RADEON_IS_AGP) >rdev->need_dma32 = true; > - if (rdev->flags & RADEON_IS_PCI) > + if ((rdev->flags & RADEON_IS_PCI) && > + (rdev->family < CHIP_RS400)) >rdev->need_dma32 = true; > >dma_bits = rdev->need_dma32 ? 32 : 40; > -- > 1.7.1.1 > > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Re:[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, 在 2011年10月17日 下午2:34, 写道: > If I start X but switch to the console, then do suspend & resume, "GPU > reset" hardly happen. but there is a new problem that the IRQ of radeon > card is disabled. Maybe "GPU reset" has something to do with "IRQ > disabled"? > > I have tried "irqpoll", it doesn't fix this problem. > > [ 571.914062] irq 6: nobody cared (try booting with the "irqpoll" option) > [ 571.914062] Call Trace: > [ 571.914062] [] dump_stack+0x8/0x34 > [ 571.914062] [] __report_bad_irq.clone.6+0x44/0x15c > [ 571.914062] [] note_interrupt+0x204/0x2a0 > [ 571.914062] [] handle_irq_event_percpu+0x19c/0x1f8 > [ 571.914062] [] handle_irq_event+0x68/0xa8 > [ 571.914062] [] handle_level_irq+0xd8/0x13c > [ 571.914062] [] generic_handle_irq+0x48/0x58 > [ 571.914062] [] do_IRQ+0x18/0x24 > [ 571.914062] [] mach_irq_dispatch+0xf0/0x194 > [ 571.914062] [] ret_from_irq+0x0/0x4 > [ 571.914062] > [ 571.914062] handlers: > [ 571.914062] [] radeon_driver_irq_handler_kms > > P.S.: use the latest kernel from git, and irq6 is not shared by other > devices. > > Does fence_wait depends on GPU's interrupt? If yes, then can I say "GPU lockup" is caused by unexpected disabling of GPU's irq? > > Hi Alex, Michel > > > > 2011/10/5 Alex Deucher > > > >> 2011/10/5 Michel D鋘zer : > >> > On Don, 2011-09-29 at 17:17 +0800, Chen Jie wrote: > >> >> > >> >> We got occasionally "GPU lockup" after resuming from suspend(on > >> mipsel > >> >> platform with a mips64 compatible CPU and rs780e, the kernel is > >> >> 3.1.0-rc8 64bit). Related kernel message: > >> > > >> > [...] > >> > > >> >> [ 177.085937] radeon :01:05.0: GPU lockup CP stall for more than > >> >> 10019msec > >> >> [ 177.089843] [ cut here ] > >> >> [ 177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 > >> >> radeon_fence_wait+0x25c/0x33c() > >> >> [ 177.105468] GPU lockup (waiting for 0x13C3 last fence id > >> >> 0x13AD) > >> >> [ 177.113281] Modules linked in: psmouse serio_raw > >> >> [ 177.117187] Call Trace: > >> >> [ 177.121093] [] dump_stack+0x8/0x34 > >> >> [ 177.125000] [] warn_slowpath_common+0x78/0xa0 > >> >> [ 177.132812] [] warn_slowpath_fmt+0x38/0x44 > >> >> [ 177.136718] [] radeon_fence_wait+0x25c/0x33c > >> >> [ 177.144531] [] ttm_bo_wait+0x108/0x220 > >> >> [ 177.148437] [] radeon_gem_wait_idle_ioctl > >> >> +0x80/0x114 > >> >> [ 177.156250] [] drm_ioctl+0x2e4/0x3fc > >> >> [ 177.160156] [] radeon_kms_compat_ioctl+0x28/0x38 > >> >> [ 177.167968] [] compat_sys_ioctl+0x120/0x35c > >> >> [ 177.171875] [] handle_sys+0x118/0x138 > >> >> [ 177.179687] ---[ end trace 92f63d998efe4c6d ]--- > >> >> [ 177.187500] radeon :01:05.0: GPU softreset > >> >> [ 177.191406] radeon :01:05.0: R_008010_GRBM_STATUS=0xF57C2030 > >> >> [ 177.195312] radeon :01:05.0: > >> R_008014_GRBM_STATUS2=0x0003 > >> >> [ 177.203125] radeon :01:05.0: R_000E50_SRBM_STATUS=0x20023040 > >> >> [ 177.363281] radeon :01:05.0: Wait for MC idle timedout ! > >> > > >> > [...] > >> > > >> >> What may cause a "GPU lockup"? > >> > > >> > Lots of things... The most common cause is an incorrect command stream > >> > sent to the GPU by userspace or the kernel. > >> > > >> >> Why reset didn't work? > >> > > >> > Might be related to 'Wait for MC idle timedout !', but I don't know > >> > offhand what could be up with that. > >> > > >> > > >> >> BTW, one question: > >> >> I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes > >> >> need_dma32 was set. > >> >> Is it correct? (drivers/char/agp is not available on mips, could that > >> >> be the reason?) > >> > > >> > Not sure, Alex? > >> > >> You don't AGP for newer IGP cards (rs4xx+). It gets set by default if > >> the card is not AGP or PCIE. That should be changed as only the > >> legacy r1xx PCI GART block has that limitation. I'll send a patch out > >> shortly. > >> > >> Got it, thanks for the reply. > > > ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[radeon]Question about create ring BO in VRAM
Hi all, I tried to create/pin ring BO in VRAM instead of GTT to debug some ring-related problems. After I did this, it rendered a black screen in X (on a X86 RS780E board), but radeon.test passed. 'ps aux' shows X uninterruptibly sleeps on radeon. Curious why this does not work? Regards, -- Chen Jie ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [radeon]Question about create ring BO in VRAM
2011/11/5 Alex Deucher : > On Fri, Nov 4, 2011 at 10:26 AM, Chen Jie wrote: >> Hi all, >> >> I tried to create/pin ring BO in VRAM instead of GTT to debug some >> ring-related problems. After I did this, it rendered a black screen in >> X (on a X86 RS780E board), but radeon.test passed. >> 'ps aux' shows X uninterruptibly sleeps on radeon. >> >> Curious why this does not work? > > The tricky part is dealing with the HDP cache. Access to vram via the > PCI FB BAR goes through the HDP cache, you have to make sure it's > flushed properly before the GPU starts using the data there. To flush > it, either read back from vram, or write 1 to the > HDP_MEM_COHERENCY_FLUSH_CNTL register. We generally don't recommend > putting the ring in vram. Get it, thanks. After add HDP cache flush in r600_cp_commit(), it works fine. Regards, -- Chen Jie ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, Some status update. 在 2011年9月29日 下午5:17,Chen Jie 写道: > Hi, > Add more information. > We got occasionally "GPU lockup" after resuming from suspend(on mipsel > platform with a mips64 compatible CPU and rs780e, the kernel is 3.1.0-rc8 > 64bit). Related kernel message: > /* return from STR */ > [ 156.152343] radeon :01:05.0: WB enabled > [ 156.187500] [drm] ring test succeeded in 0 usecs > [ 156.187500] [drm] ib test succeeded in 0 usecs > [ 156.398437] ata2: SATA link down (SStatus 0 SControl 300) > [ 156.398437] ata3: SATA link down (SStatus 0 SControl 300) > [ 156.398437] ata4: SATA link down (SStatus 0 SControl 300) > [ 156.578125] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > [ 156.597656] ata1.00: configured for UDMA/133 > [ 156.613281] usb 1-5: reset high speed USB device number 4 using ehci_hcd > [ 157.027343] usb 3-2: reset low speed USB device number 2 using ohci_hcd > [ 157.609375] usb 3-3: reset low speed USB device number 3 using ohci_hcd > [ 157.683593] r8169 :02:00.0: eth0: link up > [ 165.621093] PM: resume of devices complete after 9679.556 msecs > [ 165.628906] Restarting tasks ... done. > [ 177.085937] radeon :01:05.0: GPU lockup CP stall for more than > 10019msec > [ 177.089843] [ cut here ] > [ 177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 > radeon_fence_wait+0x25c/0x33c() > [ 177.105468] GPU lockup (waiting for 0x13C3 last fence id 0x13AD) > [ 177.113281] Modules linked in: psmouse serio_raw > [ 177.117187] Call Trace: > [ 177.121093] [] dump_stack+0x8/0x34 > [ 177.125000] [] warn_slowpath_common+0x78/0xa0 > [ 177.132812] [] warn_slowpath_fmt+0x38/0x44 > [ 177.136718] [] radeon_fence_wait+0x25c/0x33c > [ 177.144531] [] ttm_bo_wait+0x108/0x220 > [ 177.148437] [] radeon_gem_wait_idle_ioctl+0x80/0x114 > [ 177.156250] [] drm_ioctl+0x2e4/0x3fc > [ 177.160156] [] radeon_kms_compat_ioctl+0x28/0x38 > [ 177.167968] [] compat_sys_ioctl+0x120/0x35c > [ 177.171875] [] handle_sys+0x118/0x138 > [ 177.179687] ---[ end trace 92f63d998efe4c6d ]--- > [ 177.187500] radeon :01:05.0: GPU softreset > [ 177.191406] radeon :01:05.0: R_008010_GRBM_STATUS=0xF57C2030 > [ 177.195312] radeon :01:05.0: R_008014_GRBM_STATUS2=0x0003 > [ 177.203125] radeon :01:05.0: R_000E50_SRBM_STATUS=0x20023040 > [ 177.363281] radeon :01:05.0: Wait for MC idle timedout ! > [ 177.367187] radeon :01:05.0: R_008020_GRBM_SOFT_RESET=0x7FEE > [ 177.390625] radeon :01:05.0: R_008020_GRBM_SOFT_RESET=0x0001 > [ 177.414062] radeon :01:05.0: R_008010_GRBM_STATUS=0xA0003030 > [ 177.417968] radeon :01:05.0: R_008014_GRBM_STATUS2=0x0003 > [ 177.425781] radeon :01:05.0: R_000E50_SRBM_STATUS=0x2002B040 > [ 177.433593] radeon :01:05.0: GPU reset succeed > [ 177.605468] radeon :01:05.0: Wait for MC idle timedout ! > [ 177.761718] radeon :01:05.0: Wait for MC idle timedout ! > [ 177.804687] radeon :01:05.0: WB enabled > [ 178.00] [drm:r600_ring_test] *ERROR* radeon: ring test failed > (scratch(0x8504)=0xCAFEDEAD) After pinned ring in VRAM, it warned an ib test failure. It seems something wrong with accessing through GTT. We dump gart table just after stopped cp, and compare gart table with the dumped one just after r600_pcie_gart_enable, and don't find any difference. Any idea? > [ 178.007812] [drm:r600_resume] *ERROR* r600 startup failed on resume > [ 178.988281] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule > IB(5). > [ 178.996093] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB ! > [ 179.003906] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule > IB(6). > ... Regards, -- Chen Jie ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, Status update about the problem 'Occasionally "GPU lockup" after resuming from suspend.' First, this could happen when system returns from a STR(suspend to ram) or STD(suspend to disk, aka hibernation). When returns from STD, the initialization process is most similar to the normal boot. The standby is ok, which is similar to STR, except that standby will not shutdown the power of CPU,GPU etc. We've dumped and compared the registers, and found something: CP_STAT normal value: 0x value when this problem occurred: 0x802100C1 or 0x802300C1 CP_ME_CNTL normal value: 0x00FF value when this problem occurred: always 0x20FF in our test Questions: According to the manual, CP_STAT = 0x802100C1 means CSF_RING_BUSY(bit 0): The Ring fetcher still has command buffer data to fetch, or the PFP still has data left to process from the reorder queue. CSF_BUSY(bit 6): The input FIFOs have command buffers to fetch, or one or more of the fetchers are busy, or the arbiter has a request to send to the MIU. MIU_RDREQ_BUSY(bit 7): The read path logic inside the MIU is busy. MEQ_BUSY(bit 16): The PFP-to-ME queue has valid data in it. SURFACE_SYNC_BUSY(bit 21): The Surface Sync unit is busy. CP_BUSY(bit 31): Any block in the CP is busy. What does it suggest? What does it mean if bit 29 of CP_ME_CNTL is set? BTW, how does the dummy page work in GART? Regards, -- Chen Jie 在 2011年12月7日 下午10:21,Alex Deucher 写道: > 2011/12/7 : >> When "MC timeout" happens at GPU reset, we found the 12th and 13th >> bits of R_000E50_SRBM_STATUS is 1. From kernel code we found these >> two bits are like this: >> #define G_000E50_MCDX_BUSY(x) (((x) >> 12) & 1) >> #define G_000E50_MCDW_BUSY(x) (((x) >> 13) & 1) >> >> Could you please tell me what does they mean? And if possible, > > They refer to sub-blocks in the memory controller. I don't really > know off hand what the name mean. > >> I want to know the functionalities of these 5 registers in detail: >> #define R_000E60_SRBM_SOFT_RESET 0x0E60 >> #define R_000E50_SRBM_STATUS 0x0E50 >> #define R_008020_GRBM_SOFT_RESET0x8020 >> #define R_008010_GRBM_STATUS0x8010 >> #define R_008014_GRBM_STATUS2 0x8014 >> >> A bit more info: If I reset the MC after resetting CP (this is what >> Linux-2.6.34 does, but removed since 2.6.35), then "MC timeout" will >> disappear, but there is still "ring test failed". > > The bits are defined in r600d.h. As to the acronyms: > BIF - Bus InterFace > CG - clocks > DC - Display Controller > GRBM - Graphics block (3D engine) > HDP - Host Data Path (CPU access to vram via the PCI BAR) > IH, RLC - Interrupt controller > MC - Memory controller > ROM - ROM > SEM - semaphore controller > > When you reset the MC, you will probably have to reset just about > everything else since most blocks depend on the MC for access to > memory. If you do reset the MC, you should do it at prior to calling > asic_init so you make sure all the hw gets re-initialized properly. > Additionally, you should probably reset the GRBM either via > SRBM_SOFT_RESET or the individual sub-blocks via GRBM_SOFT_RESET. > > Alex > >> >> Huacai Chen >> >>> 2011/11/8 : >>>> And, I want to know something: >>>> 1, Does GPU use MC to access GTT? >>> >>> Yes. All GPU clients (display, 3D, etc.) go through the MC to access >>> memory (vram or gart). >>> >>>> 2, What can cause MC timeout? >>> >>> Lots of things. Some GPU client still active, some GPU client hung or >>> not properly initialized. >>> >>> Alex >>> >>>> >>>>> Hi, >>>>> >>>>> Some status update. >>>>> 在 2011年9月29日 下午5:17,Chen Jie 写道: >>>>>> Hi, >>>>>> Add more information. >>>>>> We got occasionally "GPU lockup" after resuming from suspend(on mipsel >>>>>> platform with a mips64 compatible CPU and rs780e, the kernel is >>>>>> 3.1.0-rc8 >>>>>> 64bit). Related kernel message: >>>>>> /* return from STR */ >>>>>> [ 156.152343] radeon :01:05.0: WB enabled >>>>>> [ 156.187500] [drm] ring test succeeded in 0 usecs >>>>>> [ 156.187500] [drm] ib test succeeded in 0 usecs >>>>&
Re: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, 在 2012年2月15日 下午11:53,Jerome Glisse 写道: > To me it looks like the CP is trying to fetch memory but the > GPU memory controller fail to fullfill cp request. Did you > check the PCI configuration before & after (when things don't > work) My best guest is PCI bus mastering is no properly working > or the PCIE GPU gart table as wrong data. > > Maybe one need to drop bus master and reenable bus master to > work around some bug... Thanks for your suggestion. We've tried the 'drop and reenable master' trick, unfortunately doesn't work. The PCI configuration compare will be done later. Some additional information: The "GPU Lockup" seems always occur after tasks be restarting -- We inserted more ring tests , non of them failed before restarting tasks. BTW, I hacked GART table to try to simulate the problem: 1. Changes the system memory address(bus address) of ring_obj to an arbitrary value, e.g. 0 or 128M. 2. Changes the system memory address of a BO in radeon_test to an arbitrary value, e.g. 0 Non of above leaded to a GPU Lockup: Point 1 rendered a black screen; Point 2 only the test itself failed Any idea? Regards, -- Chen Jie ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
在 2012年2月16日 下午5:21,Chen Jie 写道: > Hi, > > 在 2012年2月15日 下午11:53,Jerome Glisse 写道: >> To me it looks like the CP is trying to fetch memory but the >> GPU memory controller fail to fullfill cp request. Did you >> check the PCI configuration before & after (when things don't >> work) My best guest is PCI bus mastering is no properly working >> or the PCIE GPU gart table as wrong data. >> >> Maybe one need to drop bus master and reenable bus master to >> work around some bug... > Thanks for your suggestion. We've tried the 'drop and reenable master' > trick, unfortunately doesn't work. > The PCI configuration compare will be done later. Update: We've checked the first 64 bytes of PCI configuration space before & after, and didn't find any difference. Regards, -- Chen Jie ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
在 2012年2月17日 上午12:32,Jerome Glisse 写道: > Ok let's start from the begining, i convince it's related to GPU > memory controller failing to full fill some request that hit system > memory. So in another mail you wrote : > >> BTW, I found radeon_gart_bind() will call pci_map_page(), it hooks >> to swiotlb_map_page on our platform, which seems allocates and returns >> dma_addr_t of a new page from pool if not meet dma_mask. Seems a bug, since >> the BO backed by one set of pages, but mapped to GART was another set of >> pages? > > Is this still the case ? As this is obviously wrong, we fixed that > recently. What drm code are you using. rs780 dma mask is something > like 40bits iirc so you should never have issue on your system with > 1G of memory right ? Right. > > If you have an iommu what happens on resume ? Are all page previously > mapped with pci map page still valid ? The physical address is directly mapped to bus address, so iommu do nothing on resume, the pages should be valid? > > One good way to test gart is to go over GPU gart table and write a > dword using the GPU at end of each page something like 0xCAFEDEAD > or somevalue that is unlikely to be already set. And then go over > all the page and check that GPU write succeed. Abusing the scratch > register write back feature is the easiest way to try that. I'm planning to add a GART table check procedure when resume, which will go over GPU gart table: 1. read(backup) a dword at end of each GPU page 2. write a mark by GPU and check it 3. restore the original dword Hopefully, this can do some help. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
>> 在 2012年2月15日 下午11:53,Jerome Glisse 写道: >>> To me it looks like the CP is trying to fetch memory but the >>> GPU memory controller fail to fullfill cp request. Did you >>> check the PCI configuration before & after (when things don't >>> work) My best guest is PCI bus mastering is no properly working >>> or the PCIE GPU gart table as wrong data. >>> >>> Maybe one need to drop bus master and reenable bus master to >>> work around some bug... >> Thanks for your suggestion. We've tried the 'drop and reenable master' >> trick, unfortunately doesn't work. >> The PCI configuration compare will be done later. > Update: We've checked the first 64 bytes of PCI configuration space > before & after, and didn't find any difference. Hi, Status update: We try to analyze the GPU instruction stream when lockup today. The lockup always occurs after tasks restarting, so the related instructions should reside at ib, as pointed by dmesg: [ 2456.585937] GPU lockup (waiting for 0x0002F98B last fence id 0x0002F98A) Print instructions in related ib: [ 2462.492187] PM4 block 10 has 115 instructions, with fence seq 2f98b [ 2462.976562] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2462.984375] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2462.988281] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2462.992187] Type3:PACKET3_SET_ALU_CONST ref_addr [ 2462.996093] Type3:PACKET3_SURFACE_SYNC ref_addr 18c880 [ 2463.003906] Type3:PACKET3_SET_RESOURCE ref_addr [ 2463.007812] Type3:PACKET3_SET_CONFIG_REG ref_addr [ 2463.011718] Type3:PACKET3_INDEX_TYPE ref_addr [ 2463.015625] Type3:PACKET3_NUM_INSTANCES ref_addr [ 2463.019531] Type3:PACKET3_DRAW_INDEX_AUTO ref_addr [ 2463.027343] Type3:PACKET3_EVENT_WRITE ref_addr [ 2463.031250] Type3:PACKET3_SET_CONFIG_REG ref_addr [ 2463.035156] Type3:PACKET3_SURFACE_SYNC ref_addr 10f680 [ 2463.039062] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2463.046875] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2463.050781] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2463.054687] Type3:PACKET3_SET_BOOL_CONST ref_addr [ 2463.062500] Type3:PACKET3_SURFACE_SYNC ref_addr 10668e CP_COHER_BASE was 0x0018C880, so the instruction which caused lockup should be in: [ 2462.996093] Type3:PACKET3_SURFACE_SYNC ref_addr 18c880 ... [ 2463.035156] Type3:PACKET3_SURFACE_SYNC ref_addr 10f680 Here, only SURFACE_SYNC, SET_RESOURCE and EVENT_WRITE will access GPU memory. We guess it maybe SURFACE_SYNC? BTW, when lockup happens, if places the CP ring at vram, ring_test will pass, but ib_test fails -- which suggests ME fails to feed CP when lockup? May a former SURFACE_SYNC block the MC? P.S. We hack to place CP ring, ib and ih at vram and disable wb(radeon_no_wb=1) in today's debugging. Any idea? Regards, -- Chen Jie ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
在 2012年2月17日 下午5:27,Chen Jie 写道: >> One good way to test gart is to go over GPU gart table and write a >> dword using the GPU at end of each page something like 0xCAFEDEAD >> or somevalue that is unlikely to be already set. And then go over >> all the page and check that GPU write succeed. Abusing the scratch >> register write back feature is the easiest way to try that. > I'm planning to add a GART table check procedure when resume, which > will go over GPU gart table: > 1. read(backup) a dword at end of each GPU page > 2. write a mark by GPU and check it > 3. restore the original dword Attachment validateGART.patch do the job: * It current only works for mips64 platform. * To use it, apply all_in_vram.patch first, which will allocate CP ring, ih, ib in VRAM and hard code no_wb=1. The gart test routine will be invoked in r600_resume. We've tried it, and find that when lockup happened the gart table was good before userspace restarting. The related dmesg follows: [ 1521.820312] [drm] r600_gart_table_validate(): Validate GART Table at 90004004, 32768 entries, Dummy Page[0x0e004000-0x0e007fff] [ 1522.019531] [drm] r600_gart_table_validate(): Sweep 32768 entries(valid=8544, invalid=24224, total=32768). ... [ 1531.156250] PM: resume of devices complete after 9396.588 msecs [ 1532.152343] Restarting tasks ... done. [ 1544.468750] radeon :01:05.0: GPU lockup CP stall for more than 10003msec [ 1544.472656] [ cut here ] [ 1544.480468] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:243 radeon_fence_wait+0x25c/0x314() [ 1544.488281] GPU lockup (waiting for 0x0002136B last fence id 0x0002136A) ... [ 1544.886718] radeon :01:05.0: Wait for MC idle timedout ! [ 1545.046875] radeon :01:05.0: Wait for MC idle timedout ! [ 1545.062500] radeon :01:05.0: WB disabled [ 1545.097656] [drm] ring test succeeded in 0 usecs [ 1545.105468] [drm] ib test succeeded in 0 usecs [ 1545.109375] [drm] Enabling audio support [ 1545.113281] [drm] r600_gart_table_validate(): Validate GART Table at 90004004, 32768 entries, Dummy Page[0x0e004000-0x0e007fff] [ 1545.125000] [drm:r600_gart_table_validate] *ERROR* Iter=0: unexpected value 0x745aaad1(expect 0xDEADBEEF) entry=0x0e008067, orignal=0x745aaad1 ... /* System blocked here. */ Any idea? BTW, we find the following in r600_pcie_gart_enable() (drivers/gpu/drm/radeon/r600.c): WREG32(VM_CONTEXT0_PROTECTION_FAULT_DEFAULT_ADDR, (u32)(rdev->dummy_page.addr >> 12)); On our platform, PAGE_SIZE is 16K, does it have any problem? Also in radeon_gart_unbind() and radeon_gart_restore(), the logic should change to: for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) { radeon_gart_set_page(rdev, t, page_base); - page_base += RADEON_GPU_PAGE_SIZE; + if (page_base != rdev->dummy_page.addr) + page_base += RADEON_GPU_PAGE_SIZE; } ??? Regards, -- Chen Jie diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 53dbf50..e5961ed 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2215,6 +2218,8 @@ int r600_cp_resume(struct radeon_device *rdev) void r600_cp_commit(struct radeon_device *rdev) { + if ((rdev->cp.ring_obj->tbo.mem.placement & TTM_PL_MASK_MEM) == TTM_PL_FLAG_VRAM) + WREG32(R_005480_HDP_MEM_COHERENCY_FLUSH_CNTL, 0x1); WREG32(CP_RB_WPTR, rdev->cp.wptr); (void)RREG32(CP_RB_WPTR); } @@ -2754,7 +2764,7 @@ static int r600_ih_ring_alloc(struct radeon_device *rdev) if (rdev->ih.ring_obj == NULL) { r = radeon_bo_create(rdev, NULL, rdev->ih.ring_size, true, - RADEON_GEM_DOMAIN_GTT, + RADEON_GEM_DOMAIN_VRAM, &rdev->ih.ring_obj); if (r) { DRM_ERROR("radeon: failed to create ih ring buffer (%d).\n", r); @@ -2764,7 +2774,7 @@ static int r600_ih_ring_alloc(struct radeon_device *rdev) if (unlikely(r != 0)) return r; r = radeon_bo_pin(rdev->ih.ring_obj, - RADEON_GEM_DOMAIN_GTT, + RADEON_GEM_DOMAIN_VRAM, &rdev->ih.gpu_addr); if (r) { radeon_bo_unreserve(rdev->ih.ring_obj); @@ -3444,6 +3454,8 @@ restart_ih: if (queue_hotplug) queue_work(rdev->wq, &rdev->hotplug_work); rdev->ih.rptr = rptr; + if ((rdev->ih.ring_obj->tbo.mem.placement & TTM_PL_MASK_MEM) == TTM_PL_FLAG_VRAM) + WREG32(R_005480_HDP_MEM_COHERENCY_FLUSH_CNTL, 0x1); WREG32(IH_RB_RPTR, rdev->ih.rptr); spin_unlock_irqrestore(&rdev->ih.lock, flags); return IRQ_HANDLED; diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index 795403b..c5326e0 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -82,13 +82,13 @@ void radeon_debugfs_cleanup(struct drm_minor *minor); #endif -int radeon_no_wb; +int radeon_no_wb = 1;
Re: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, For this occasional GPU lockup when returns from STR/STD, I find followings(when the problem happens): The value of SRBM_STATUS is whether 0x20002040 or 0x20003040. Which means: * HI_RQ_PENDING(There is a HI/BIF request pending in the SRBM) * MCDW_BUSY(Memory Controller Block is Busy) * BIF_BUSY(Bus Interface is Busy) * MCDX_BUSY(Memory Controller Block is Busy) if is 0x20003040 Are MCDW_BUSY and MCDX_BUSY two memory channels? What is the relationship among GART mapped memory, On-board video memory and MCDX, MCDW? CP_STAT: the CSF_RING_BUSY is always set. There are many CP_PACKET2(0x8000) in CP ring(more than three hundreds). e.g. r[131800]=0x00028000 r[131801]=0xc0016800 r[131802]=0x0140 r[131803]=0x79c5 r[131804]=0x304a r[131805] ... r[132143]=0x8000 r[132144]=0x After the first reset, GPU will lockup again, this time, typically there are 320 dwords in CP ring -- with 319 CP_PACKET2 and 0xc0033d00 in the end. Are these normal? BTW, is there any way for X to switch to NOACCEL mode when the problem happens? Thus users will have a chance to save their documents and then reboot machine. Regards, -- Chen Jie ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
UVD status on loongson 3a platform
Hi all, Recently, the uvd supporting is released, and we've tried it on loongson 3a platform. Brief introduction about loongson 3a, it's a MIPS III compatible, 4 cores processor. More details about the platform [1]: * The Board: RS780E + SB710 chipset, with an AMD radeon HD6570 video card * The kernel is 64bits(n64 ABI), and the userland is 32bits(o32 ABI) * OS: LOonux 3.3.6 [2] + LTP-uvd-installer-20130419.bin [3] ** kernel: 3.9 + uvd related patches ** mesa: git master version (d0e9aa) We tried three video samples: * big_buck_bunny_1080p_h264.mov (http://mirrorblender.top-ix.org/peach/bigbuckbunny_movies/big_buck_bunny_1080p_h264.mov) * Sintel.2010.2K.x264-VODO.mp4 (http://dev.lemote.com/files/upload/software/UVD-debug/Sintel.2010.2K.x264-VODO.mp4) * test.avi (http://dev.lemote.com/files/upload/software/UVD-debug/test.avi) For big_buck_bunny_1080p_h264.mov, the playback is not very fluent at the beginning, and it has some video mosaic. We've recorded a video for it, see http://dev.lemote.com/files/upload/software/UVD-debug/bbb-1080P.mp4 For video mosaic, what could it be caused by? For Sintel.2010.2K.x264-VODO.mp4, it has a very long wait for the first frame. We've also recorded a video for it, see http://dev.lemote.com/files/upload/software/UVD-debug/sintel.2K.mp4 Any idea about the long wait for the first frame? For test.avi(video: ITU H.264, 1920x1080), it's playing back perfectly! Thanks for the effort on UVD! In all of these tests, the CPU usage is around 50%, and all three video samples play well on X86 platform with the same video card. BTW, 785G also has UVD2.0, is it supported currently? Or will it be supported in the near future? Regards, Chen Jie [1] http://www.lemote.com/products/computer/fulong/348.html (zh_CN) [2] http://dev.lemote.com/653.html (zh_CN) [3] http://dev.lemote.com/663.html (zh_CN) ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon/kms: set DMA mask properly on newer PCI asics
Hi Alex, Sorry for the late reply. I tried the patch on our mipsel platform, but got the following: [1.335937] [drm] Loading RS780 Microcode [1.910156] [drm:r600_ring_test] *ERROR* radeon: ring test failed (scratch(0x8504)=0xCAFEDEAD) [1.917968] radeon :01:05.0: disabling GPU acceleration The platform is equipped with 1G memory, and the physical address layout is: [0-256M] physical memory [256M - 4352M] hole [4352M - ] physical memory After applying the patch, the ring buffer BO is allocated at physical address(and is equal to the bus address) near 5G. I doubt RS780 fails to access such high bus address? (I can't validate it on X86+rs780e, since I doesn't have >4G memory at hand, could somebody please to validate it?) BTW, I found radeon_gart_bind() will call pci_map_page(), it hooks to swiotlb_map_page on our platform, which seems allocates and returns dma_addr_t of a new page from pool if not meet dma_mask. Seems a bug, since the BO backed by one set of pages, but mapped to GART was another set of pages? Regards, -- cee1 2011/10/5 > From: Alex Deucher > > If a card wasn't PCIE, we always set the DMA mask to 32 bits. > This is only applies to the old rage128/r1xx gart block on > early radeon asics (~r1xx-r4xx). Newer PCI and IGP cards > can handle 40 bits just fine. > > Signed-off-by: Alex Deucher > Cc: Chen Jie > --- > drivers/gpu/drm/radeon/radeon_device.c |7 --- > 1 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_device.c > b/drivers/gpu/drm/radeon/radeon_device.c > index b51e157..2c3429d 100644 > --- a/drivers/gpu/drm/radeon/radeon_device.c > +++ b/drivers/gpu/drm/radeon/radeon_device.c > @@ -750,14 +750,15 @@ int radeon_device_init(struct radeon_device *rdev, > >/* set DMA mask + need_dma32 flags. > * PCIE - can handle 40-bits. > -* IGP - can handle 40-bits (in theory) > +* IGP - can handle 40-bits > * AGP - generally dma32 is safest > -* PCI - only dma32 > +* PCI - dma32 for legacy pci gart, 40 bits on newer asics > */ >rdev->need_dma32 = false; >if (rdev->flags & RADEON_IS_AGP) >rdev->need_dma32 = true; > - if (rdev->flags & RADEON_IS_PCI) > + if ((rdev->flags & RADEON_IS_PCI) && > + (rdev->family < CHIP_RS400)) >rdev->need_dma32 = true; > >dma_bits = rdev->need_dma32 ? 32 : 40; > -- > 1.7.1.1 > > ___ > dri-devel mailing list > dri-devel at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20111018/3001a527/attachment-0001.htm>
[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, ? 2011?10?17? ??2:34? ??? > If I start X but switch to the console, then do suspend & resume, "GPU > reset" hardly happen. but there is a new problem that the IRQ of radeon > card is disabled. Maybe "GPU reset" has something to do with "IRQ > disabled"? > > I have tried "irqpoll", it doesn't fix this problem. > > [ 571.914062] irq 6: nobody cared (try booting with the "irqpoll" option) > [ 571.914062] Call Trace: > [ 571.914062] [] dump_stack+0x8/0x34 > [ 571.914062] [] __report_bad_irq.clone.6+0x44/0x15c > [ 571.914062] [] note_interrupt+0x204/0x2a0 > [ 571.914062] [] handle_irq_event_percpu+0x19c/0x1f8 > [ 571.914062] [] handle_irq_event+0x68/0xa8 > [ 571.914062] [] handle_level_irq+0xd8/0x13c > [ 571.914062] [] generic_handle_irq+0x48/0x58 > [ 571.914062] [] do_IRQ+0x18/0x24 > [ 571.914062] [] mach_irq_dispatch+0xf0/0x194 > [ 571.914062] [] ret_from_irq+0x0/0x4 > [ 571.914062] > [ 571.914062] handlers: > [ 571.914062] [] radeon_driver_irq_handler_kms > > P.S.: use the latest kernel from git, and irq6 is not shared by other > devices. > > Does fence_wait depends on GPU's interrupt? If yes, then can I say "GPU lockup" is caused by unexpected disabling of GPU's irq? > > Hi Alex, Michel > > > > 2011/10/5 Alex Deucher > > > >> 2011/10/5 Michel D?zer : > >> > On Don, 2011-09-29 at 17:17 +0800, Chen Jie wrote: > >> >> > >> >> We got occasionally "GPU lockup" after resuming from suspend(on > >> mipsel > >> >> platform with a mips64 compatible CPU and rs780e, the kernel is > >> >> 3.1.0-rc8 64bit). Related kernel message: > >> > > >> > [...] > >> > > >> >> [ 177.085937] radeon :01:05.0: GPU lockup CP stall for more than > >> >> 10019msec > >> >> [ 177.089843] [ cut here ] > >> >> [ 177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 > >> >> radeon_fence_wait+0x25c/0x33c() > >> >> [ 177.105468] GPU lockup (waiting for 0x13C3 last fence id > >> >> 0x13AD) > >> >> [ 177.113281] Modules linked in: psmouse serio_raw > >> >> [ 177.117187] Call Trace: > >> >> [ 177.121093] [] dump_stack+0x8/0x34 > >> >> [ 177.125000] [] warn_slowpath_common+0x78/0xa0 > >> >> [ 177.132812] [] warn_slowpath_fmt+0x38/0x44 > >> >> [ 177.136718] [] radeon_fence_wait+0x25c/0x33c > >> >> [ 177.144531] [] ttm_bo_wait+0x108/0x220 > >> >> [ 177.148437] [] radeon_gem_wait_idle_ioctl > >> >> +0x80/0x114 > >> >> [ 177.156250] [] drm_ioctl+0x2e4/0x3fc > >> >> [ 177.160156] [] radeon_kms_compat_ioctl+0x28/0x38 > >> >> [ 177.167968] [] compat_sys_ioctl+0x120/0x35c > >> >> [ 177.171875] [] handle_sys+0x118/0x138 > >> >> [ 177.179687] ---[ end trace 92f63d998efe4c6d ]--- > >> >> [ 177.187500] radeon :01:05.0: GPU softreset > >> >> [ 177.191406] radeon :01:05.0: R_008010_GRBM_STATUS=0xF57C2030 > >> >> [ 177.195312] radeon :01:05.0: > >> R_008014_GRBM_STATUS2=0x0003 > >> >> [ 177.203125] radeon :01:05.0: R_000E50_SRBM_STATUS=0x20023040 > >> >> [ 177.363281] radeon :01:05.0: Wait for MC idle timedout ! > >> > > >> > [...] > >> > > >> >> What may cause a "GPU lockup"? > >> > > >> > Lots of things... The most common cause is an incorrect command stream > >> > sent to the GPU by userspace or the kernel. > >> > > >> >> Why reset didn't work? > >> > > >> > Might be related to 'Wait for MC idle timedout !', but I don't know > >> > offhand what could be up with that. > >> > > >> > > >> >> BTW, one question: > >> >> I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes > >> >> need_dma32 was set. > >> >> Is it correct? (drivers/char/agp is not available on mips, could that > >> >> be the reason?) > >> > > >> > Not sure, Alex? > >> > >> You don't AGP for newer IGP cards (rs4xx+). It gets set by default if > >> the card is not AGP or PCIE. That should be changed as only the > >> legacy r1xx PCI GART block has that limitation. I'll send a patch out > >> shortly. > >> > >> Got it, thanks for the reply. > > > -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20111018/95b83eaf/attachment.htm>
[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, Add more information. We got occasionally "GPU lockup" after resuming from suspend(on mipsel platform with a mips64 compatible CPU and rs780e, the kernel is 3.1.0-rc8 64bit). Related kernel message: /* return from STR */ [ 156.152343] radeon :01:05.0: WB enabled [ 156.187500] [drm] ring test succeeded in 0 usecs [ 156.187500] [drm] ib test succeeded in 0 usecs [ 156.398437] ata2: SATA link down (SStatus 0 SControl 300) [ 156.398437] ata3: SATA link down (SStatus 0 SControl 300) [ 156.398437] ata4: SATA link down (SStatus 0 SControl 300) [ 156.578125] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 156.597656] ata1.00: configured for UDMA/133 [ 156.613281] usb 1-5: reset high speed USB device number 4 using ehci_hcd [ 157.027343] usb 3-2: reset low speed USB device number 2 using ohci_hcd [ 157.609375] usb 3-3: reset low speed USB device number 3 using ohci_hcd [ 157.683593] r8169 :02:00.0: eth0: link up [ 165.621093] PM: resume of devices complete after 9679.556 msecs [ 165.628906] Restarting tasks ... done. [ 177.085937] radeon :01:05.0: GPU lockup CP stall for more than 10019msec [ 177.089843] [ cut here ] [ 177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x25c/0x33c() [ 177.105468] GPU lockup (waiting for 0x13C3 last fence id 0x13AD) [ 177.113281] Modules linked in: psmouse serio_raw [ 177.117187] Call Trace: [ 177.121093] [] dump_stack+0x8/0x34 [ 177.125000] [] warn_slowpath_common+0x78/0xa0 [ 177.132812] [] warn_slowpath_fmt+0x38/0x44 [ 177.136718] [] radeon_fence_wait+0x25c/0x33c [ 177.144531] [] ttm_bo_wait+0x108/0x220 [ 177.148437] [] radeon_gem_wait_idle_ioctl+0x80/0x114 [ 177.156250] [] drm_ioctl+0x2e4/0x3fc [ 177.160156] [] radeon_kms_compat_ioctl+0x28/0x38 [ 177.167968] [] compat_sys_ioctl+0x120/0x35c [ 177.171875] [] handle_sys+0x118/0x138 [ 177.179687] ---[ end trace 92f63d998efe4c6d ]--- [ 177.187500] radeon :01:05.0: GPU softreset [ 177.191406] radeon :01:05.0: R_008010_GRBM_STATUS=0xF57C2030 [ 177.195312] radeon :01:05.0: R_008014_GRBM_STATUS2=0x0003 [ 177.203125] radeon :01:05.0: R_000E50_SRBM_STATUS=0x20023040 [ 177.363281] radeon :01:05.0: Wait for MC idle timedout ! [ 177.367187] radeon :01:05.0: R_008020_GRBM_SOFT_RESET=0x7FEE [ 177.390625] radeon :01:05.0: R_008020_GRBM_SOFT_RESET=0x0001 [ 177.414062] radeon :01:05.0: R_008010_GRBM_STATUS=0xA0003030 [ 177.417968] radeon :01:05.0: R_008014_GRBM_STATUS2=0x0003 [ 177.425781] radeon :01:05.0: R_000E50_SRBM_STATUS=0x2002B040 [ 177.433593] radeon :01:05.0: GPU reset succeed [ 177.605468] radeon :01:05.0: Wait for MC idle timedout ! [ 177.761718] radeon :01:05.0: Wait for MC idle timedout ! [ 177.804687] radeon :01:05.0: WB enabled [ 178.00] [drm:r600_ring_test] *ERROR* radeon: ring test failed (scratch(0x8504)=0xCAFEDEAD) [ 178.007812] [drm:r600_resume] *ERROR* r600 startup failed on resume [ 178.988281] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(5). [ 178.996093] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB ! [ 179.003906] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(6). ... What may cause a "GPU lockup"? Why reset didn't work? Any idea? BTW, one question: I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes need_dma32 was set. Is it correct? (drivers/char/agp is not available on mips, could that be the reason?) [ 177.179687]? 2011?9?28? ??3:23? ??? > Hi Alex, > > When we do STR (S3) with a RS780E radeon card on MIPS platform. "GPU > reset" may happen after resume (the possibility is about 5%). After that, > X is unusuable. > > We know there is a "ring test" at system resume time and GPU reset time. > Whether GPU reset happens, the "ring test" at system resume time is always > successful. But the "ring test" at GPU reset time usually fails. > > We use the latest kernel (3.1.0-RC8 from git) and X.org is 7.6. > > Any ideas? > > Best regards, > Huacai Chen > > Regards, - Chen Jie -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20110929/718d8ecf/attachment.htm>
[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, Status update about the problem 'Occasionally "GPU lockup" after resuming from suspend.' First, this could happen when system returns from a STR(suspend to ram) or STD(suspend to disk, aka hibernation). When returns from STD, the initialization process is most similar to the normal boot. The standby is ok, which is similar to STR, except that standby will not shutdown the power of CPU,GPU etc. We've dumped and compared the registers, and found something: CP_STAT normal value: 0x value when this problem occurred: 0x802100C1 or 0x802300C1 CP_ME_CNTL normal value: 0x00FF value when this problem occurred: always 0x20FF in our test Questions: According to the manual, CP_STAT = 0x802100C1 means CSF_RING_BUSY(bit 0): The Ring fetcher still has command buffer data to fetch, or the PFP still has data left to process from the reorder queue. CSF_BUSY(bit 6): The input FIFOs have command buffers to fetch, or one or more of the fetchers are busy, or the arbiter has a request to send to the MIU. MIU_RDREQ_BUSY(bit 7): The read path logic inside the MIU is busy. MEQ_BUSY(bit 16): The PFP-to-ME queue has valid data in it. SURFACE_SYNC_BUSY(bit 21): The Surface Sync unit is busy. CP_BUSY(bit 31): Any block in the CP is busy. What does it suggest? What does it mean if bit 29 of CP_ME_CNTL is set? BTW, how does the dummy page work in GART? Regards, -- Chen Jie ? 2011?12?7? ??10:21?Alex Deucher ??? > 2011/12/7 : >> When "MC timeout" happens at GPU reset, we found the 12th and 13th >> bits of R_000E50_SRBM_STATUS is 1. From kernel code we found these >> two bits are like this: >> #define G_000E50_MCDX_BUSY(x) (((x) >> 12) & 1) >> #define G_000E50_MCDW_BUSY(x) (((x) >> 13) & 1) >> >> Could you please tell me what does they mean? And if possible, > > They refer to sub-blocks in the memory controller. I don't really > know off hand what the name mean. > >> I want to know the functionalities of these 5 registers in detail: >> #define R_000E60_SRBM_SOFT_RESET 0x0E60 >> #define R_000E50_SRBM_STATUS 0x0E50 >> #define R_008020_GRBM_SOFT_RESET0x8020 >> #define R_008010_GRBM_STATUS0x8010 >> #define R_008014_GRBM_STATUS2 0x8014 >> >> A bit more info: If I reset the MC after resetting CP (this is what >> Linux-2.6.34 does, but removed since 2.6.35), then "MC timeout" will >> disappear, but there is still "ring test failed". > > The bits are defined in r600d.h. As to the acronyms: > BIF - Bus InterFace > CG - clocks > DC - Display Controller > GRBM - Graphics block (3D engine) > HDP - Host Data Path (CPU access to vram via the PCI BAR) > IH, RLC - Interrupt controller > MC - Memory controller > ROM - ROM > SEM - semaphore controller > > When you reset the MC, you will probably have to reset just about > everything else since most blocks depend on the MC for access to > memory. If you do reset the MC, you should do it at prior to calling > asic_init so you make sure all the hw gets re-initialized properly. > Additionally, you should probably reset the GRBM either via > SRBM_SOFT_RESET or the individual sub-blocks via GRBM_SOFT_RESET. > > Alex > >> >> Huacai Chen >> >>> 2011/11/8 : >>>> And, I want to know something: >>>> 1, Does GPU use MC to access GTT? >>> >>> Yes. All GPU clients (display, 3D, etc.) go through the MC to access >>> memory (vram or gart). >>> >>>> 2, What can cause MC timeout? >>> >>> Lots of things. Some GPU client still active, some GPU client hung or >>> not properly initialized. >>> >>> Alex >>> >>>> >>>>> Hi, >>>>> >>>>> Some status update. >>>>> ? 2011?9?29? ??5:17?Chen Jie ??? >>>>>> Hi, >>>>>> Add more information. >>>>>> We got occasionally "GPU lockup" after resuming from suspend(on mipsel >>>>>> platform with a mips64 compatible CPU and rs780e, the kernel is >>>>>> 3.1.0-rc8 >>>>>> 64bit). Related kernel message: >>>>>> /* return from STR */ >>>>>> [ 156.152343] radeon :01:05.0: WB enabled >>>>>> [ 156.187500] [drm] ring test succeeded in 0 usecs >>>>>> [ 156.187500] [drm] ib test succeeded in 0 usecs >>>>&
[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, ? 2012?2?15? ??11:53?Jerome Glisse ??? > To me it looks like the CP is trying to fetch memory but the > GPU memory controller fail to fullfill cp request. Did you > check the PCI configuration before & after (when things don't > work) My best guest is PCI bus mastering is no properly working > or the PCIE GPU gart table as wrong data. > > Maybe one need to drop bus master and reenable bus master to > work around some bug... Thanks for your suggestion. We've tried the 'drop and reenable master' trick, unfortunately doesn't work. The PCI configuration compare will be done later. Some additional information: The "GPU Lockup" seems always occur after tasks be restarting -- We inserted more ring tests , non of them failed before restarting tasks. BTW, I hacked GART table to try to simulate the problem: 1. Changes the system memory address(bus address) of ring_obj to an arbitrary value, e.g. 0 or 128M. 2. Changes the system memory address of a BO in radeon_test to an arbitrary value, e.g. 0 Non of above leaded to a GPU Lockup: Point 1 rendered a black screen; Point 2 only the test itself failed Any idea? Regards, -- Chen Jie
[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
? 2012?2?16? ??5:21?Chen Jie ??? > Hi, > > ? 2012?2?15? ??11:53?Jerome Glisse ??? >> To me it looks like the CP is trying to fetch memory but the >> GPU memory controller fail to fullfill cp request. Did you >> check the PCI configuration before & after (when things don't >> work) My best guest is PCI bus mastering is no properly working >> or the PCIE GPU gart table as wrong data. >> >> Maybe one need to drop bus master and reenable bus master to >> work around some bug... > Thanks for your suggestion. We've tried the 'drop and reenable master' > trick, unfortunately doesn't work. > The PCI configuration compare will be done later. Update: We've checked the first 64 bytes of PCI configuration space before & after, and didn't find any difference. Regards, -- Chen Jie
[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
? 2012?2?17? ??12:32?Jerome Glisse ??? > Ok let's start from the begining, i convince it's related to GPU > memory controller failing to full fill some request that hit system > memory. So in another mail you wrote : > >> BTW, I found radeon_gart_bind() will call pci_map_page(), it hooks >> to swiotlb_map_page on our platform, which seems allocates and returns >> dma_addr_t of a new page from pool if not meet dma_mask. Seems a bug, since >> the BO backed by one set of pages, but mapped to GART was another set of >> pages? > > Is this still the case ? As this is obviously wrong, we fixed that > recently. What drm code are you using. rs780 dma mask is something > like 40bits iirc so you should never have issue on your system with > 1G of memory right ? Right. > > If you have an iommu what happens on resume ? Are all page previously > mapped with pci map page still valid ? The physical address is directly mapped to bus address, so iommu do nothing on resume, the pages should be valid? > > One good way to test gart is to go over GPU gart table and write a > dword using the GPU at end of each page something like 0xCAFEDEAD > or somevalue that is unlikely to be already set. And then go over > all the page and check that GPU write succeed. Abusing the scratch > register write back feature is the easiest way to try that. I'm planning to add a GART table check procedure when resume, which will go over GPU gart table: 1. read(backup) a dword at end of each GPU page 2. write a mark by GPU and check it 3. restore the original dword Hopefully, this can do some help.
[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
>> ? 2012?2?15? ??11:53?Jerome Glisse ??? >>> To me it looks like the CP is trying to fetch memory but the >>> GPU memory controller fail to fullfill cp request. Did you >>> check the PCI configuration before & after (when things don't >>> work) My best guest is PCI bus mastering is no properly working >>> or the PCIE GPU gart table as wrong data. >>> >>> Maybe one need to drop bus master and reenable bus master to >>> work around some bug... >> Thanks for your suggestion. We've tried the 'drop and reenable master' >> trick, unfortunately doesn't work. >> The PCI configuration compare will be done later. > Update: We've checked the first 64 bytes of PCI configuration space > before & after, and didn't find any difference. Hi, Status update: We try to analyze the GPU instruction stream when lockup today. The lockup always occurs after tasks restarting, so the related instructions should reside at ib, as pointed by dmesg: [ 2456.585937] GPU lockup (waiting for 0x0002F98B last fence id 0x0002F98A) Print instructions in related ib: [ 2462.492187] PM4 block 10 has 115 instructions, with fence seq 2f98b [ 2462.976562] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2462.984375] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2462.988281] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2462.992187] Type3:PACKET3_SET_ALU_CONST ref_addr [ 2462.996093] Type3:PACKET3_SURFACE_SYNC ref_addr 18c880 [ 2463.003906] Type3:PACKET3_SET_RESOURCE ref_addr [ 2463.007812] Type3:PACKET3_SET_CONFIG_REG ref_addr [ 2463.011718] Type3:PACKET3_INDEX_TYPE ref_addr [ 2463.015625] Type3:PACKET3_NUM_INSTANCES ref_addr [ 2463.019531] Type3:PACKET3_DRAW_INDEX_AUTO ref_addr [ 2463.027343] Type3:PACKET3_EVENT_WRITE ref_addr [ 2463.031250] Type3:PACKET3_SET_CONFIG_REG ref_addr [ 2463.035156] Type3:PACKET3_SURFACE_SYNC ref_addr 10f680 [ 2463.039062] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2463.046875] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2463.050781] Type3:PACKET3_SET_CONTEXT_REG ref_addr [ 2463.054687] Type3:PACKET3_SET_BOOL_CONST ref_addr [ 2463.062500] Type3:PACKET3_SURFACE_SYNC ref_addr 10668e CP_COHER_BASE was 0x0018C880, so the instruction which caused lockup should be in: [ 2462.996093] Type3:PACKET3_SURFACE_SYNC ref_addr 18c880 ... [ 2463.035156] Type3:PACKET3_SURFACE_SYNC ref_addr 10f680 Here, only SURFACE_SYNC, SET_RESOURCE and EVENT_WRITE will access GPU memory. We guess it maybe SURFACE_SYNC? BTW, when lockup happens, if places the CP ring at vram, ring_test will pass, but ib_test fails -- which suggests ME fails to feed CP when lockup? May a former SURFACE_SYNC block the MC? P.S. We hack to place CP ring, ib and ih at vram and disable wb(radeon_no_wb=1) in today's debugging. Any idea? Regards, -- Chen Jie
[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
? 2012?2?17? ??5:27?Chen Jie ??? >> One good way to test gart is to go over GPU gart table and write a >> dword using the GPU at end of each page something like 0xCAFEDEAD >> or somevalue that is unlikely to be already set. And then go over >> all the page and check that GPU write succeed. Abusing the scratch >> register write back feature is the easiest way to try that. > I'm planning to add a GART table check procedure when resume, which > will go over GPU gart table: > 1. read(backup) a dword at end of each GPU page > 2. write a mark by GPU and check it > 3. restore the original dword Attachment validateGART.patch do the job: * It current only works for mips64 platform. * To use it, apply all_in_vram.patch first, which will allocate CP ring, ih, ib in VRAM and hard code no_wb=1. The gart test routine will be invoked in r600_resume. We've tried it, and find that when lockup happened the gart table was good before userspace restarting. The related dmesg follows: [ 1521.820312] [drm] r600_gart_table_validate(): Validate GART Table at 90004004, 32768 entries, Dummy Page[0x0e004000-0x0e007fff] [ 1522.019531] [drm] r600_gart_table_validate(): Sweep 32768 entries(valid=8544, invalid=24224, total=32768). ... [ 1531.156250] PM: resume of devices complete after 9396.588 msecs [ 1532.152343] Restarting tasks ... done. [ 1544.468750] radeon :01:05.0: GPU lockup CP stall for more than 10003msec [ 1544.472656] [ cut here ] [ 1544.480468] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:243 radeon_fence_wait+0x25c/0x314() [ 1544.488281] GPU lockup (waiting for 0x0002136B last fence id 0x0002136A) ... [ 1544.886718] radeon :01:05.0: Wait for MC idle timedout ! [ 1545.046875] radeon :01:05.0: Wait for MC idle timedout ! [ 1545.062500] radeon :01:05.0: WB disabled [ 1545.097656] [drm] ring test succeeded in 0 usecs [ 1545.105468] [drm] ib test succeeded in 0 usecs [ 1545.109375] [drm] Enabling audio support [ 1545.113281] [drm] r600_gart_table_validate(): Validate GART Table at 90004004, 32768 entries, Dummy Page[0x0e004000-0x0e007fff] [ 1545.125000] [drm:r600_gart_table_validate] *ERROR* Iter=0: unexpected value 0x745aaad1(expect 0xDEADBEEF) entry=0x0e008067, orignal=0x745aaad1 ... /* System blocked here. */ Any idea? BTW, we find the following in r600_pcie_gart_enable() (drivers/gpu/drm/radeon/r600.c): WREG32(VM_CONTEXT0_PROTECTION_FAULT_DEFAULT_ADDR, (u32)(rdev->dummy_page.addr >> 12)); On our platform, PAGE_SIZE is 16K, does it have any problem? Also in radeon_gart_unbind() and radeon_gart_restore(), the logic should change to: for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) { radeon_gart_set_page(rdev, t, page_base); - page_base += RADEON_GPU_PAGE_SIZE; + if (page_base != rdev->dummy_page.addr) + page_base += RADEON_GPU_PAGE_SIZE; } ??? Regards, -- Chen Jie -- next part -- A non-text attachment was scrubbed... Name: all_in_vram.patch Type: text/x-patch Size: 3971 bytes Desc: not available URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20120221/cac7c118/attachment-0002.bin> -- next part -- A non-text attachment was scrubbed... Name: validateGART.patch Type: text/x-patch Size: 3947 bytes Desc: not available URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20120221/cac7c118/attachment-0003.bin>
[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, For this occasional GPU lockup when returns from STR/STD, I find followings(when the problem happens): The value of SRBM_STATUS is whether 0x20002040 or 0x20003040. Which means: * HI_RQ_PENDING(There is a HI/BIF request pending in the SRBM) * MCDW_BUSY(Memory Controller Block is Busy) * BIF_BUSY(Bus Interface is Busy) * MCDX_BUSY(Memory Controller Block is Busy) if is 0x20003040 Are MCDW_BUSY and MCDX_BUSY two memory channels? What is the relationship among GART mapped memory, On-board video memory and MCDX, MCDW? CP_STAT: the CSF_RING_BUSY is always set. There are many CP_PACKET2(0x8000) in CP ring(more than three hundreds). e.g. r[131800]=0x00028000 r[131801]=0xc0016800 r[131802]=0x0140 r[131803]=0x79c5 r[131804]=0x304a r[131805] ... r[132143]=0x8000 r[132144]=0x After the first reset, GPU will lockup again, this time, typically there are 320 dwords in CP ring -- with 319 CP_PACKET2 and 0xc0033d00 in the end. Are these normal? BTW, is there any way for X to switch to NOACCEL mode when the problem happens? Thus users will have a chance to save their documents and then reboot machine. Regards, -- Chen Jie
[radeon]Question about create ring BO in VRAM
Hi all, I tried to create/pin ring BO in VRAM instead of GTT to debug some ring-related problems. After I did this, it rendered a black screen in X (on a X86 RS780E board), but radeon.test passed. 'ps aux' shows X uninterruptibly sleeps on radeon. Curious why this does not work? Regards, -- Chen Jie
[radeon]Question about create ring BO in VRAM
2011/11/5 Alex Deucher : > On Fri, Nov 4, 2011 at 10:26 AM, Chen Jie wrote: >> Hi all, >> >> I tried to create/pin ring BO in VRAM instead of GTT to debug some >> ring-related problems. After I did this, it rendered a black screen in >> X (on a X86 RS780E board), but radeon.test passed. >> 'ps aux' shows X uninterruptibly sleeps on radeon. >> >> Curious why this does not work? > > The tricky part is dealing with the HDP cache. ?Access to vram via the > PCI FB BAR goes through the HDP cache, you have to make sure it's > flushed properly before the GPU starts using the data there. ?To flush > it, either read back from vram, or write 1 to the > HDP_MEM_COHERENCY_FLUSH_CNTL register. ?We generally don't recommend > putting the ring in vram. Get it, thanks. After add HDP cache flush in r600_cp_commit(), it works fine. Regards, -- Chen Jie
[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
Hi, Some status update. ? 2011?9?29? ??5:17?Chen Jie ??? > Hi, > Add more information. > We got occasionally "GPU lockup" after resuming from suspend(on mipsel > platform with a mips64 compatible CPU and rs780e, the kernel is 3.1.0-rc8 > 64bit). Related kernel message: > /* return from STR */ > [ 156.152343] radeon :01:05.0: WB enabled > [ 156.187500] [drm] ring test succeeded in 0 usecs > [ 156.187500] [drm] ib test succeeded in 0 usecs > [ 156.398437] ata2: SATA link down (SStatus 0 SControl 300) > [ 156.398437] ata3: SATA link down (SStatus 0 SControl 300) > [ 156.398437] ata4: SATA link down (SStatus 0 SControl 300) > [ 156.578125] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > [ 156.597656] ata1.00: configured for UDMA/133 > [ 156.613281] usb 1-5: reset high speed USB device number 4 using ehci_hcd > [ 157.027343] usb 3-2: reset low speed USB device number 2 using ohci_hcd > [ 157.609375] usb 3-3: reset low speed USB device number 3 using ohci_hcd > [ 157.683593] r8169 :02:00.0: eth0: link up > [ 165.621093] PM: resume of devices complete after 9679.556 msecs > [ 165.628906] Restarting tasks ... done. > [ 177.085937] radeon :01:05.0: GPU lockup CP stall for more than > 10019msec > [ 177.089843] [ cut here ] > [ 177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 > radeon_fence_wait+0x25c/0x33c() > [ 177.105468] GPU lockup (waiting for 0x13C3 last fence id 0x13AD) > [ 177.113281] Modules linked in: psmouse serio_raw > [ 177.117187] Call Trace: > [ 177.121093] [] dump_stack+0x8/0x34 > [ 177.125000] [] warn_slowpath_common+0x78/0xa0 > [ 177.132812] [] warn_slowpath_fmt+0x38/0x44 > [ 177.136718] [] radeon_fence_wait+0x25c/0x33c > [ 177.144531] [] ttm_bo_wait+0x108/0x220 > [ 177.148437] [] radeon_gem_wait_idle_ioctl+0x80/0x114 > [ 177.156250] [] drm_ioctl+0x2e4/0x3fc > [ 177.160156] [] radeon_kms_compat_ioctl+0x28/0x38 > [ 177.167968] [] compat_sys_ioctl+0x120/0x35c > [ 177.171875] [] handle_sys+0x118/0x138 > [ 177.179687] ---[ end trace 92f63d998efe4c6d ]--- > [ 177.187500] radeon :01:05.0: GPU softreset > [ 177.191406] radeon :01:05.0: R_008010_GRBM_STATUS=0xF57C2030 > [ 177.195312] radeon :01:05.0: R_008014_GRBM_STATUS2=0x0003 > [ 177.203125] radeon :01:05.0: R_000E50_SRBM_STATUS=0x20023040 > [ 177.363281] radeon :01:05.0: Wait for MC idle timedout ! > [ 177.367187] radeon :01:05.0: R_008020_GRBM_SOFT_RESET=0x7FEE > [ 177.390625] radeon :01:05.0: R_008020_GRBM_SOFT_RESET=0x0001 > [ 177.414062] radeon :01:05.0: R_008010_GRBM_STATUS=0xA0003030 > [ 177.417968] radeon :01:05.0: R_008014_GRBM_STATUS2=0x0003 > [ 177.425781] radeon :01:05.0: R_000E50_SRBM_STATUS=0x2002B040 > [ 177.433593] radeon :01:05.0: GPU reset succeed > [ 177.605468] radeon :01:05.0: Wait for MC idle timedout ! > [ 177.761718] radeon :01:05.0: Wait for MC idle timedout ! > [ 177.804687] radeon :01:05.0: WB enabled > [ 178.00] [drm:r600_ring_test] *ERROR* radeon: ring test failed > (scratch(0x8504)=0xCAFEDEAD) After pinned ring in VRAM, it warned an ib test failure. It seems something wrong with accessing through GTT. We dump gart table just after stopped cp, and compare gart table with the dumped one just after r600_pcie_gart_enable, and don't find any difference. Any idea? > [ 178.007812] [drm:r600_resume] *ERROR* r600 startup failed on resume > [ 178.988281] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule > IB(5). > [ 178.996093] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB ! > [ 179.003906] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule > IB(6). > ... Regards, -- Chen Jie
UVD status on loongson 3a platform
Hi all, Recently, the uvd supporting is released, and we've tried it on loongson 3a platform. Brief introduction about loongson 3a, it's a MIPS III compatible, 4 cores processor. More details about the platform [1]: * The Board: RS780E + SB710 chipset, with an AMD radeon HD6570 video card * The kernel is 64bits(n64 ABI), and the userland is 32bits(o32 ABI) * OS: LOonux 3.3.6 [2] + LTP-uvd-installer-20130419.bin [3] ** kernel: 3.9 + uvd related patches ** mesa: git master version (d0e9aa) We tried three video samples: * big_buck_bunny_1080p_h264.mov (http://mirrorblender.top-ix.org/peach/bigbuckbunny_movies/big_buck_bunny_1080p_h264.mov) * Sintel.2010.2K.x264-VODO.mp4 (http://dev.lemote.com/files/upload/software/UVD-debug/Sintel.2010.2K.x264-VODO.mp4) * test.avi (http://dev.lemote.com/files/upload/software/UVD-debug/test.avi) For big_buck_bunny_1080p_h264.mov, the playback is not very fluent at the beginning, and it has some video mosaic. We've recorded a video for it, see http://dev.lemote.com/files/upload/software/UVD-debug/bbb-1080P.mp4 For video mosaic, what could it be caused by? For Sintel.2010.2K.x264-VODO.mp4, it has a very long wait for the first frame. We've also recorded a video for it, see http://dev.lemote.com/files/upload/software/UVD-debug/sintel.2K.mp4 Any idea about the long wait for the first frame? For test.avi(video: ITU H.264, 1920x1080), it's playing back perfectly! Thanks for the effort on UVD! In all of these tests, the CPU usage is around 50%, and all three video samples play well on X86 platform with the same video card. BTW, 785G also has UVD2.0, is it supported currently? Or will it be supported in the near future? Regards, Chen Jie [1] http://www.lemote.com/products/computer/fulong/348.html (zh_CN) [2] http://dev.lemote.com/653.html (zh_CN) [3] http://dev.lemote.com/663.html (zh_CN)
UVD on RS880
2014/1/9 Christian K?nig : > Hi, > > The code for the first generation UVD blocks (RV6xx, RS780, RS880 and RV790) > is already implemented and I'm only waiting for the OK to release it. > > The only problem is that I don't know if and when we are getting this OK for > release. Maybe tomorrow, maybe never. It just doesn't has a high priority > for the reviewer because we don't really sell that old hardware any more. Hey, as far as I know, rs780e is still on sale. And currently, almost all of loongson3[1](both loongson3a and 3b) based machine use that chip :) So it will improve HD video playback experience on that platform. Regards, Chen Jie [1] http://en.wikipedia.org/wiki/Loongson#Loongson_3
UVD on RS880
2014/1/10 Mike Lothian : > Fingers crossed this happens soon especially now that BluRays can be played > on Linux > > 1080p VC1 does not play well on a Phenom II X4 even when multi threaded It seems UVD based VC1 playback is broken?? With a Radeon 6570 card, neither X86 nor the loongson platform give me a correct picture for my VC1 samples: * http://dev.lemote.com/files/upload/software/temp/video-test/HDVideoSamples/VC1/Robotica_1080.wmv * http://dev.lemote.com/files/upload/software/temp/video-test/HDVideoSamples/VC1/Wildlife.wmv