On 11/23/2015 09:04 PM, Harry Wentland wrote: > Hi Mario, > > when we've had issues with this on amdgpu Christian fixed it by enabling > page flip irq all the time, rather than turning it on when usermode > request a flip and turning it back off after we handled it. I believe > that fix exists on radeon already. Michel should have more info on that. > > See other comments inline. > > Thanks, > Harry > > > On 2015-11-23 11:02 AM, Mario Kleiner wrote: >> On 11/20/2015 04:42 AM, Alex Deucher wrote: >>> On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner >>> <mario.kleiner.de at gmail.com> wrote: >>>> Hi Alex and Michel and Ville, >>>> >>>> it's "fix vblank stuff" time again ;-) >>> >>> Adding Harry from our display team. He might be able to fill in the >>> blanks of on some of this better than I can. It might also be worth >>> checking to see how our DAL (our new display code which is being >>> developed directly by our display team) code handles this. It could >>> be that we are just missing register settings: >> >> Thanks Alex! And hello Harry :) >> >>> http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip >> >> I'll have a look at this. >> >>> Additionally we've published full registers headers for the display >>> block: >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce >>> >>> The DCE8 stuff should generally apply back to DCE4. If you have >>> questions about registers older asics not covered in the hw docs, let >>> me know. Note the new headers are dword aligned rather than byte >>> aligned. >>> >> >> I've tested now with two different progressive modes on DCE3 and one >> progressive mode on DCE4, the only cards i have atm. So far it seems >> that the framecounter indeed increments when the vpos from the scanout >> position query jumps back to zero. Attached for reference is my >> current patch to radeon-kms. This one seems to work reliably so far, >> also if i enable the immediate vblank irq disable which we so far only >> used on intel-kms. >> >> But according to this patch the framecounter increment happens >> somewhere in the middle of vblank. >> >> Essentially from the vpos extracted from >> EVERGREEN_CRTC_STATUS_POSITION which defines start of vblank ("Start" >> part of EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1 >> the framecounter stays on the count of the old previous scanout cycle. >> Then when vpos wraps to zero the framecounter increments by 1. And >> then we have another couple of dozen lines inside vblank until >> reaching the "End" part of EVERGREEN_CRTC_V_BLANK_START_END and >> entering active scanout for the new frame. >> >> So the position of observed framecounter increment seems to be not >> close to the end of vblank ("start of frame" indicator?), but a couple >> of scanlines after start of vblank. >> >> E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478, >> vtotal is 1481, end of vblank is 38. So i enter the vblank and see the >> old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and >> the framecounter increments by 1, then 38 scanlines later the vblank >> ends. >> >> So i seem to have something that seems to work in practice and this >> "increment framecounter if vpos wraps back to zero" behavior makes >> some sense. It just doesn't conform to what those descriptions for >> start_line and "start of frame" indicator describe? >> > This is correct. Our HW doesn't really have a vblank counter but a frame > counter. The framecounter increments at the start of vsync, which is > when we wrap to zero which doesn't coincide with the start of vblank. > > What we're trying to do with get_vblank_counter isn't the same as what > framecount gives us, but we could probably do something like: > > if (get_scanout_pos > vblank_start) > return frame_count + 1; > else > return frame_count; >
Great! That's what my current patch does and it seems to work well with different video modes on both DCE3 and DCE4. So theory agrees with practice again :) - thanks for clarifying this. So the other problem we have since forever is the vblank irq firing before the start of vblank. We are typically 1-2 scanlines before vblank_start when we sample the scanout position and framecounter. This needs a slightly ugly workaround. Is there a way, maybe some config register, to fire the irq at leading edge of vblank instead of a bit too early? thanks, -mario > Harry > >> I'll test with a few more video modes. >> >> thanks, >> -mario >> >> >>>> >>>> Ville's changes to the DRM's drm_handle_vblank() / >>>> drm_update_vblank_count() >>>> code in Linux 4.4 not only made that code more elegant, but also >>>> removed the >>>> robustness against the vblank irq quirks in AMD hw and similar >>>> hardware. So >>>> now i get tons of off-by-one errors and >>>> >>>> "[ 432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip >>>> completion event has impossible msc 24803 < target_msc 24804" XOrg >>>> messages >>>> from that kernel. >>>> >>>> One of the reasons for trouble is that AMD hw quirk where the hw >>>> fires an >>>> extra vblank irq shortly after vblank irq's get enabled, not >>>> synchronized to >>>> vblank, but typically in the middle of active scanout, so we get a >>>> redundant >>>> call to drm_handle_vblank in the middle of scanout. >>>> >>>> To fix that i have a minor patch to make drm_update_vblank_count() >>>> again >>>> robust against such redundant calls, which i will send out later to the >>>> mailing list. Diff attached for reference. >>>> >>>> The second quirk of AMD hw is that the vblank interrupt fires a few >>>> scanlines before start of vblank, so drm_handle_vblank -> >>>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets >>>> called >>>> before the start of the vblank for which the new vblank count should be >>>> queried. >>>> >>>> The third problem is that the DRM vblank handling always had the >>>> assumption >>>> that hardware vblank counters would essentially increment at leading >>>> edge of >>>> vblank - basically in sync with the firing of the vblank irq, so >>>> that a hw >>>> counter readout from within the vblank irq handler would always >>>> deliver the >>>> new incremented value. If this assumption is violated then the >>>> counting by >>>> use of the hw counter gets unreliable, because depending on random >>>> small >>>> delays in irq handling the code may end up sampling the hw counter >>>> pre- or >>>> post-increment, leading to inconsistent updating and funky bugs. It >>>> just so >>>> happens that AMD hardware doesn't increment the hw counter at >>>> leading edge >>>> of vblank, so stuff falls apart. >>>> >>>> So to fix those two problems i'm tinkering with cooking the hw vblank >>>> counter value returned by radeon_get_vblank_counter_kms() to make it >>>> appear >>>> as if the counter incremented at leading edge of vblank in sync with >>>> vblank >>>> irq. >>>> >>>> It almost sort of works on the rs600 code path, but i need a bit of >>>> info >>>> from you: >>>> >>>> 1. There's this register from the old specs for m76.pdf, which is >>>> not part >>>> of the current register defines for radeon-kms: >>>> >>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" >>>> >>>> It contains the lower 16 bits of framecounter and the 13 bits of >>>> vertical >>>> scanout position. It seems to give the same readings as the 24 bit >>>> R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This >>>> would >>>> come handy. >>>> >>>> Does Evergreen and later have a same/similar register and where is it? >>> >>> Yes. CRTC_STATUS_VF_COUNT >>> >>> CRTC:CRTC_STATUS_VF_COUNT · [R/W] · 32 bits · Access: 8/16/32 · >>> [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2] >>> GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4] >>> GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c >>> DESCRIPTION: Current composite vertical and frame count for CRTC >>> Field Name Bits Default Description >>> CRTC_VF_COUNT >>> (Access: R) 28:0 0x0 Reports current vertical and frame count >>> >>>> >>>> 2. The hw framecounter seems to increment when the vertical scanout >>>> position >>>> wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i >>>> tested so >>>> far. Is this so on all asics? And is the hw counter increment happening >>>> exactly at the moment that vertical scanout position jumps back to >>>> zero, ie. >>>> both events are driven by the same signal? Or is the framecounter >>>> increment >>>> just happening somewhere inside either scanline VTOTAL-1 or scanline 0? >>> >>> Unless Harry knows, we can ask the hw team, but I doubt they would >>> have changed it. I think it's tied to the start line which is >>> controlled by this register: >>> CRTC:CRTC_START_LINE_CONTROL · [R/W] · 32 bits · Access: 8/16/32 >>> · [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2] >>> GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4] >>> GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc >>> DESCRIPTION: move start_line signal earlier by 1 line in CRTC >>> Field Name Bits Default Description >>> CRTC_PROGRESSIVE_START_LINE_EARLY 0 0x0 move start_line >>> signal by 1 line eariler in progressive mode >>> >>> CRTC_INTERLACE_START_LINE_EARLY 8 0x1 move start_line >>> signal by 1 line earlier in interlaced timing mode >>> >>> CRTC_ADVANCED_START_LINE_POSITION 19:16 0x4 Advanced >>> Start Line position respect to not VBLANK. Default of 4 means the >>> Advanced Start Line is 4 lines before the first non VBLANK line >>> >>> The following info I dug up internally may be useful: >>> >>> The position of the CRTC output timing generator when the âstart of >>> frameâ indicator occurs depends on several conditions. These >>> conditions are whether the timing generator is outputting timing for a >>> progressive or interlaced display mode, whether the scaler is enabled, >>> and if so, whether the register to select an earlier than normal >>> âstart of frameâ indicator is set. The âstart of frameâ indicator >>> typically occurs 2 lines before the vertical blank end position >>> (DxCRTC_V_BLANK_END) is reached >>> >>> When interlaced output display modes are enabled >>> (DxCRTC_INTERLACE_ENABLE = 1) and the CRTC timing generator is enabled >>> (DxCRTC_MASTER_EN = 1), the timing generatorâs vertical counter counts >>> by 2 for both the even fields and odd fields of each frame. For both >>> the even fields and the odd fields of interlaced output modes, the >>> âstart of frameâ indicator occurs 2 lines before the end of the >>> vertical blank occurs (DxCRTC_V_BLANK_END â 4 or DxCRTC_V_BLANK_END â >>> 3 depending on the field since the vertical counter counts by 2 in >>> interlaced), since the vertical counter counts by 2 in this mode). >>> There is one exception to the previous statement. If scaling is >>> enabled (DxSCL_SCALE_EN = 1) and the early start line is enabled >>> (DxCRTC_INTERLACE_START_LINE_EARLY = 1) in interlaced output mode then >>> the âstart of frameâ indicator happens 3 lines before the end of the >>> vertical blank (DxCRTC_V_BLANK_END â 6 or DxCRTC_V_BLANK_END â 5 >>> depending on the field since the vertical counter counts by 2 in >>> interlaced). >>> >>> When progressive output display modes are enabled >>> (DxCRTC_INTERLACE_ENABLE = 0) and the CRTC timing generator is enabled >>> (DxCRTC_MASTER_EN = 1), the âstart of frameâ indicator occurs 2 lines >>> before the end of the vertical blank occurs (DxCRTC_V_BLANK_END - 2). >>> Similar to interlaced output mode, there is one exception to the >>> previous statement. If scaling is enabled (DxSCL_SCALE_EN = 1) and >>> the early start line is enabled (DxCRTC_PROGRESSIVE_START_LINE_EARLY = >>> 1) in progressive output mode then the âstart of frameâ indicator >>> happens 3 lines before the end of the vertical blank >>> (DxCRTC_V_BLANK_END â 3) >>> >>> I hope this helps, >>> >>> Alex >>> >>>> >>>> >>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad >>>> regression and with a bit of luck at the same time improve by being >>>> able to >>>> set dev->vblank_disable_immediate = true then and allow vblank irqs >>>> to get >>>> turned off more aggressively for a bit of extra power saving. >>>> >>>> thanks, >>>> -mario >