On Mon, 29 Oct 2018 10:03:01 +0100 Daniel Vetter <dan...@ffwll.ch> wrote:
> On Mon, Oct 29, 2018 at 09:41:36AM +0100, Boris Brezillon wrote: > > On Mon, 29 Oct 2018 09:06:40 +0100 > > Daniel Vetter <dan...@ffwll.ch> wrote: > > > > > On Fri, Oct 26, 2018 at 06:10:03PM +0300, Ville Syrjälä wrote: > > > > On Fri, Oct 26, 2018 at 04:52:33PM +0200, Boris Brezillon wrote: > > > > > On Fri, 26 Oct 2018 16:26:03 +0200 > > > > > Daniel Vetter <dan...@ffwll.ch> wrote: > > > > > > > > > > > On Fri, Oct 26, 2018 at 3:57 PM Boris Brezillon > > > > > > <boris.brezil...@bootlin.com> wrote: > > > > > > > > > > > > > > On Fri, 26 Oct 2018 15:30:26 +0200 > > > > > > > Daniel Vetter <dan...@ffwll.ch> wrote: > > > > > > > > > > > > > > > On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon > > > > > > > > wrote: > > > > > > > > > On Thu, 25 Oct 2018 11:33:14 +0200 > > > > > > > > > Daniel Vetter <dan...@ffwll.ch> wrote: > > > > > > > > > > > > > > > > > > > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon > > > > > > > > > > wrote: > > > > > > > > > > > On Tue, 23 Oct 2018 15:44:43 +0200 > > > > > > > > > > > Daniel Vetter <dan...@ffwll.ch> wrote: > > > > > > > > > > > > > > > > > > > > > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris > > > > > > > > > > > > Brezillon wrote: > > > > > > > > > > > > > Hi Daniel, > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 16 Oct 2018 14:57:43 +0200 > > > > > > > > > > > > > Daniel Vetter <dan...@ffwll.ch> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris > > > > > > > > > > > > > > Brezillon wrote: > > > > > > > > > > > > > > > The HVS block is supposed to fill the pixelvalve > > > > > > > > > > > > > > > FIFOs fast enough to > > > > > > > > > > > > > > > meet the requested framerate. The problem is, the > > > > > > > > > > > > > > > HVS and memory bus > > > > > > > > > > > > > > > bandwidths are limited, and if we don't take > > > > > > > > > > > > > > > these limitations into > > > > > > > > > > > > > > > account we might end up with HVS underflow errors. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch is trying to model the per-plane HVS > > > > > > > > > > > > > > > and memory bus bandwidth > > > > > > > > > > > > > > > consumption and take a decision at atomic_check() > > > > > > > > > > > > > > > time whether the > > > > > > > > > > > > > > > estimated load will fit in the HVS and membus > > > > > > > > > > > > > > > budget. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Note that we take an extra margin on the memory > > > > > > > > > > > > > > > bus consumption to let > > > > > > > > > > > > > > > the system run smoothly when other blocks are > > > > > > > > > > > > > > > doing heavy use of the > > > > > > > > > > > > > > > memory bus. Same goes for the HVS limit, except > > > > > > > > > > > > > > > the margin is smaller in > > > > > > > > > > > > > > > this case, since the HVS is not used by external > > > > > > > > > > > > > > > components. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Boris Brezillon > > > > > > > > > > > > > > > <boris.brezil...@bootlin.com> > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > This logic has been validated using a simple > > > > > > > > > > > > > > > shell script and > > > > > > > > > > > > > > > some instrumentation in the VC4 driver: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - capture underflow errors at the HVS level and > > > > > > > > > > > > > > > expose a debugfs file > > > > > > > > > > > > > > > reporting those errors > > > > > > > > > > > > > > > - add debugfs files to expose when atomic_check > > > > > > > > > > > > > > > fails because of the > > > > > > > > > > > > > > > HVS or membus load limitation or when it fails > > > > > > > > > > > > > > > for other reasons > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The branch containing those modification is > > > > > > > > > > > > > > > available here [1], and the > > > > > > > > > > > > > > > script (which is internally using modetest) is > > > > > > > > > > > > > > > here [2] (please note > > > > > > > > > > > > > > > that I'm bad at writing shell scripts :-)). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Note that those modification tend to > > > > > > > > > > > > > > > over-estimate the load, and thus > > > > > > > > > > > > > > > reject setups that might have previously worked, > > > > > > > > > > > > > > > so we might want to > > > > > > > > > > > > > > > adjust the limits to avoid that. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval > > > > > > > > > > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Any interest in using igt to test this stuff? We > > > > > > > > > > > > > > have at least a bunch of > > > > > > > > > > > > > > tests already in there that try all kinds of plane > > > > > > > > > > > > > > setups. And we use > > > > > > > > > > > > > > those to hunt for underruns on i915 hw. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Wrt underrun reporting: On i915 we just dump them > > > > > > > > > > > > > > into dmesg at the error > > > > > > > > > > > > > > level, using DRM_ERROR, > > > > > > > > > > > > > > > > > > > > > > > > > > Are you masking the underrun interrupt after it's > > > > > > > > > > > > > been reported? If we > > > > > > > > > > > > > don't do that on VC4 we just end up flooding the > > > > > > > > > > > > > kernel-log buffer until > > > > > > > > > > > > > someone comes and update the config. > > > > > > > > > > > > > > > > > > > > > > > > Yeah we do that too. Rule is that a full modeset will > > > > > > > > > > > > clear any underrun > > > > > > > > > > > > masking (so tests need to make sure they start with a > > > > > > > > > > > > modeset, or it'll be > > > > > > > > > > > > for nothing). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > plus a tracepoint. See e.g. > > > > > > > > > > > > > > intel_pch_fifo_underrun_irq_handler(). If there's > > > > > > > > > > > > > > interest we could > > > > > > > > > > > > > > perhaps extract this into something common, similar > > > > > > > > > > > > > > to what was done with > > > > > > > > > > > > > > crc support already. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm not a big fan of hardcoded trace points in > > > > > > > > > > > > > general (because of the > > > > > > > > > > > > > whole "it's part of the stable ABI" thing), and in > > > > > > > > > > > > > this case, making the > > > > > > > > > > > > > tracepoint generic sounds even more risky to me. > > > > > > > > > > > > > Indeed, how can we know > > > > > > > > > > > > > about all the HW specific bits one might want to > > > > > > > > > > > > > expose. For instance, > > > > > > > > > > > > > I see the intel underrun tracepoint exposes a struct > > > > > > > > > > > > > with a frame and > > > > > > > > > > > > > scanline field, and AFAICT, we don't have such > > > > > > > > > > > > > information in the VC4 > > > > > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > > > > > > > Any opinion on that? > > > > > > > > > > > > > > > > > > > > > > > > It's only abi if you're unlucky. If it's just for > > > > > > > > > > > > debugging and > > > > > > > > > > > > validation, you can change it again. Tbh, no idea why > > > > > > > > > > > > we even have these > > > > > > > > > > > > tracepoints, they're fairly useless imo. CI only relies > > > > > > > > > > > > upon the dmesg > > > > > > > > > > > > output. Maybe run git blame and ask the original > > > > > > > > > > > > author, we can probably > > > > > > > > > > > > update them to suit our needs. > > > > > > > > > > > > > > > > > > > > > > Okay, I think I'll go for a generic debugfs entry that > > > > > > > > > > > returns true > > > > > > > > > > > when an underrun error happened since the last modeset, > > > > > > > > > > > false otherwise. > > > > > > > > > > > > > > > > > > > > > > Next question is: should I attach the underrun status to > > > > > > > > > > > the drm_device > > > > > > > > > > > or have one per CRTC? In my case, I only care about the > > > > > > > > > > > "has an > > > > > > > > > > > underrun error occurred on any of the active CRTC" case, > > > > > > > > > > > so I'd vote for > > > > > > > > > > > a per-device underrun status. > > > > > > > > > > > > > > > > > > > > Yeah probably good enough. For our CI all we care about is > > > > > > > > > > the warn/error > > > > > > > > > > level dmesg output. Anything at that level is considered a > > > > > > > > > > CI failure. > > > > > > > > > > > > > > > > > > So igt is grepping dmesg to detect when an underrun happens? > > > > > > > > > > > > > > > > > > > > > > > > > No, but the CI runner is also observing dmesg. Anything in > > > > > > > > there at > > > > > > > > warning or higher level is considered a failure. > > > > > > > > > > > > > > Eric, do you do the same when you launch the IGT testsuite? > > > > > > > > > > > > > > > > > > > > > > > > > What do you need the debugfs file for? > > > > > > > > > > > > > > > > > > I just thought having a debugfs file to expose the underrun > > > > > > > > > information > > > > > > > > > would be cleaner than having to grep in dmesg to detect such > > > > > > > > > failures. > > > > > > > > > > > > > > > > The issue is that you want to detect underruns everywhere, not > > > > > > > > just in the > > > > > > > > specific tests you're checking for it. Anything that does a > > > > > > > > modeset could > > > > > > > > cause an underrun (at least we've managed to do so pretty much > > > > > > > > everywhere > > > > > > > > on i915 hw, if you misprogram is sufficiently). > > > > > > > > > > > > > > In my specific case, I want to have the IGT test check the > > > > > > > underrun > > > > > > > value while the test is being executed so that I know which exact > > > > > > > configuration triggers the underrun situation. At least that's > > > > > > > how I > > > > > > > did to adjust/debug the HVS load tracking code. Maybe it's not > > > > > > > really a > > > > > > > problem when all we do is tracking regressions. > > > > > > > > > > > > Ok, that makes sense, and explains why you want the overall underrun > > > > > > counter exposed in debugfs. > > > > > > > > > > Just one tiny detail which is not exactly related to this discussion > > > > > but I thought I'd mention it here: underrun is actually not a counter, > > > > > it's just a boolean. I used an atomic_t type to avoid having to add a > > > > > spinlock to protect the variable (the variable is modified from > > > > > an interrupt context and read from a non-atomic context). So, the > > > > > acceptable values for underrun are currently 0 or 1. I can make it a > > > > > counter if required. > > > > > > > > One idea I had a while back for i915 would be to count the number of > > > > frames that had an underrun. So basically something like this: > > > > > > > > underrun_irq() { > > > > underrun_frames=1 > > > > disable_underrun_irq(); > > > > vblank_get(); > > > > } > > > > vblank_irq() { > > > > if (underrun) { > > > > underrun_frames++; > > > > } else if (underrun_frames) { > > > > vblank_put(); > > > > enable_underrun_irq(); > > > > DEBUG("%d frames had an underrun\n", underrun_frames); > > > > underrun_frames=0; > > > > } > > > > } > > > > > > > > This avoids drowning in underrun interrupts while still > > > > reporting at least how many frames had problems. > > > > > > > > But I haven't had time to implement that yet. > > > > May I ask why you need to know how many frames had underrun issues? > > Much quicker testing when you're experimenting around with e.g. watermark > settings. Full modeset for resetting can easily take a few seconds. I > think we even had patches that restored the interrupt after 1ms already, > to have more accurate sampling to judge whether a given change made things > worse or better. We might have somewhat strange hw :-) > > > > The other upshot of a counter is that there's no problem with resetting. > > > Userspace simply grabs the counter before and after the test and compares. > > > If you only have 0/1 you need some way to reset, or otherwise automated > > > running in a CI farm isn't possible. > > > > The reset was done during atomic commit in my proposal, so no action > > was required on the user side apart from applying a new config. > > Anyway, I'll change that for a real counter. > > We want to measure underruns while doing full modesets too (specifically, > when shutting down the pipeline). At least for our hw this has > historically helped greatly in catching programming sequence issues. Looks like you have particular needs for intel HW which I don't know about, maybe I shouldn't be the one driving this rework... _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel