On Thu, 02 Nov 2017, Joonas Lahtinen <joonas.lahti...@linux.intel.com> wrote: > On Thu, 2017-11-02 at 07:47 -0700, Rodrigo Vivi wrote: >> On Thu, Nov 02, 2017 at 08:06:29AM +0000, Jani Nikula wrote: >> > On Wed, 01 Nov 2017, Rodrigo Vivi <rodrigo.v...@intel.com> wrote: >> > > On Wed, Nov 01, 2017 at 04:21:08PM +0000, Ben Widawsky wrote: >> > > > On 17-11-01 18:09:47, Joonas Lahtinen wrote: >> > > > > + Kimmo and Paul >> > > > > >> > > > > On Wed, 2017-11-01 at 07:43 -0700, Ben Widawsky wrote: >> > > > > > On 17-11-01 14:07:28, Joonas Lahtinen wrote: >> > > > > > > On Mon, 2017-10-30 at 10:48 -0700, Rodrigo Vivi wrote: >> > > > > > > > On Mon, Oct 30, 2017 at 01:00:51PM +0000, David Weinehall >> > > > > > > > wrote: >> > > > > > > > > On Fri, Oct 27, 2017 at 01:57:09PM -0700, Daniele Ceraolo >> > > > > > > > > Spurio wrote: >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On 26/10/17 03:32, Chris Wilson wrote: >> > > > > > > > > > > It has been many years since the last confirmed sighting >> > > > > > > > > > > (and fix) of an >> > > > > > > > > > > RC6 related bug (usually a system hang). Remove the >> > > > > > > > > > > parameter to stop >> > > > > > > > > > > users from setting dangerous values, as they often set >> > > > > > > > > > > it during triage >> > > > > > > > > > > and end up disabling the entire runtime pm instead (the >> > > > > > > > > > > option is not a >> > > > > > > > > > > fine scalpel!). >> > > > > > > > > > > >> > > > > > > > > > > Furthermore, it allows users to set known dangerous >> > > > > > > > > > > values which were >> > > > > > > > > > > intended for testing and not for production use. For >> > > > > > > > > > > testing, we can >> > > > > > > > > > > always patch in the required setting without having to >> > > > > > > > > > > expose ourselves >> > > > > > > > > > > to random abuse. >> > > > > > > > > > > >> > > > > > > > > > > v2: Fixup NEEDS_WaRsDisableCoarsePowerGating fumble, and >> > > > > > > > > > > document the >> > > > > > > > > > > lack of ilk support better. >> > > > > > > > > > > v3: Clear intel_info->rc6p if we don't support rc6 >> > > > > > > > > > > itself. >> > > > > > > > > > > >> > > > > > > > > > > Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk> >> > > > > > > > > > > Cc: Rodrigo Vivi <rodrigo.v...@intel.com> >> > > > > > > > > > > Cc: Joonas Lahtinen <joonas.lahti...@linux.intel.com> >> > > > > > > > > > > Cc: Jani Nikula <jani.nik...@intel.com> >> > > > > > > > > > > Cc: Imre Deak <imre.d...@intel.com> >> > > > > > > > > > > Cc: Daniel Vetter <daniel.vet...@ffwll.ch> >> > > > > > > > > > > Acked-by: Daniel Vetter <daniel.vet...@ffwll.ch> >> > > > > > > > > > > --- >> > > > > > > > > > >> > > > > > > > > > I think that for execution/debug on early silicon we might >> > > > > > > > > > still want the >> > > > > > > > > > ability to turn features like RC6 off. Maybe we can add a >> > > > > > > > > > debug kconfig to >> > > > > > > > > > force info->has_rc6 = 0? Not a blocker to this patch but >> > > > > > > > > > worth considering >> > > > > > > > > > IMO. >> > > > > > > > > >> > > > > > > > > Most of the BIOSes I've seen on our RVPs have had an option >> > > > > > > > > to disable >> > > > > > > > > RC6. >> > > > > > > > >> > > > > > > > BIOS option don't block our code to run and set some MMIOs. >> > > > > > > > Not sure how the GPU will behave on such cases. >> > > > > > > > >> > > > > > > > I like the idea of removing some and keeping the parameters >> > > > > > > > clean. >> > > > > > > > But there are few ones like RC6 and disable_power_wells that >> > > > > > > > are very >> > > > > > > > useful on platform enabling and also when assisting others to >> > > > > > > > debug issues. >> > > > > > > > >> > > > > > > > For instance right now that we fixed RC6 on CNL someone told >> > > > > > > > that >> > > > > > > > he believe seeing more hangs, so I immediately asked to boot >> > > > > > > > with >> > > > > > > > i915.enable_rc6=0 to double check. It is easier and >> > > > > > > > straighforward >> > > > > > > > to direct them to the unsafe param than to ask them to compile >> > > > > > > > the code >> > > > > > > > with different options or to use some BIOS options that we are >> > > > > > > > not sure. >> > > > > > > > >> > > > > > > > Also on bug triage some options like this are helpful. >> > > > > > > > >> > > > > > > > Also BIOS and compile are saved flags. So if you need to do a >> > > > > > > > quick test >> > > > > > > > you have to save it, and then unsave later. Parameters are >> > > > > > > > very convinient >> > > > > > > > for 1 boot only check. >> > > > > > > >> > > > > > > It's convenient for sure, but the unsafe module parameters seems >> > > > > > > to be >> > > > > > > finding their way into way too many HOWTOs, and from there to >> > > > > > > some >> > > > > > > "productized" use-cases. Chris states that setting .enable_rc6=0 >> > > > > > > to >> > > > > > > solving an issue on publicly shipping products has been some >> > > > > > > years ago, >> > > > > > > so I don't see a need for carrying this. >> > > > > > > >> > > > > > > We shouldn't allow the convenience of not having to change one >> > > > > > > line and >> > > > > > > recompile kernel during development to affect the end-users who >> > > > > > > are >> > > > > > > Googling how to get the best performance out of their hardware >> > > > > > > (I could >> > > > > > > mention some distro here). >> > > > > > > >> > > > > > > This seems the like the best option as I don't think introducing >> > > > > > > kernel >> > > > > > > parameters that only exists on debug builds would be too >> > > > > > > convenient >> > > > > > > either. It'd maybe just add more confusion. >> > > > > > > >> > > > > > > Regards, Joonas >> > > > > > >> > > > > > I believe the ability to disable RC6 is valuable not just for >> > > > > > debugging >> > > > > > purposes. Folks with very latency sensitive workloads are often >> > > > > > willing to >> > > > > > forego power savings. The real problem I see is that we don't test >> > > > > > without rc6 >> > > > > > in our setup, which indeed makes it unsafe. As such, I see the >> > > > > > other option here >> > > > > > going back to the ability to toggle rc6 after load (either module >> > > > > > parameter, or >> > > > > > make it sysfs), and actually run some subset of our workloads with >> > > > > > RC6. I >> > > > > > suspect people will poop on that suggestion, but I figured I'd >> > > > > > mention. >> > > > > >> > > > > I agree there, but by my understanding there's really no ask to >> > > > > support >> > > > > the feature in upstream. And the original motive from Chris to drop >> > > > > the >> > > > > feature is that it's unsafe as it currently is. >> > > > > >> > > > > So unless we've got the resources to bring it back from the unsafe >> > > > > zone, I think we should drop it like this patch proposes. >> > > > > >> > > > > Regards, Joonas >> > > > >> > > > Yep, I agree. One other option would be to move i915_forcewake_user to >> > > > sysfs and >> > > > let them use that. >> > > >> > > Well, I won't try to block that. I just put my 2 cents that I believe it >> > > is a very >> > > useful parameter. >> > > >> > > It wasn't that long ago the last time that we needed the flag to allow >> > > end users to have a functional machine: >> > > https://plus.google.com/+JonMasters/posts/BqWLEjenLKv. >> > > >> > > or to debug a related issue: >> > > https://bugzilla.redhat.com/show_bug.cgi?id=1440988 >> > > https://bugzilla.kernel.org/show_bug.cgi?id=116431 >> > > >> > > Although date on few seems over than 1 year. We need to consider that >> > > that was our latest new gpu... gen9. >> > > >> > > If products are recommending the use of enable_rc6=0 I can see they >> > > adding the patch to disable that. Effect is the same and our convenience >> > > is gone. >> > > >> > > But again, just my view here. Not a nack ;) >> > >> > I suppose the compromise would be to make it a boolean module parameter >> > to only allow disabling rc6 on platforms where it's enabled by default, >> > but not letting you enable rc6 where it's disabled by default. I.e. only >> > support i915.enable_rc6=0 to be passed by the user. >> >> +1. I like this approach. > > Umm, it still doesn't resolve the issue that it's not being tested. > > I try to be super clear; until we have resources to support that > specific code path, I'd much prefer not to have an easy kernel > parameter to set it.
It resolves the worst part of the issue: people enabling rc6 where it's known not to work. BR, Jani. -- Jani Nikula, Intel Open Source Technology Center _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx