Am 15.08.2014 um 17:32 schrieb Grigori Goronzy: > On 15.08.2014 17:26, Alex Deucher wrote: >> On Fri, Aug 15, 2014 at 11:20 AM, Grigori Goronzy <greg at chown.ath.cx> >> wrote: >>> On 15.08.2014 16:11, Christian K?nig wrote: >>>> Hi Marco, >>>> >>>> the problem with an CS ioctl flag is that we sometimes don't know how >>>> much SCLK/MCLK boost is needed, for example when we do post processing >>>> in the player using OpenGL and UVD decoding with VDPAU. In this case >>>> VDPAU don't has the slightest idea how high SCLK/MCLK must be and so >>>> can't give that info to the kernel either. >>>> >>> Maybe it's an acceptable workaround to simply disable dynamic UVD state >>> selection in case the UVD states only have a single power level. That >>> will avoid the performance issues on affected systems, while still >>> allowing dynamic UVD states on systems that have a saner DPM table >>> setup. I think it is mosly older systems that suffer from this. >>> >> That is exactly what we do now. >> > Is it? In 3.17-wip, dynamic UVD state selection (according to active > streams) is still completely disabled. It will always use the generic > UVD state. In fact wasn't it reverted again because of the performance > issues on some systems?
This is the performance table of my laptop (at least the interesting parts), which I think is a typical example of the problem: [ 4.106772] == power state 1 == [ 4.106774] ui class: performance [ 4.106776] internal class: none [ 4.106780] uvd vclk: 0 dclk: 0 [ 4.106782] power level 0 sclk: 20000 vddc_index: 2 [ 4.106784] power level 1 sclk: 50000 vddc_index: 2 [ 4.106805] == power state 3 == [ 4.106807] ui class: none [ 4.106808] internal class: uvd [ 4.106813] uvd vclk: 55000 dclk: 40000 [ 4.106816] power level 0 sclk: 50000 vddc_index: 2 [ 4.106818] power level 1 sclk: 50000 vddc_index: 2 [ 4.106820] status: [ 4.106822] == power state 4 == [ 4.106823] ui class: battery [ 4.106825] internal class: uvd_hd [ 4.106831] uvd vclk: 40000 dclk: 30000 [ 4.106833] power level 0 sclk: 38000 vddc_index: 1 [ 4.106835] power level 1 sclk: 38000 vddc_index: 1 [ 4.106839] == power state 5 == [ 4.106841] ui class: battery [ 4.106843] internal class: uvd_sd [ 4.106848] uvd vclk: 40000 dclk: 30000 [ 4.106850] power level 0 sclk: 38000 vddc_index: 2 [ 4.106853] power level 1 sclk: 38000 vddc_index: 2 As you can see we currently always select the performance level uvd, which results in selecting the maximum sclk/dclk and vclk. Unfortunately neither uvd, uvd_sd nor uvd_hd allows the hardware to switch the sclk once selected (it's a hardware limitation of older uvd blocks). So for all cases where this is interesting you actually always have only a single power level to choose from. Christian. > > Grigori > >> Alex >> >> >>> Best regards >>> Grigori >>> >>>> Regards, >>>> Christian. >>>> >>>> Am 15.08.2014 um 15:21 schrieb Marco Benatto: >>>>> Hey all, >>>>> >>>>> I also had a talk with Alex yesterday about post-processing issues >>>>> when using dynamic UVD profiles and a chamge on CS ioctl >>>>> including a flag to let user mode driver tell to the kernel which >>>>> performance requirement it wants for post processing. A commom >>>>> point for both discussion is to stablish the default values for these >>>>> profiles, but probably this ioctl change would be more impacting/complex >>>>> to implement than a sysfs entry. >>>>> >>>>> If a sysfs entry is anough for now I can handle the code to create it >>>>> and, with your help, the code to setup the UVD profile requested >>>>> through it. >>>>> >>>>> Is there any suggestion? >>>>> >>>>> Thanks all for your help, >>>>> >>>>> >>>>> On Fri, Aug 15, 2014 at 5:48 AM, Christian K?nig >>>>> <christian.koenig at amd.com <mailto:christian.koenig at amd.com>> wrote: >>>>> >>>>> Hi guys, >>>>> >>>>> to make a long story short every time I watch a movie my laptop >>>>> start to heat up because we always select the standard UVD power >>>>> profile without actually measuring if that is necessary. >>>>> >>>>> Marco came up with a patch that seems to reliable measure the fps >>>>> send down to the kernel and so together with knowing the frame >>>>> size of the video should allow us to select the right UVD power >>>>> profile. >>>>> >>>>> The problem is that Alex (unnoticed by me) completely disabled >>>>> selecting the UVD profiles because of some issues with advanced >>>>> post processing discussed on IRC. The problem seems to be that the >>>>> lower UVD profiles have a to low SCLK/MCLK to handle the 3D load >>>>> that comes with scaling, deinterlacing etc... >>>>> >>>>> I unfortunately don't have time for it, cause this only affects >>>>> the hardware generations R600-SI and not the newest one CIK. So >>>>> could you guys stick together and come up with a solution? >>>>> Something like a sysfs entry that let's us select the minimum UVD >>>>> power level allowed? >>>>> >>>>> I think Marco is happy to come up with a patch, we just need to >>>>> know what's really needed and what should be the default values. >>>>> I'm happy to review everything that comes out of it, just don't >>>>> have time to do it myself. >>>>> >>>>> Happy discussion and thanks in advance, >>>>> Christian. >>>>> >>>>> Am 12.08.2014 um 15:05 schrieb Alex Deucher: >>>>> >>>>> On Tue, Aug 12, 2014 at 6:00 AM, Christian K?nig >>>>> <deathsimple at vodafone.de <mailto:deathsimple at vodafone.de>> >>>>> wrote: >>>>> >>>>> Am 11.08.2014 um 16:52 schrieb Alex Deucher: >>>>> >>>>> On Mon, Aug 11, 2014 at 5:08 AM, Christian K?nig >>>>> <deathsimple at vodafone.de >>>>> <mailto:deathsimple at vodafone.de>> wrote: >>>>> >>>>> Am 07.08.2014 um 21:43 schrieb Alex Deucher: >>>>> >>>>> On Thu, Aug 7, 2014 at 11:32 AM, Christian K?nig >>>>> <deathsimple at vodafone.de >>>>> <mailto:deathsimple at vodafone.de>> wrote: >>>>> >>>>> Am 07.08.2014 um 16:32 schrieb Alex Deucher: >>>>> >>>>> On Thu, Aug 7, 2014 at 7:33 AM, >>>>> Christian K?nig >>>>> <deathsimple at vodafone.de >>>>> <mailto:deathsimple at vodafone.de>> >>>>> wrote: >>>>> >>>>> From: Marco A Benatto >>>>> <marco.antonio.780 at gmail.com >>>>> <mailto:marco.antonio.780 at >>>>> gmail.com>> >>>>> >>>>> Adding a Frames Per Second >>>>> estimation logic on UVD handles >>>>> when it has being used. This >>>>> estimation is per handle basis >>>>> and will help on DPM profile >>>>> calculation. >>>>> >>>>> v2 (chk): fix timestamp type, move >>>>> functions around and >>>>> cleanup code a bit. >>>>> >>>>> Will this really help much? I thought >>>>> the problem was mainly due to >>>>> sclk and mclk for post processing. >>>>> >>>>> >>>>> It should at least handle the UVD side for >>>>> upclocking when you get a >>>>> lot >>>>> of >>>>> streams / fps. And at on my NI the patch >>>>> seems to do exactly that. >>>>> >>>>> Switching sclk and mclk for post >>>>> processing is a different task, and I >>>>> actually have no idea what to do with them. >>>>> >>>>> At this point we always choose the plain UVD >>>>> state anyway so this >>>>> patch would only take effect if we re-enabled >>>>> the dynamic UVD state >>>>> selection. >>>>> >>>>> >>>>> Hui? I thought we already re-enabled the dynamic >>>>> UVD state selection, but >>>>> double checking this I found it disabled again. >>>>> >>>>> What was the problem with that? Looks like I >>>>> somehow missed the >>>>> discussion >>>>> around it. >>>>> >>>>> We did, but after doing so a number of people >>>>> complained about a >>>>> regression on IRC because when apps like xmbc enabled >>>>> post processing, >>>>> performance went down. >>>>> >>>>> >>>>> That's strange, from my experience the different UVD >>>>> performance states only >>>>> affect UVDs dclk/vclk, not sclk/mclk. I need to get the >>>>> DPM dumps to >>>>> confirms this. >>>>> >>>>> The sclks and mclks are usually different as well, especially >>>>> on APUs. >>>>> I can send you some examples. >>>>> >>>>> You not off hand remember who complained on IRC? Finding >>>>> something in the >>>>> IRC logs is like searching for a needle in a haystack. >>>>> >>>>> I don't remember off hand. I think zgreg was involved in some >>>>> of the >>>>> discussions. >>>>> >>>>> Alex >>>>> >>>>> Thanks, >>>>> Christian. >>>>> >>>>> >>>>> Alex >>>>> >>>>> >>>>> Christian. >>>>> >>>>> >>>>> For the post processing, we probably need a >>>>> hint we can >>>>> pass to the driver in the CS ioctl to denote >>>>> what state we need. >>>>> Although if we did that, this could would >>>>> largely be moot. That said, >>>>> newer asics support dynamic UVD clocks so we >>>>> really only need >>>>> something like that for older asics and I >>>>> guess VCE. >>>>> >>>>> Alex >>>>> >>>>> Christian. >>>>> >>>>> >>>>> Alex >>>>> >>>>> Signed-off-by: Marco A Benatto >>>>> <marco.antonio.780 at gmail.com >>>>> <mailto:marco.antonio.780 at >>>>> gmail.com>> >>>>> Signed-off-by: Christian K?nig >>>>> <christian.koenig at amd.com >>>>> <mailto:christian.koenig at amd.com>> >>>>> --- >>>>> >>>>> drivers/gpu/drm/radeon/radeon.h >>>>> | 10 ++++++ >>>>> >>>>> drivers/gpu/drm/radeon/radeon_uvd.c >>>>> | 64 >>>>> +++++++++++++++++++++++++++++++++---- >>>>> 2 files changed, 68 >>>>> insertions(+), 6 deletions(-) >>>>> >>>>> diff --git >>>>> a/drivers/gpu/drm/radeon/radeon.h >>>>> b/drivers/gpu/drm/radeon/radeon.h >>>>> index 9e1732e..e92f6cb 100644 >>>>> --- a/drivers/gpu/drm/radeon/radeon.h >>>>> +++ b/drivers/gpu/drm/radeon/radeon.h >>>>> @@ -1617,6 +1617,15 @@ int >>>>> radeon_pm_get_type_index(struct >>>>> radeon_device >>>>> *rdev, >>>>> #define >>>>> RADEON_UVD_STACK_SIZE (1024*1024) >>>>> #define RADEON_UVD_HEAP_SIZE >>>>> (1024*1024) >>>>> >>>>> +#define RADEON_UVD_FPS_EVENTS_MAX 8 >>>>> +#define RADEON_UVD_DEFAULT_FPS 60 >>>>> + >>>>> +struct radeon_uvd_fps { >>>>> + uint64_t timestamp; >>>>> + uint8_t event_index; >>>>> + uint8_t >>>>> events[RADEON_UVD_FPS_EVENTS_MAX]; >>>>> +}; >>>>> + >>>>> struct radeon_uvd { >>>>> struct radeon_bo >>>>> *vcpu_bo; >>>>> void >>>>> *cpu_addr; >>>>> @@ -1626,6 +1635,7 @@ struct >>>>> radeon_uvd { >>>>> struct drm_file >>>>> *filp[RADEON_MAX_UVD_HANDLES]; >>>>> unsigned >>>>> img_size[RADEON_MAX_UVD_HANDLES]; >>>>> struct delayed_work >>>>> idle_work; >>>>> + struct radeon_uvd_fps >>>>> fps_info[RADEON_MAX_UVD_HANDLES]; >>>>> }; >>>>> >>>>> int radeon_uvd_init(struct >>>>> radeon_device *rdev); >>>>> diff --git >>>>> a/drivers/gpu/drm/radeon/radeon_uvd.c >>>>> b/drivers/gpu/drm/radeon/radeon_uvd.c >>>>> index 6bf55ec..ef5667a 100644 >>>>> --- >>>>> a/drivers/gpu/drm/radeon/radeon_uvd.c >>>>> +++ >>>>> b/drivers/gpu/drm/radeon/radeon_uvd.c >>>>> @@ -237,6 +237,51 @@ void >>>>> >>>>> radeon_uvd_force_into_uvd_segment(struct >>>>> radeon_bo *rbo) >>>>> rbo->placement.lpfn = >>>>> (256 * 1024 * 1024) >> PAGE_SHIFT; >>>>> } >>>>> >>>>> +static void >>>>> radeon_uvd_fps_clear_events(struct >>>>> radeon_device *rdev, >>>>> int >>>>> idx) >>>>> +{ >>>>> + struct radeon_uvd_fps *fps >>>>> = &rdev->uvd.fps_info[idx]; >>>>> + unsigned i; >>>>> + >>>>> + fps->timestamp = jiffies_64; >>>>> + fps->event_index = 0; >>>>> + for (i = 0; i < >>>>> RADEON_UVD_FPS_EVENTS_MAX; i++) >>>>> + fps->events[i] = 0; >>>>> +} >>>>> + >>>>> +static void >>>>> radeon_uvd_fps_note_event(struct >>>>> radeon_device *rdev, >>>>> int >>>>> idx) >>>>> +{ >>>>> + struct radeon_uvd_fps *fps >>>>> = &rdev->uvd.fps_info[idx]; >>>>> + uint64_t timestamp = >>>>> jiffies_64; >>>>> + unsigned rate = 0; >>>>> + >>>>> + uint8_t index = >>>>> fps->event_index++; >>>>> + fps->event_index %= >>>>> RADEON_UVD_FPS_EVENTS_MAX; >>>>> + >>>>> + rate = div64_u64(HZ, >>>>> max(timestamp - fps->timestamp, >>>>> 1ULL)); >>>>> + >>>>> + fps->timestamp = timestamp; >>>>> + fps->events[index] = >>>>> min(rate, 120u); >>>>> +} >>>>> + >>>>> +static unsigned >>>>> radeon_uvd_estimate_fps(struct >>>>> radeon_device *rdev, >>>>> int >>>>> idx) >>>>> +{ >>>>> + struct radeon_uvd_fps *fps >>>>> = &rdev->uvd.fps_info[idx]; >>>>> + unsigned i, valid = 0, >>>>> count = 0; >>>>> + >>>>> + for (i = 0; i < >>>>> RADEON_UVD_FPS_EVENTS_MAX; i++) { >>>>> + /* We should >>>>> ignore zero values */ >>>>> + if (fps->events[i] >>>>> != 0) { >>>>> + count += >>>>> fps->events[i]; >>>>> + valid++; >>>>> + } >>>>> + } >>>>> + >>>>> + if (valid > 0) >>>>> + return count / valid; >>>>> + else >>>>> + return >>>>> RADEON_UVD_DEFAULT_FPS; >>>>> +} >>>>> + >>>>> void >>>>> radeon_uvd_free_handles(struct >>>>> radeon_device *rdev, struct >>>>> drm_file *filp) >>>>> { >>>>> int i, r; >>>>> @@ -419,8 +464,10 @@ static int >>>>> radeon_uvd_cs_msg(struct >>>>> radeon_cs_parser >>>>> *p, struct radeon_bo *bo, >>>>> >>>>> /* create or decode, >>>>> validate the handle */ >>>>> for (i = 0; i < >>>>> RADEON_MAX_UVD_HANDLES; ++i) { >>>>> - if >>>>> >>>>> (atomic_read(&p->rdev->uvd.handles[i]) >>>>> == handle) >>>>> + if >>>>> >>>>> (atomic_read(&p->rdev->uvd.handles[i]) >>>>> == handle) >>>>> { >>>>> + >>>>> radeon_uvd_fps_note_event(p->rdev, >>>>> i); >>>>> return 0; >>>>> + } >>>>> } >>>>> >>>>> /* handle not found >>>>> try to alloc a new one */ >>>>> @@ -428,6 +475,7 @@ static int >>>>> radeon_uvd_cs_msg(struct >>>>> radeon_cs_parser >>>>> *p, struct radeon_bo *bo, >>>>> if >>>>> >>>>> (!atomic_cmpxchg(&p->rdev->uvd.handles[i], >>>>> 0, >>>>> handle)) { >>>>> >>>>> p->rdev->uvd.filp[i] = p->filp; >>>>> >>>>> p->rdev->uvd.img_size[i] = img_size; >>>>> + >>>>> radeon_uvd_fps_clear_events(p->rdev, >>>>> i); >>>>> return 0; >>>>> } >>>>> } >>>>> @@ -763,7 +811,7 @@ int >>>>> radeon_uvd_get_destroy_msg(struct >>>>> radeon_device >>>>> *rdev, int ring, >>>>> static void >>>>> radeon_uvd_count_handles(struct >>>>> radeon_device *rdev, >>>>> >>>>> unsigned *sd, unsigned *hd) >>>>> { >>>>> - unsigned i; >>>>> + unsigned i, fps_rate = 0; >>>>> >>>>> *sd = 0; >>>>> *hd = 0; >>>>> @@ -772,10 +820,13 @@ static void >>>>> radeon_uvd_count_handles(struct >>>>> radeon_device *rdev, >>>>> if >>>>> (!atomic_read(&rdev->uvd.handles[i])) >>>>> continue; >>>>> >>>>> - if >>>>> (rdev->uvd.img_size[i] >= 720*576) >>>>> - ++(*hd); >>>>> - else >>>>> - ++(*sd); >>>>> + fps_rate = >>>>> radeon_uvd_estimate_fps(rdev, i); >>>>> + >>>>> + if >>>>> (rdev->uvd.img_size[i] >= 720*576) { >>>>> + (*hd) += >>>>> fps_rate > 30 ? 1 : 2; >>>>> + } else { >>>>> + (*sd) += >>>>> fps_rate > 30 ? 1 : 2; >>>>> + } >>>>> } >>>>> } >>>>> >>>>> @@ -805,6 +856,7 @@ void >>>>> radeon_uvd_note_usage(struct >>>>> radeon_device >>>>> *rdev) >>>>> set_clocks &= >>>>> >>>>> schedule_delayed_work(&rdev->uvd.idle_work, >>>>> >>>>> >>>>> msecs_to_jiffies(UVD_IDLE_TIMEOUT_MS)); >>>>> >>>>> + >>>>> if >>>>> ((rdev->pm.pm_method == >>>>> PM_METHOD_DPM) && >>>>> rdev->pm.dpm_enabled) { >>>>> unsigned hd = >>>>> 0, sd = 0; >>>>> >>>>> radeon_uvd_count_handles(rdev, >>>>> &sd, &hd); >>>>> -- >>>>> 1.9.1 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Marco Antonio Benatto >>>>> Linux user ID:#506236 >>> >