Re: Another drm/dri lockup - when moving the mouse

2005-03-16 Thread Roland Scheidegger
Helge Hafting wrote:
I have reported this before, but now I have some more data.
I have an office pc with this video card:
VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 
7000/VE]

In previous reports I found that starting xfree or xorg with dri support
cause a hang after a little while.  It seems that this only happens when
the mouse moves.  Something I didn't discover before because there
are lots of unplanned mouse movements - the thing is sensitive and jumps
a pixel now and then when I move stuff on the desk.
What xorg / xfree / drm versions are you talking about?
Taking care not to move the mouse, I can start X and run glxgears
with acceleration.  The slightest mouse movement during 3D activity
kills the machine instantly so it only responds to the reset button.  Mouse
movement without 3D activity may or may not kill the pc.
Could there be a problem where 3D-stuff and code to move the mouse
"steps on each other toes" somehow?  Or some way to test this further,
by disabling the mouse or force some kind of software fallback for
the mouse cursor?
You could use Option "SWcursor" "true".
Since it crashes even without 3d sometimes, the problem does not seem to 
be related to dri (as in, dri driver). Sounds more like it's related to 
CP activity. Not sure what would cause this, there seem to be a lot of 
mouse cursor movement crashes reported lately... Do you have a USB mouse 
whose controller shares the IRQ with the graphic card maybe?

Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Another drm/dri lockup - when moving the mouse

2005-03-16 Thread Roland Scheidegger
Helge Hafting wrote:
Since it crashes even without 3d sometimes, the problem does not
seem to be related to dri (as in, dri driver).

Stable as rock, _if_  Load "dri" is commented out from xorg.conf (or
 from XF86Config-4)
Well, commenting that out makes the 2d ddx driver not to use the CP, drm
etc.
Almost sounds like a hardware issue to me, very few people have problems 
when only using 2d (dri loaded or not) with the radeon driver (as far as 
I can tell). You could try using pci gart instead of agp, if there's a 
problem with your agp bridge (not sure if that would help though in that 
case), Option "BusType" "PCI" (old xorg versions might not understand 
that option however, they might have a boolean "ForcePCIMode" option 
instead).

Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] input: fix aux port detection with some i8042 chips

2007-05-03 Thread Roland Scheidegger
From: Roland Scheidegger <[EMAIL PROTECTED]>

The i8042 driver fails detection of the AUX port with some chips,
because they apparently do not change the I8042_CTR_AUXDIS bit
immediately. This is known to affect at least HP500 / HP510 notebooks,
consequently the built-in touchpad will not work. The patch will simply
reread the value until it gets the expected value or a retry limit is
hit, without touching other workaround code in the same area.

Signed-off-by: Roland Scheidegger <[EMAIL PROTECTED]>

---
There is some discussion about non-working touchpads in HP500 notebooks
in ubuntu and a (ugly) workaround for this problem here:
http://ubuntuforums.org/showthread.php?t=344103. I've got a HP510 and
even with 2.6.21 the aux port would get disabled. Works with the patch,
for the record the i8042 here needs around 6 tries (sometimes a bit
more, sometimes less) until it reads the I8042_CTR_AUXDIS bit correctly,
both after disabling and enabling the aux port.
(please CC: on any replies)

Signed-off-by: Roland Scheidegger <[EMAIL PROTECTED]>

--- linux-2.6/drivers/input/serio/i8042.c.orig  2007-05-03
16:32:26.0 +0200
+++ linux-2.6/drivers/input/serio/i8042.c   2007-05-03 16:56:00.0
+0200
@@ -537,6 +537,7 @@ static int __devinit i8042_check_aux(voi
int retval = -1;
int irq_registered = 0;
int aux_loop_broken = 0;
+   int i = 0;
unsigned long flags;
unsigned char param;

@@ -582,14 +583,27 @@ static int __devinit i8042_check_aux(voi

if (i8042_command(¶m, I8042_CMD_AUX_DISABLE))
return -1;
-   if (i8042_command(¶m, I8042_CMD_CTL_RCTR) || (~param &
I8042_CTR_AUXDIS)) {
+   /* some chips need some time to set the I8042_CTR_AUXDIS bit */
+   for (i = 0; i < 100; i++) {
+   if (!i8042_command(¶m, I8042_CMD_CTL_RCTR) && (param &
I8042_CTR_AUXDIS))
+   break;
+   udelay(50);
+   }
+   if (i == 100) {
printk(KERN_WARNING "Failed to disable AUX port, but continuing
anyway... Is this a SiS?\n");
printk(KERN_WARNING "If AUX port is really absent please use the
'i8042.noaux' option.\n");
}

if (i8042_command(¶m, I8042_CMD_AUX_ENABLE))
return -1;
-   if (i8042_command(¶m, I8042_CMD_CTL_RCTR) || (param &
I8042_CTR_AUXDIS))
+   for (i = 0; i < 100; i++) {
+   if (i8042_command(¶m, I8042_CMD_CTL_RCTR))
+   return -1;
+   if (~param & I8042_CTR_AUXDIS)
+   break;
+   udelay(50);
+   }
+   if (i == 100)
return -1;

 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drm/vmwgfx: fix spelling mistake "Cant" -> "Can't"

2020-08-10 Thread Roland Scheidegger
Thanks, I've put the fix in the vmwgfx-next branch.

Roland

Am 10.08.20 um 12:04 schrieb Colin King:
> From: Colin Ian King 
> 
> There is a spelling mistake in a DRM_ERROR message. Fix it.
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index bbce45d142aa..471836672312 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -186,7 +186,7 @@ void vmw_kms_cursor_snoop(struct vmw_surface *srf,
>   /* TODO handle none page aligned offsets */
>   /* TODO handle more dst & src != 0 */
>   /* TODO handle more then one copy */
> - DRM_ERROR("Cant snoop dma request for cursor!\n");
> + DRM_ERROR("Can't snoop dma request for cursor!\n");
>   DRM_ERROR("(%u, %u, %u) (%u, %u, %u) (%ux%ux%u) %u %u\n",
> box->srcx, box->srcy, box->srcz,
> box->x, box->y, box->z,
> 



Re: [vmwgfx] Xwayland crash on latest linus git

2020-08-19 Thread Roland Scheidegger
I'm now able to reproduce this, looking into it. (The crash looks
actually similar to what was also happening with the next commit,
drm/ttm: make TT creation purely optional v3, but that got reverted
already.)

Roland


Am 19.08.20 um 09:47 schrieb 허종만:
> 
> Hi,
> 
> 
> I'm running Linux guest OS (Fedora 33 + custom build kernel) on Windows 10 
> host, with VMWare workstation.
> 
> Kernel 5.8 was fine, but latest linus git kernel has a problem, Xwayland 
> crashes. 
> 
> Gnome login screen is Ok, but when I try login, crash occurs.
> 
> I just see black screen, cannot see graphic screen after the login screen.
> 
> 
>  * If I disable 3D accelaration from VMWare setting, then this issue doesn't 
> occur
> 
> 
> git bisect log follows;
> 
> =
> # git bisect log
> git bisect start '--' 'drivers/gpu/drm'
> # bad: [18445bf405cb331117bc98427b1ba6f12418ad17] Merge tag 
> 'spi-fix-v5.9-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
> git bisect bad 18445bf405cb331117bc98427b1ba6f12418ad17
> # good: [bcf876870b95592b52519ed4aafcf9d95999bc9c] Linux 5.8
> git bisect good bcf876870b95592b52519ed4aafcf9d95999bc9c
> # good: [026c396b41a4c9412f6f2b6496077949ea99e3ad] drm/amdgpu: add ih ip 
> block for navy_flounder
> git bisect good 026c396b41a4c9412f6f2b6496077949ea99e3ad
> # bad: [395be0f47a001a975d25a3e2a0bfe4f08ee193fa] Merge tag 
> 'drm/tegra/for-5.9-rc1' of ssh://git.freedesktop.org/git/tegra/linux into 
> drm-next
> git bisect bad 395be0f47a001a975d25a3e2a0bfe4f08ee193fa
> # good: [d524b87f77364db096855d7eb714ffacec974ddf] drm/i915: Update 
> DRIVER_DATE to 20200702
> git bisect good d524b87f77364db096855d7eb714ffacec974ddf
> # bad: [33c6855bda1b39eae88f3730d2baddce3bfd] Merge tag 
> 'drm-misc-next-2020-07-16' of git://anongit.freedesktop.org/drm/drm-misc into 
> drm-next
> git bisect bad 33c6855bda1b39eae88f3730d2baddce3bfd
> # bad: [247c12fcba30ecb9473fe6b6c0375d46916417c9] drm/vc4: Reorder the bind 
> order of the devices
> git bisect bad 247c12fcba30ecb9473fe6b6c0375d46916417c9
> # bad: [04f08f8831779e91fc59570dbbd3b7815c229d92] drm/ingenic-drm: drop use 
> of legacy drm_bus_flags
> git bisect bad 04f08f8831779e91fc59570dbbd3b7815c229d92
> # good: [60e9eabf41fa916d2ef68c5bf929197975917578] Backmerge remote-tracking 
> branch 'drm/drm-next' into drm-misc-next
> git bisect good 60e9eabf41fa916d2ef68c5bf929197975917578
> # bad: [35205ee9ba46dcb2a82dbb981ec5fb242c4d847d] drm/i915: Send hotplug 
> event if edid had changed
> git bisect bad 35205ee9ba46dcb2a82dbb981ec5fb242c4d847d
> # bad: [3f1f6981afed9fa21efa12ce396b35ca684b8a29] drm: pl111: Credit where 
> credit is due
> git bisect bad 3f1f6981afed9fa21efa12ce396b35ca684b8a29
> # bad: [2ddef17678bc2ea1d20517dd2b4ed4aa967ffa8b] drm/ttm: make TT creation 
> purely optional v3
> git bisect bad 2ddef17678bc2ea1d20517dd2b4ed4aa967ffa8b
> # bad: [58e4d686d456c3e356439ae160ff4a0728940b8e] drm/ttm: cleanup 
> ttm_mem_type_manager_func.get_node interface v3
> git bisect bad 58e4d686d456c3e356439ae160ff4a0728940b8e
> # first bad commit: [58e4d686d456c3e356439ae160ff4a0728940b8e] drm/ttm: 
> cleanup ttm_mem_type_manager_func.get_node interface v3
> =
> 
> But I can't revert first bad commit, 58e4d686d456, due to conflict. So I 
> couldn't test it with the commit reverted.
> Also, there was another issue, boot crash [1], which is now fixed. 
> But I'm afraid it affects git bisect result.
> 
> [1] 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Farm-kernel%2Fmsg828262.html&data=02%7C01%7Csroland%40vmware.com%7C47c14d644bc2401449bf08d8441419ba%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637334200428210022&sdata=uky0qUPe9dYnChyx9aHzOyHg1a12xaLQIUmMqOkigfk%3D&reserved=0
> 
> Among above bad commits, followings are marked bad due to the boot crash, not 
> the Xwayland crash.
> 
> 395be0f47a001a975d25a3e2a0bfe4f08ee193fa
> 33c6855bda1b39eae88f3730d2baddce3bfd
> 247c12fcba30ecb9473fe6b6c0375d46916417c9
> 04f08f8831779e91fc59570dbbd3b7815c229d92
> 35205ee9ba46dcb2a82dbb981ec5fb242c4d847d
> 3f1f6981afed9fa21efa12ce396b35ca684b8a29
> 2ddef17678bc2ea1d20517dd2b4ed4aa967ffa8b
> 
> 
> And following is journalctl log.
> 
> ===
> Aug 14 15:26:39 localhost.localdomain org.gnome.Shell.desktop[1093]: VMware: 
> vmw_ioctl_command error Cannot allocate memory.
> Aug 14 15:26:39 localhost.localdomain org.gnome.Shell.desktop[1093]: (EE)
> Aug 14 15:26:39 localhost.localdomain org.gnome.Shell.desktop[1093]: (EE) 
> Backtrace:
> Aug 14 15:26:39 localhost.localdomain org.gnome.Shell.desktop[1093]: (EE) 0: 
> /usr/bin/Xwayland (XNFreallocarray+0xb9) [0x562e51c93bb>
> Aug 14 15:26:39 localhost.localdomain org.gnome.Shell.desktop[1093]: (EE) 1: 
> /lib64/libp

Re: [RESEND 00/53] Rid GPU from W=1 warnings

2021-03-05 Thread Roland Scheidegger
The vmwgfx ones look all good to me, so for
23-53: Reviewed-by: Roland Scheidegger 
That said, they were already signed off by Zack, so not sure what
happened here.

Roland

On 03.03.21 14:42, Lee Jones wrote:
> This is a resend.  All of these patches have been sent before.
> 
> The vmwgfx ones were even applied, but were dropped for some reason.
> 
> Lee Jones (53):
>   drm/nouveau/nvkm/subdev/bios/init: Demote obvious abuse of kernel-doc
>   drm/nouveau/dispnv50/disp: Remove unused variable 'ret'
>   drm/msm/dp/dp_display: Remove unused variable 'hpd'
>   drm/amd/display/dc/bios/command_table: Remove unused variable
>   include: drm: drm_atomic: Make use of 'new_plane_state'
>   drm/nouveau/nvkm/subdev/volt/gk20a: Demote non-conformant kernel-doc
> headers
>   drm/amd/display/dc/bios/command_table: Remove unused variable and
> associated comment
>   drm/amd/display/dc/calcs/dce_calcs: Move some large variables from the
> stack to the heap
>   drm/amd/display/dc/calcs/dce_calcs: Remove some large variables from
> the stack
>   drm/amd/display/dc/dce/dce_aux: Remove duplicate line causing 'field
> overwritten' issue
>   drm/amd/display/dc/dce80/dce80_resource: Make local functions static
>   drm/nouveau/nvkm/engine/gr/gf100: Demote non-conformant kernel-doc
> header
>   drm/nouveau/nouveau_bo: Remove unused variables 'dev'
>   drm/nouveau/nouveau_display: Remove set but unused variable 'width'
>   drm/nouveau/dispnv04/crtc: Demote non-conforming kernel-doc headers
>   drm/nouveau/dispnv50/disp: Remove unused variable 'ret' from function
> returning void
>   drm/nouveau/dispnv50/headc57d: Make local function 'headc57d_olut'
> static
>   drm/nouveau/nv50_display: Remove superfluous prototype for local
> static functions
>   drm/nouveau/dispnv50/disp: Include header containing our prototypes
>   drm/nouveau/nouveau_ioc32: File headers are not good candidates for
> kernel-doc
>   drm/nouveau/nouveau_svm: Remove unused variable 'ret' from void
> function
>   drm/nouveau/nouveau_ioc32: Demote kernel-doc abuse to standard comment
> block
>   drm/vmwgfx/vmwgfx_execbuf: Fix some kernel-doc related issues
>   drm/vmwgfx/vmwgfx_kms: Remove unused variable 'ret' from
> 'vmw_du_primary_plane_atomic_check()'
>   drm/vmwgfx/vmwgfx_kms: Mark vmw_{cursor,primary}_plane_formats as
> __maybe_unused
>   drm/vmwgfx/vmwgfx_drv: Fix some kernel-doc misdemeanours
>   drm/vmwgfx/vmwgfx_ioctl: Provide missing '@' sign required by
> kernel-doc
>   drm/vmwgfx/vmwgfx_resource: Fix worthy function headers demote some
> others
>   drm/vmwgfx/vmwgfx_ttm_buffer: Supply some missing parameter
> descriptions
>   drm/vmwgfx/vmwgfx_fifo: Demote non-conformant kernel-doc header
>   drm/vmwgfx/vmwgfx_ldu: Supply descriptions for 'state' function
> parameter
>   drm/vmwgfx/vmwgfx_kms: Update worthy function headers and demote
> others
>   drm/vmwgfx/vmwgfx_overlay: Demote kernel-doc abuses to standard
> comment blocks
>   drm/vmwgfx/vmwgfx_fence: Add, remove and demote various documentation
> params/headers
>   drm/vmwgfx/vmwgfx_bo: Remove superfluous param description and supply
> another
>   drm/vmwgfx/vmwgfx_context: Demote kernel-doc abuses
>   drm/vmwgfx/vmwgfx_scrn: Demote unworthy kernel-doc headers and update
> others
>   drm/vmwgfx/vmwgfx_surface: Fix some kernel-doc related issues
>   drm/vmwgfx/vmwgfx_cmdbuf_res: Rename param description and remove
> another
>   drm/vmwgfx/vmwgfx_shader: Demote kernel-doc abuses and fix-up worthy
> headers
>   drm/vmwgfx/vmwgfx_cmdbuf: Fix a bunch of missing or incorrectly
> formatted/named params
>   drm/vmwgfx/vmwgfx_cmdbuf_res: Remove unused variable 'ret'
>   drm/vmwgfx/vmwgfx_stdu: Add some missing param/member descriptions
>   drm/vmwgfx/vmwgfx_cmdbuf: Fix misnaming of 'headers' should be plural
>   drm/vmwgfx/vmwgfx_cotable: Fix a couple of simple documentation
> problems
>   drm/vmwgfx/vmwgfx_so: Add description for 'vmw_view's 'rcu' member
>   drm/vmwgfx/vmwgfx_binding: Provide some missing param descriptions and
> remove others
>   drm/vmwgfx/vmwgfx_msg: Fix misspelling of 'msg'
>   drm/vmwgfx/vmwgfx_blit: Add description for 'vmw_bo_cpu_blit's 'diff'
> param
>   drm/vmwgfx/vmwgfx_validation: Add some missing struct member/function
> param descriptions
>   drm/vmwgfx/ttm_object: Demote half-assed headers and fix-up another
>   drm/vmwgfx/vmwgfx_thp: Add description for 'vmw_thp_manager's member
> '

Re: [patch 2/7] drm/vmgfx: Replace kmap_atomic()

2021-03-05 Thread Roland Scheidegger
On 03.03.21 14:20, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> There is no reason to disable pagefaults and preemption as a side effect of
> kmap_atomic_prot().
> 
> Use kmap_local_page_prot() instead and document the reasoning for the
> mapping usage with the given pgprot.
> 
> Remove the NULL pointer check for the map. These functions return a valid
> address for valid pages and the return was bogus anyway as it would have
> left preemption and pagefaults disabled.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: VMware Graphics 
> Cc: Roland Scheidegger 
> Cc: Zack Rusin 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: dri-de...@lists.freedesktop.org
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_blit.c |   30 --
>  1 file changed, 12 insertions(+), 18 deletions(-)
> 
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
> @@ -375,12 +375,12 @@ static int vmw_bo_cpu_blit_line(struct v
>   copy_size = min_t(u32, copy_size, PAGE_SIZE - src_page_offset);
>  
>   if (unmap_src) {
> - kunmap_atomic(d->src_addr);
> + kunmap_local(d->src_addr);
>   d->src_addr = NULL;
>   }
>  
>   if (unmap_dst) {
> - kunmap_atomic(d->dst_addr);
> + kunmap_local(d->dst_addr);
>   d->dst_addr = NULL;
>   }
>  
> @@ -388,12 +388,8 @@ static int vmw_bo_cpu_blit_line(struct v
>   if (WARN_ON_ONCE(dst_page >= d->dst_num_pages))
>   return -EINVAL;
>  
> - d->dst_addr =
> - kmap_atomic_prot(d->dst_pages[dst_page],
> -  d->dst_prot);
> - if (!d->dst_addr)
> - return -ENOMEM;
> -
> + d->dst_addr = 
> kmap_local_page_prot(d->dst_pages[dst_page],
> +d->dst_prot);
>   d->mapped_dst = dst_page;
>   }
>  
> @@ -401,12 +397,8 @@ static int vmw_bo_cpu_blit_line(struct v
>   if (WARN_ON_ONCE(src_page >= d->src_num_pages))
>   return -EINVAL;
>  
> - d->src_addr =
> - kmap_atomic_prot(d->src_pages[src_page],
> -  d->src_prot);
> - if (!d->src_addr)
> - return -ENOMEM;
> -
> + d->src_addr = 
> kmap_local_page_prot(d->src_pages[src_page],
> +d->src_prot);
>   d->mapped_src = src_page;
>   }
>   diff->do_cpy(diff, d->dst_addr + dst_page_offset,
> @@ -436,8 +428,10 @@ static int vmw_bo_cpu_blit_line(struct v
>   *
>   * Performs a CPU blit from one buffer object to another avoiding a full
>   * bo vmap which may exhaust- or fragment vmalloc space.
> - * On supported architectures (x86), we're using kmap_atomic which avoids
> - * cross-processor TLB- and cache flushes and may, on non-HIGHMEM systems
> + *
> + * On supported architectures (x86), we're using kmap_local_prot() which
> + * avoids cross-processor TLB- and cache flushes. kmap_local_prot() will
> + * either map a highmem page with the proper pgprot on HIGHMEM=y systems or
>   * reference already set-up mappings.
>   *
>   * Neither of the buffer objects may be placed in PCI memory
> @@ -500,9 +494,9 @@ int vmw_bo_cpu_blit(struct ttm_buffer_ob
>   }
>  out:
>   if (d.src_addr)
> - kunmap_atomic(d.src_addr);
> + kunmap_local(d.src_addr);
>   if (d.dst_addr)
> - kunmap_atomic(d.dst_addr);
> + kunmap_local(d.dst_addr);
>  
>   return ret;
>  }
> 
> 

Seems reasonable to me.
Reviewed-by: Roland Scheidegger 



Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm

2016-12-13 Thread Roland Scheidegger
Am 13.12.2016 um 14:14 schrieb Thomas Gleixner:
> Roland reported interesting TSC ADJUST register wreckage on his DELL
> machine, which seems to populate that MSR with a random number generator.

FWIW, I thought about the actual values some more and I don't actually
think they are all that random any more: the behavior is consistent with
the bios trying to zero the TSC of all cpus. If I understand this right,
writing a zero to TSC would cause somewhat small negative values in the
TSC_ADJ register at boot time, and larger negative values at suspend
time (at least if the TSC just stops when suspended and isn't reset) -
exactly what I'm seeing.
(And of course the different TSC_ADJ values would be because the bios is
writing TSC without any thoughts of synchronization, just one cpu after
another).

> 
> Deeper investagation into fixing this wreckage unearthed another special
> feature which is designed by Intel: Negative TSC adjuste values cause
> interrupt storms on the TSC deadline timer. Further details in patch 2/2

This actually looks like quite a serious hw bug to me, shouldn't there
be an errata for such a bug?

And I still don't quite understand why the lockup doesn't happen after a
warm boot, there must be something different there...

(I didn't have the chance to test the patch yet.)

Roland




Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm

2016-12-13 Thread Roland Scheidegger
Am 13.12.2016 um 17:46 schrieb Thomas Gleixner:
> On Tue, 13 Dec 2016, Roland Scheidegger wrote:
> 
>> Am 13.12.2016 um 14:14 schrieb Thomas Gleixner:
>>> Roland reported interesting TSC ADJUST register wreckage on his DELL
>>> machine, which seems to populate that MSR with a random number generator.
>>
>> FWIW, I thought about the actual values some more and I don't actually
>> think they are all that random any more: the behavior is consistent with
>> the bios trying to zero the TSC of all cpus. If I understand this right,
>> writing a zero to TSC would cause somewhat small negative values in the
>> TSC_ADJ register at boot time, and larger negative values at suspend
>> time (at least if the TSC just stops when suspended and isn't reset) -
>> exactly what I'm seeing.
>> (And of course the different TSC_ADJ values would be because the bios is
>> writing TSC without any thoughts of synchronization, just one cpu after
>> another).
> 
> Yeah, that might be. Still it looks like random nonsense and definitely the
> BIOS developers did not follow the secrit boot protocol.
> 
>>> Deeper investagation into fixing this wreckage unearthed another special
>>> feature which is designed by Intel: Negative TSC adjuste values cause
>>> interrupt storms on the TSC deadline timer. Further details in patch 2/2
>>
>> This actually looks like quite a serious hw bug to me, shouldn't there
>> be an errata for such a bug?
>>
>> And I still don't quite understand why the lockup doesn't happen after a
>> warm boot, there must be something different there...
> 
> What are the adjust values after a warm boot?
> 

So, after cold boot with a kernel which doesn't adjust TSCs, then warm
boot I got:
[0.00] TSC ADJUST: CPU0: -602358264300 176072418728
[0.00] TSC ADJUST: Boot CPU0: -602358264300
[0.172245] TSC ADJUST: CPU1: -602360207584 176587932558
[0.172245] TSC ADJUST differs: Reference CPU0: -602358264300 CPU1:
-602360207584
[0.172246] TSC ADJUST synchronize: Reference CPU0: -602358264300
CPU1: -602360207584
[0.252663] TSC ADJUST: CPU2: -602359000822 176828627154
[0.252663] TSC ADJUST differs: Reference CPU0: -602358264300 CPU2:
-602359000822
[0.252664] TSC ADJUST synchronize: Reference CPU0: -602358264300
CPU2: -602359000822
[0.337014] TSC ADJUST: CPU3: -602360177680 177081093132
[0.337014] TSC ADJUST differs: Reference CPU0: -602358264300 CPU3:
-602360177680
[0.337015] TSC ADJUST synchronize: Reference CPU0: -602358264300
CPU3: -602360177680

and so on.

Albeit after another reboot (some minutes later), it actually straight
locked up again:

TSC ADJUST: CPU1: -8257481427958 165112676430
TSC ADJUST differs: Reference CPU0: -8257479484330 CPU1: -8257481427958
TSC ADJUST synchronize: Reference CPU0: -8257479484330 CPU1: -8254781427958
TSC target sync skip
...
smpboot: Target CPU is online

So, actually I thought the TSC would get reset too on warm boot, but
clearly looks like that isn't the case...
But I don't know what's the difference between first and second reboot -
the adjust values have just more magnitude, but otherwise even the
direction of the adjustments and everything looks all the same (just
like cold boot, which also looks all the same to me).

Roland



Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm

2016-12-14 Thread Roland Scheidegger
Am 14.12.2016 um 22:40 schrieb Thomas Gleixner:
> On Wed, 14 Dec 2016, Thomas Gleixner wrote:
>> Positive space, results in timer not firing anymore - at least not in a
>> time frame you are willing to wait for.
>>
>>  0x  8000 
>>
>> Negative space, results in an interrupt storm.
>>
>>  0x   
>>  0x fffe  
>>  0x fffd  
>>  0x fffc  
>>  0x fffb  
>>  
>>
>> These points are independent of the underlying counter value (cold boot,
>> warm boot) and even reproduce after hours of power on reliably.
>>
>> And looking at the values makes me wonder about 32bit vs. 64bit wreckage
>> combined with sign expansion done wrong. Im really impressed!
> 
> And the whole mess stems from the fact that the deadline is not as one
> would expect simply compared against the sum of the counter and the adjust
> MSR.
Why would it be compared against the sum? As far as I can tell the adjust
value should never come into play when using deadline timer (other than
indirectly because the TSC would change).
(And I'd guess better avoid an armed deadline timer while changing
TSC_ADJ...)

In any case, I've tested the two patches on top of x86-timers and they
work just fine - all TSC_ADJ values get set to zero both on boot and
resume, no lockups, and tsc clocksource active, with some whining in the
log of course :-).
So,
Tested-by: Roland Scheidegger 

> No, they subtract the adjust value from the MSR when you write the deadline
> and latch the result to compare it against the counter.
> 
> So when the following happens:
> 
>ADJUST = 0
>RDTSC  = 1000 
>DEADLINE   = 1100
> 
>ADJUST =  100
> 
>INTERRUPT
>RDTSC  = 1200
> 
>DEADLINE   = 1300
> 
>ADJUST =0
> 
>INTERRUPT
>RDTSC  = 1200
> 
> So depending on the direction of the adjustment the timer fires late or
> early.
> 
> Combined with that math wreckage this is a complete disaster. And of course
> nothing is documented anywhere and the SDM is outright wrong:
> 
> 10.5.4.1 TSC-Deadline Mode
> 
>   The processor generates a timer interrupt when the value of time-stamp
>   counter is greater than or equal to that of IA32_TSC_DEADLINE. It then
>   disarms the timer and clear the IA32_TSC_DEADLINE MSR. (Both the time-stamp
>   counter and the IA32_TSC_DEADLINE MSR are 64-bit unsigned integers.)
> 
> See the example above. 120 is neither equal nor greater than 130, at
> least not in my universe.
> 
> I serioulsy doubt that Intel manages it to design at least ONE functional
> non broken timer before I retire.
> 
> Thanks,
> 
>   tglx
> 



Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC

2016-12-09 Thread Roland Scheidegger
Am 09.12.2016 um 10:59 schrieb Thomas Gleixner:
> On Fri, 9 Dec 2016, Roland Scheidegger wrote:
>>
>> I saw some system lockups though:
>> When doing a cold boot, this kernel never managed to boot up. The last
>> message seen is:
>> x86: Booting SMP configuration:
>>  node  #0, CPUs:#1
> 
> Weird. That really would be interesting to figure out what goes wrong
> there. What bothers me is that we don't see something like this:
> 
>> [0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1:
>> -577423766270
> 
> Can you please apply the debug patch below and provide the output ?
Ok, this is the output (minus some typos maybe...):

x86: Booting SMP configuration:
 node #0, CPUs: #1
TSC ADJUST: CPU1: -2806491604
TSC source sync 0 -> 1 runs 3
TSC ADJUST differs: Reference CPU0: -2805503200 CPU1: -2806491604
TSC ADJUST synchronize: Reference CPU0: -2805503200 CPU1: -2806491604
TSC target sync skip
TSC source sync skipped

And that's it.


> 
> 
>> [0.094492] x86: Booting SMP configuration:
>> [0.094534]  node  #0, CPUs:#1
>> [0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1:
>> -577423766270
> 
> What on earth is this BIOS doing? That's a couple of minutes back in time.
Looks like that's only after a reset, after a cold boot the numbers are
significantly smaller (1 sec or so?). Though the difference between
these two is nearly a million cycles too, so might be similar between
min and max for all.


> 
> And the difference between the max and min adjust value is 2050932 cycles.
> 
>> Without the patches on cold boot it just was as expected:
>> [0.093700] x86: Booting SMP configuration:
>> [0.093737]  node  #0, CPUs:#1
>> [0.174304] TSC synchronization [CPU#0 -> CPU#1]:
>> [0.174375] Measured 1837188 cycles TSC warp between CPUs, turning
>> off TSC clock.
> 
> Not surprising given the above numbers.
> 

Hope that helps,

Roland




Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC

2016-12-09 Thread Roland Scheidegger
Am 09.12.2016 um 18:33 schrieb Thomas Gleixner:
> On Fri, 9 Dec 2016, Roland Scheidegger wrote:
>> Am 09.12.2016 um 10:59 schrieb Thomas Gleixner:
>>> On Fri, 9 Dec 2016, Roland Scheidegger wrote:
>>>>
>>>> I saw some system lockups though:
>>>> When doing a cold boot, this kernel never managed to boot up. The last
>>>> message seen is:
>>>> x86: Booting SMP configuration:
>>>>  node  #0, CPUs:#1
>>>
>>> Weird. That really would be interesting to figure out what goes wrong
>>> there. What bothers me is that we don't see something like this:
>>>
>>>> [0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1:
>>>> -577423766270
>>>
>>> Can you please apply the debug patch below and provide the output ?
>> Ok, this is the output (minus some typos maybe...):
>>
>> x86: Booting SMP configuration:
>>  node #0, CPUs: #1
>> TSC ADJUST: CPU1: -2806491604
>> TSC source sync 0 -> 1 runs 3
>> TSC ADJUST differs: Reference CPU0: -2805503200 CPU1: -2806491604
>> TSC ADJUST synchronize: Reference CPU0: -2805503200 CPU1: -2806491604
>> TSC target sync skip
>> TSC source sync skipped
>>
>> And that's it.
> 
> That's a cold boot (power on), right?
> 
> That looks like I expected. As we know that the CPUs are in the same
> package simply resynchronizing the TSC ADJUST MSR is enough. And the sync
> test is skipped as we have resynced TSC ADJUST already to be the same
> value.
> 
> What's unexpected is that the thing stops working :(
> 
> I'm pretty sure the boot cpu hangs in that endless loop waiting for CPU1 to
> set the online bit, but for whatever reason this does not happen.
> 
> Can you add the patch below to gather more information? There is a hunk in
> there with an '#if 0' which sets the TSC ADJUST to 0 on boot, which you can
> turn on as second step.

Ok, here's the results:

...
TSC ADJUST synchronize: Reference CPU0: -2820267100 CPU1: -2822498296
TSC target sync skipped
smpboot: Vector locked
smpboot: Vector setup done
smpboot: Clock setup
TSC source sync skipped
smpboot: Target CPU is online


With the #if 0 block activated, it boots up fine, the output was:

[1.038892] x86: Booting SMP configuration:
[1.038930]  node  #0, CPUs:#1
[0.171851] TSC ADJUST: CPU1: -2830353064 218577682002
[1.117495] TSC source sync 0 -> 1 runs 3
[0.171852] TSC ADJUST differs: Reference CPU0: -2828600940 CPU1:
-2830353064
[0.171853] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2830353064
[1.117497] TSC target sync skip
[1.117497] smpboot: Vector locked
[1.117506] smpboot: Vector setup done
[1.117508] smpboot: Clock setup
[1.117510] smpboot: Enter idle startup
[1.117860] TSC source sync skipped
[1.117896] smpboot: Target CPU is online
[1.117990]   #2
[0.252336] TSC ADJUST: CPU2: -2828948550 218818578394
[1.197511] TSC source sync 0 -> 2 runs 3
[0.252337] TSC ADJUST differs: Reference CPU0: -2828600940 CPU2:
-2828948550
[0.252338] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -2828948550
[1.197513] TSC target sync skip
[1.197513] smpboot: Vector locked
[1.197523] smpboot: Vector setup done
[1.197524] smpboot: Clock setup
[1.197527] smpboot: Enter idle startup
[1.197866] TSC source sync skipped
[1.197902] smpboot: Target CPU is online
[1.197997]   #3
[0.332199] TSC ADJUST: CPU3: -2829409422 219057612986
[1.277528] TSC source sync 0 -> 3 runs 3
[0.332201] TSC ADJUST differs: Reference CPU0: -2828600940 CPU3:
-2829409422
[0.332202] TSC ADJUST synchronize: Reference CPU0: 0 CPU3: -2829409422
[1.277530] TSC target sync skip
[1.277530] smpboot: Vector locked
[1.277540] smpboot: Vector setup done
[1.277542] smpboot: Clock setup
[1.277544] smpboot: Enter idle startup
[1.277884] TSC source sync skipped
[1.277920] smpboot: Target CPU is online
[1.278016]   #4
[0.411955] TSC ADJUST: CPU4: -2830194520 219296322250
[1.357546] TSC source sync 0 -> 4 runs 3
[0.411956] TSC ADJUST differs: Reference CPU0: -2828600940 CPU4:
-2830194520
[0.411957] TSC ADJUST synchronize: Reference CPU0: 0 CPU4: -2830194520
[1.357548] TSC target sync skip
[1.357548] smpboot: Vector locked
[1.357558] smpboot: Vector setup done
[1.357560] smpboot: Clock setup
[1.357562] smpboot: Enter idle startup
[1.357901] TSC source sync skipped
[1.357937] smpboot: Target CPU is online
[1.358030]   #5
[0.491957] TSC ADJUST: CPU5: -2830235658 219535775620
[1.437562] TSC source sync 0 -> 5 runs 3
[0.491959] TSC ADJUST differs: Reference CPU0: -2828600940 CPU5:
-2830235658
[0.491960] TSC AD

Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC

2016-12-09 Thread Roland Scheidegger
Am 09.12.2016 um 23:59 schrieb Thomas Gleixner:
> On Fri, 9 Dec 2016, Roland Scheidegger wrote:
> 
> Cc'ed someone from Dell. 
> 
>> Am 09.12.2016 um 18:33 schrieb Thomas Gleixner:
>>> Can you add the patch below to gather more information? There is a hunk in
>>> there with an '#if 0' which sets the TSC ADJUST to 0 on boot, which you can
>>> turn on as second step.
>>
>> Ok, here's the results:
>> ...
>> TSC ADJUST synchronize: Reference CPU0: -2820267100 CPU1: -2822498296
>> TSC target sync skipped
>> smpboot: Vector locked
>> smpboot: Vector setup done
>> smpboot: Clock setup
>> TSC source sync skipped
>> smpboot: Target CPU is online
> 
> I did not expect that to happen. Now I'm puzzled and curious where the
> machine gets lost after that. See below.
> 
>> With the #if 0 block activated, it boots up fine, the output was:
> 
> That does not make any sense at all, but yes, nothing in this context makes
> sense.
> 
>> [1.038892] x86: Booting SMP configuration:
>> [1.038930]  node  #0, CPUs:#1
>> [0.171851] TSC ADJUST: CPU1: -2830353064 218577682002
>> [1.117495] TSC source sync 0 -> 1 runs 3
>> [0.171852] TSC ADJUST differs: Reference CPU0: -2828600940 CPU1:
>> -2830353064
>> [0.171853] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2830353064
>> [1.117497] TSC target sync skip
> 
>> (And fwiw with my quick hack the lockups disappear to when I change that
>> back to blast a zero into TSC_ADJ for all cpus.)
> 
> Right, That's what that hunk does as well.
> 
> Now what's interesting is that the adjustement of CPU1 in the non write to
> zero case results in the following:
> 
> TSC ADJUST: CPU1: -2830353064 218577682002 <-- TSC value
> TSC ADJUST differs: Reference CPU0: -2828600940 CPU1: -2830353064
> 
> We write CPU1 adjust register to -2828600940 which makes the TSC on CPU1
> jump forwards by -2828600940 - -2830353064 = 1752124 cycles.
> 
> In the write to zero case the jump is forward as well, but this time it's
> huge, i.e. 2830353064 cycles.
> 
> I tried to wreckage the TSC by writing similar values to the adjust MSR on
> early boot, but independent of the values and independent of the write to
> zero part the machine comes up happily all the time.
> 
> The only difference is that my machine has a somewhat saner BIOS. So the
> thing might just die in the value add SMM crap, but who knows.
> 
> In the patch below is another bunch of debug prints which emit the state
> information of CPU1 during bringup. Maybe that gives a hint where the
> system gets stuck when you disable the 'write to zero' magic again.
> 
> The NMI watchdog does not catch anything, right?
Nope. (Though as mentioned earlier, with my hack when not writing zero
it did - but the lockup there was later after all 16 cpus were online,
and I only really tried that with the ubuntu 4.4 kernel. I never got to
see the full output from that NMI though due to limited screen space, my
attempts to try anything different than text mode were met with a blank
screen, and from the parts I did see I didn't really see anything
interesting albeit that's not saying much as I really have no idea about
that code...)

With the new patch here's the output (albeit the typing gets a bit
annoying...)
...
Invoking state 32 CB replay_prepare_cpu+0x0/0xe0
CB returned 0
Invoking state 35 CB rcutree_prepare_cpu+0x0/0x50
CB returned 0
Invoking state 41 CB notify_prepare+0x0/0xa0
CBreturned 0
Invoking state 48 CB bringup_cpu+0x0/0x90
x86: Booting SMP configuration:
 node  #0, CPUs:   #1
Invoking state 51 CB sched_cpu_starting+0x0/0x60
CB returned 0
Invoking state 62 CB x86_pmu_starting_cpu+0x0/0x20
CB returned 0
TSC ADJUST: CPU1: -2846131604 175264328618
TSC ADJUST differs: Reference CPU0: -2843967660 CPU1: -2846131604
TSC ADJUST synchronize: Reference CPU0: -2843967660 CPU1: -2846131604
TSC target sync skip
smpboot: Vector locked
TSC source sync 0 -> 1 runs 3
smpboot: Vector setup done
smpboot: Clock setup
TSC source sync skipped
smpboot: Target CPU is online


> 
>> The system also came back up fine from suspend with this (well - still
>> minus graphics...), however disabled tsc clocksource:
>>
>> [  579.931739] Enabling non-boot CPUs ...
>> [  579.943107] smpboot: Booting Node 0 Processor 1 APIC 0x2
>> [  579.943189] TSC ADJUST: CPU1: -1504429974 21601834126
> 
> Fun, yet another adjust value. Are they set by a random number generator?
> 
>> [  579.944093] CPU1 is up
> 
>> [  580.458983] clocksource: timekeeping watchdog on CPU1: Marking
>> clocksource 'tsc' as unstable because the skew is 

Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC

2016-12-09 Thread Roland Scheidegger
Am 10.12.2016 um 02:55 schrieb Roland Scheidegger:
> Am 09.12.2016 um 23:59 schrieb Thomas Gleixner:
>> On Fri, 9 Dec 2016, Roland Scheidegger wrote:
>>
>> Cc'ed someone from Dell. 
>>
>>> Am 09.12.2016 um 18:33 schrieb Thomas Gleixner:
>>>> Can you add the patch below to gather more information? There is a hunk in
>>>> there with an '#if 0' which sets the TSC ADJUST to 0 on boot, which you can
>>>> turn on as second step.
>>>
>>> Ok, here's the results:
>>> ...
>>> TSC ADJUST synchronize: Reference CPU0: -2820267100 CPU1: -2822498296
>>> TSC target sync skipped
>>> smpboot: Vector locked
>>> smpboot: Vector setup done
>>> smpboot: Clock setup
>>> TSC source sync skipped
>>> smpboot: Target CPU is online
>>
>> I did not expect that to happen. Now I'm puzzled and curious where the
>> machine gets lost after that. See below.
>>
>>> With the #if 0 block activated, it boots up fine, the output was:
>>
>> That does not make any sense at all, but yes, nothing in this context makes
>> sense.
>>
>>> [1.038892] x86: Booting SMP configuration:
>>> [1.038930]  node  #0, CPUs:#1
>>> [0.171851] TSC ADJUST: CPU1: -2830353064 218577682002
>>> [1.117495] TSC source sync 0 -> 1 runs 3
>>> [0.171852] TSC ADJUST differs: Reference CPU0: -2828600940 CPU1:
>>> -2830353064
>>> [0.171853] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2830353064
>>> [1.117497] TSC target sync skip
>>
>>> (And fwiw with my quick hack the lockups disappear to when I change that
>>> back to blast a zero into TSC_ADJ for all cpus.)
>>
>> Right, That's what that hunk does as well.
>>
>> Now what's interesting is that the adjustement of CPU1 in the non write to
>> zero case results in the following:
>>
>> TSC ADJUST: CPU1: -2830353064 218577682002 <-- TSC value
>> TSC ADJUST differs: Reference CPU0: -2828600940 CPU1: -2830353064
>>
>> We write CPU1 adjust register to -2828600940 which makes the TSC on CPU1
>> jump forwards by -2828600940 - -2830353064 = 1752124 cycles.
>>
>> In the write to zero case the jump is forward as well, but this time it's
>> huge, i.e. 2830353064 cycles.
>>
>> I tried to wreckage the TSC by writing similar values to the adjust MSR on
>> early boot, but independent of the values and independent of the write to
>> zero part the machine comes up happily all the time.
>>
>> The only difference is that my machine has a somewhat saner BIOS. So the
>> thing might just die in the value add SMM crap, but who knows.
>>
>> In the patch below is another bunch of debug prints which emit the state
>> information of CPU1 during bringup. Maybe that gives a hint where the
>> system gets stuck when you disable the 'write to zero' magic again.
>>
>> The NMI watchdog does not catch anything, right?
> Nope. (Though as mentioned earlier, with my hack when not writing zero
> it did - but the lockup there was later after all 16 cpus were online,
> and I only really tried that with the ubuntu 4.4 kernel. I never got to
> see the full output from that NMI though due to limited screen space, my
> attempts to try anything different than text mode were met with a blank
> screen, and from the parts I did see I didn't really see anything
> interesting albeit that's not saying much as I really have no idea about
> that code...)
> 
> With the new patch here's the output (albeit the typing gets a bit
> annoying...)
> ...
> Invoking state 32 CB replay_prepare_cpu+0x0/0xe0
> CB returned 0
> Invoking state 35 CB rcutree_prepare_cpu+0x0/0x50
> CB returned 0
> Invoking state 41 CB notify_prepare+0x0/0xa0
> CBreturned 0
> Invoking state 48 CB bringup_cpu+0x0/0x90
> x86: Booting SMP configuration:
>  node  #0, CPUs:   #1
> Invoking state 51 CB sched_cpu_starting+0x0/0x60
> CB returned 0
> Invoking state 62 CB x86_pmu_starting_cpu+0x0/0x20
> CB returned 0
> TSC ADJUST: CPU1: -2846131604 175264328618
> TSC ADJUST differs: Reference CPU0: -2843967660 CPU1: -2846131604
> TSC ADJUST synchronize: Reference CPU0: -2843967660 CPU1: -2846131604
> TSC target sync skip
> smpboot: Vector locked
> TSC source sync 0 -> 1 runs 3
> smpboot: Vector setup done
> smpboot: Clock setup
> TSC source sync skipped
> smpboot: Target CPU is online


Ok I did some more digging. Since it appeared it never returned from
x86_cpuinit.setup_percpu_clockev() I followed that a bit more. This is
using the tsc deadline timer, ending up in clockevents_register_device()
finally. This executes all well except the raw_spin_unlock_irqrestore()
at the end which we never get past.

I disabled the tsc deadline timer (lapic=notscdeadline) and indeed, no
more lockups!
So could there be something be wrong with setting this up? Warping past
some event due to resynchronization or something?
Or hitting some bugs with TSC deadline interrupts?
Anyway, that's definitely out of my area of knowledge, hope it helps...

Roland



Re: [PATCH][next] drm/vmwgfx: Use struct_size() helper

2020-06-22 Thread Roland Scheidegger
I've commited this to our next branch, thanks!
(I'm actually kind of impressed this can be found automatically...)

Roland

Am 17.06.20 um 23:51 schrieb Gustavo A. R. Silva:
> Make use of the struct_size() helper instead of an open-coded version
> in order to avoid any potential type mistakes.
> 
> This code was detected with the help of Coccinelle and, audited and
> fixed manually.
> 
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
> index 126f93c0b0b8..3914bfee0533 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
> @@ -1969,7 +1969,7 @@ static int vmw_surface_dirty_alloc(struct vmw_resource 
> *res)
>   num_mip = 1;
>  
>   num_subres = num_layers * num_mip;
> - dirty_size = sizeof(*dirty) + num_subres * sizeof(dirty->boxes[0]);
> + dirty_size = struct_size(dirty, boxes, num_subres);
>   acc_size = ttm_round_pot(dirty_size);
>   ret = ttm_mem_global_alloc(vmw_mem_glob(res->dev_priv),
>  acc_size, &ctx);
> 



Re: [PATCH AUTOSEL 5.8 10/12] drm/vmwgfx: Fix error handling in get_node

2020-10-05 Thread Roland Scheidegger
Not entirely sure how the patches for autosel were selected, but this
patch is no good for 5.8, since the patch which introduced the breakage
in the first place is only in 5.9 (in particular it was
58e4d686d456c3e356439ae160ff4a0728940b8e, drm/ttm: cleanup
ttm_mem_type_manager_func.get_node interface v3), and at least I don't
see that one being backported to 5.8.

Roland

Am 05.10.20 um 16:44 schrieb Sasha Levin:
> From: Zack Rusin 
> 
> [ Upstream commit f54c4442893b8dfbd3aff8e903c54dfff1aef990 ]
> 
> ttm_mem_type_manager_func.get_node was changed to return -ENOSPC
> instead of setting the node pointer to NULL. Unfortunately
> vmwgfx still had two places where it was explicitly converting
> -ENOSPC to 0 causing regressions. This fixes those spots by
> allowing -ENOSPC to be returned. That seems to fix recent
> regressions with vmwgfx.
> 
> Signed-off-by: Zack Rusin 
> Reviewed-by: Roland Scheidegger 
> Reviewed-by: Martin Krastev 
> Sigend-off-by: Roland Scheidegger 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c | 2 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_thp.c   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c
> index 7da752ca1c34b..b93c558dd86e0 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c
> @@ -57,7 +57,7 @@ static int vmw_gmrid_man_get_node(struct 
> ttm_mem_type_manager *man,
>  
>   id = ida_alloc_max(&gman->gmr_ida, gman->max_gmr_ids - 1, GFP_KERNEL);
>   if (id < 0)
> - return (id != -ENOMEM ? 0 : id);
> + return id;
>  
>   spin_lock(&gman->lock);
>  
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c
> index b7c816ba71663..c8b9335bccd8d 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c
> @@ -95,7 +95,7 @@ static int vmw_thp_get_node(struct ttm_mem_type_manager 
> *man,
>   mem->start = node->start;
>   }
>  
> - return 0;
> + return ret;
>  }
>  
>  
> 



Re: [PATCH 5.8 050/124] drm/vmwgfx: Fix error handling in get_node

2020-10-13 Thread Roland Scheidegger
Hi,

this commit should NOT be applied to 5.8.
It fixes a regression introduced by
58e4d686d456c3e356439ae160ff4a0728940b8e (drm/ttm: cleanup
ttm_mem_type_manager_func.get_node interface v3) which is part of 5.9
but not 5.8.
Applying this to 5.8 will very likely break things. I don't know why it
ended up as a candidate for 5.8.

Roland

Am 12.10.20 um 15:30 schrieb Greg Kroah-Hartman:
> From: Zack Rusin 
> 
> [ Upstream commit f54c4442893b8dfbd3aff8e903c54dfff1aef990 ]
> 
> ttm_mem_type_manager_func.get_node was changed to return -ENOSPC
> instead of setting the node pointer to NULL. Unfortunately
> vmwgfx still had two places where it was explicitly converting
> -ENOSPC to 0 causing regressions. This fixes those spots by
> allowing -ENOSPC to be returned. That seems to fix recent
> regressions with vmwgfx.
> 
> Signed-off-by: Zack Rusin 
> Reviewed-by: Roland Scheidegger 
> Reviewed-by: Martin Krastev 
> Sigend-off-by: Roland Scheidegger 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c | 2 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_thp.c   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c
> index 7da752ca1c34b..b93c558dd86e0 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c
> @@ -57,7 +57,7 @@ static int vmw_gmrid_man_get_node(struct 
> ttm_mem_type_manager *man,
>  
>   id = ida_alloc_max(&gman->gmr_ida, gman->max_gmr_ids - 1, GFP_KERNEL);
>   if (id < 0)
> - return (id != -ENOMEM ? 0 : id);
> + return id;
>  
>   spin_lock(&gman->lock);
>  
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c
> index b7c816ba71663..c8b9335bccd8d 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c
> @@ -95,7 +95,7 @@ found_unlock:
>   mem->start = node->start;
>   }
>  
> - return 0;
> + return ret;
>  }
>  
>  
> 



Re: [Linux-graphics-maintainer] [PATCH] drm/vmwgfx: Return true in function vmw_fence_obj_signaled()

2020-05-14 Thread Roland Scheidegger
I've pulled that into our tree, thanks!

Roland

Am 07.05.20 um 13:07 schrieb Jason Yan:
> Fix the following coccicheck warning:
> 
> drivers/gpu/drm/vmwgfx/vmwgfx_fence.c:518:9-10: WARNING: return of 0/1
> in function 'vmw_fence_obj_signaled' with return type bool
> 
> Signed-off-by: Jason Yan 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> index 178a6cd1a06f..0f8d29397157 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> @@ -515,7 +515,7 @@ bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence)
>   struct vmw_fence_manager *fman = fman_from_fence(fence);
>  
>   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
> - return 1;
> + return true;
>  
>   vmw_fences_update(fman);
>  
> 



Re: [Linux-graphics-maintainer] [PATCH v3 15/25] drm: vmwgfx: fix common struct sg_table related issues

2020-05-11 Thread Roland Scheidegger
I'm not exactly an expert on this, but looks alright to me.
Acked-by: Roland Scheidegger 

Am 05.05.20 um 10:46 schrieb Marek Szyprowski:
> The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
> numer of the created entries in the DMA address space. However the
> subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
> called with the original number of the entries passed to dma_map_sg. The
> sg_table->nents in turn holds the result of the dma_map_sg call as stated
> in include/linux/scatterlist.h. A common mistake was to ignore a result
> of the dma_map_sg function and don't use the sg_table->orig_nents at all.
> 
> To avoid such issues, lets use common dma-mapping wrappers operating
> directly on the struct sg_table objects and adjust references to the
> nents and orig_nents respectively.
> 
> Signed-off-by: Marek Szyprowski 
> ---
> For more information, see '[PATCH v3 00/25] DRM: fix struct sg_table nents
> vs. orig_nents misuse' thread: https://lkml.org/lkml/2020/5/5/187
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 17 -
>  1 file changed, 4 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> index bf0bc46..e50ae8b 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> @@ -362,8 +362,7 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt 
> *vmw_tt)
>  {
>   struct device *dev = vmw_tt->dev_priv->dev->dev;
>  
> - dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents,
> - DMA_BIDIRECTIONAL);
> + dma_unmap_sgtable(dev, vmw_tt->sgt, DMA_BIDIRECTIONAL);
>   vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
>  }
>  
> @@ -383,16 +382,8 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt 
> *vmw_tt)
>  static int vmw_ttm_map_for_dma(struct vmw_ttm_tt *vmw_tt)
>  {
>   struct device *dev = vmw_tt->dev_priv->dev->dev;
> - int ret;
> -
> - ret = dma_map_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.orig_nents,
> -  DMA_BIDIRECTIONAL);
> - if (unlikely(ret == 0))
> - return -ENOMEM;
>  
> - vmw_tt->sgt.nents = ret;
> -
> - return 0;
> + return dma_map_sgtable(dev, vmw_tt->sgt, DMA_BIDIRECTIONAL);
>  }
>  
>  /**
> @@ -449,10 +440,10 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
>   if (unlikely(ret != 0))
>   goto out_sg_alloc_fail;
>  
> - if (vsgt->num_pages > vmw_tt->sgt.nents) {
> + if (vsgt->num_pages > vmw_tt->sgt.orig_nents) {
>   uint64_t over_alloc =
>   sgl_size * (vsgt->num_pages -
> - vmw_tt->sgt.nents);
> + vmw_tt->sgt.orig_nents);
>  
>   ttm_mem_global_free(glob, over_alloc);
>   vmw_tt->sg_alloc_size -= over_alloc;
>