On Tue, May 05, 2015 at 10:36:24AM -0400, Peter Hurley wrote: > On 05/04/2015 12:52 AM, Mario Kleiner wrote: > > On 04/16/2015 03:03 PM, Daniel Vetter wrote: > >> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote: > >>> On 04/15/2015 01:31 PM, Daniel Vetter wrote: > >>>> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote: > >>>>> Hi Daniel, > >>>>> > >>>>> On 04/15/2015 03:17 AM, Daniel Vetter wrote: > >>>>>> This was a bit too much cargo-culted, so lets make it solid: > >>>>>> - vblank->count doesn't need to be an atomic, writes are always done > >>>>>> under the protection of dev->vblank_time_lock. Switch to an unsigned > >>>>>> long instead and update comments. Note that atomic_read is just a > >>>>>> normal read of a volatile variable, so no need to audit all the > >>>>>> read-side access specifically. > >>>>>> > >>>>>> - The barriers for the vblank counter seqlock weren't complete: The > >>>>>> read-side was missing the first barrier between the counter read and > >>>>>> the timestamp read, it only had a barrier between the ts and the > >>>>>> counter read. We need both. > >>>>>> > >>>>>> - Barriers weren't properly documented. Since barriers only work if > >>>>>> you have them on boths sides of the transaction it's prudent to > >>>>>> reference where the other side is. To avoid duplicating the > >>>>>> write-side comment 3 times extract a little store_vblank() helper. > >>>>>> In that helper also assert that we do indeed hold > >>>>>> dev->vblank_time_lock, since in some cases the lock is acquired a > >>>>>> few functions up in the callchain. > >>>>>> > >>>>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to > >>>>>> the vblank_wait ioctl. > >>>>>> > >>>>>> Cc: Chris Wilson <ch...@chris-wilson.co.uk> > >>>>>> Cc: Mario Kleiner <mario.kleiner...@gmail.com> > >>>>>> Cc: Ville Syrjälä <ville.syrj...@linux.intel.com> > >>>>>> Cc: Michel Dänzer <mic...@daenzer.net> > >>>>>> Signed-off-by: Daniel Vetter <daniel.vet...@intel.com> > >>>>>> --- > >>>>>> drivers/gpu/drm/drm_irq.c | 92 > >>>>>> ++++++++++++++++++++++++----------------------- > >>>>>> include/drm/drmP.h | 8 +++-- > >>>>>> 2 files changed, 54 insertions(+), 46 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c > >>>>>> index c8a34476570a..23bfbc61a494 100644 > >>>>>> --- a/drivers/gpu/drm/drm_irq.c > >>>>>> +++ b/drivers/gpu/drm/drm_irq.c > >>>>>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, > >>>>>> drm_vblank_offdelay, int, 0600); > >>>>>> module_param_named(timestamp_precision_usec, > >>>>>> drm_timestamp_precision, int, 0600); > >>>>>> module_param_named(timestamp_monotonic, drm_timestamp_monotonic, > >>>>>> int, 0600); > >>>>>> > >>>>>> +static void store_vblank(struct drm_device *dev, int crtc, > >>>>>> + unsigned vblank_count_inc, > >>>>>> + struct timeval *t_vblank) > >>>>>> +{ > >>>>>> + struct drm_vblank_crtc *vblank = &dev->vblank[crtc]; > >>>>>> + u32 tslot; > >>>>>> + > >>>>>> + assert_spin_locked(&dev->vblank_time_lock); > >>>>>> + > >>>>>> + if (t_vblank) { > >>>>>> + tslot = vblank->count + vblank_count_inc; > >>>>>> + vblanktimestamp(dev, crtc, tslot) = *t_vblank; > >>>>>> + } > >>>>>> + > >>>>>> + /* > >>>>>> + * vblank timestamp updates are protected on the write side with > >>>>>> + * vblank_time_lock, but on the read side done locklessly using a > >>>>>> + * sequence-lock on the vblank counter. Ensure correct ordering > >>>>>> using > >>>>>> + * memory barrriers. We need the barrier both before and also > >>>>>> after the > >>>>>> + * counter update to synchronize with the next timestamp write. > >>>>>> + * The read-side barriers for this are in > >>>>>> drm_vblank_count_and_time. > >>>>>> + */ > >>>>>> + smp_wmb(); > >>>>>> + vblank->count += vblank_count_inc; > >>>>>> + smp_wmb(); > >>>>> > >>>>> The comment and the code are each self-contradictory. > >>>>> > >>>>> If vblank->count writes are always protected by vblank_time_lock > >>>>> (something I > >>>>> did not verify but that the comment above asserts), then the trailing > >>>>> write > >>>>> barrier is not required (and the assertion that it is in the comment is > >>>>> incorrect). > >>>>> > >>>>> A spin unlock operation is always a write barrier. > >>>> > >>>> Hm yeah. Otoh to me that's bordering on "code too clever for my own > >>>> good". > >>>> That the spinlock is held I can assure. That no one goes around and does > >>>> multiple vblank updates (because somehow that code raced with the hw > >>>> itself) I can't easily assure with a simple assert or something similar. > >>>> It's not the case right now, but that can changes. > >>> > >>> The algorithm would be broken if multiple updates for the same vblank > >>> count were allowed; that's why it checks to see if the vblank count has > >>> not advanced before storing a new timestamp. > >>> > >>> Otherwise, the read side would not be able to determine that the > >>> timestamp is valid by double-checking that the vblank count has not > >>> changed. > >>> > >>> And besides, even if the code looped without dropping the spinlock, > >>> the correct write order would still be observed because it would still > >>> be executing on the same cpu. > >>> > >>> My objection to the write memory barrier is not about optimization; > >>> it's about correct code. > >> > >> Well diff=0 is not allowed, I guess I could enforce this with some > >> WARN_ON. And I still think my point of non-local correctness is solid. > >> With the smp_wmb() removed the following still works correctly: > >> > >> spin_lock(vblank_time_lock); > >> store_vblank(dev, crtc, 1, ts1); > >> spin_unlock(vblank_time_lock); > >> > >> spin_lock(vblank_time_lock); > >> store_vblank(dev, crtc, 1, ts2); > >> spin_unlock(vblank_time_lock); > >> > >> But with the smp_wmb(); removed the following would be broken: > >> > >> spin_lock(vblank_time_lock); > >> store_vblank(dev, crtc, 1, ts1); > >> store_vblank(dev, crtc, 1, ts2); > >> spin_unlock(vblank_time_lock); > >> > >> because the compiler/cpu is free to reorder the store for vblank->count > >> _ahead_ of the store for the timestamp. And that would trick readers into > >> believing that they have a valid timestamp when they potentially raced. > >> > >> Now you're correct that right now there's no such thing going on, and it's > >> unlikely to happen (given the nature of vblank updates). But my point is > >> that if we optimize this then the correctness can't be proven locally > >> anymore by just looking at store_vblank, but instead you must audit all > >> the callers. And leaking locking/barriers like that is too fragile design > >> for my taste. > >> > >> But you insist that my approach is broken somehow and dropping the smp_wmb > >> is needed for correctness. I don't see how that's the case at all. > > Daniel, > > I've been really busy this last week; my apologies for not replying promptly. > > > Fwiw, i spent some time reeducating myself about memory barriers (thanks > > for your explanations) and thinking about this, and the last version of > > your patch looks good to me. It also makes sense to me to leave that last > > smb_wmb() in place to make future use of the helper robust - for non-local > > correctness, to avoid having to audit all future callers of that helper. > > My concern wrt to unnecessary barriers in this algorithm is that the trailing > barrier now appears mandatory, when in fact it is not. > > Moreover, this algorithm is, in general, fragile and not designed to handle > random or poorly-researched changes.
Less fragility is exactly why I want that surplus barrier. But I've run out of new ideas for how to explain that ... > For example, if only the read and store operations are considered, it's > obviously > unsafe, since a read may unwittingly retrieve an store in progress. > > > CPU 0 | CPU 1 > | > /* vblank->count == 0 */ > | > drm_vblank_count_and_time() | store_vblank(.., inc = 2, ...) > | > cur_vblank <= LOAD vblank->count | > | tslot = vblank->count + 2 > | /* tslot == 2 */ > | STORE vblanktime[0] This line here is wrong, it should be "STORE vblanktime[2]" The "STORE vblanktime[0]" happened way earlier, before 2 smp_wmb and the previous updating of vblank->count. I'm also somewhat confused about how you to a line across both cpus for barriers because barriers only have cpu-local effects (which is why we always need a barrier on both ends of a transaction). In short I still don't follow what's wrong. -Daniel > - - - - - - - - - - - - - - - - - - - - - smp_wmb() - - - - - - - - - - > /* cur_vblank == 0 */ | > local <= LOAD vblanktime[0] | > smp_rmb - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | > * cpu has loaded the wrong timestamp * | > | > local <= LOAD vblank->count | > cur_vblank == local? | > yes - exit loop | > | vblank->count += 2 > - - - - - - - - - - - - - - - - - - - - - smp_wmb() - - - - - - - - - - > > Regards, > Peter Hurley > > > > I also tested your patch + a slightly modified version of Chris vblank > > delayed disable / instant query patches + my fixes using my own stress > > tests and hardware timing test equipment on both intel and nouveau, and > > everything seems to work fine. > > > > So i'm all for including this patch and it has my > > > > Reviewed-and-tested-by: Mario Kleiner <mario.kleiner...@gmail.com> > > > > I just sent out an updated version of my patches, so they don't conflict > > with this one and also fix a compile failure of drm/qxl with yours. > > > > Thanks, > > -mario > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx