You need to ensure that the streaming loads do not execute speculatively too 
early (before the data to be copied has been produced).  The MFENCE at the 
start of the copy routine is sufficient to ensure that the loads don't see 
stale data older than that fence.

All loads must be completed at the time of their retirement.  There are no 
special ordering requirements to ensure completion of the streaming loads.

Joe


-----Original Message-----
From: Justen, Jordan L 
Sent: Thursday, October 01, 2015 21:20
To: Matt Turner; mesa-dev@lists.freedesktop.org
Cc: Nuzman, Joseph
Subject: Re: [Mesa-dev] [PATCH] mesa: Add missing _mm_mfence() before streaming 
loads.

On 2015-10-01 10:11:33, Matt Turner wrote:
> According to the Intel Software Development Manual:

How about a more specific doc location?

According to the Intel Software Development Manual (Volume 1: Basic 
Architecture, 12.10.3 Streaming Load Hint Instruction):

>    Streaming loads may be weakly ordered and may appear to software to
>    execute out of order with respect to other memory operations.
>    Software must explicitly use fences (e.g. MFENCE) if it needs to
>    preserve order among streaming loads or between streaming loads and
>    other memory operations.

Does this mean we need a mfence following the load as well?

-Jordan

> That is, a memory fence is needed to preserve the order between the 
> GPU writing the buffer and the streaming loads reading it back.
> 
> Reported-by: Joseph Nuzman <joseph.nuz...@intel.com>
> ---
>  src/mesa/main/streaming-load-memcpy.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/mesa/main/streaming-load-memcpy.c 
> b/src/mesa/main/streaming-load-memcpy.c
> index d7147af..32854b6 100644
> --- a/src/mesa/main/streaming-load-memcpy.c
> +++ b/src/mesa/main/streaming-load-memcpy.c
> @@ -54,16 +54,19 @@ _mesa_streaming_load_memcpy(void *restrict dst, 
> void *restrict src, size_t len)
>  
>        memcpy(d, s, MIN2(bytes_before_alignment_boundary, len));
>  
>        d = (char *)ALIGN((uintptr_t)d, 16);
>        s = (char *)ALIGN((uintptr_t)s, 16);
>        len -= MIN2(bytes_before_alignment_boundary, len);
>     }
>  
> +   if (len >= 64)
> +      _mm_mfence();
> +
>     while (len >= 64) {
>        __m128i *dst_cacheline = (__m128i *)d;
>        __m128i *src_cacheline = (__m128i *)s;
>  
>        __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0);
>        __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1);
>        __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2);
>        __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3);
> --
> 2.4.9
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to