On Mon, Sep 10, 2018 at 8:05 PM Jason Ekstrand <ja...@jlekstrand.net> wrote:
>
> The instruction scheduler is re-ordering loads which is causing fence
> values to be loaded after the value they're fencing.  In particular,
> consider the following pseudocode:
>
>     void try_use_a_thing(int idx)
>     {
>         bool ready = ssbo.arr[idx].ready;
>         vec4 data = ssbo.arr[idx].data;
>         if (ready)
>             use(data);
>     }
>
>     void write_a_thing(int idx, vec4 data)
>     {
>         ssbo.arr[idx].data = data;
>         ssbo.arr[idx].ready = true;
>     }
>
> Our current instruction scheduling scheme doesn't see any problem with
> re-ordering the load of "ready" with respect to the load of "data".
> However, if try_use_a_thing is called in one thread and write_a_thing is
> called in another thread, such re-ordering could cause invalid data to
> be used.  Normally, some re-ordering of loads is fine; however, with the
> Vulkan memory model, there are some additional guarantees that are
> provided particularly in the case of atomic loads which we currently
> don't differentiate in any way in the back-end.
>
> Obviously, we need to come up with something better for this than just
> shutting off the scheduler but I wanted to send this out earlier rather
> than later and provide the opportunity for a discussion.

so how about adding a bitmask with flags for these to load/store
intrinsics? I remember we still need a coherent bit to implement
coherent loads and stores for AMD. We could easily have a mask with
coherent+atomic+whatever and then pass that to the backend?

> ---
>  src/intel/compiler/brw_fs.cpp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 3f7f2b4c984..9df238a6f6a 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -6427,7 +6427,7 @@ fs_visitor::allocate_registers(unsigned 
> min_dispatch_width, bool allow_spilling)
>      * performance but increasing likelihood of allocating.
>      */
>     for (unsigned i = 0; i < ARRAY_SIZE(pre_modes); i++) {
> -      schedule_instructions(pre_modes[i]);
> +      //schedule_instructions(pre_modes[i]);
>
>        if (0) {
>           assign_regs_trivial();
> @@ -6478,7 +6478,7 @@ fs_visitor::allocate_registers(unsigned 
> min_dispatch_width, bool allow_spilling)
>
>     opt_bank_conflicts();
>
> -   schedule_instructions(SCHEDULE_POST);
> +   //schedule_instructions(SCHEDULE_POST);
>
>     if (last_scratch > 0) {
>        MAYBE_UNUSED unsigned max_scratch_size = 2 * 1024 * 1024;
> --
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to