https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108016

--- Comment #7 from Alexey Merzlyakov <alexey.merzlyakov at samsung dot com> ---
> For DSE to kick in, I'm pretty sure we'd need to eliminate the memory load
> first.  Eliminating the memory load will likely be nontrivial.
For this, I think we could start from loads that was partially filled by stores
before in the same BB, e.g. something like depicted below:

           [value] <- store                       |
               V                                  |
  {........|value} <- stack-frame            (execution)
    V     V    V                                  |
  [........|value] <- read in wider mode          V

And ensure that the rest of upper bits were not touched by any other store(s)
in the function.

I've locally added some sort of optimization into DSE. Like current
replace_read() cases it finds store-load pairs, that operate with the same
memory base and offset, while all bits in upper part of memory were never
touched by any store. If it is OK, it tries to replace_read() by non-memory
instruction; and delete_dead_store_insn() in case of replace success and if
store could be deleted. The approach could look as pseudo-code below:

  void remove_part_stores() {
    FOR_EACH_BB_FN (bb, cfun) {
      FOR_EACH_READ_INSN (read_insn = bb->last_insn; read_insn--) {
        if (read_info->group_id >= 0 && read_info->group_id->frame_related) {
          // Read insn selected. Let's try to find pair store
          insn_stored = NULL;
          FOR_EACH_STORE_INSN (store_insn = read_insn; store_insn--) {
            if (store_insn->wild_read || store_insn->non_frame_wild_read) {
              // Wild read between store and load as it could do read
              // of any part of memory, breaks the optimization
              break;
            }

            if (store_info->group_id == read_info->group_id &&
                store is inside read bounds) {
              // Nearest store insn found
              insn_stored = store_insn;
              break;
            }
          }

          if (insn_stored && remaining_frame_read_is_safe(read_info)) {
            // Try to replace read_insn with non-memory one
            if (replace_read(insn_stored, read_insn) &&
                !insn_stored->cannot_delete) {
              // If store could be deleted - delete it
              delete_dead_store_insn (insn_stored);
            }
          }
        }
      }
    }
  }

  bool remaining_frame_read_is_safe() {
    It checks that the rest of cfun code does not contain:
     (1) Any other stores with same as read_info->group_id to the upper part
         of frame
     (2) Non-fixed memory stores (with store_rec->group_id < 0), as it could
potentially
         touch the remaining bytes on frame; and thus could break the
optimization
  }

Unlike the rest of DSE where we are playing with stores and finding killing
memory reads, in this optimization we are staring to dance from frame loads and
then looking for the stores that may ruin the optimization. This approach
requires to have store_info-s to be preserved. Thus remove_part_stores() could
be inserted at the end of dse_step1, but before deleting cselib stores. At this
stage, we have information about all stores, including non-fixed memory ones
(with group_id < 0).

I've simply tested optimization prototype on local RISC-V build with ~10K
csmith tests and it seem to works well. May I ask you about the feedback - if
this way of optimization suitable for DSE, so I may develop and test it further
more seriously.
  • [Bug middle-end/108016] ... alexey.merzlyakov at samsung dot com via Gcc-bugs

Reply via email to