https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108016
--- Comment #7 from Alexey Merzlyakov <alexey.merzlyakov at samsung dot com> --- > For DSE to kick in, I'm pretty sure we'd need to eliminate the memory load > first. Eliminating the memory load will likely be nontrivial. For this, I think we could start from loads that was partially filled by stores before in the same BB, e.g. something like depicted below: [value] <- store | V | {........|value} <- stack-frame (execution) V V V | [........|value] <- read in wider mode V And ensure that the rest of upper bits were not touched by any other store(s) in the function. I've locally added some sort of optimization into DSE. Like current replace_read() cases it finds store-load pairs, that operate with the same memory base and offset, while all bits in upper part of memory were never touched by any store. If it is OK, it tries to replace_read() by non-memory instruction; and delete_dead_store_insn() in case of replace success and if store could be deleted. The approach could look as pseudo-code below: void remove_part_stores() { FOR_EACH_BB_FN (bb, cfun) { FOR_EACH_READ_INSN (read_insn = bb->last_insn; read_insn--) { if (read_info->group_id >= 0 && read_info->group_id->frame_related) { // Read insn selected. Let's try to find pair store insn_stored = NULL; FOR_EACH_STORE_INSN (store_insn = read_insn; store_insn--) { if (store_insn->wild_read || store_insn->non_frame_wild_read) { // Wild read between store and load as it could do read // of any part of memory, breaks the optimization break; } if (store_info->group_id == read_info->group_id && store is inside read bounds) { // Nearest store insn found insn_stored = store_insn; break; } } if (insn_stored && remaining_frame_read_is_safe(read_info)) { // Try to replace read_insn with non-memory one if (replace_read(insn_stored, read_insn) && !insn_stored->cannot_delete) { // If store could be deleted - delete it delete_dead_store_insn (insn_stored); } } } } } } bool remaining_frame_read_is_safe() { It checks that the rest of cfun code does not contain: (1) Any other stores with same as read_info->group_id to the upper part of frame (2) Non-fixed memory stores (with store_rec->group_id < 0), as it could potentially touch the remaining bytes on frame; and thus could break the optimization } Unlike the rest of DSE where we are playing with stores and finding killing memory reads, in this optimization we are staring to dance from frame loads and then looking for the stores that may ruin the optimization. This approach requires to have store_info-s to be preserved. Thus remove_part_stores() could be inserted at the end of dse_step1, but before deleting cselib stores. At this stage, we have information about all stores, including non-fixed memory ones (with group_id < 0). I've simply tested optimization prototype on local RISC-V build with ~10K csmith tests and it seem to works well. May I ask you about the feedback - if this way of optimization suitable for DSE, so I may develop and test it further more seriously.