https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116713
Oleg Endo <olegendo at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2024-09-20 Ever confirmed|0 |1 --- Comment #2 from Oleg Endo <olegendo at gcc dot gnu.org> --- (In reply to pietro from comment #1) > (In reply to Oleg Endo from comment #0) > > In some cases __builtin_prefetch will be eliminated and thus it can't be > > reliably used for SH4 store queue writes. > > > > An example here: https://godbolt.org/z/TGsed8cnq > > It doesn't get eliminated on SH2A/SH3/SH4 on that example. It does get moved > to before the writes to sq_part between O1 and O2 though: > > https://godbolt.org/z/sP8PPKz3K Thanks! It looks like the problem can be "fixed" by inserting a '__atomic_thread_fence (1);' before the '__builtin_prefetch', which kinda makes sense. So maybe a new SH specific builtin function could be added which would insert the memory fence automatically.