https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045
--- Comment #25 from Mark Wielaard <mark at gcc dot gnu.org> --- Note comment #16 which explains that valgrind seems to translate this large read into smaller chunks. Which most likely causes memcheck to flag the (last) 8 bytes read as fully invalid. See --partial-loads-ok=<yes|no> [default: yes] Controls how Memcheck handles 32-, 64-, 128- and 256-bit naturally aligned loads from addresses for which some bytes are addressable and others are not. When yes, such loads do not produce an address error. Instead, loaded bytes originating from illegal addresses are marked as uninitialised, and those corresponding to legal addresses are handled in the normal way. When no, loads from partially invalid addresses are treated the same as loads from completely invalid addresses: an illegal-address error is issued, and the resulting bytes are marked as initialised. It would be helpful to see if someone with arm knowledge (and valgrind VEX knowledge) can see if there is a better translation of the vld1 instruction so that it is one big read. That way memcheck at least has a chance of detecting that the part that is invalid isn't actually used. See https://sourceware.org/cgit/valgrind/tree/VEX/priv/guest_arm_toIR.c#n8383 But maybe there is no good/natural translation of these vector loads that would help memcheck see it is a valid read and only the defined bytes are used.