https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70998
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
sse2_cvtsd2ss<round_name> pattern is wrong.
This pattern is written as:
(define_insn "sse2_cvtsd2ss<round_name>"
[(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
(vec_merge:V4SF
(vec_duplicate:V4SF
(float_truncate:V2SF
(match_operand:V2DF 2 "nonimmediate_operand"
"x,m,<round_constraint>")))
(match_operand:V4SF 1 "register_operand" "0,0,v")
(const_int 1)))]
This implies V2DF load from memory, which is not the case.
The pattern should be similar to e.g. cvtsi2ss pattern:
(define_insn "sse_cvtsi2ss<round_name>"
[(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
(vec_merge:V4SF
(vec_duplicate:V4SF
(float:SF (match_operand:SI 2 "<round_nimm_scalar_predicate>"
"r,m,<round_constraint3>")))
(match_operand:V4SF 1 "register_operand" "0,0,v")
(const_int 1)))]
This is correct scalar memory load.