On Fri, Jul 22, 2016 at 7:06 AM Simon Pilgrim via cfe-commits < cfe-commits@lists.llvm.org> wrote:
> Author: rksimon > Date: Fri Jul 22 08:58:56 2016 > New Revision: 276417 > > URL: http://llvm.org/viewvc/llvm-project?rev=276417&view=rev > Log: > [X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 > with generic IR > > As discussed on D22460, I've updated the vbroadcastf128 pd256/ps256 > builtins to map directly to generic IR - load+splat a 128-bit vector to > both lanes of a 256-bit vector. > > Fix for PR28657. > > Modified: > cfe/trunk/lib/CodeGen/CGBuiltin.cpp > cfe/trunk/test/CodeGen/avx-builtins.c > > Modified: cfe/trunk/lib/CodeGen/CGBuiltin.cpp > URL: > http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGBuiltin.cpp?rev=276417&r1=276416&r2=276417&view=diff > > ============================================================================== > --- cfe/trunk/lib/CodeGen/CGBuiltin.cpp (original) > +++ cfe/trunk/lib/CodeGen/CGBuiltin.cpp Fri Jul 22 08:58:56 2016 > @@ -6619,6 +6619,26 @@ static Value *EmitX86MaskedLoad(CodeGenF > return CGF.Builder.CreateMaskedLoad(Ops[0], Align, MaskVec, Ops[1]); > } > > +static Value *EmitX86SubVectorBroadcast(CodeGenFunction &CGF, > + SmallVectorImpl<Value *> &Ops, > + llvm::Type *DstTy, > + unsigned SrcSizeInBits, > + unsigned Align) { > + // Load the subvector. > + Ops[0] = CGF.Builder.CreateAlignedLoad(Ops[0], Align); > + > + // Create broadcast mask. > + unsigned NumDstElts = DstTy->getVectorNumElements(); > + unsigned NumSrcElts = SrcSizeInBits / DstTy->getScalarSizeInBits(); > + > + SmallVector<uint32_t, 8> Mask; > + for (unsigned i = 0; i != NumDstElts; i += NumSrcElts) > + for (unsigned j = 0; j != NumSrcElts; ++j) > + Mask.push_back(j); > + > + return CGF.Builder.CreateShuffleVector(Ops[0], Ops[0], Mask, > "subvecbcst"); > +} > + > static Value *EmitX86Select(CodeGenFunction &CGF, > Value *Mask, Value *Op0, Value *Op1) { > > @@ -6995,6 +7015,13 @@ Value *CodeGenFunction::EmitX86BuiltinEx > > getContext().getTypeAlignInChars(E->getArg(1)->getType()).getQuantity(); > return EmitX86MaskedLoad(*this, Ops, Align); > } > + > + case X86::BI__builtin_ia32_vbroadcastf128_pd256: > + case X86::BI__builtin_ia32_vbroadcastf128_ps256: { > + llvm::Type *DstTy = ConvertType(E->getType()); > + return EmitX86SubVectorBroadcast(*this, Ops, DstTy, 128, 16); > Somewhat to my surprise, after a bunch of debugging, we found a bug in this line. See my fix in r278202. I wanted to mention it here in case others bisect back to this and wonder. And because frankly, I would never have thought of this. The broadcast instructions, even when taking a 128-bit input, don't have an alignment requirement here. Paint me surprised. Anyways, just FYI and in case you want to double check my fix.
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits