On 30 June 2015 at 00:58, Roland Scheidegger <srol...@vmware.com> wrote: > Don't worry about the AoS stuff. Only meant to do simple things. > > Looks good overall, I guess it makes sense to not split execution too > (so you'd have native hw vector size there), llvm should handle that > pretty well these days (the sse intrinsics won't get used that way > probably (though there's a helper for that too which makes it possible > but it might not be hooked up, but I guess there's not really much need > for them). > > Some comments inline.
I've noticed we have no tests for indirect access to fp64 things, so I'll probably write some first to validate the indirect paths I haven't fixed up yet. >> Two things that don't mix well are SoA and doubles, see >> emit_fetch_double, and emit_store_double_chan in this. >> >> I've also had to split emit_data.chan, to add src_chan, >> which can be different for doubles. >> >> Open issues: >> are intrinsics okay for floor/ceil? > The question is if they actually work if you don't have sse4.1 and don't > just crash (at least I assume with sse4.1 it turns into round > instruction). (Or on non-x86 cpus if there is no direct hw support). If > they don't you'd have to provide your own implementation (at least as a > fallback) or make support for the extension conditional. Otherwise llvm > intrinsics are just fine (traditionally we didn't really use them much > as most of the things we do with sse intrinsics were missing, and even > if some intrinsic existed it often didn't work, but that was a long time > ago - ideally we'd switch to llvm intrinsics where possible). Okay well I'm okay with limiting fp64 to where they work I suppose though that needs testing on older non sse4.1 hw. >> + >> + scalar = LLVMBuildExtractElement(builder, input, si, ""); >> + res = LLVMBuildInsertElement(builder, res, scalar, ii, ""); >> + scalar2 = LLVMBuildExtractElement(builder, input2, si, ""); >> + res = LLVMBuildInsertElement(builder, res, scalar2, ii1, ""); >> + } > Did you check what code this generated? Traditionally, we tried to avoid > the extract/insert stuff where possible and use shuffles instead. > Because llvm would actually do inserts/extracts (i.e. move from simd > domain to integer domain and back, which is pretty horrendous, and > doubly so on some non-intel cpus which have like 15+ cycles latency for > this). It is possible though this is no longer a problem, llvm 3.6 or > 3.7 got some majorly improved shuffle optimizer which might also catch this. No I haven't looked at what it generated, I was pretty sure it was going to be ugly, Oh if I can use shufflevector for this direction I probably will, that make sense. I'm not sure it'll work for the other way, but maybe two shufflevectors will, I hadn't looked into it that much yet. Dave. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev