Jan Vesely <jan.ves...@rutgers.edu> writes: > On Thu, 2018-02-15 at 22:50 -0800, Francisco Jerez wrote: >> Jan Vesely <jan.ves...@rutgers.edu> writes: >> >> > On Thu, 2018-02-15 at 20:36 -0800, Francisco Jerez wrote: >> > > Jan Vesely <jan.ves...@rutgers.edu> writes: >> > > >> > > > On Thu, 2018-02-08 at 15:56 -0800, Francisco Jerez wrote: >> > > > > Jan Vesely <jan.ves...@rutgers.edu> writes: >> > > > > >> > > > > > On Thu, 2018-02-08 at 23:16 +0100, Pierre Moreau wrote: >> > > > > > > (Moving the conversation to its own thread.) >> > > > > > > >> > > > > > > > target agnostic libclc is rather difficult to do. CLC includes >> > > > > > > > 3 levels >> > > > > > > > of precision on float (fp32) operands; full, half, native. The >> > > > > > > > implementation of each depends on capabilites of specific >> > > > > > > > device (e.g. >> > > > > > > > vega(VI+?) can do 1 ULP log2/exp2 in hw, other targets need sw >> > > > > > > > implementation to meet CLC requirement of 3ulp). Any >> > > > > > > > conversion backend >> > > > > > > > would thus need to implement sw versions of math builtins for >> > > > > > > > targets >> > > > > > > > that can't perform the op in HW. >> > > > > > > >> > > > > > > My initial thought for the target agnostic libclc, was to just >> > > > > > > provide some >> > > > > > > (fake?) implementations of OpenCL built-in functions to make >> > > > > > > clang happy and >> > > > > > > let me compile kernels using “get_global_id()”, as well as >> > > > > > > include headers >> > > > > > > defining OpenCL specific types like “float4” or others. If there >> > > > > > > is another >> > > > > > > (better?) way to achieve this, I am all ears. (There is probably >> > > > > > > one, as I had >> > > > > > > no issues when using the Khronos LLVM/clang fork rather than >> > > > > > > Tomeu’s >> > > > > > > out-of-tree module, the former having also some bits and pieces >> > > > > > > in clang.) >> > > > > > >> > > > > > I don't think you need libclc for this. workitem IDs are >> > > > > > platform/device specific, and iiuc SPIR-V builtins should handle >> > > > > > it in >> > > > > > an abstract way [0]. any conversion consuming SPIR-V needs to >> > > > > > replace >> > > > > > those with device/platform specific way of obtaining the >> > > > > > information. >> > > > > > you can also use clang's clc header to provide data types [1]. >> > > > > > >> > > > > > >> > > > > > [0] >> > > > > > https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#B >> > > > > > uiltIn >> > > > > > [1] >> > > > > > https://github.com/llvm-mirror/clang/blob/master/lib/Headers/opencl >> > > > > > -c.h >> > > > > > >> > > > > > > >> > > > > > > > Extending the current libclc to provide target specific SPIR-V >> > > > > > > > binaries >> > > > > > > > in addition to/in place of LLVM IR is rather straightforward. >> > > > > > > > Adding >> > > > > > > > additional targets it's more work since it relies on clang to >> > > > > > > > support >> > > > > > > > those targets. >> > > > > > > >> > > > > > > I’m curious how those target specific SPIR-V binaries would look >> > > > > > > like. I can >> > > > > > > imagine how some functions like “OpSign” could be implemented >> > > > > > > using other >> > > > > > > SPIR-V functions, but how would you handle something like >> > > > > > > “get_local_id()”? If >> > > > > > > you define it as the built-in “LocalInvocationId” and don’t >> > > > > > > supply an >> > > > > > > implementation of it, then you lose the target specificness. On >> > > > > > > the other hand, >> > > > > > > if you want to keep it device-specific, how would you express >> > > > > > > that in SPIR-V? >> > > > > > >> > > > > > getting IDs is not a problem. SPIR-V should provide builtins for >> > > > > > that. >> > > > > > >> > > > > > The problem I had in mind is when SPIR-V binary calls e.g. exp2(). >> > > > > > You >> > > > > > can either assume that the op needs CLC precision (3 ulp), or >> > > > > > device >> > > > > > native precision. >> > > > > >> > > > > That's up to the SPIR-V extended instruction set specification to >> > > > > define >> > > > > what precision the exp2 built-in is supposed to have. >> > > > > >> > > > > > SPIR-V binary can also call exp2(fp64), which does not have an >> > > > > > equivalent GPU instruction. >> > > > > >> > > > > Then it should probably be lowered by the SPIR-V front-end, right? >> > > > >> > > > I'm not sure what you mean by "spir-v frontend". If it's the tool that >> > > > generates SPIR-V, then no, not really. >> > > >> > > No, I meant the SPIR-V front-end of the driver (or whatever translation >> > > pass in control of the driver is translating machine-agnostic SPIR-V >> > > into some other more hardware-specific representation of the program). >> > >> > OK. my question still stands. How does generic SPIR-V based libclc >> > help the process? >> > >> >> That I can think of now, it would remove the need for maintaining any >> target-specific knowledge in libclc, for plumbing target-specific >> information in order to select the right libclc flavour at link time, > > it would only move the specific decisions to SPIR-V lowering time. > I understand the advantage of cross language usefulness, but I'm not > sure how practical it is. Taking the below example of exp2(fp64). CLC > requires precision <= 2ulp, other languages might have different > requirements. Thus to achieve good performance, you'd need to lower to > different routine for each precision requirement. >
Because of SPIR-V extended instruction sets, you do know at SPIR-V translation time what the required precision bound is, so the translation logic can still provide a more optimal implementation for certain APIs -- Or not, in the most common case where the performance difference isn't clear enough to justify separate lowering for each flavour of SPIR-V, doing things at SPIR-V translation time allows you not to bother to write a separate implementation for separate APIs until you have enough evidence that it's helpful. >> and it would allow solving common problems in a place where there is a >> chance that the solution could be shared among different drivers and >> APIs (e.g. the exp2(fp64) lowering example you mentioned earlier is not >> exclusively useful to CL). > > the exp2 example is not something that could be addressed in generic > SPIR-V libclc, since the decision is hw specific. But it can be addressed at SPIR-V translation time with minimal hardware-specific knowledge, and with largely target-independent logic, like the lowering of other fp64 intrinsics is done these days. > Sure we can provide implementation of all CLC builtins using only the > core SPIR-V operations, That's fine for the built-ins that admit a compliant and reasonably efficient implementation in terms of core SPIR-V exclusively, but I wasn't arguing for lowering all of them in terms of core SPIR-V. > but if a SPIR-V input uses clc extended instructions the same > functionality would have to be implemented in SPIR-V lowering anyway, > so it's just simpler to implement libclc as single op wrappers over > CLC extended ops. No objection against implementing libclc as a pile of one-liners written in terms of CLC extended SPIR-V ops -- That's fully target-independent. > Am I missing anything? > > Jan > >> >> > Jan >> > >> > > >> > > > My understanding is that those are run prior to application >> > > > distribution, and therefore have no information about the target HW. >> > > > >> > > > So if a program imports "CLC.std.11" extended instruction set to get >> > > > access CLC builtin functions. What would a generic SPIR-V libclc >> > > > provide? >> > > > >> > > > > >> > > > > > It's easier to translate these to libclc function calls (combined >> > > > > > with >> > > > > > the right library implementation of the exp2 builtin), than try to >> > > > > > generate exp2 algorithm when converting to NIR (or anything else >> > > > > > really). >> > > > > > >> > > > > >> > > > > But the SPIR-V front-end will need to lower that in terms of >> > > > > instructions supported by the back-end anyway in order to be able to >> > > > > handle general SPIR-V shaders as input, right? So why re-implement >> > > > > the >> > > > > lowering for those operations in libclc in a way that's only going >> > > > > to be >> > > > > useful for the OpenCL C language but not for other APIs? >> > > > > >> > > > > > The current libclc mostly assumes that LLVM ops are done in device >> > > > > > native precision, and provides sw implementation of operations that >> > > > > > don't have conformant device instruction. >> > > > > >> > > > > But I don't think there is any disadvantage from having a libclc >> > > > > implementation that doesn't make any precision assumptions beyond >> > > > > what >> > > > > is stated in the SPIR-V spec. In fact that would have the IMO more >> > > > > desirable advantage that you could re-use one and the same libclc >> > > > > implementation for *all* back-ends that want SPIR-V as input. >> > > > >> > > > Sure, a compiler-rt library would be more useful (usable by multiple >> > > > languages). However, unlike target specific libclc, it's not available >> > > > atm. >> > > > >> > > > Jan >> > > > >> > > > > >> > > > > > This obviates the need for compiler-rt library. And alternative >> > > > > > approach is to assume that the ops provide full precision and use >> > > > > > target intrinsics for native precision. it's still target specific >> > > > > > if >> > > > > > a library call uses the former or the latter. >> > > > > > >> > > > > > regards, >> > > > > > Jan >> > > > > > >> > > > > > > >> > > > > > > Regards, >> > > > > > > Pierre >> > > > > >> > > > > _______________________________________________ >> > > > > mesa-dev mailing list >> > > > > mesa-dev@lists.freedesktop.org >> > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > > > >> > > > -- >> > > > Jan Vesely <jan.ves...@rutgers.edu> >> > >> > -- >> > Jan Vesely <jan.ves...@rutgers.edu>
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev