On 14.08.2021 07:49, Dylan Fernando wrote:
On Sat, Aug 14, 2021 at 9:11 AM Timo Rothenpieler <t...@rothenpieler.org> wrote:On 13.08.2021 10:42, Dylan Fernando wrote:Any update on this? Kind Regards, DylanAlso, are you sure that exp() function is correct? The CUDA-Function exp() is defined as "double exp(double x)" and calculates the base e exponential. While __nvvm_ex2_approx_f reads to me like it does so for floats, and for base 2. For which the CUDA equivalent would be "float exp2f(float)". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".I wasn't sure about the exp() function. Is there a function like __nvvm_exp_approx_d? I can't seem to find a function for this.
Looking into it some more, that's simply because there is no other fast approx exp function than ex2.
If I use __expf() with nvcc, it spawns the following code: ld.param.f32 %f1, [param]; mul.f32 %f2, %f1, 0f3FB8AA3B; ex2.approx.f32 %f3, %f2;So it multiplies the input value by some factor, and then runs it through it. Given by math, this value must be log2(euler_constant), or log2(exp(1)), for lack of the constant being defined.
So the implementation of __expf() would look like this:
static inline __device__ float __expf(float a) { return __nvvm_ex2_approx_f(a * (float)__builtin_log2(__builtin_exp(1))); }
With llvm, this now spawns the exact same code: ld.param.f32 %f1, [param]; mul.f32 %f2, %f1, 0f3FB8AA3B; ex2.approx.f32 %f3, %f2;I will push that function soon, so you can just use __expf() in your code. Assuming you want exp to base e.
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".