Your message dated Tue, 18 Nov 2014 15:20:58 +0000
with message-id <[email protected]>
and subject line Bug#768090: fixed in beignet 1.0.0-1
has caused the Debian Bug report #768090,
regarding beignet: pow(n), erf(c), tgamma give wrong results
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
768090: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768090
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: beignet
Version: 0.9.3~dfsg-1
Severity: serious
Justification: a math library should NOT silently give wrong answers
Control: found -1 0.8-1.1
Control: tags -1 upstream patch

In current beignet (0.8, 0.9.3 and upstream-HEAD):
-pow/pown ignore the sign of their first argument (e.g. pow(-2,3) gives 8 instead of -8) -erf/erfc diverge (instead of converging to 1 or 0) for arguments above about 2 -tgamma is actually lgamma, a related but very different function (and the test suite doesn't notice because it checks against glibc's gammaf, which is also lgamma, instead of tgammaf)
#!/usr/bin/env python3
#Depends: python3-pyopencl python3-scipy
import pyopencl
import pyopencl.array
import numpy as np
import scipy.special as spfn
import time
ctx=pyopencl.create_some_context()
cq=pyopencl.CommandQueue(ctx)
a1=np.array(-0.95*np.random.rand(1e6)-0.01,dtype=np.float32)
a2=-1000*a1
a3=20*(a1+0.5)
ai=np.array(a3,dtype=np.int32)
aCL1=pyopencl.array.to_device(cq,a1)
aCL2=pyopencl.array.to_device(cq,a2)
aCL3=pyopencl.array.to_device(cq,a3)
aCLi=pyopencl.array.to_device(cq,ai)
bCL=aCL1+1
c=np.array(range(20),dtype=np.float32)/3-2
ci=np.array(c,dtype=np.int32)
cCL=pyopencl.array.to_device(cq,c)
cCLi=pyopencl.array.to_device(cq,ci)
dCL=cCL+1
def approx_erfc(x):
    p  =  0.3275911
    a1 =  0.254829592
    a2 = -0.284496736
    a3 =  1.421413741
    a4 = -1.453152027
    a5 =  1.061405429
    d = np.exp(-x*x)
    t = 1/(1+p*np.abs(x))
    r = (a1*t+a2*t*t+a3*t*t*t+a4*t*t*t*t+a5*t*t*t*t*t)*d
    return r*np.sign(x)+1-np.sign(x)
erfc_err=np.abs(spfn.erfc(a3)-approx_erfc(a3))
print("x:",c,"\n",ci)
print("erfc:",np.max(erfc_err),np.mean(erfc_err))
f_to_test=[("cos",np.cos),("sin",np.sin),("tan",np.tan),("cosh",np.cosh),("cospi",lambda x:np.cos(np.pi*x)),("tanh",np.tanh),("exp",np.exp),("log",np.log),("sqrt",np.sqrt),("acos",np.arccos),("acosh",np.arccosh),("pow(a[i],c[i])",lambda x,y:x**y),("pown(a[i],d[i])",lambda x,y:x**y),("tgamma",spfn.gamma),("erf",spfn.erf),("erfc",spfn.erfc)]
for f in f_to_test:
    for aCL in aCL1,aCL2,aCL3:
        fCL=pyopencl.elementwise.ElementwiseKernel(ctx,"float *a,float *b,float *c,int *d","b[i]="+f[0]+("" if f[0] in ("pow(a[i],c[i])","pown(a[i],d[i])","powr(a[i],c[i])") else "(a[i])"))
        t0=time.time()
        fCL(aCL,bCL,aCL3,aCLi).wait()
        t=time.time()-t0
        b=bCL.get()
        if f[0] in ("pow(a[i],c[i])","powr(a[i],c[i])"):
            b0=f[1](aCL.get(),a3)
        elif f[0] in ("pown(a[i],d[i])",):
            b0=f[1](aCL.get(),ai)
        else:
            b0=f[1](aCL.get())
        abserr=np.abs(b-b0)
        relerr=np.abs(b/b0-1)
        print(f[0],"abs avg err:",np.nanmean(abserr)," max err:",np.max(abserr),"rel avg err:",np.nanmean(relerr)," max err:",np.max(relerr)," time:",t)
    if f[0] in ("erf","erfc","tgamma","pown(a[i],d[i])","pow(a[i],c[i])"):
        fCL(cCL,dCL,cCL,cCLi).wait()
        if f[0] in ("pow(a[i],c[i])","powr(a[i],c[i])"):
            d0=f[1](c,c)
        elif f[0] in ("pown(a[i],d[i])",):
            d0=f[1](c,ci)
        else:
            d0=f[1](c)
        print(dCL.get(),"\n",d0)

Description: <short summary of the patch>
 TODO: Put a short summary on the line above and replace this paragraph
 with a longer explanation of this change. Complete the meta-information
 with other relevant fields (see below for details). To make it easier, the
 information below has been extracted from the changelog. Adjust it or drop
 it.
 .
 beignet (0.9.3~dfsg-2) UNRELEASED; urgency=medium
 .
   * Fix tgamma,pow,pown,erf,erfc
   * Enable debug output in tests
Author: Rebecca N. Palmer <rnpalmer@rnpalmer-laptop>

---
The information above should follow the Patch Tagging Guidelines, please
checkout http://dep.debian.net/deps/dep3/ to learn about the format. Here
are templates for supplementary fields that you might want to add:

Origin: <vendor|upstream|other>, <url of original patch>
Bug: <url in upstream bugtracker>
Bug-Debian: https://bugs.debian.org/<bugnumber>
Bug-Ubuntu: https://launchpad.net/bugs/<bugnumber>
Forwarded: <no|not-needed|url proving that it has been forwarded>
Reviewed-By: <name and email of someone who approved the patch>
Last-Update: <YYYY-MM-DD>

--- beignet-0.9.3~dfsg.orig/utests/builtin_acos_asin.cpp
+++ beignet-0.9.3~dfsg/utests/builtin_acos_asin.cpp
@@ -2,12 +2,13 @@
 #include <cmath>
 #include <algorithm>
 
-#define udebug 0
+#define udebug 1
 #define printf_c(...) \
 {\
   printf("\033[1m\033[40;31m");\
   printf( __VA_ARGS__ );\
   printf("\033[0m");\
+  status = 1;\
 }
 
 const float input_data[] = {-30, -1, -0.92, -0.5, -0.09, 0, 0.09, 0.5, 0.92, 1, 30};
@@ -29,6 +30,7 @@ static void builtin_acos_asin(void)
 {
   // Setup kernel and buffers
   int k, i, index_cur;
+  int status = 0;
   float gpu_data[max_function * count_input] = {0}, cpu_data[max_function * count_input] = {0};
 
   OCL_CREATE_KERNEL("builtin_acos_asin");
@@ -82,6 +84,7 @@ static void builtin_acos_asin(void)
 #endif
     }
   }
+  OCL_ASSERT(status == 0);
 }
 
 MAKE_UTEST_FROM_FUNCTION(builtin_acos_asin)
--- beignet-0.9.3~dfsg.orig/utests/builtin_pow.cpp
+++ beignet-0.9.3~dfsg/utests/builtin_pow.cpp
@@ -2,12 +2,13 @@
 #include <cmath>
 #include <algorithm>
 
-#define udebug 0
+#define udebug 1
 #define printf_c(...) \
 {\
   printf("\033[1m\033[40;31m");\
   printf( __VA_ARGS__ );\
   printf("\033[0m");\
+  status = 1;\
 }
 const float ori_data[] = {-20.5, -1, -0.9, -0.01, 0, 0.01, 0.9, 1.0, 20.5};
 const int count_input_ori = sizeof(ori_data) / sizeof(ori_data[0]);
@@ -27,6 +28,7 @@ static void builtin_pow(void)
 {
   // Setup kernel and buffers
   int k, i, index_cur;
+  int status = 0;
   float gpu_data[max_function * count_input] = {0}, cpu_data[max_function * count_input] = {0};
 
   for(i=0; i<count_input_ori;i++)
@@ -87,6 +89,7 @@ static void builtin_pow(void)
 #endif
     }
   }
+  OCL_ASSERT(status == 0);
 }
 
 MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(builtin_pow)
--- beignet-0.9.3~dfsg.orig/utests/utest_generator.py
+++ beignet-0.9.3~dfsg/utests/utest_generator.py
@@ -73,6 +73,7 @@ def udebug(ulpSize,returnType):
       }
 #endif
   }
+  OCL_ASSERTM(status == 0, log);
 }\n'''%(returnType,\
         ulpUnit(ulpSize),ulpNum(ulpSize),\
         ulpNum(ulpSize), ulpNum(ulpSize),\
@@ -139,7 +140,7 @@ which can print more values and informat
 #include <algorithm>
 #include <string.h>
 
-#define udebug 0
+#define udebug 1
 #define FLT_MAX 0x1.fffffep127f
 #define FLT_MIN 0x1.0p-126f
 #define INT_ULP 0
@@ -149,6 +150,7 @@ which can print more values and informat
   printf("\\033[1m\\033[40;31m");\\
   printf( __VA_ARGS__ );\\
   printf("\\033[0m");\\
+  status = 1;\\
 }
 '''
     #########Execute class itself
@@ -248,6 +250,7 @@ which can print more values and informat
 static void %s_%s(void)
 {
   int index;
+  int status = 0;
   %s gpu_data[count_input] = {0}, cpu_data[count_input] = {0}, diff=0.0;
   char log[1024] = {0};
 
Description: <short summary of the patch>
 TODO: Put a short summary on the line above and replace this paragraph
 with a longer explanation of this change. Complete the meta-information
 with other relevant fields (see below for details). To make it easier, the
 information below has been extracted from the changelog. Adjust it or drop
 it.
 .
 beignet (0.9.3~dfsg-2) UNRELEASED; urgency=medium
 .
   * Fix tgamma,pow,pown,erf,erfc
   * Enable debug output in tests
Author: Rebecca N. Palmer <rnpalmer@rnpalmer-laptop>

---
The information above should follow the Patch Tagging Guidelines, please
checkout http://dep.debian.net/deps/dep3/ to learn about the format. Here
are templates for supplementary fields that you might want to add:

Origin: <vendor|upstream|other>, <url of original patch>
Bug: <url in upstream bugtracker>
Bug-Debian: https://bugs.debian.org/<bugnumber>
Bug-Ubuntu: https://launchpad.net/bugs/<bugnumber>
Forwarded: <no|not-needed|url proving that it has been forwarded>
Reviewed-By: <name and email of someone who approved the patch>
Last-Update: <YYYY-MM-DD>

--- beignet-0.9.3~dfsg.orig/backend/src/builtin_vector_proto.def
+++ beignet-0.9.3~dfsg/backend/src/builtin_vector_proto.def
@@ -94,8 +94,7 @@ floatn pown (floatn x, intn y)
 float pown (float x, int y)
 doublen pown (doublen x, intn y)
 double pown (double x, int y)
-#XXX we define powr as pow
-#gentype powr (gentype x, gentype y)
+gentype powr (gentype x, gentype y)
 gentype remainder (gentype x, gentype y)
 floatn remquo (floatn x, floatn y, __global intn *quo)
 floatn remquo (floatn x, floatn y, __local intn *quo)
--- beignet-0.9.3~dfsg.orig/backend/src/ocl_stdlib.tmpl.h
+++ beignet-0.9.3~dfsg/backend/src/ocl_stdlib.tmpl.h
@@ -1565,189 +1565,6 @@ INLINE_OVERLOADABLE float native_log2(fl
 INLINE_OVERLOADABLE float native_log(float x) {
   return native_log2(x) * 0.6931472002f;
 }
-INLINE_OVERLOADABLE float tgamma(float x) {
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-  float pi = 3.1415927410e+00,
-    a0 = 7.7215664089e-02,
-    a1 = 3.2246702909e-01,
-    a2 = 6.7352302372e-02,
-    a3 = 2.0580807701e-02,
-    a4 = 7.3855509982e-03,
-    a5 = 2.8905137442e-03,
-    a6 = 1.1927076848e-03,
-    a7 = 5.1006977446e-04,
-    a8 = 2.2086278477e-04,
-    a9 = 1.0801156895e-04,
-    a10 = 2.5214456400e-05,
-    a11 = 4.4864096708e-05,
-    tc = 1.4616321325e+00,
-    tf = -1.2148628384e-01,
-    tt = 6.6971006518e-09,
-    t0 = 4.8383611441e-01,
-    t1 = -1.4758771658e-01,
-    t2 = 6.4624942839e-02,
-    t3 = -3.2788541168e-02,
-    t4 = 1.7970675603e-02,
-    t5 = -1.0314224288e-02,
-    t6 = 6.1005386524e-03,
-    t7 = -3.6845202558e-03,
-    t8 = 2.2596477065e-03,
-    t9 = -1.4034647029e-03,
-    t10 = 8.8108185446e-04,
-    t11 = -5.3859531181e-04,
-    t12 = 3.1563205994e-04,
-    t13 = -3.1275415677e-04,
-    t14 = 3.3552918467e-04,
-    u0 = -7.7215664089e-02,
-    u1 = 6.3282704353e-01,
-    u2 = 1.4549225569e+00,
-    u3 = 9.7771751881e-01,
-    u4 = 2.2896373272e-01,
-    u5 = 1.3381091878e-02,
-    v1 = 2.4559779167e+00,
-    v2 = 2.1284897327e+00,
-    v3 = 7.6928514242e-01,
-    v4 = 1.0422264785e-01,
-    v5 = 3.2170924824e-03,
-    s0 = -7.7215664089e-02,
-    s1 = 2.1498242021e-01,
-    s2 = 3.2577878237e-01,
-    s3 = 1.4635047317e-01,
-    s4 = 2.6642270386e-02,
-    s5 = 1.8402845599e-03,
-    s6 = 3.1947532989e-05,
-    r1 = 1.3920053244e+00,
-    r2 = 7.2193557024e-01,
-    r3 = 1.7193385959e-01,
-    r4 = 1.8645919859e-02,
-    r5 = 7.7794247773e-04,
-    r6 = 7.3266842264e-06,
-    w0 = 4.1893854737e-01,
-    w1 = 8.3333335817e-02,
-    w2 = -2.7777778450e-03,
-    w3 = 7.9365057172e-04,
-    w4 = -5.9518753551e-04,
-    w5 = 8.3633989561e-04,
-    w6 = -1.6309292987e-03;
-  float t, y, z, nadj, p, p1, p2, p3, q, r, w;
-  int i, hx, ix;
-  nadj = 0;
-  hx = *(int *) (&x);
-  ix = hx & 0x7fffffff;
-  if (ix >= 0x7f800000)
-    return x * x;
-  if (ix == 0)
-    return INFINITY;
-  if (ix < 0x1c800000) {
-    if (hx < 0) {
-      return - native_log(-x);
-    } else
-      return - native_log(x);
-  }
-  if (hx < 0) {
-    if (ix >= 0x4b000000)
-      return INFINITY;
-    t = __gen_ocl_internal_sinpi(x);
-    if (__gen_ocl_fabs(t) < 1e-8f)
-      return INFINITY;
-    nadj = native_log(M_PI_F / __gen_ocl_fabs(t * x));
-    x = -x;
-  }
-
-  if (ix == 0x3f800000 || ix == 0x40000000)
-    r = 0;
-  else if (ix < 0x40000000) {
-    if (ix <= 0x3f666666) {
-      r = - native_log(x);
-      if (ix >= 0x3f3b4a20) {
-        y = 1 - x;
-        i = 0;
-      } else if (ix >= 0x3e6d3308) {
-        y = x - (tc - 1);
-        i = 1;
-      } else {
-        y = x;
-        i = 2;
-      }
-    } else {
-      r = 0;
-      if (ix >= 0x3fdda618) {
-        y = 2 - x;
-        i = 0;
-      } else if (ix >= 0x3F9da620) {
-        y = x - tc;
-        i = 1;
-      } else {
-        y = x - 1;
-        i = 2;
-      }
-    }
-    switch (i) {
-    case 0:
-      z = y * y;
-      p1 = a0 + z * (a2 + z * (a4 + z * (a6 + z * (a8 + z * a10))));
-      p2 = z * (a1 + z * (a3 + z * (a5 + z * (a7 + z * (a9 + z * a11)))));
-      p = y * p1 + p2;
-      r += (p - .5f * y);
-      break;
-    case 1:
-      z = y * y;
-      w = z * y;
-      p1 = t0 + w * (t3 + w * (t6 + w * (t9 + w * t12)));
-      p2 = t1 + w * (t4 + w * (t7 + w * (t10 + w * t13)));
-      p3 = t2 + w * (t5 + w * (t8 + w * (t11 + w * t14)));
-      p = z * p1 - (tt - w * (p2 + y * p3));
-      r += (tf + p);
-      break;
-    case 2:
-      p1 = y * (u0 + y * (u1 + y * (u2 + y * (u3 + y * (u4 + y * u5)))));
-      p2 = 1 + y * (v1 + y * (v2 + y * (v3 + y * (v4 + y * v5))));
-      r += (-.5f * y + p1 / p2);
-    }
-  } else if (ix < 0x41000000) {
-    i = x;
-    t = 0;
-    y = x - i;
-    p = y*(s0+y*(s1+y*(s2+y*(s3+y*(s4+y*(s5+y*s6))))));
-    q = 1 + y * (r1 + y * (r2 + y * (r3 + y * (r4 + y * (r5 + y * r6)))));
-    r = .5f * y + p / q;
-    z = 1;
-    switch (i) {
-    case 7:
-      z *= (y + 6.f);
-    case 6:
-      z *= (y + 5.f);
-    case 5:
-      z *= (y + 4.f);
-    case 4:
-      z *= (y + 3.f);
-    case 3:
-      z *= (y + 2.f);
-      r += native_log(z);
-      break;
-    }
-  } else if (ix < 0x5c800000) {
-    t = native_log(x);
-    z = 1 / x;
-    y = z * z;
-    w = w0 + z * (w1 + y * (w2 + y * (w3 + y * (w4 + y * (w5 + y * w6)))));
-    r = (x - .5f) * (t - 1) + w;
-  } else
-    r = x * (native_log(x) - 1);
-  if (hx < 0)
-    r = nadj - r;
-  return r;
-}
-
 INLINE_OVERLOADABLE float lgamma(float x) {
 /*
  * ====================================================
@@ -2501,13 +2318,6 @@ INLINE_OVERLOADABLE float __gen_ocl_inte
 INLINE_OVERLOADABLE float __gen_ocl_internal_atanpi(float x) {
   return __gen_ocl_internal_atan(x) / M_PI_F;
 }
-INLINE_OVERLOADABLE float __gen_ocl_internal_erf(float x) {
-  return M_2_SQRTPI_F * (x - __gen_ocl_pow(x, 3) / 3 + __gen_ocl_pow(x, 5) / 10 - __gen_ocl_pow(x, 7) / 42 + __gen_ocl_pow(x, 9) / 216);
-}
-INLINE_OVERLOADABLE float __gen_ocl_internal_erfc(float x) {
-  return 1 - __gen_ocl_internal_erf(x);
-}
-
 // XXX work-around PTX profile
 #define sqrt native_sqrt
 INLINE_OVERLOADABLE float rsqrt(float x) { return native_rsqrt(x); }
@@ -2726,6 +2536,295 @@ INLINE_OVERLOADABLE float __gen_ocl_inte
     return y*twom100;
   }
 }
+INLINE_OVERLOADABLE float tgamma(float x) {
+  float y;
+  int s;
+  y=lgamma_r(x,&s);
+  return __gen_ocl_internal_exp(y)*s;
+}
+
+/* erf,erfc from glibc s_erff.c -- float version of s_erf.c.
+ * Conversion to float by Ian Lance Taylor, Cygnus Support, [email protected].
+ */
+
+/*
+ * ====================================================
+ * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
+ *
+ * Developed at SunPro, a Sun Microsystems, Inc. business.
+ * Permission to use, copy, modify, and distribute this
+ * software is freely granted, provided that this notice
+ * is preserved.
+ * ====================================================
+ */
+
+INLINE_OVERLOADABLE float __gen_ocl_internal_erf(float x) {
+/*...*/
+const float
+tiny = 1.0e-30,
+half_val=  5.0000000000e-01, /* 0x3F000000 */
+one =  1.0000000000e+00, /* 0x3F800000 */
+two =  2.0000000000e+00, /* 0x40000000 */
+	/* c = (subfloat)0.84506291151 */
+erx =  8.4506291151e-01, /* 0x3f58560b */
+/*
+ * Coefficients for approximation to  erf on [0,0.84375]
+ */
+efx =  1.2837916613e-01, /* 0x3e0375d4 */
+efx8=  1.0270333290e+00, /* 0x3f8375d4 */
+pp0  =  1.2837916613e-01, /* 0x3e0375d4 */
+pp1  = -3.2504209876e-01, /* 0xbea66beb */
+pp2  = -2.8481749818e-02, /* 0xbce9528f */
+pp3  = -5.7702702470e-03, /* 0xbbbd1489 */
+pp4  = -2.3763017452e-05, /* 0xb7c756b1 */
+qq1  =  3.9791721106e-01, /* 0x3ecbbbce */
+qq2  =  6.5022252500e-02, /* 0x3d852a63 */
+qq3  =  5.0813062117e-03, /* 0x3ba68116 */
+qq4  =  1.3249473704e-04, /* 0x390aee49 */
+qq5  = -3.9602282413e-06, /* 0xb684e21a */
+/*
+ * Coefficients for approximation to  erf  in [0.84375,1.25]
+ */
+pa0  = -2.3621185683e-03, /* 0xbb1acdc6 */
+pa1  =  4.1485610604e-01, /* 0x3ed46805 */
+pa2  = -3.7220788002e-01, /* 0xbebe9208 */
+pa3  =  3.1834661961e-01, /* 0x3ea2fe54 */
+pa4  = -1.1089469492e-01, /* 0xbde31cc2 */
+pa5  =  3.5478305072e-02, /* 0x3d1151b3 */
+pa6  = -2.1663755178e-03, /* 0xbb0df9c0 */
+qa1  =  1.0642088205e-01, /* 0x3dd9f331 */
+qa2  =  5.4039794207e-01, /* 0x3f0a5785 */
+qa3  =  7.1828655899e-02, /* 0x3d931ae7 */
+qa4  =  1.2617121637e-01, /* 0x3e013307 */
+qa5  =  1.3637083583e-02, /* 0x3c5f6e13 */
+qa6  =  1.1984500103e-02, /* 0x3c445aa3 */
+ /*
+ * Coefficients for approximation to  erfc in [1.25,1/0.35]
+ */ra0  = -9.8649440333e-03, /* 0xbc21a093 */
+ra1  = -6.9385856390e-01, /* 0xbf31a0b7 */
+ra2  = -1.0558626175e+01, /* 0xc128f022 */
+ra3  = -6.2375331879e+01, /* 0xc2798057 */
+ra4  = -1.6239666748e+02, /* 0xc322658c */
+ra5  = -1.8460508728e+02, /* 0xc3389ae7 */
+ra6  = -8.1287437439e+01, /* 0xc2a2932b */
+ra7  = -9.8143291473e+00, /* 0xc11d077e */
+sa1  =  1.9651271820e+01, /* 0x419d35ce */
+sa2  =  1.3765776062e+02, /* 0x4309a863 */
+sa3  =  4.3456588745e+02, /* 0x43d9486f */
+sa4  =  6.4538726807e+02, /* 0x442158c9 */
+sa5  =  4.2900814819e+02, /* 0x43d6810b */
+sa6  =  1.0863500214e+02, /* 0x42d9451f */
+sa7  =  6.5702495575e+00, /* 0x40d23f7c */
+sa8  = -6.0424413532e-02, /* 0xbd777f97 */
+/*
+ * Coefficients for approximation to  erfc in [1/.35,28]
+ */
+rb0  = -9.8649431020e-03, /* 0xbc21a092 */
+rb1  = -7.9928326607e-01, /* 0xbf4c9dd4 */
+rb2  = -1.7757955551e+01, /* 0xc18e104b */
+rb3  = -1.6063638306e+02, /* 0xc320a2ea */
+rb4  = -6.3756646729e+02, /* 0xc41f6441 */
+rb5  = -1.0250950928e+03, /* 0xc480230b */
+rb6  = -4.8351919556e+02, /* 0xc3f1c275 */
+sb1  =  3.0338060379e+01, /* 0x41f2b459 */
+sb2  =  3.2579251099e+02, /* 0x43a2e571 */
+sb3  =  1.5367296143e+03, /* 0x44c01759 */
+sb4  =  3.1998581543e+03, /* 0x4547fdbb */
+sb5  =  2.5530502930e+03, /* 0x451f90ce */
+sb6  =  4.7452853394e+02, /* 0x43ed43a7 */
+sb7  = -2.2440952301e+01; /* 0xc1b38712 */
+
+	int hx,ix,i;
+	float R,S,P,Q,s,y,z,r;
+	GEN_OCL_GET_FLOAT_WORD(hx,x);
+	ix = hx&0x7fffffff;
+	if(ix>=0x7f800000) {		/* erf(nan)=nan */
+	    i = ((unsigned int)hx>>31)<<1;
+	    return (float)(1-i)+one/x;	/* erf(+-inf)=+-1 */
+	}
+
+	if(ix < 0x3f580000) {		/* |x|<0.84375 */
+	    if(ix < 0x31800000) { 	/* |x|<2**-28 */
+	        if (ix < 0x04000000)
+		    /*avoid underflow */
+		    return (float)0.125*((float)8.0*x+efx8*x);
+		return x + efx*x;
+	    }
+	    z = x*x;
+	    r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4)));
+	    s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5))));
+	    y = r/s;
+	    return x + x*y;
+	}
+	if(ix < 0x3fa00000) {		/* 0.84375 <= |x| < 1.25 */
+	    s = __gen_ocl_internal_fabs(x)-one;
+	    P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6)))));
+	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6)))));
+	    if(hx>=0) return erx + P/Q; else return -erx - P/Q;
+	}
+	if (ix >= 0x40c00000) {		/* inf>|x|>=6 */
+	    if(hx>=0) return one-tiny; else return tiny-one;
+	}
+	x = __gen_ocl_internal_fabs(x);
+ 	s = one/(x*x);
+	if(ix< 0x4036DB6E) {	/* |x| < 1/0.35 */
+	    R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(
+				ra5+s*(ra6+s*ra7))))));
+	    S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(
+				sa5+s*(sa6+s*(sa7+s*sa8)))))));
+	} else {	/* |x| >= 1/0.35 */
+	    R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(
+				rb5+s*rb6)))));
+	    S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(
+				sb5+s*(sb6+s*sb7))))));
+	}
+	GEN_OCL_GET_FLOAT_WORD(ix,x);
+	GEN_OCL_SET_FLOAT_WORD(z,ix&0xfffff000);
+	r  =  __gen_ocl_internal_exp(-z*z-(float)0.5625)*__gen_ocl_internal_exp((z-x)*(z+x)+R/S);
+	if(hx>=0) return one-r/x; else return  r/x-one;
+}
+INLINE_OVERLOADABLE float __gen_ocl_internal_erfc(float x) {
+/*...*/
+const float
+tiny = 1.0e-30,
+half_val=  5.0000000000e-01, /* 0x3F000000 */
+one =  1.0000000000e+00, /* 0x3F800000 */
+two =  2.0000000000e+00, /* 0x40000000 */
+	/* c = (subfloat)0.84506291151 */
+erx =  8.4506291151e-01, /* 0x3f58560b */
+/*
+ * Coefficients for approximation to  erf on [0,0.84375]
+ */
+efx =  1.2837916613e-01, /* 0x3e0375d4 */
+efx8=  1.0270333290e+00, /* 0x3f8375d4 */
+pp0  =  1.2837916613e-01, /* 0x3e0375d4 */
+pp1  = -3.2504209876e-01, /* 0xbea66beb */
+pp2  = -2.8481749818e-02, /* 0xbce9528f */
+pp3  = -5.7702702470e-03, /* 0xbbbd1489 */
+pp4  = -2.3763017452e-05, /* 0xb7c756b1 */
+qq1  =  3.9791721106e-01, /* 0x3ecbbbce */
+qq2  =  6.5022252500e-02, /* 0x3d852a63 */
+qq3  =  5.0813062117e-03, /* 0x3ba68116 */
+qq4  =  1.3249473704e-04, /* 0x390aee49 */
+qq5  = -3.9602282413e-06, /* 0xb684e21a */
+/*
+ * Coefficients for approximation to  erf  in [0.84375,1.25]
+ */
+pa0  = -2.3621185683e-03, /* 0xbb1acdc6 */
+pa1  =  4.1485610604e-01, /* 0x3ed46805 */
+pa2  = -3.7220788002e-01, /* 0xbebe9208 */
+pa3  =  3.1834661961e-01, /* 0x3ea2fe54 */
+pa4  = -1.1089469492e-01, /* 0xbde31cc2 */
+pa5  =  3.5478305072e-02, /* 0x3d1151b3 */
+pa6  = -2.1663755178e-03, /* 0xbb0df9c0 */
+qa1  =  1.0642088205e-01, /* 0x3dd9f331 */
+qa2  =  5.4039794207e-01, /* 0x3f0a5785 */
+qa3  =  7.1828655899e-02, /* 0x3d931ae7 */
+qa4  =  1.2617121637e-01, /* 0x3e013307 */
+qa5  =  1.3637083583e-02, /* 0x3c5f6e13 */
+qa6  =  1.1984500103e-02, /* 0x3c445aa3 */
+ /*
+ * Coefficients for approximation to  erfc in [1.25,1/0.35]
+ */ra0  = -9.8649440333e-03, /* 0xbc21a093 */
+ra1  = -6.9385856390e-01, /* 0xbf31a0b7 */
+ra2  = -1.0558626175e+01, /* 0xc128f022 */
+ra3  = -6.2375331879e+01, /* 0xc2798057 */
+ra4  = -1.6239666748e+02, /* 0xc322658c */
+ra5  = -1.8460508728e+02, /* 0xc3389ae7 */
+ra6  = -8.1287437439e+01, /* 0xc2a2932b */
+ra7  = -9.8143291473e+00, /* 0xc11d077e */
+sa1  =  1.9651271820e+01, /* 0x419d35ce */
+sa2  =  1.3765776062e+02, /* 0x4309a863 */
+sa3  =  4.3456588745e+02, /* 0x43d9486f */
+sa4  =  6.4538726807e+02, /* 0x442158c9 */
+sa5  =  4.2900814819e+02, /* 0x43d6810b */
+sa6  =  1.0863500214e+02, /* 0x42d9451f */
+sa7  =  6.5702495575e+00, /* 0x40d23f7c */
+sa8  = -6.0424413532e-02, /* 0xbd777f97 */
+/*
+ * Coefficients for approximation to  erfc in [1/.35,28]
+ */
+rb0  = -9.8649431020e-03, /* 0xbc21a092 */
+rb1  = -7.9928326607e-01, /* 0xbf4c9dd4 */
+rb2  = -1.7757955551e+01, /* 0xc18e104b */
+rb3  = -1.6063638306e+02, /* 0xc320a2ea */
+rb4  = -6.3756646729e+02, /* 0xc41f6441 */
+rb5  = -1.0250950928e+03, /* 0xc480230b */
+rb6  = -4.8351919556e+02, /* 0xc3f1c275 */
+sb1  =  3.0338060379e+01, /* 0x41f2b459 */
+sb2  =  3.2579251099e+02, /* 0x43a2e571 */
+sb3  =  1.5367296143e+03, /* 0x44c01759 */
+sb4  =  3.1998581543e+03, /* 0x4547fdbb */
+sb5  =  2.5530502930e+03, /* 0x451f90ce */
+sb6  =  4.7452853394e+02, /* 0x43ed43a7 */
+sb7  = -2.2440952301e+01; /* 0xc1b38712 */
+	int hx,ix;
+	float R,S,P,Q,s,y,z,r;
+	GEN_OCL_GET_FLOAT_WORD(hx,x);
+	ix = hx&0x7fffffff;
+	if(ix>=0x7f800000) {			/* erfc(nan)=nan */
+						/* erfc(+-inf)=0,2 */
+	    return (float)(((unsigned int)hx>>31)<<1)+one/x;
+	}
+
+	if(ix < 0x3f580000) {		/* |x|<0.84375 */
+	    if(ix < 0x23800000)  	/* |x|<2**-56 */
+		return one-x;
+	    z = x*x;
+	    r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4)));
+	    s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5))));
+	    y = r/s;
+	    if(hx < 0x3e800000) {  	/* x<1/4 */
+		return one-(x+x*y);
+	    } else {
+		r = x*y;
+		r += (x-half_val);
+	        return half_val - r ;
+	    }
+	}
+	if(ix < 0x3fa00000) {		/* 0.84375 <= |x| < 1.25 */
+	    s = __gen_ocl_internal_fabs(x)-one;
+	    P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6)))));
+	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6)))));
+	    if(hx>=0) {
+	        z  = one-erx; return z - P/Q;
+	    } else {
+		z = erx+P/Q; return one+z;
+	    }
+	}
+	if (ix < 0x41e00000) {		/* |x|<28 */
+	    x = __gen_ocl_internal_fabs(x);
+ 	    s = one/(x*x);
+	    if(ix< 0x4036DB6D) {	/* |x| < 1/.35 ~ 2.857143*/
+	        R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(
+				ra5+s*(ra6+s*ra7))))));
+	        S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(
+				sa5+s*(sa6+s*(sa7+s*sa8)))))));
+	    } else {			/* |x| >= 1/.35 ~ 2.857143 */
+		if(hx<0&&ix>=0x40c00000) return two-tiny;/* x < -6 */
+	        R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(
+				rb5+s*rb6)))));
+	        S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(
+				sb5+s*(sb6+s*sb7))))));
+	    }
+	    GEN_OCL_GET_FLOAT_WORD(ix,x);
+	    GEN_OCL_SET_FLOAT_WORD(z,ix&0xffffe000);
+	    r  =  __gen_ocl_internal_exp(-z*z-(float)0.5625)*
+			__gen_ocl_internal_exp((z-x)*(z+x)+R/S);
+	    if(hx>0) {
+		float ret = r/x;
+		return ret;
+	    } else
+		return two-r/x;
+	} else {
+	    if(hx>0) {
+		return tiny*tiny;
+	    } else
+		return two-tiny;
+	}
+}
+
+
 INLINE_OVERLOADABLE float __gen_ocl_internal_fmod (float x, float y) {
   //return x-y*__gen_ocl_rndz(x/y);
   float one = 1.0;
@@ -3190,7 +3289,6 @@ INLINE_OVERLOADABLE float __gen_ocl_inte
 #define atan2pi __gen_ocl_internal_atan2pi
 #define atanpi __gen_ocl_internal_atanpi
 #define atanh __gen_ocl_internal_atanh
-#define pow powr
 #define cbrt __gen_ocl_internal_cbrt
 #define rint __gen_ocl_internal_rint
 #define copysign __gen_ocl_internal_copysign
@@ -3729,10 +3827,23 @@ INLINE_OVERLOADABLE float remquo(float x
 #undef BODY
 INLINE_OVERLOADABLE float native_divide(float x, float y) { return x/y; }
 INLINE_OVERLOADABLE float pown(float x, int n) {
-  if (x == 0 && n == 0)
-    return 1;
+  if (x == 0.f && n == 0)
+    return 1.f;
+  if (x < 0.f && (n&1) )
+    return -powr(-x, n);
   return powr(x, n);
 }
+INLINE_OVERLOADABLE float pow(float x, float y) {
+  int n;
+  if (x == 0.f && y == 0.f)
+    return 1.f;
+  if (x >= 0.f)
+    return powr(x, y);
+  n = y;
+  if ((float)n == y)//is exact integer
+    return pown(x, n);
+  return NAN;
+}
 
 INLINE_OVERLOADABLE float internal_rootn(float x, int n, const bool isFastpath)
 {
--- beignet-0.9.3~dfsg.orig/utests/builtin_pow.cpp
+++ beignet-0.9.3~dfsg/utests/builtin_pow.cpp
@@ -16,12 +16,12 @@ const int count_input = count_input_ori
 
 float input_data1[count_input];
 float input_data2[count_input];
-const int max_function = 1;
+const int max_function = 2; // builtin_pow.cl has 2 outputs: pow(src1,src2) and src1
 
 static void cpu_compiler_math(const float *src1, const float *src2, float *dst)
 {
   dst[0] = powf(src1[0], src2[0]);
-//  dst[1] = src1[0];
+  dst[1] = src1[0];
 }
 
 static void builtin_pow(void)
@@ -92,4 +92,4 @@ static void builtin_pow(void)
   OCL_ASSERT(status == 0);
 }
 
-MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(builtin_pow)
+MAKE_UTEST_FROM_FUNCTION(builtin_pow)
--- beignet-0.9.3~dfsg.orig/utests/builtin_tgamma.cpp
+++ beignet-0.9.3~dfsg/utests/builtin_tgamma.cpp
@@ -27,7 +27,7 @@ void builtin_tgamma(void)
     OCL_MAP_BUFFER(1);
     float *dst = (float*)buf_data[1];
     for (int i = 0; i < n; ++i) {
-      float cpu = gammaf(src[i]);
+      float cpu = tgammaf(src[i]);
       if (isinf(cpu)) {
         OCL_ASSERT(isinf(dst[i]));
       } else if (fabsf(cpu - dst[i]) >= 1e-3) {
--- beignet-0.9.3~dfsg.orig/utests/utest_math_gen.py
+++ beignet-0.9.3~dfsg/utests/utest_math_gen.py
@@ -216,17 +216,17 @@ static float atanpi(float x){
   cospi_cpu_func=reduce1+cospi
   cospiUtests = func('cospi','cospi',[cospi_input_type],cospi_output_type,[cospi_input_values],'2 * FLT_ULP',cospi_cpu_func)
   
-#  ##### gentype erf(gentype)
-#  erf_input_values = base_input_values
-#  erf_input_type = ['float','float2','float4','float8','float16']
-#  erf_output_type = ['float','float2','float4','float8','float16']
-#  erfUtests = func('erf','erf',[erf_input_type],erf_output_type,[erf_input_values],'16 * FLT_ULP')
+  ##### gentype erf(gentype)
+  erf_input_values = base_input_values
+  erf_input_type = ['float','float2','float4','float8','float16']
+  erf_output_type = ['float','float2','float4','float8','float16']
+  erfUtests = func('erf','erf',[erf_input_type],erf_output_type,[erf_input_values],'16 * FLT_ULP')
 
-#  ##### gentype erfc(gentype)
-#  erfc_input_values = base_input_values
-#  erfc_input_type = ['float','float2','float4','float8','float16']
-#  erfc_output_type = ['float','float2','float4','float8','float16']
-#  erfcUtests = func('erfc','erfc',[erfc_input_type],erfc_output_type,[erfc_input_values],'16 * FLT_ULP')
+  ##### gentype erfc(gentype)
+  erfc_input_values = base_input_values
+  erfc_input_type = ['float','float2','float4','float8','float16']
+  erfc_output_type = ['float','float2','float4','float8','float16']
+  erfcUtests = func('erfc','erfc',[erfc_input_type],erfc_output_type,[erfc_input_values],'16 * FLT_ULP')
   
   ##### gentype exp(gentype x)
   exp_input_values = base_input_values

--- End Message ---
--- Begin Message ---
Source: beignet
Source-Version: 1.0.0-1

We believe that the bug you reported is fixed in the latest version of
beignet, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to [email protected],
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Andreas Beckmann <[email protected]> (supplier of updated beignet package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing [email protected])


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Format: 1.8
Date: Tue, 18 Nov 2014 14:58:00 +0100
Source: beignet
Binary: beignet beignet-dev
Architecture: source
Version: 1.0.0-1
Distribution: experimental
Urgency: medium
Maintainer: Debian OpenCL Maintainers <[email protected]>
Changed-By: Andreas Beckmann <[email protected]>
Description:
 beignet    - OpenCL library for Intel GPUs
 beignet-dev - Intel OpenCL library (development files)
Closes: 768090
Changes:
 beignet (1.0.0-1) experimental; urgency=medium
 .
   [ Andreas Beckmann ]
   * New upstream release.
     - Fix tgamma, pow, pown, erf, erfc  (Closes: #768090)
     - Drop upstream patches.
     - Refresh patches.
   * d/copyright: Upstream switched to a consistent usage of LGPL-2.1+.
     Switch the packaging, too.
   * README.Debian: Document interoperability problem with mesa-opencl-icd.
   * Ship documentation.
   * Upload to experimental.
 .
   [ Rebecca N. Palmer ]
   * Re-enable GBE_DEBUG (lost when debhelper switched to
     None build type)
   * Enable extra debug output in tests
   * Document in the description what hardware this supports.
Checksums-Sha1:
 51b70a329954d502322975adef23e6308238755d 2469 beignet_1.0.0-1.dsc
 a2b0eb53e5f9a6055cd656531532a4c6ae03fbb0 827800 beignet_1.0.0.orig.tar.gz
 81734dbaaa365d78d5247f4a4bda0ad9e92bf763 8816 beignet_1.0.0-1.debian.tar.xz
Checksums-Sha256:
 2f6c63f1445882973be7772b00470c499190a87d408837eef3e8d5ba75a88d96 2469 
beignet_1.0.0-1.dsc
 e30c4d0f4c8917fa0df2467b2d70a4ee524f28d54c42c582262d5f08928ea543 827800 
beignet_1.0.0.orig.tar.gz
 fa4603ca5772c838391997fd589ad86e3550ecd27740f7ad2487fe00b459a629 8816 
beignet_1.0.0-1.debian.tar.xz
Files:
 89f04670b11064b180e256580ddea4f6 2469 libs extra beignet_1.0.0-1.dsc
 bfd755904c332cdd285d6058f5f3de8c 827800 libs extra beignet_1.0.0.orig.tar.gz
 21678f71c716ef0a8033dabf3f290982 8816 libs extra beignet_1.0.0-1.debian.tar.xz

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCAAGBQJUa1S/AAoJEF+zP5NZ6e0Iq+wP/jSYNanJ/luu/nXfIQDxdHdQ
hx1UX0eePew/luWjxlUgFKZ21oPYoZFq9GR2IUC0fDLLTkCJLRUmaMwTKolZeqNe
3GAsUJO6HF6n6vn8kecUli+x9FfreAsXCf7X2AI/dnvpKEBni012QINQm1tfUeJW
Jk4tiB+U+7dOzPsdgkjMKq96/rXv/JZPwZ6mFDcH+5Pg7uRgVrIso7PhDuKfAViZ
SQBYKeJO6CEUGrg0k0l3lW/iLP6ipxA3NSeK8yvGvzW2ub8aNYOsLqeSISa15fOw
Q/kC4OOqvw6XLxTyHsLINPBDBPSbynCz9pQ8eAYp2+9N+MoC6HaLCh+4/JZN8Jd0
KS0pcw4G4YBPaJDS62jKZvf9KAtumUOWanGoLlRBgFTJy3T0Ua9iRutr8Cg4lkH1
RFWJAh/xRiSjnxl23D/ZljadVrEn9Mn+R7sPLEoqWmftvlc+erC01QiJIDXAtMjt
ZEd+rJRRmUFQkaqtV73uqmwURLra0Lh8EkoMBOxnKFjStF0QmsELyy8NfWkviHaj
uhj5skd5pOjooXAs6J13uVnz2XIKwAO1z7zvjz3q/oanopRY1H0MjYryoMV5fzSB
n9CplsNp4E42s/CwD+YuDuL7BH2rwhwF9YbK49FUKB7ZQ7HwEd1fuuK53bxpYNOu
V400Mgp7YeKjtK/Qaavc
=r9To
-----END PGP SIGNATURE-----

--- End Message ---

Reply via email to