https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80732

            Bug ID: 80732
           Summary: target_clones does not work with dlsym
           Product: gcc
           Version: 6.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yyc1992 at gmail dot com
  Target Milestone: ---

Compiling the code below to a executable with `gcc -Wall -Wextra -O3 -fPIC -ldl
-rdynamic`. On a haswell+ system, the output is

```
1:
0, 4.93038e-32, 0
2:
4.93038e-32, 4.93038e-32, 4.93038e-32
```

Showing that with the manually created ifunc, dlsym, direct function call, and
accessing function address produces the same result (the fma version) whereas
with `target_clones` only direct function call uses the fma versison.

This might be related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78366 but
I'm not entirely sure. From that bug report I can understand that this is just
how `target_clones` is currently implemented but I do think this is not a
documentation issue and should be fixed / improved instead since

1. in this case there is user observable inconsistency in the result generated
when different code paths are used. The fast math object should be allowed to
produce slightly inaccurate result but I do think it should produce consistent
result every time the function is called.

2. probably more importantly, this behavior makes the `target_clone` attribute
useless for used in public interface if the shared library can ever by
dynamically loaded.

```
#include <stdio.h>
#include <dlfcn.h>

__attribute__((target_clones("default","fma"),noinline,optimize("fast-math")))
double f1(double a, double b, double c)
{
    return a * b + c;
}

double k1(double a, double b, double c, void **p)
{
    *p = f1;
    return f1(a, b, c);
}

__attribute__((target("fma"),optimize("fast-math")))
static double f2_fma(double a, double b, double c)
{
    return a * b + c;
}

__attribute__((optimize("fast-math")))
static double f2_default(double a, double b, double c)
{
    return a * b + c;
}

static void *f2_resolve(void)
{
    __builtin_cpu_init ();
    if (__builtin_cpu_supports("fma"))
        return f2_fma;
    else
        return f2_default;
}

double f2(double a, double b, double c) __attribute__((ifunc("f2_resolve")));

double k2(double a, double b, double c, void **p)
{
    *p = f2;
    return f2(a, b, c);
}

int main()
{
    volatile double a = 1.0000000000000002;
    volatile double b = -0.9999999999999998;
    volatile double c = 1.0;

    void *hdl = dlopen(NULL, RTLD_NOW);

    printf("1:\n");
    double (*pf1)(double, double, double) = dlsym(hdl, "f1");
    double (*pk1)(double, double, double, void**) = dlsym(hdl, "k1");
    double (*_pf1)(double, double, double);

    double v1_1 = pf1(a, b, c);
    double v1_2 = pk1(a, b, c, (void**)&_pf1);
    double v1_3 = _pf1(a, b, c);
    printf("%g, %g, %g\n", v1_1, v1_2, v1_3);

    printf("2:\n");
    double (*pf2)(double, double, double) = dlsym(hdl, "f2");
    double (*pk2)(double, double, double, void**) = dlsym(hdl, "k2");
    double (*_pf2)(double, double, double);

    double v2_1 = pf2(a, b, c);
    double v2_2 = pk2(a, b, c, (void**)&_pf2);
    double v2_3 = _pf2(a, b, c);
    printf("%g, %g, %g\n", v2_1, v2_2, v2_3);

    return 0;
}
```

Reply via email to