https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80732
Bug ID: 80732 Summary: target_clones does not work with dlsym Product: gcc Version: 6.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Compiling the code below to a executable with `gcc -Wall -Wextra -O3 -fPIC -ldl -rdynamic`. On a haswell+ system, the output is ``` 1: 0, 4.93038e-32, 0 2: 4.93038e-32, 4.93038e-32, 4.93038e-32 ``` Showing that with the manually created ifunc, dlsym, direct function call, and accessing function address produces the same result (the fma version) whereas with `target_clones` only direct function call uses the fma versison. This might be related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78366 but I'm not entirely sure. From that bug report I can understand that this is just how `target_clones` is currently implemented but I do think this is not a documentation issue and should be fixed / improved instead since 1. in this case there is user observable inconsistency in the result generated when different code paths are used. The fast math object should be allowed to produce slightly inaccurate result but I do think it should produce consistent result every time the function is called. 2. probably more importantly, this behavior makes the `target_clone` attribute useless for used in public interface if the shared library can ever by dynamically loaded. ``` #include <stdio.h> #include <dlfcn.h> __attribute__((target_clones("default","fma"),noinline,optimize("fast-math"))) double f1(double a, double b, double c) { return a * b + c; } double k1(double a, double b, double c, void **p) { *p = f1; return f1(a, b, c); } __attribute__((target("fma"),optimize("fast-math"))) static double f2_fma(double a, double b, double c) { return a * b + c; } __attribute__((optimize("fast-math"))) static double f2_default(double a, double b, double c) { return a * b + c; } static void *f2_resolve(void) { __builtin_cpu_init (); if (__builtin_cpu_supports("fma")) return f2_fma; else return f2_default; } double f2(double a, double b, double c) __attribute__((ifunc("f2_resolve"))); double k2(double a, double b, double c, void **p) { *p = f2; return f2(a, b, c); } int main() { volatile double a = 1.0000000000000002; volatile double b = -0.9999999999999998; volatile double c = 1.0; void *hdl = dlopen(NULL, RTLD_NOW); printf("1:\n"); double (*pf1)(double, double, double) = dlsym(hdl, "f1"); double (*pk1)(double, double, double, void**) = dlsym(hdl, "k1"); double (*_pf1)(double, double, double); double v1_1 = pf1(a, b, c); double v1_2 = pk1(a, b, c, (void**)&_pf1); double v1_3 = _pf1(a, b, c); printf("%g, %g, %g\n", v1_1, v1_2, v1_3); printf("2:\n"); double (*pf2)(double, double, double) = dlsym(hdl, "f2"); double (*pk2)(double, double, double, void**) = dlsym(hdl, "k2"); double (*_pf2)(double, double, double); double v2_1 = pf2(a, b, c); double v2_2 = pk2(a, b, c, (void**)&_pf2); double v2_3 = _pf2(a, b, c); printf("%g, %g, %g\n", v2_1, v2_2, v2_3); return 0; } ```