https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99320
Bug ID: 99320
Summary: constexpr defined arrays within constexpr functions
would benefit from lookup-tables
Product: gcc
Version: 10.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gcc-bugs at marehr dot dialup.fu-berlin.de
Target Milestone: ---
Hi gcc-team,
first of all, sorry if this is the wrong component, but I guess that this is a
"missed-optimization" issue rather than a regular C++ issue, so I wasn't sure
which component fit the most.
I have the following code (which can be further reduced, but I kept it as
original as possible to reflect my use case):
```c++
#include <array>
struct foo
{
static constexpr char bar(unsigned idx)
{
constexpr std::array<char, 256> lookup_table
{
[] () constexpr
{
std::array<char, 256> ret{};
// reverse mapping for characters and their lowercase
for (unsigned rnk = 0u; rnk < 15; ++rnk)
{
ret[rnk + 'A'] = rnk;
}
// set U equal to T
ret['U'] = ret['T']; ret['u'] = ret['t'];
// iupac characters get special treatment, because there is no N
ret['R'] = ret['A']; ret['r'] = ret['A']; // A or G
ret['Y'] = ret['C']; ret['y'] = ret['C']; // C or T
ret['S'] = ret['C']; ret['s'] = ret['C']; // C or G
ret['W'] = ret['A']; ret['w'] = ret['A']; // A or T
ret['K'] = ret['G']; ret['k'] = ret['G']; // G or T
ret['M'] = ret['A']; ret['m'] = ret['A']; // A or T
ret['B'] = ret['C']; ret['b'] = ret['C']; // C or G or T
ret['D'] = ret['A']; ret['d'] = ret['A']; // A or G or T
ret['H'] = ret['A']; ret['h'] = ret['A']; // A or C or T
ret['V'] = ret['A']; ret['v'] = ret['A']; // A or C or G
return ret;
}()
};
return lookup_table[idx];
}
};
int main(int argc, char const ** argv)
{
return foo::bar(argc);
}
```
I wanted to switch from defining that lookup-table within the class (e.g.
`static constexpr ... lookup_table = ...`) to define the lookup-table within
the function directly, and I noticed that I had some performance regression in
my benchmarks. Some micro benchmarks went from ~80ns to ~3000ns, but I also saw
an impact on more "realistic" macro benchmarks.
After looking at the assembly https://godbolt.org/z/n9bo7W, I noticed that the
table is "constructed" on each function call rather than a single
lookup-instruction.
So I compared it to what clang does, and it seems that they are actually
generating a static lookup table.
I know that this use case is quite niche, but it would be cool to have it
nevertheless :)
Thank you!