p...@thetdh.com writes: > Normally I'd try a small lookup table (1-byte index to 1-byte value) > in this case. But if the bitscan instruction were even close in > performance, it'd be preferable, due to its more-reliable caching > behavior;
Well, the problem with the bitscan instruction is we won't have it at all on a lot of configurations. I don't think this problem is so important as to justify *two* hacked-up code paths. > The specific code for large-versus-small testing would be useful; did I > overlook it? Sorry, I should have provided that. What I've tested is #define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n static const char LogTable256[256] = { 0, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, LT(5), LT(6), LT(6), LT(7), LT(7), LT(7), LT(7), LT(8), LT(8), LT(8), LT(8), LT(8), LT(8), LT(8), LT(8) }; int AllocSetFreeIndex_lt(size_t size) { int idx = 0; if (size > (1 << ALLOC_MINBITS)) { unsigned int t, tt; // temporaries size = (size - 1) >> ALLOC_MINBITS; if ((tt = (size >> 16))) { idx = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt]; } else { idx = (t = size >> 8) ? 8 + LogTable256[t] : LogTable256[size]; } } return idx; } int AllocSetFreeIndex_lts(size_t size) { int idx = 0; if (size > (1 << ALLOC_MINBITS)) { unsigned int t; // temporaries size = (size - 1) >> ALLOC_MINBITS; t = size >> 8; idx = t ? 8 + LogTable256[t] : LogTable256[(unsigned int) size]; } return idx; } plus the obvious additions to the other file for an additional test type. > Note that instruction alignment with respect to words is not the only > potential instruction-alignment issue. In the past, when optimizing > code to an extreme, I've run into cache-line issues where a small > change that should've produced a small improvement resulted in a > largish performance loss, without further work. Lookup tables can have > an analogous issue; this could, in a simplistic test, explain an > anomalous large-better-than-small result, if part of the large lookup > table remains cached. (Do any modern CPUs attempt to address this??) Yeah, I've seen similar things when trying to micro-optimize code. That's why it's so critical to get results from multiple platforms/machines before trusting the answers too much. > This is difficult to tune in a multiplatform code base, so the numbers > in a particular benchmark do not tell the whole tale; you'd need to > make a judgment call, and perhaps to allow a code-configuration > override. Well, my judgment call is that we should go with the lookup table. So far, the only machine where that doesn't seem to provide a nice win is my old Mac G4 laptop, which is hardly a platform that people are likely to be stressing about database performance on. Jeremy's PPC machines like it a lot more, so there's not something fundamentally wrong with the approach for that architecture. The other alternatives all have significantly greater downsides. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers