On Mon, 15 May 2017, François Dumont wrote:
I also added some optimizations. Especially replacement of std::fill with calls to __builtin_memset. Has anyone ever proposed to optimize std::fill in such a way ? It would require a test on the value used to fill the range but it might worth this additional runtime check, no ?
Note that with -O3, gcc recognizes the pattern in std::fill and generates a call to memset (there is a bit too much extra code around the memset, but a couple match.pd transformations should fix that). That doesn't mean we can't save it the work. If you want to save the runtime check, there is always __builtin_constant_p...
The __fill_bvector part of the fill overload for vector<bool> could do with some improvements as well. Looping is unnecessary, one just needs to produce the right mask and and or or with it, that shouldn't take more than 4 instructions or so.
There was a time when I suggested overloading std::count and std::find in order to use __builtin_popcount, etc. But from what I've seen of committee discussions, I expect that there will be specialized algorithms (possibly member functions) eventually, making the overload less useful.
-- Marc Glisse