On 15/05/2017 21:31, Marc Glisse wrote:
On Mon, 15 May 2017, François Dumont wrote:
I also added some optimizations. Especially replacement of
std::fill with calls to __builtin_memset. Has anyone ever proposed to
optimize std::fill in such a way ? It would require a test on the
value used to fill the range but it might worth this additional
runtime check, no ?
Note that with -O3, gcc recognizes the pattern in std::fill and
generates a call to memset (there is a bit too much extra code around
the memset, but a couple match.pd transformations should fix that).
Good to know, at least g++ will be able to spend more time on other
optimizations :-) What is match.pd ?
That doesn't mean we can't save it the work. If you want to save the
runtime check, there is always __builtin_constant_p...
Good point, I will give it a try.
The __fill_bvector part of the fill overload for vector<bool> could do
with some improvements as well. Looping is unnecessary, one just needs
to produce the right mask and and or or with it, that shouldn't take
more than 4 instructions or so.
Yes, good idear, I'll submit another patch after this one.
There was a time when I suggested overloading std::count and std::find
in order to use __builtin_popcount, etc. But from what I've seen of
committee discussions, I expect that there will be specialized
algorithms (possibly member functions) eventually, making the overload
less useful.
ok, thanks for those feedbacks.
François