[Bug libstdc++/51078] New: [PATCH] performance improvement of std::count algorithm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078 Bug #: 51078 Summary: [PATCH] performance improvement of std::count algorithm Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: libstdc++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: grygoriy.fuched...@gmail.com Created attachment 25778 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25778 patch itself This patch improves performance of std::count algorithm in case of random access iterators by loop unrolling. It doesn't affect std::count algorithm for usual input iterators. This is done by choosing different versions of function for different iterator types at compile time, similar technique is used for std::find algorithm. Besides the patch itself I've attached performance test which can be used to test this changes without even applying them. Test program has count_local function which implements improvements of this patch. Test program generates test bunch of test containers of different types and performs count_local and original count algorithm on them. It outputs table with following columns: container_size, type1_local_time, type1_orig_time, type1_performance_boost, type2_local_time, type2_orig_time, type2_performance_boost, ... container_size is length of data passed to each call to count algorithm. typeX_local_time is time elapsed counting bunch of containers of type X using count_local function. typeX_orig_time is time elapsed counting bunch of containers of type X using original count implementation. typeX_performance_boost is (typeX_orig_time - typeX_local_time)/typeX_orig_time. Number of containers for testing decreases with increasing of container_size to keep testing time relatively constant. There is also data file containing test results and gnuplot script to plot them. Results were obtained on my machine with i7 cpu. Also tried another machine with core2 processor, got similar results. Also tried changing range of generated random data from [1-128] to [1-2]. Got similar results. I also attached plots of performance boost against container size, first one is linear by size, second one is logarithmic. As you can see there is performance decrease for very small containers with length less then 10 elements. And around 30% performance boost for larger containers with fundamental types. Containers of complex data vector > have small performace boost around 1%. There is also list to demonstrate that this patch doesn't affect non random access iterators.
[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078 --- Comment #1 from Grygoriy Fuchedzhy 2011-11-10 11:39:08 UTC --- Created attachment 25779 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25779 performance test
[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078 --- Comment #2 from Grygoriy Fuchedzhy 2011-11-10 11:40:20 UTC --- Created attachment 25780 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25780 output of performance test
[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078 --- Comment #3 from Grygoriy Fuchedzhy 2011-11-10 11:41:14 UTC --- Created attachment 25781 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25781 gnuplot script to plot test data
[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078 --- Comment #4 from Grygoriy Fuchedzhy 2011-11-10 11:42:23 UTC --- Created attachment 25782 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25782 plot of performance boost against container size
[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078 --- Comment #5 from Grygoriy Fuchedzhy 2011-11-10 11:43:00 UTC --- Created attachment 25783 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25783 plot of performance boost against container size, logarithmic by size
[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078 --- Comment #7 from Grygoriy Fuchedzhy 2011-11-10 14:06:38 UTC --- (In reply to comment #6) > Before looking at your patch, do you have a copyright assignment in place for > contributing to GCC? No, but I'm ready to sign it. What should I do now? I've found this instructions: http://gcc.gnu.org/wiki/CopyrightAssignment
[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078 --- Comment #9 from Grygoriy Fuchedzhy 2011-11-10 18:08:31 UTC --- (In reply to comment #8) > I can't seem to find a mention of the compiler flags you used for your > benchmarks, what are they? -Ofast -funroll-loops? -march=native -O2
[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078 --- Comment #11 from Grygoriy Fuchedzhy 2011-11-11 21:53:50 UTC --- I've tried different optimization options: 1. -march=native -O2, 2. -march=native -O2 -funroll-loops, 3. -march=native -O2 -funroll-all-loops, 4. -march=native -O3, 5. -march=native -O3 -funroll-loops, 6. -march=native -O3 -funroll-all-loops And got following list of optimizations from faster to slower: 5, 6, 1, 4, 2, 3 You can see that code with automatic loop unrolling sometimes performs worse than code without one. 2 and 3 optimizations gives 1.5 times worse result compared to 1 variant. 5 and 6 variant is better then 1, but still manual loop unrolling performs better(more than 25%). Also I've tried changing number of unrolled iteration from 2 to 8 and got best performance for 4 and 5 on core2 cpu.
[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078 Grygoriy Fuchedzhy changed: What|Removed |Added Attachment #25779|0 |1 is obsolete|| --- Comment #12 from Grygoriy Fuchedzhy 2011-11-11 21:57:06 UTC --- Created attachment 25798 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25798 performance test Minor fix to implement patch functionality exactly. Doesn't affect performance.
[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078 --- Comment #14 from Grygoriy Fuchedzhy 2011-11-12 10:09:52 UTC --- (In reply to comment #13) > I would say, the next step, is analyzing why: std::count seems a very simple > algorithm, no aliasing issues for example, compiler should be able to unroll, > maybe vectorize too, and everything else, without having to fiddle with the > code in an ad-hoc way. I completely agree that analizing why is necessary, but this probably should be done in separate bug?(should I open it?) As regarding current bug: it seems that automatic loop unrolling is far from being considered "stable" or "recommended" optimization, so this patch may improve performance until this optimization become commonly used. By the way, manually unrolled std::find, one of old algorithms you mentioned also performs significantly better than version without loop unrolling.
[Bug c++/68926] New: decltype and sfinae to check for template instance availability
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68926 Bug ID: 68926 Summary: decltype and sfinae to check for template instance availability Product: gcc Version: 5.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: grygoriy.fuchedzhy at gmail dot com Target Milestone: --- Hi, here is minimal example of code which I believe valid, compiles using clang-3.5.2, but fails to compile using gcc-4.9.3 gcc-5.2.0 gcc-5.3.0 returning: no type named ‘type’ in ‘struct std::enable_if’ #include template typename std::enable_if::value>::type func(); template)> std::true_type test(T); std::false_type test(...); int main() { decltype(test(0))::value; // ok decltype(test(0.f))::value; // error }
[Bug c++/68926] [4.9/5/6 Regression] decltype and sfinae to check for template instance availability fails to compile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68926 --- Comment #3 from Grygoriy Fuchedzhy --- Created attachment 37173 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37173&action=edit patch
[Bug c++/68926] [4.9/5/6 Regression] decltype and sfinae to check for template instance availability fails to compile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68926 Grygoriy Fuchedzhy changed: What|Removed |Added CC||jason at redhat dot com --- Comment #4 from Grygoriy Fuchedzhy --- Regression was introduced in git commit 6395f98e518629ec25c36a680a3ec080590fa949 (190653 svn). Attached patch fixes it for me. However it is probably not a proper fix for it.