[Bug libstdc++/51078] New: [PATCH] performance improvement of std::count algorithm

2011-11-10 Thread grygoriy.fuchedzhy at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

 Bug #: 51078
   Summary: [PATCH] performance improvement of std::count
algorithm
Classification: Unclassified
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: libstdc++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: grygoriy.fuched...@gmail.com


Created attachment 25778
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25778
patch itself

This patch improves performance of std::count algorithm in case of random
access iterators by loop unrolling. It doesn't affect std::count algorithm for
usual input iterators. This is done by choosing different versions of function
for different iterator types at compile time, similar technique is used for
std::find algorithm.

Besides the patch itself I've attached performance test which can be used to
test this changes without even applying them. Test program has count_local
function which implements improvements of this patch. Test program generates
test bunch of test containers of different types and performs count_local and
original count algorithm on them. It outputs table with following columns:
container_size, type1_local_time, type1_orig_time, type1_performance_boost,
type2_local_time, type2_orig_time, type2_performance_boost, ...

container_size is length of data passed to each call to count algorithm.
typeX_local_time is time elapsed counting bunch of containers of type X using
count_local function.
typeX_orig_time is time elapsed counting bunch of containers of type X using
original count implementation.
typeX_performance_boost is (typeX_orig_time -
typeX_local_time)/typeX_orig_time.
Number of containers for testing decreases with increasing of container_size to
keep testing time relatively constant.

There is also data file containing test results and gnuplot script to plot
them. Results were obtained on my machine with i7 cpu. Also tried another
machine with core2 processor, got similar results. Also tried changing range of
generated random data from [1-128] to [1-2]. Got similar results.

I also attached plots of performance boost against container size, first one is
linear by size, second one is logarithmic.

As you can see there is performance decrease for very small containers with
length less then 10 elements. And around 30% performance boost for larger
containers with fundamental types. Containers of complex data
vector > have small performace boost around 1%.

There is also list to demonstrate that this patch doesn't affect non
random access iterators.


[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm

2011-11-10 Thread grygoriy.fuchedzhy at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

--- Comment #1 from Grygoriy Fuchedzhy  
2011-11-10 11:39:08 UTC ---
Created attachment 25779
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25779
performance test


[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm

2011-11-10 Thread grygoriy.fuchedzhy at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

--- Comment #2 from Grygoriy Fuchedzhy  
2011-11-10 11:40:20 UTC ---
Created attachment 25780
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25780
output of performance test


[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm

2011-11-10 Thread grygoriy.fuchedzhy at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

--- Comment #3 from Grygoriy Fuchedzhy  
2011-11-10 11:41:14 UTC ---
Created attachment 25781
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25781
gnuplot script to plot test data


[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm

2011-11-10 Thread grygoriy.fuchedzhy at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

--- Comment #4 from Grygoriy Fuchedzhy  
2011-11-10 11:42:23 UTC ---
Created attachment 25782
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25782
plot of performance boost against container size


[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm

2011-11-10 Thread grygoriy.fuchedzhy at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

--- Comment #5 from Grygoriy Fuchedzhy  
2011-11-10 11:43:00 UTC ---
Created attachment 25783
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25783
plot of performance boost against container size, logarithmic by size


[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm

2011-11-10 Thread grygoriy.fuchedzhy at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

--- Comment #7 from Grygoriy Fuchedzhy  
2011-11-10 14:06:38 UTC ---
(In reply to comment #6)
> Before looking at your patch, do you have a copyright assignment in place for
> contributing to GCC?

No, but I'm ready to sign it. What should I do now? I've found this
instructions: http://gcc.gnu.org/wiki/CopyrightAssignment


[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm

2011-11-10 Thread grygoriy.fuchedzhy at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

--- Comment #9 from Grygoriy Fuchedzhy  
2011-11-10 18:08:31 UTC ---
(In reply to comment #8)
> I can't seem to find a mention of the compiler flags you used for your
> benchmarks, what are they? -Ofast -funroll-loops?

-march=native -O2


[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm

2011-11-11 Thread grygoriy.fuchedzhy at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

--- Comment #11 from Grygoriy Fuchedzhy  
2011-11-11 21:53:50 UTC ---
I've tried different optimization options:
1. -march=native -O2,
2. -march=native -O2 -funroll-loops,
3. -march=native -O2 -funroll-all-loops,
4. -march=native -O3,
5. -march=native -O3 -funroll-loops,
6. -march=native -O3 -funroll-all-loops

And got following list of optimizations from faster to slower: 5, 6, 1, 4, 2, 3

You can see that code with automatic loop unrolling sometimes performs worse
than code without one. 2 and 3 optimizations gives 1.5 times worse result
compared to 1 variant.

5 and 6 variant is better then 1, but still manual loop unrolling performs
better(more than 25%).

Also I've tried changing number of unrolled iteration from 2 to 8 and got best
performance for 4 and 5 on core2 cpu.


[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm

2011-11-11 Thread grygoriy.fuchedzhy at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

Grygoriy Fuchedzhy  changed:

   What|Removed |Added

  Attachment #25779|0   |1
is obsolete||

--- Comment #12 from Grygoriy Fuchedzhy  
2011-11-11 21:57:06 UTC ---
Created attachment 25798
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25798
performance test

Minor fix to implement patch functionality exactly. Doesn't affect performance.


[Bug libstdc++/51078] [PATCH] performance improvement of std::count algorithm

2011-11-12 Thread grygoriy.fuchedzhy at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

--- Comment #14 from Grygoriy Fuchedzhy  
2011-11-12 10:09:52 UTC ---
(In reply to comment #13)
> I would say, the next step, is analyzing why: std::count seems a very simple
> algorithm, no aliasing issues for example, compiler should be able to unroll,
> maybe vectorize too, and everything else, without having to fiddle with the
> code in an ad-hoc way.

I completely agree that analizing why is necessary, but this probably should be
done in separate bug?(should I open it?) As regarding current bug: it seems
that automatic loop unrolling is far from being considered "stable" or
"recommended" optimization, so this patch may improve performance until this
optimization become commonly used. By the way, manually unrolled std::find, one
of old algorithms you mentioned also performs significantly better than version
without loop unrolling.


[Bug c++/68926] New: decltype and sfinae to check for template instance availability

2015-12-15 Thread grygoriy.fuchedzhy at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68926

Bug ID: 68926
   Summary: decltype and sfinae to check for template instance
availability
   Product: gcc
   Version: 5.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: grygoriy.fuchedzhy at gmail dot com
  Target Milestone: ---

Hi,
here is minimal example of code which I believe valid, compiles using
clang-3.5.2, but fails to compile using gcc-4.9.3 gcc-5.2.0 gcc-5.3.0
returning:
no type named ‘type’ in ‘struct std::enable_if’

#include 

template
typename std::enable_if::value>::type
func();

template)>
std::true_type test(T);

std::false_type test(...);

int main()
{
   decltype(test(0))::value;   // ok
   decltype(test(0.f))::value; // error
}

[Bug c++/68926] [4.9/5/6 Regression] decltype and sfinae to check for template instance availability fails to compile

2015-12-27 Thread grygoriy.fuchedzhy at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68926

--- Comment #3 from Grygoriy Fuchedzhy  ---
Created attachment 37173
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37173&action=edit
patch

[Bug c++/68926] [4.9/5/6 Regression] decltype and sfinae to check for template instance availability fails to compile

2015-12-27 Thread grygoriy.fuchedzhy at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68926

Grygoriy Fuchedzhy  changed:

   What|Removed |Added

 CC||jason at redhat dot com

--- Comment #4 from Grygoriy Fuchedzhy  ---
Regression was introduced in git commit
6395f98e518629ec25c36a680a3ec080590fa949 (190653 svn).
Attached patch fixes it for me. However it is probably not a proper fix for it.