Re: [C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR c++/83938)

chang jc Thu, 25 Jan 2018 14:37:57 -0800

Hi:

1. The __len = (__len + 1) / 2; is as you suggested, need to modify as
__len = (__len == 1) ? 0 : ((__len + 1) / 2);


2. The coding gain has been shown  PR c++/83938; I re-post here




  21
  20
  19
  18
  17
  16


  0.471136
  0.625695
  0.767262
  0.907461
  1.04838
  1.19508


  0.340845
  0.48651
  0.639139
  0.770133
  0.898454
  1.04632

it means when Merge [0, 4325376, 16777216); A is a sorted integer with
4325376 & B with 12451840 elements, total with 16M integers

The proposed method has the speed up under given buffer size =, ex
2^16, 2^17, ... 2^21 in unit of sizeof(int), for example, 2^16 means
given sizof(int)*64K bytes.

3. As your suggestion, _TmpBuf __buf should be rewrite.

4. It represents a fact that the intuitive idea to split from larger
part is wrong.

For example, if you have an input sorted array A & B, A has 8 integers
& B has 24 integers. Given tmp buffer whose capacity = 4 integers.

Current it tries to split from B, right?

Then we have:

A1 | A2  B1 | B2

B1 & B2 has 12 integers each, right?

Current algorithm selects pivot as 13th integer from B, right?

If the corresponding upper bound of A is 6th integer.

Then it split in

A1 = 5 | A2 = 3 | B1 = 12 | B2 = 12

After rotate, we have two arrays to merge

[A1 = 5 | B1 = 12]  & [A2 = 3 | B2 = 12]

Great, [A2 = 3 | B2 = 12] can use tmp buffer to merge.

Sadly, [A1 = 5 | B1 = 12] CANNOT.

So we do rotate again, split & merge the two split arrays from [A1 = 5
| B1 = 12] again.


But wait, if we always split from the smaller one instead of larger one.

After rotate, it promises two split arrays both contain ceiling[small/2].

Since tmp buffer also allocate by starting from sizeof(small) &
recursively downgrade by ceiling[small/2^(# of fail allocate)].

It means the allocated tmp buffer promises to be sufficient at the
level of (# of fail allocate).

Instead, you can see if split from large at level (# of fail allocate)
several split array still CANNOT use  tmp buffer to do buffered merge.


As you know, buffered merge is far speed then (split, rotate, and
merge two sub-arrays) (PR c++/83938 gives the profiling figures),

the way should provide speedup.


Thanks.










On 24/01/2018 18:23, François Dumont wrote:

Hi


    It sounds like a very sensitive change to make but nothing worth figures.
Do you have any bench showing the importance of the gain ?

    At least the memory usage optimization is obvious.

On 19/01/2018 10:43, chang jc wrote:

Current std::inplace_merge() suffers from performance issue by inefficient

logic under limited memory,

It leads to performance downgrade.

Please help to review it.

Index: include/bits/stl_algo.h
===================================================================
--- include/bits/stl_algo.h    (revision 256871)
+++ include/bits/stl_algo.h    (working copy)
@@ -2437,7 +2437,7 @@
        _BidirectionalIterator __second_cut = __middle;
        _Distance __len11 = 0;
        _Distance __len22 = 0;
-      if (__len1 > __len2)
+      if (__len1 < __len2)
          {
            __len11 = __len1 / 2;
            std::advance(__first_cut, __len11);
@@ -2539,9 +2539,15 @@
        const _DistanceType __len1 = std::distance(__first, __middle);
        const _DistanceType __len2 = std::distance(__middle, __last);

+

        typedef _Temporary_buffer<_BidirectionalIterator, _ValueType>
_TmpBuf;

-      _TmpBuf __buf(__first, __last);
-
+      _BidirectionalIterator __start, __end;
+      if (__len1 < __len2) {
+    __start = __first; __end = __middle;
+      } else {
+    __start = __middle; __end = __last;
+      }
+      _TmpBuf __buf(__start, ___end);

Note another optimization, we could introduce a _Temporary_buffer<> constructor
in order to write:

_TmpBuf __buf(std::min(__len1, __len2), __first);

So that std::distance do not need to be called again.

        if (__buf.begin() == 0)
      std::__merge_without_buffer
        (__first, __middle, __last, __len1, __len2, __comp);
Index: include/bits/stl_tempbuf.h
===================================================================
--- include/bits/stl_tempbuf.h    (revision 256871)
+++ include/bits/stl_tempbuf.h    (working copy)
@@ -95,7 +95,7 @@
                              std::nothrow));
        if (__tmp != 0)
          return std::pair<_Tp*, ptrdiff_t>(__tmp, __len);
-      __len /= 2;
+      __len = (__len + 1) / 2;

This part is problematic, if __len is 1 and allocation fails it will loop
forever.

It doesn't seem really necessary for your patch.


2018-01-20 4:05 GMT+08:00 Jason Merrill <ja...@redhat.com>:

> This is a libstdc++ bug and patch, not the C++ front end.  So I'm
> adding the libstdc++ list to CC.
>
> On Fri, Jan 19, 2018 at 3:02 AM, chang jc <r97922...@gmail.com> wrote:
> > Current std::inplace_merge() suffers from performance issue by
> inefficient
> > logic under limited memory,
> >
> > It leads to performance downgrade.
> >
> > Please help to review it.
> >
> > Index: include/bits/stl_algo.h
> > ===================================================================
> > --- include/bits/stl_algo.h     (revision 256871)
> > +++ include/bits/stl_algo.h     (working copy)
> > @@ -2437,7 +2437,7 @@
> >           _BidirectionalIterator __second_cut = __middle;
> >           _Distance __len11 = 0;
> >           _Distance __len22 = 0;
> > -         if (__len1 > __len2)
> > +         if (__len1 < __len2)
> >             {
> >               __len11 = __len1 / 2;
> >               std::advance(__first_cut, __len11);
> > @@ -2539,9 +2539,15 @@
> >        const _DistanceType __len1 = std::distance(__first, __middle);
> >        const _DistanceType __len2 = std::distance(__middle, __last);
> >
> > +
> >        typedef _Temporary_buffer<_BidirectionalIterator, _ValueType>
> _TmpBuf;
> > -      _TmpBuf __buf(__first, __last);
> > -
> > +      _BidirectionalIterator __start, __end;
> > +      if (__len1 < __len2) {
> > +       __start = __first; __end = __middle;
> > +      } else {
> > +       __start = __middle; __end = __last;
> > +      }
> > +      _TmpBuf __buf(__start, ___end);
> >        if (__buf.begin() == 0)
> >         std::__merge_without_buffer
> >           (__first, __middle, __last, __len1, __len2, __comp);
> > Index: include/bits/stl_tempbuf.h
> > ===================================================================
> > --- include/bits/stl_tempbuf.h  (revision 256871)
> > +++ include/bits/stl_tempbuf.h  (working copy)
> > @@ -95,7 +95,7 @@
> >                                                         std::nothrow));
> >           if (__tmp != 0)
> >             return std::pair<_Tp*, ptrdiff_t>(__tmp, __len);
> > -         __len /= 2;
> > +         __len = (__len + 1) / 2;
> >         }
> >        return std::pair<_Tp*, ptrdiff_t>(static_cast<_Tp*>(0), 0);
> >      }
> >
> >
> >
> >
> > Thanks
>

Re: [C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR c++/83938)

Reply via email to