Hi: 1. The __len = (__len + 1) / 2; is as you suggested, need to modify as __len = (__len == 1) ? 0 : ((__len + 1) / 2);
2. The coding gain has been shown PR c++/83938; I re-post here 21 20 19 18 17 16 0.471136 0.625695 0.767262 0.907461 1.04838 1.19508 0.340845 0.48651 0.639139 0.770133 0.898454 1.04632 it means when Merge [0, 4325376, 16777216); A is a sorted integer with 4325376 & B with 12451840 elements, total with 16M integers The proposed method has the speed up under given buffer size =, ex 2^16, 2^17, ... 2^21 in unit of sizeof(int), for example, 2^16 means given sizof(int)*64K bytes. 3. As your suggestion, _TmpBuf __buf should be rewrite. 4. It represents a fact that the intuitive idea to split from larger part is wrong. For example, if you have an input sorted array A & B, A has 8 integers & B has 24 integers. Given tmp buffer whose capacity = 4 integers. Current it tries to split from B, right? Then we have: A1 | A2 B1 | B2 B1 & B2 has 12 integers each, right? Current algorithm selects pivot as 13th integer from B, right? If the corresponding upper bound of A is 6th integer. Then it split in A1 = 5 | A2 = 3 | B1 = 12 | B2 = 12 After rotate, we have two arrays to merge [A1 = 5 | B1 = 12] & [A2 = 3 | B2 = 12] Great, [A2 = 3 | B2 = 12] can use tmp buffer to merge. Sadly, [A1 = 5 | B1 = 12] CANNOT. So we do rotate again, split & merge the two split arrays from [A1 = 5 | B1 = 12] again. But wait, if we always split from the smaller one instead of larger one. After rotate, it promises two split arrays both contain ceiling[small/2]. Since tmp buffer also allocate by starting from sizeof(small) & recursively downgrade by ceiling[small/2^(# of fail allocate)]. It means the allocated tmp buffer promises to be sufficient at the level of (# of fail allocate). Instead, you can see if split from large at level (# of fail allocate) several split array still CANNOT use tmp buffer to do buffered merge. As you know, buffered merge is far speed then (split, rotate, and merge two sub-arrays) (PR c++/83938 gives the profiling figures), the way should provide speedup. Thanks. On 24/01/2018 18:23, François Dumont wrote: Hi It sounds like a very sensitive change to make but nothing worth figures. Do you have any bench showing the importance of the gain ? At least the memory usage optimization is obvious. On 19/01/2018 10:43, chang jc wrote: Current std::inplace_merge() suffers from performance issue by inefficient logic under limited memory, It leads to performance downgrade. Please help to review it. Index: include/bits/stl_algo.h =================================================================== --- include/bits/stl_algo.h (revision 256871) +++ include/bits/stl_algo.h (working copy) @@ -2437,7 +2437,7 @@ _BidirectionalIterator __second_cut = __middle; _Distance __len11 = 0; _Distance __len22 = 0; - if (__len1 > __len2) + if (__len1 < __len2) { __len11 = __len1 / 2; std::advance(__first_cut, __len11); @@ -2539,9 +2539,15 @@ const _DistanceType __len1 = std::distance(__first, __middle); const _DistanceType __len2 = std::distance(__middle, __last); + typedef _Temporary_buffer<_BidirectionalIterator, _ValueType> _TmpBuf; - _TmpBuf __buf(__first, __last); - + _BidirectionalIterator __start, __end; + if (__len1 < __len2) { + __start = __first; __end = __middle; + } else { + __start = __middle; __end = __last; + } + _TmpBuf __buf(__start, ___end); Note another optimization, we could introduce a _Temporary_buffer<> constructor in order to write: _TmpBuf __buf(std::min(__len1, __len2), __first); So that std::distance do not need to be called again. if (__buf.begin() == 0) std::__merge_without_buffer (__first, __middle, __last, __len1, __len2, __comp); Index: include/bits/stl_tempbuf.h =================================================================== --- include/bits/stl_tempbuf.h (revision 256871) +++ include/bits/stl_tempbuf.h (working copy) @@ -95,7 +95,7 @@ std::nothrow)); if (__tmp != 0) return std::pair<_Tp*, ptrdiff_t>(__tmp, __len); - __len /= 2; + __len = (__len + 1) / 2; This part is problematic, if __len is 1 and allocation fails it will loop forever. It doesn't seem really necessary for your patch. 2018-01-20 4:05 GMT+08:00 Jason Merrill <ja...@redhat.com>: > This is a libstdc++ bug and patch, not the C++ front end. So I'm > adding the libstdc++ list to CC. > > On Fri, Jan 19, 2018 at 3:02 AM, chang jc <r97922...@gmail.com> wrote: > > Current std::inplace_merge() suffers from performance issue by > inefficient > > logic under limited memory, > > > > It leads to performance downgrade. > > > > Please help to review it. > > > > Index: include/bits/stl_algo.h > > =================================================================== > > --- include/bits/stl_algo.h (revision 256871) > > +++ include/bits/stl_algo.h (working copy) > > @@ -2437,7 +2437,7 @@ > > _BidirectionalIterator __second_cut = __middle; > > _Distance __len11 = 0; > > _Distance __len22 = 0; > > - if (__len1 > __len2) > > + if (__len1 < __len2) > > { > > __len11 = __len1 / 2; > > std::advance(__first_cut, __len11); > > @@ -2539,9 +2539,15 @@ > > const _DistanceType __len1 = std::distance(__first, __middle); > > const _DistanceType __len2 = std::distance(__middle, __last); > > > > + > > typedef _Temporary_buffer<_BidirectionalIterator, _ValueType> > _TmpBuf; > > - _TmpBuf __buf(__first, __last); > > - > > + _BidirectionalIterator __start, __end; > > + if (__len1 < __len2) { > > + __start = __first; __end = __middle; > > + } else { > > + __start = __middle; __end = __last; > > + } > > + _TmpBuf __buf(__start, ___end); > > if (__buf.begin() == 0) > > std::__merge_without_buffer > > (__first, __middle, __last, __len1, __len2, __comp); > > Index: include/bits/stl_tempbuf.h > > =================================================================== > > --- include/bits/stl_tempbuf.h (revision 256871) > > +++ include/bits/stl_tempbuf.h (working copy) > > @@ -95,7 +95,7 @@ > > std::nothrow)); > > if (__tmp != 0) > > return std::pair<_Tp*, ptrdiff_t>(__tmp, __len); > > - __len /= 2; > > + __len = (__len + 1) / 2; > > } > > return std::pair<_Tp*, ptrdiff_t>(static_cast<_Tp*>(0), 0); > > } > > > > > > > > > > Thanks >