On Fri, Feb 27, 2026 at 5:12 PM Tomasz Kaminski <[email protected]> wrote:
> > > On Fri, Feb 27, 2026 at 4:48 PM Jonathan Wakely <[email protected]> > wrote: > >> On Wed, 07 Jan 2026 at 15:30 +0100, Tomasz Kamiński wrote: >> >The warning was produced by following seqeunce, given an >> istream_iterator<char> >> >it, such that *it will result in hitting EoF in it->_M_get(), and thus >> clearing >> >_M_sbuf, the subsequent call to ++it, will result in _M_sbuf->sbumpc() >> call >> >on null pointer dereference. This is however an false-positive, as in >> such sitution >> >it == istream_iteator() returns true, and the iterator should not be >> dereferenced >> >in first place. >> > >> >This patch addresses the above by clearing the _M_sbuf in operator++, >> instead of >> >_M_get(). This removes the need for making _M_sbuf mutable, and thus >> make the >> >implementation conforming with regards to C++11 [res.on.data.races] p3. >> > >> >This change should not or positive have performance impact on the usual >> iteration >> >> I think this should be: >> >> "This change should have zero or positive performance impact ..." >> >> >patterns, in form: >> > while (it != end) { process(*it); ++it; } >> >In case when it is end-of-stream iterator, the it != end returns in one >> call of >> >_M_sbuf->sgetc() both before and after the change. However we do not >> modify _M_sbuf >> >in this case. For non-empty range, we replace call to _M_sbuf->sbumpc() >> with >> >_M_sbuf->snextc() in pre-increment, and extract the check again from *it >> to >> >++it. However, as _M_sbuf is now cleared during increment, so last it != >> end >> >check avoids _M_sbuf->sgetc() call to check against EoF. >> > >> >However, this change impact the behavior of the post-increment (*it++), >> as >> >we now load both current character (for return value) and next character >> (to >> >check against EoF). In consequence we call both sgetc() and snextc(), in >> contrast >> >to previous single sbumpc() call. >> > >> > PR libstdc++/105580 >> > >> >libstdc++-v3/ChangeLog: >> > >> > * include/bits/streambuf_iterator.h >> (istreambuf_iterator::_M_sbuf): >> > Remove mutable and adjust whitespace. >> > (istreambuf_iterator::_M_c): Adjust whitespace. >> > (istreambuf_iterator::operator++()): Clear _M_sbuf if next >> character >> > is EoF. >> > (istreambuf_iterator::operator++(int)): Use _M_sbuf->sgetc() to >> > load current character, and define in terms of __*this. >> >> This should be ++ not __ >> >> > (istreambuf_iterator::_M_get()): Do not clear _M_sbuf in case of >> EoF. >> > * testsuite/24_iterators/istreambuf_iterator/2.cc: Test for using >> > multiple iterators to same rdbuf. >> >--- >> >This warning turned out to be false-positive (as explained in the commit >> >decription), so we can just add add pragrams to address that. But this >> >change seem to be beneficial anyway, as we are no longer causing >> >data-races. >> > >> >Tested on x86_64-linux. Locally tested with -Wnull-dereference, >> >and the istream_iterator warning in 24_iterators/istreambuf_iterator/2.cc >> >disappears (there are some other warning present I haven't looked into >> >that yet). >> >> Can we add a new testsuite/24_iterators/istreambuf_iterator/105580.cc >> using exactly the testcase from the bug, with -Wnull-dereference in >> the dg-options? I don't think the additions to 2.cc cover it. I'd like >> us to have a test that gives the warning today and doesn't warn after >> the fix. >> > OK will do. > >> >> > .../include/bits/streambuf_iterator.h | 20 +++---- >> > .../24_iterators/istreambuf_iterator/2.cc | 58 +++++++++++++++++++ >> > 2 files changed, 66 insertions(+), 12 deletions(-) >> > >> >diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h >> b/libstdc++-v3/include/bits/streambuf_iterator.h >> >index 93d3dd24f93..095928ca4d8 100644 >> >--- a/libstdc++-v3/include/bits/streambuf_iterator.h >> >+++ b/libstdc++-v3/include/bits/streambuf_iterator.h >> >@@ -112,8 +112,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> > // the "end of stream" iterator value. >> > // NB: This implementation assumes the "end of stream" value >> > // is EOF, or -1. >> >- mutable streambuf_type* _M_sbuf; >> >- int_type _M_c; >> >+ streambuf_type* _M_sbuf; >> >> I've been trying to figure out if this change (removing 'mutable') >> could be unsafe if we mix objects compiled with old and new GCC. >> >> In a TU compiled by an old GCC, _M_sbuf could change by changed when >> the iterator reaches EOF. In a TU compiled by the new GCC, could the >> compiler optimize based on the fact that if only const member >> functionss are used, the member can't change? >> >> I don't think it's a problem, because the new compiler has to assume >> that any non-inline code could already do a const_cast on the iterator >> and so call non-const members that way. The only way it can optimize >> by assuming _M_sbuf never, ever changes is if it can see the original >> definition of the iterator and it was defined as const. But in that >> case, nobody can ever call non-const members on it anyway. >> > I thought about it, and the only situation when removing the mutable could > have > any impact, is that you declare an iterator as a const object, so the > compiler can in > that case now can propagate _M_sbuf value. But in that case, there is no > way to > mutate _M_sbuf without UB. > > >> >> So I think this is safe. I would feel more comfortable if we left the >> 'mutable' for GCC 16 with a comment saying it's not needed now, and >> then remove it for GCC 17 early in stage 1. >> > Given that the warning is FP, and is emitted from system headers, do we > even > need to have fix for GCC16? Maybe I should simply add pragma to ignore > warnings > for now, and postpone the whole endeavor for GCC17? Or just leave the > warning as is. > is emitted from system headers anyway. > I also feel uneasy about touching istream iterator now. In the beginning, we prioritized the change because we thought this was a true bug, so I think a better approach now may be to add test emmiting and pragmas disabling the warning now (with backports) and land the changes to GCC17. > >> >+ int_type _M_c; >> > >> > public: >> > /// Construct end of input stream iterator. >> >@@ -172,7 +172,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> > >> _M_message(__gnu_debug::__msg_inc_istreambuf) >> > ._M_iterator(*this)); >> > >> >- _M_sbuf->sbumpc(); >> >+ if (_S_is_eof(_M_sbuf->snextc())) >> >+ _M_sbuf = 0; >> >> Again, I'm considering old vs new compilers. What if one TU inlines >> the old definition of operator++ which doesn't set _M_sbuf to null, >> and makes a non-inline call to the new _M_get, which uses a definition >> from a new TU, which doesn't set _M_sbuf either? >> >> I think that means we will never set _M_sbuf = 0 but we still have the >> right behaviour, because comparing to the past-the-end iterator uses >> _M_at_eof() which will get a character from the streambuf, which will >> say we're at EOF. So it still works, it's just a bit slower because we >> have to go to the streambuf every time instead of just seeing that >> _M_sbuf is null. > > Again, I do not think that matters in practice, because "every time" > would be repeated comparision against end, after we determine that end > was reached. > > >> >> So I think this isn't an ABI change. I've been wondering whether we >> could be sure it's safe by adding __attribute__((__always_inline__)) >> to all the increment oeprators, and everything that directly or >> indirectly calls _M_get. But that's basically *everything*. We'd need >> to add it to equal() and _M_at_eof() and operator== etc. etc. >> >> If we did that, old TUs would continue to use their own definitions >> (via implicit instantiations) and new TUs would have force-inlined >> new definitions. But we could still get the behaviour above, if all >> the increments happen in the old TU and all the deferences happen in >> the new TU. So I don't think using always_inline really solves >> anything. And I don't think there's a real problem to solve anyway. >> >> > _M_c = traits_type::eof(); >> > return *this; >> > } >> >@@ -181,14 +182,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> > istreambuf_iterator >> > operator++(int) >> > { >> >- __glibcxx_requires_cond(_M_sbuf && >> >- (!_S_is_eof(_M_c) || >> !_S_is_eof(_M_sbuf->sgetc())), >> >- >> _M_message(__gnu_debug::__msg_inc_istreambuf) >> >- ._M_iterator(*this)); >> >- >> > istreambuf_iterator __old = *this; >> >- __old._M_c = _M_sbuf->sbumpc(); >> >- _M_c = traits_type::eof(); >> >+ __old._M_c = _M_sbuf->sgetc(); >> >+ ++*this; >> > return __old; >> > } >> > >> >@@ -206,8 +202,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> > _M_get() const >> > { >> > int_type __ret = _M_c; >> >- if (_M_sbuf && _S_is_eof(__ret) && _S_is_eof(__ret = >> _M_sbuf->sgetc())) >> >- _M_sbuf = 0; >> >+ if (_M_sbuf && _S_is_eof(__ret)) >> >> Would it make sense to use __builtin_expect here? >> >> if (__builtin_expect(_M_sbuf && _S_is_eof(__ret), 1)) >> >> We should only be calling _M_get() on dereferenceable iterators, so >> they should always have non-null _M_sbuf, and __ret is always eof >> except for the prvalue returned from the post-increment >> operator++(int). >> >> If that might pessimize process(*it++) then we could make it so the >> prediction is only on _M_sbuf: >> >> if (__builtin_expect(_M_sbuf, 1) && _S_is_eof(__ret)) >> >> Oh no ... because we call _M_get() to check if an iterator is >> past-the-end, which we do in operator==, so that happens on every >> iterator of a loop like while (it != eof_it) process(*it++); >> >> Drat. >> >> Comparing to the default_sentinel doesn't have that problem, because >> we only call _M_at_eof() for the iterator, not for both the iterator >> and the sentinel. That's an interesting data point I'd never >> considered before: it's better to use the default sentinel here, >> because istreambuf_iterator does more than just compare the data >> members of two iterators. >> >> We could maybe consider changing _M_at_eof() to do: >> >> return _M_sbuf && _S_is_eof(_M_get()); >> >> so that we can have the branch prediction in _M_get() and it would be >> correct, because we would only call it for operator* and when we >> already know _M_sbuf is not null. >> >> But I don't think we need to do that as part of this patch. >> > I tried, but I haven't seen any gains from it. What have actual benefit is > to > change the return type of postfix++ to proxy (as in your prototype), > eliminate writes to _M_c, and then adjust _M_get to: > if (!_S_is_eof(_M_c)) [[unlikely]] > return _M_c; > return _M_sbuf->sgetc() > The unlikely case of _M_c not being eof is someone explicitly converting > it++ to iterator, or getting in returned from older TU, compiled with older > standard, that returned ++it. > >> >+ __ret = _M_sbuf->sgetc(); >> > return __ret; >> > } >> > >> >diff --git >> a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >> b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >> >index 78701d71cee..0318a677f35 100644 >> >--- a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >> >+++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >> >@@ -109,8 +109,66 @@ void test02(void) >> > } >> > } >> > >> >+void >> >+test_empty() >> >+{ >> >+ std::istreambuf_iterator<char> null(0), end; >> >+ VERIFY( null == end ); >> >+ >> >+ std::istringstream ess; >> >+ // Thie specification hear seem to indicate, that such iterators >> >+ // are not end-of-stream iterators, as rdbuf pointer is nullptr, >> >+ // however we treat them as such, as otherwise we would produce >> >+ // a range containing single eof character. >> >> Could you mention LWG 2366 here please. I think the deviation from the >> standard that you describe is fixing the problem identified in 2366, >> but this patch makes the implementation simpler than what I proposed >> there. >> >> >+ std::istreambuf_iterator<char> it1(ess), it2(ess); >> >+ VERIFY( it1 == end ); >> >+ VERIFY( it2 == end ); >> >+} >> >+ >> >+void >> >+test_multi() >> >+{ >> >+ // C++98 (and later) define operator* in [istreambuf.iterator.ops] as: >> >+ // Returns: The character obtained via the streambuf member >> sbuf_->sgetc(). >> >+ // This nails down behavior of mulptiple iterators to same sequence. >> >> "multiple" >> >> >+ std::istringstream iss("abcd"); >> >+ std::istreambuf_iterator<char> it1(iss), it2(iss), end; >> >+ >> >+ VERIFY( it1 != end ); >> >+ VERIFY( it2 != end ); >> >+ VERIFY( *it1 == 'a' ); >> >+ VERIFY( *it2 == 'a' ); >> >+ ++it1; >> >+ >> >+ VERIFY( it1 != end ); >> >+ VERIFY( it2 != end ); >> >+ VERIFY( *it1 == 'b' ); >> >+ VERIFY( *it2 == 'b' ); >> >+ ++it2; >> >+ >> >+ VERIFY( it1 != end ); >> >+ VERIFY( it2 != end ); >> >+ VERIFY( *it1 == 'c' ); >> >+ VERIFY( *it2 == 'c' ); >> >+ ++it2; >> >+ >> >+ VERIFY( it1 != end ); >> >+ VERIFY( it2 != end ); >> >+ VERIFY( *it1 == 'd' ); >> >+ VERIFY( *it2 == 'd' ); >> >+ // second dereference >> >+ VERIFY( *it1 == 'd' ); >> >+ VERIFY( *it2 == 'd' ); >> >+ ++it1; >> >+ >> >+ VERIFY( it1 == end ); >> >+ VERIFY( it2 == end ); >> >+} >> >+ >> > int main() >> > { >> > test02(); >> >+ test_empty(); >> >+ test_multi(); >> > return 0; >> > } >> >-- >> >2.52.0 >> > >> > >> >>
