On Fri, Feb 27, 2026 at 4:48 PM Jonathan Wakely <[email protected]> wrote:
> On Wed, 07 Jan 2026 at 15:30 +0100, Tomasz Kamiński wrote:
> >The warning was produced by following seqeunce, given an
> istream_iterator<char>
> >it, such that *it will result in hitting EoF in it->_M_get(), and thus
> clearing
> >_M_sbuf, the subsequent call to ++it, will result in _M_sbuf->sbumpc()
> call
> >on null pointer dereference. This is however an false-positive, as in
> such sitution
> >it == istream_iteator() returns true, and the iterator should not be
> dereferenced
> >in first place.
> >
> >This patch addresses the above by clearing the _M_sbuf in operator++,
> instead of
> >_M_get(). This removes the need for making _M_sbuf mutable, and thus make
> the
> >implementation conforming with regards to C++11 [res.on.data.races] p3.
> >
> >This change should not or positive have performance impact on the usual
> iteration
>
> I think this should be:
>
> "This change should have zero or positive performance impact ..."
>
> >patterns, in form:
> > while (it != end) { process(*it); ++it; }
> >In case when it is end-of-stream iterator, the it != end returns in one
> call of
> >_M_sbuf->sgetc() both before and after the change. However we do not
> modify _M_sbuf
> >in this case. For non-empty range, we replace call to _M_sbuf->sbumpc()
> with
> >_M_sbuf->snextc() in pre-increment, and extract the check again from *it
> to
> >++it. However, as _M_sbuf is now cleared during increment, so last it !=
> end
> >check avoids _M_sbuf->sgetc() call to check against EoF.
> >
> >However, this change impact the behavior of the post-increment (*it++), as
> >we now load both current character (for return value) and next character
> (to
> >check against EoF). In consequence we call both sgetc() and snextc(), in
> contrast
> >to previous single sbumpc() call.
> >
> > PR libstdc++/105580
> >
> >libstdc++-v3/ChangeLog:
> >
> > * include/bits/streambuf_iterator.h (istreambuf_iterator::_M_sbuf):
> > Remove mutable and adjust whitespace.
> > (istreambuf_iterator::_M_c): Adjust whitespace.
> > (istreambuf_iterator::operator++()): Clear _M_sbuf if next
> character
> > is EoF.
> > (istreambuf_iterator::operator++(int)): Use _M_sbuf->sgetc() to
> > load current character, and define in terms of __*this.
>
> This should be ++ not __
>
> > (istreambuf_iterator::_M_get()): Do not clear _M_sbuf in case of
> EoF.
> > * testsuite/24_iterators/istreambuf_iterator/2.cc: Test for using
> > multiple iterators to same rdbuf.
> >---
> >This warning turned out to be false-positive (as explained in the commit
> >decription), so we can just add add pragrams to address that. But this
> >change seem to be beneficial anyway, as we are no longer causing
> >data-races.
> >
> >Tested on x86_64-linux. Locally tested with -Wnull-dereference,
> >and the istream_iterator warning in 24_iterators/istreambuf_iterator/2.cc
> >disappears (there are some other warning present I haven't looked into
> >that yet).
>
> Can we add a new testsuite/24_iterators/istreambuf_iterator/105580.cc
> using exactly the testcase from the bug, with -Wnull-dereference in
> the dg-options? I don't think the additions to 2.cc cover it. I'd like
> us to have a test that gives the warning today and doesn't warn after
> the fix.
>
OK will do.
>
> > .../include/bits/streambuf_iterator.h | 20 +++----
> > .../24_iterators/istreambuf_iterator/2.cc | 58 +++++++++++++++++++
> > 2 files changed, 66 insertions(+), 12 deletions(-)
> >
> >diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h
> b/libstdc++-v3/include/bits/streambuf_iterator.h
> >index 93d3dd24f93..095928ca4d8 100644
> >--- a/libstdc++-v3/include/bits/streambuf_iterator.h
> >+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
> >@@ -112,8 +112,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > // the "end of stream" iterator value.
> > // NB: This implementation assumes the "end of stream" value
> > // is EOF, or -1.
> >- mutable streambuf_type* _M_sbuf;
> >- int_type _M_c;
> >+ streambuf_type* _M_sbuf;
>
> I've been trying to figure out if this change (removing 'mutable')
> could be unsafe if we mix objects compiled with old and new GCC.
>
> In a TU compiled by an old GCC, _M_sbuf could change by changed when
> the iterator reaches EOF. In a TU compiled by the new GCC, could the
> compiler optimize based on the fact that if only const member
> functionss are used, the member can't change?
>
> I don't think it's a problem, because the new compiler has to assume
> that any non-inline code could already do a const_cast on the iterator
> and so call non-const members that way. The only way it can optimize
> by assuming _M_sbuf never, ever changes is if it can see the original
> definition of the iterator and it was defined as const. But in that
> case, nobody can ever call non-const members on it anyway.
>
I thought about it, and the only situation when removing the mutable could
have
any impact, is that you declare an iterator as a const object, so the
compiler can in
that case now can propagate _M_sbuf value. But in that case, there is no
way to
mutate _M_sbuf without UB.
>
> So I think this is safe. I would feel more comfortable if we left the
> 'mutable' for GCC 16 with a comment saying it's not needed now, and
> then remove it for GCC 17 early in stage 1.
>
Given that the warning is FP, and is emitted from system headers, do we even
need to have fix for GCC16? Maybe I should simply add pragma to ignore
warnings
for now, and postpone the whole endeavor for GCC17? Or just leave the
warning as is.
is emitted from system headers anyway.
>
> >+ int_type _M_c;
> >
> > public:
> > /// Construct end of input stream iterator.
> >@@ -172,7 +172,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >
> _M_message(__gnu_debug::__msg_inc_istreambuf)
> > ._M_iterator(*this));
> >
> >- _M_sbuf->sbumpc();
> >+ if (_S_is_eof(_M_sbuf->snextc()))
> >+ _M_sbuf = 0;
>
> Again, I'm considering old vs new compilers. What if one TU inlines
> the old definition of operator++ which doesn't set _M_sbuf to null,
> and makes a non-inline call to the new _M_get, which uses a definition
> from a new TU, which doesn't set _M_sbuf either?
>
> I think that means we will never set _M_sbuf = 0 but we still have the
> right behaviour, because comparing to the past-the-end iterator uses
> _M_at_eof() which will get a character from the streambuf, which will
> say we're at EOF. So it still works, it's just a bit slower because we
> have to go to the streambuf every time instead of just seeing that
> _M_sbuf is null.
Again, I do not think that matters in practice, because "every time"
would be repeated comparision against end, after we determine that end
was reached.
>
> So I think this isn't an ABI change. I've been wondering whether we
> could be sure it's safe by adding __attribute__((__always_inline__))
> to all the increment oeprators, and everything that directly or
> indirectly calls _M_get. But that's basically *everything*. We'd need
> to add it to equal() and _M_at_eof() and operator== etc. etc.
>
> If we did that, old TUs would continue to use their own definitions
> (via implicit instantiations) and new TUs would have force-inlined
> new definitions. But we could still get the behaviour above, if all
> the increments happen in the old TU and all the deferences happen in
> the new TU. So I don't think using always_inline really solves
> anything. And I don't think there's a real problem to solve anyway.
>
> > _M_c = traits_type::eof();
> > return *this;
> > }
> >@@ -181,14 +182,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > istreambuf_iterator
> > operator++(int)
> > {
> >- __glibcxx_requires_cond(_M_sbuf &&
> >- (!_S_is_eof(_M_c) ||
> !_S_is_eof(_M_sbuf->sgetc())),
> >-
> _M_message(__gnu_debug::__msg_inc_istreambuf)
> >- ._M_iterator(*this));
> >-
> > istreambuf_iterator __old = *this;
> >- __old._M_c = _M_sbuf->sbumpc();
> >- _M_c = traits_type::eof();
> >+ __old._M_c = _M_sbuf->sgetc();
> >+ ++*this;
> > return __old;
> > }
> >
> >@@ -206,8 +202,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > _M_get() const
> > {
> > int_type __ret = _M_c;
> >- if (_M_sbuf && _S_is_eof(__ret) && _S_is_eof(__ret =
> _M_sbuf->sgetc()))
> >- _M_sbuf = 0;
> >+ if (_M_sbuf && _S_is_eof(__ret))
>
> Would it make sense to use __builtin_expect here?
>
> if (__builtin_expect(_M_sbuf && _S_is_eof(__ret), 1))
>
> We should only be calling _M_get() on dereferenceable iterators, so
> they should always have non-null _M_sbuf, and __ret is always eof
> except for the prvalue returned from the post-increment
> operator++(int).
>
> If that might pessimize process(*it++) then we could make it so the
> prediction is only on _M_sbuf:
>
> if (__builtin_expect(_M_sbuf, 1) && _S_is_eof(__ret))
>
> Oh no ... because we call _M_get() to check if an iterator is
> past-the-end, which we do in operator==, so that happens on every
> iterator of a loop like while (it != eof_it) process(*it++);
>
> Drat.
>
> Comparing to the default_sentinel doesn't have that problem, because
> we only call _M_at_eof() for the iterator, not for both the iterator
> and the sentinel. That's an interesting data point I'd never
> considered before: it's better to use the default sentinel here,
> because istreambuf_iterator does more than just compare the data
> members of two iterators.
>
> We could maybe consider changing _M_at_eof() to do:
>
> return _M_sbuf && _S_is_eof(_M_get());
>
> so that we can have the branch prediction in _M_get() and it would be
> correct, because we would only call it for operator* and when we
> already know _M_sbuf is not null.
>
> But I don't think we need to do that as part of this patch.
>
I tried, but I haven't seen any gains from it. What have actual benefit is
to
change the return type of postfix++ to proxy (as in your prototype),
eliminate writes to _M_c, and then adjust _M_get to:
if (!_S_is_eof(_M_c)) [[unlikely]]
return _M_c;
return _M_sbuf->sgetc()
The unlikely case of _M_c not being eof is someone explicitly converting
it++ to iterator, or getting in returned from older TU, compiled with older
standard, that returned ++it.
> >+ __ret = _M_sbuf->sgetc();
> > return __ret;
> > }
> >
> >diff --git a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc
> b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc
> >index 78701d71cee..0318a677f35 100644
> >--- a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc
> >+++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc
> >@@ -109,8 +109,66 @@ void test02(void)
> > }
> > }
> >
> >+void
> >+test_empty()
> >+{
> >+ std::istreambuf_iterator<char> null(0), end;
> >+ VERIFY( null == end );
> >+
> >+ std::istringstream ess;
> >+ // Thie specification hear seem to indicate, that such iterators
> >+ // are not end-of-stream iterators, as rdbuf pointer is nullptr,
> >+ // however we treat them as such, as otherwise we would produce
> >+ // a range containing single eof character.
>
> Could you mention LWG 2366 here please. I think the deviation from the
> standard that you describe is fixing the problem identified in 2366,
> but this patch makes the implementation simpler than what I proposed
> there.
>
> >+ std::istreambuf_iterator<char> it1(ess), it2(ess);
> >+ VERIFY( it1 == end );
> >+ VERIFY( it2 == end );
> >+}
> >+
> >+void
> >+test_multi()
> >+{
> >+ // C++98 (and later) define operator* in [istreambuf.iterator.ops] as:
> >+ // Returns: The character obtained via the streambuf member
> sbuf_->sgetc().
> >+ // This nails down behavior of mulptiple iterators to same sequence.
>
> "multiple"
>
> >+ std::istringstream iss("abcd");
> >+ std::istreambuf_iterator<char> it1(iss), it2(iss), end;
> >+
> >+ VERIFY( it1 != end );
> >+ VERIFY( it2 != end );
> >+ VERIFY( *it1 == 'a' );
> >+ VERIFY( *it2 == 'a' );
> >+ ++it1;
> >+
> >+ VERIFY( it1 != end );
> >+ VERIFY( it2 != end );
> >+ VERIFY( *it1 == 'b' );
> >+ VERIFY( *it2 == 'b' );
> >+ ++it2;
> >+
> >+ VERIFY( it1 != end );
> >+ VERIFY( it2 != end );
> >+ VERIFY( *it1 == 'c' );
> >+ VERIFY( *it2 == 'c' );
> >+ ++it2;
> >+
> >+ VERIFY( it1 != end );
> >+ VERIFY( it2 != end );
> >+ VERIFY( *it1 == 'd' );
> >+ VERIFY( *it2 == 'd' );
> >+ // second dereference
> >+ VERIFY( *it1 == 'd' );
> >+ VERIFY( *it2 == 'd' );
> >+ ++it1;
> >+
> >+ VERIFY( it1 == end );
> >+ VERIFY( it2 == end );
> >+}
> >+
> > int main()
> > {
> > test02();
> >+ test_empty();
> >+ test_multi();
> > return 0;
> > }
> >--
> >2.52.0
> >
> >
>
>