On Wed, 07 Jan 2026 at 15:30 +0100, Tomasz Kamiński wrote:
The warning was produced by following seqeunce, given an istream_iterator<char>
it, such that *it will result in hitting EoF in it->_M_get(), and thus clearing
_M_sbuf, the subsequent call to ++it, will result in _M_sbuf->sbumpc() call
on null pointer dereference. This is however an false-positive, as in such
sitution
it == istream_iteator() returns true, and the iterator should not be
dereferenced
in first place.
This patch addresses the above by clearing the _M_sbuf in operator++, instead of
_M_get(). This removes the need for making _M_sbuf mutable, and thus make the
implementation conforming with regards to C++11 [res.on.data.races] p3.
This change should not or positive have performance impact on the usual
iteration
I think this should be:
"This change should have zero or positive performance impact ..."
patterns, in form:
while (it != end) { process(*it); ++it; }
In case when it is end-of-stream iterator, the it != end returns in one call of
_M_sbuf->sgetc() both before and after the change. However we do not modify
_M_sbuf
in this case. For non-empty range, we replace call to _M_sbuf->sbumpc() with
_M_sbuf->snextc() in pre-increment, and extract the check again from *it to
++it. However, as _M_sbuf is now cleared during increment, so last it != end
check avoids _M_sbuf->sgetc() call to check against EoF.
However, this change impact the behavior of the post-increment (*it++), as
we now load both current character (for return value) and next character (to
check against EoF). In consequence we call both sgetc() and snextc(), in
contrast
to previous single sbumpc() call.
PR libstdc++/105580
libstdc++-v3/ChangeLog:
* include/bits/streambuf_iterator.h (istreambuf_iterator::_M_sbuf):
Remove mutable and adjust whitespace.
(istreambuf_iterator::_M_c): Adjust whitespace.
(istreambuf_iterator::operator++()): Clear _M_sbuf if next character
is EoF.
(istreambuf_iterator::operator++(int)): Use _M_sbuf->sgetc() to
load current character, and define in terms of __*this.
This should be ++ not __
(istreambuf_iterator::_M_get()): Do not clear _M_sbuf in case of EoF.
* testsuite/24_iterators/istreambuf_iterator/2.cc: Test for using
multiple iterators to same rdbuf.
---
This warning turned out to be false-positive (as explained in the commit
decription), so we can just add add pragrams to address that. But this
change seem to be beneficial anyway, as we are no longer causing
data-races.
Tested on x86_64-linux. Locally tested with -Wnull-dereference,
and the istream_iterator warning in 24_iterators/istreambuf_iterator/2.cc
disappears (there are some other warning present I haven't looked into
that yet).
Can we add a new testsuite/24_iterators/istreambuf_iterator/105580.cc
using exactly the testcase from the bug, with -Wnull-dereference in
the dg-options? I don't think the additions to 2.cc cover it. I'd like
us to have a test that gives the warning today and doesn't warn after
the fix.
.../include/bits/streambuf_iterator.h | 20 +++----
.../24_iterators/istreambuf_iterator/2.cc | 58 +++++++++++++++++++
2 files changed, 66 insertions(+), 12 deletions(-)
diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h
b/libstdc++-v3/include/bits/streambuf_iterator.h
index 93d3dd24f93..095928ca4d8 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -112,8 +112,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// the "end of stream" iterator value.
// NB: This implementation assumes the "end of stream" value
// is EOF, or -1.
- mutable streambuf_type* _M_sbuf;
- int_type _M_c;
+ streambuf_type* _M_sbuf;
I've been trying to figure out if this change (removing 'mutable')
could be unsafe if we mix objects compiled with old and new GCC.
In a TU compiled by an old GCC, _M_sbuf could change by changed when
the iterator reaches EOF. In a TU compiled by the new GCC, could the
compiler optimize based on the fact that if only const member
functionss are used, the member can't change?
I don't think it's a problem, because the new compiler has to assume
that any non-inline code could already do a const_cast on the iterator
and so call non-const members that way. The only way it can optimize
by assuming _M_sbuf never, ever changes is if it can see the original
definition of the iterator and it was defined as const. But in that
case, nobody can ever call non-const members on it anyway.
So I think this is safe. I would feel more comfortable if we left the
'mutable' for GCC 16 with a comment saying it's not needed now, and
then remove it for GCC 17 early in stage 1.
+ int_type _M_c;
public:
/// Construct end of input stream iterator.
@@ -172,7 +172,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_message(__gnu_debug::__msg_inc_istreambuf)
._M_iterator(*this));
- _M_sbuf->sbumpc();
+ if (_S_is_eof(_M_sbuf->snextc()))
+ _M_sbuf = 0;
Again, I'm considering old vs new compilers. What if one TU inlines
the old definition of operator++ which doesn't set _M_sbuf to null,
and makes a non-inline call to the new _M_get, which uses a definition
from a new TU, which doesn't set _M_sbuf either?
I think that means we will never set _M_sbuf = 0 but we still have the
right behaviour, because comparing to the past-the-end iterator uses
_M_at_eof() which will get a character from the streambuf, which will
say we're at EOF. So it still works, it's just a bit slower because we
have to go to the streambuf every time instead of just seeing that
_M_sbuf is null.
So I think this isn't an ABI change. I've been wondering whether we
could be sure it's safe by adding __attribute__((__always_inline__))
to all the increment oeprators, and everything that directly or
indirectly calls _M_get. But that's basically *everything*. We'd need
to add it to equal() and _M_at_eof() and operator== etc. etc.
If we did that, old TUs would continue to use their own definitions
(via implicit instantiations) and new TUs would have force-inlined
new definitions. But we could still get the behaviour above, if all
the increments happen in the old TU and all the deferences happen in
the new TU. So I don't think using always_inline really solves
anything. And I don't think there's a real problem to solve anyway.
_M_c = traits_type::eof();
return *this;
}
@@ -181,14 +182,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
istreambuf_iterator
operator++(int)
{
- __glibcxx_requires_cond(_M_sbuf &&
- (!_S_is_eof(_M_c) ||
!_S_is_eof(_M_sbuf->sgetc())),
- _M_message(__gnu_debug::__msg_inc_istreambuf)
- ._M_iterator(*this));
-
istreambuf_iterator __old = *this;
- __old._M_c = _M_sbuf->sbumpc();
- _M_c = traits_type::eof();
+ __old._M_c = _M_sbuf->sgetc();
+ ++*this;
return __old;
}
@@ -206,8 +202,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_get() const
{
int_type __ret = _M_c;
- if (_M_sbuf && _S_is_eof(__ret) && _S_is_eof(__ret = _M_sbuf->sgetc()))
- _M_sbuf = 0;
+ if (_M_sbuf && _S_is_eof(__ret))
Would it make sense to use __builtin_expect here?
if (__builtin_expect(_M_sbuf && _S_is_eof(__ret), 1))
We should only be calling _M_get() on dereferenceable iterators, so
they should always have non-null _M_sbuf, and __ret is always eof
except for the prvalue returned from the post-increment
operator++(int).
If that might pessimize process(*it++) then we could make it so the
prediction is only on _M_sbuf:
if (__builtin_expect(_M_sbuf, 1) && _S_is_eof(__ret))
Oh no ... because we call _M_get() to check if an iterator is
past-the-end, which we do in operator==, so that happens on every
iterator of a loop like while (it != eof_it) process(*it++);
Drat.
Comparing to the default_sentinel doesn't have that problem, because
we only call _M_at_eof() for the iterator, not for both the iterator
and the sentinel. That's an interesting data point I'd never
considered before: it's better to use the default sentinel here,
because istreambuf_iterator does more than just compare the data
members of two iterators.
We could maybe consider changing _M_at_eof() to do:
return _M_sbuf && _S_is_eof(_M_get());
so that we can have the branch prediction in _M_get() and it would be
correct, because we would only call it for operator* and when we
already know _M_sbuf is not null.
But I don't think we need to do that as part of this patch.
+ __ret = _M_sbuf->sgetc();
return __ret;
}
diff --git a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc
b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc
index 78701d71cee..0318a677f35 100644
--- a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc
+++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc
@@ -109,8 +109,66 @@ void test02(void)
}
}
+void
+test_empty()
+{
+ std::istreambuf_iterator<char> null(0), end;
+ VERIFY( null == end );
+
+ std::istringstream ess;
+ // Thie specification hear seem to indicate, that such iterators
+ // are not end-of-stream iterators, as rdbuf pointer is nullptr,
+ // however we treat them as such, as otherwise we would produce
+ // a range containing single eof character.
Could you mention LWG 2366 here please. I think the deviation from the
standard that you describe is fixing the problem identified in 2366,
but this patch makes the implementation simpler than what I proposed
there.
+ std::istreambuf_iterator<char> it1(ess), it2(ess);
+ VERIFY( it1 == end );
+ VERIFY( it2 == end );
+}
+
+void
+test_multi()
+{
+ // C++98 (and later) define operator* in [istreambuf.iterator.ops] as:
+ // Returns: The character obtained via the streambuf member sbuf_->sgetc().
+ // This nails down behavior of mulptiple iterators to same sequence.
"multiple"
+ std::istringstream iss("abcd");
+ std::istreambuf_iterator<char> it1(iss), it2(iss), end;
+
+ VERIFY( it1 != end );
+ VERIFY( it2 != end );
+ VERIFY( *it1 == 'a' );
+ VERIFY( *it2 == 'a' );
+ ++it1;
+
+ VERIFY( it1 != end );
+ VERIFY( it2 != end );
+ VERIFY( *it1 == 'b' );
+ VERIFY( *it2 == 'b' );
+ ++it2;
+
+ VERIFY( it1 != end );
+ VERIFY( it2 != end );
+ VERIFY( *it1 == 'c' );
+ VERIFY( *it2 == 'c' );
+ ++it2;
+
+ VERIFY( it1 != end );
+ VERIFY( it2 != end );
+ VERIFY( *it1 == 'd' );
+ VERIFY( *it2 == 'd' );
+ // second dereference
+ VERIFY( *it1 == 'd' );
+ VERIFY( *it2 == 'd' );
+ ++it1;
+
+ VERIFY( it1 == end );
+ VERIFY( it2 == end );
+}
+
int main()
{
test02();
+ test_empty();
+ test_multi();
return 0;
}
--
2.52.0