[issue45617] sys.stdin does not iterate correctly on '\r' line separator

2021-10-26 Thread Kelly Brazil


New submission from Kelly Brazil :

When iterating on sys.stdin lines, '\r\n' and '\n' are handled correctly, but 
'\r' is not handled, though it is documented that it should be supported.

Example code:
import sys

for line in sys.stdin:
print(repr(line))

Results in Python 3.8.9:
$ echo -e 'line1\nline2\nline3' | python3 linetest.py 
'line1\n'
'line2\n'
'line3\n'

$ echo -e 'line1\r\nline2\r\nline3' | python3 linetest.py 
'line1\r\n'
'line2\r\n'
'line3\n'

$ echo -e 'line1\rline2\rline3' | python3 linetest.py 
'line1\rline2\rline3\n'

--
messages: 405057
nosy: kbrazil
priority: normal
severity: normal
status: open
title: sys.stdin does not iterate correctly on '\r' line separator
type: behavior
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue45617>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45617] sys.stdin does not iterate correctly on '\r' line separator

2021-10-26 Thread Kelly Brazil


Change by Kelly Brazil :


--
components: +Library (Lib)

___
Python tracker 
<https://bugs.python.org/issue45617>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45617] sys.stdin does not iterate correctly on '\r' line separator

2021-10-27 Thread Kelly Brazil


Kelly Brazil  added the comment:

'\r' support is implicitly documented under the sys.stdin section[0]:

"These streams are regular text files like those returned by the open() 
function. Their parameters are chosen as follows..."

By following the link to the open()[1] docs, it says:

"newline controls how universal newlines mode works (it only applies to text 
mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

When reading input from the stream, if newline is None, universal newlines mode 
is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are 
translated into '\n' before being returned to the caller. If it is '', 
universal newlines mode is enabled, but line endings are returned to the caller 
untranslated. If it has any of the other legal values, input lines are only 
terminated by the given string, and the line ending is returned to the caller 
untranslated."

When inspecting a newly created sys.stdin object I see that it creates an 
instance of _io.TextIOWrapper and its newlines attribute is set to None:

>>> sys.stdin
<_io.TextIOWrapper name='' mode='r' encoding='utf-8'>
>>> print(sys.stdin.newlines)
None

Note: an oddity here is that the attribute name is newlines instead of newline.

Interestingly, when opening STDIN directly it seems to work fine:

import sys
for line in open(0, sys.stdin.mode):
print(repr(line))

Result:
$ echo -e 'line1\rline2\rline3' | python3 linetest.py 
'line1\n'
'line2\n'
'line3\n'

So, perhaps the sys.stdin documentation should be updated to reflect this 
exception or it could be considered a bug to make its behavior consistent?

[0]https://docs.python.org/3/library/sys.html#sys.stdin
[1]https://docs.python.org/3/library/functions.html#open

--

___
Python tracker 
<https://bugs.python.org/issue45617>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45617] sys.stdin does not iterate correctly on '\r' line separator

2021-10-27 Thread Kelly Brazil


Kelly Brazil  added the comment:

Also, I believe this docstring is being inherited, but this is also where it 
seems that '\r' is documented to work with sys.stdin:

>>> print(sys.stdin.__doc__)
Character and line based layer over a BufferedIOBase object, buffer.

encoding gives the name of the encoding that the stream will be
decoded or encoded with. It defaults to locale.getpreferredencoding(False).

errors determines the strictness of encoding and decoding (see
help(codecs.Codec) or the documentation for codecs.register) and
defaults to "strict".

newline controls how line endings are handled. It can be None, '',
'\n', '\r', and '\r\n'.  It works as follows:

* On input, if newline is None, universal newlines mode is
  enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
  these are translated into '\n' before being returned to the
  caller. If it is '', universal newline mode is enabled, but line
  endings are returned to the caller untranslated. If it has any of
  the other legal values, input lines are only terminated by the given
  string, and the line ending is returned to the caller untranslated.

* On output, if newline is None, any '\n' characters written are
  translated to the system default line separator, os.linesep. If
  newline is '' or '\n', no translation takes place. If newline is any
  of the other legal values, any '\n' characters written are translated
  to the given string.

If line_buffering is True, a call to flush is implied when a call to
write contains a newline character.

I understand that sys.stdin is slightly different than an actual file being 
opened in text mode, but the documentation seems to suggest that it works 
pretty much the same. Though, in practice there is a slight difference in 
behavior.

--

___
Python tracker 
<https://bugs.python.org/issue45617>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45617] sys.stdin does not iterate correctly on '\r' line separator

2021-11-01 Thread Kelly Brazil


Kelly Brazil  added the comment:

Are there other scenarios where splitlines behavior deviates from the default 
of newline=None (Universal Newlines)? It seems sys.stdin (on non-Windows OS) is 
the outlier.

All of these use Universal Newlines:
- sys.stdin (on Windows)
- open(0, 'r')
- str.splitlines()

For sake of consistency it seems that sys.stdin on non-Windows should use the 
Universal Newlines behavior. Since the difference in behavior is not 
documented, it is safe to assume users can be confused by this difference.

Also, unless there is a technical reason for the difference, I'm not sure what 
the rationale would be to keep the behavior different. All types of data can be 
piped to STDIN on non-Windows systems. Just because the pipeline is happening 
on unix/linux doesn't mean the data inside conforms to \n newlines.

I believe Universal Newlines should be the default (as with the other 
scenarios) and the user should be able to decide if another newline option 
should be configured.

--

___
Python tracker 
<https://bugs.python.org/issue45617>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com