On Mon, Oct 20, 2014 at 4:18 PM, Marko Rauhamaa <ma...@pacujo.net> wrote: > Dan Stromberg <drsali...@gmail.com>: >> ...then everything acts line buffered, or perhaps even character >> buffered [...] >> >> That, or we're using two different versions of netcat (there are at >> least two available). > > Let's unconfuse the issue a bit. I'll take line buffering, netcat and > the OS out of the picture. > > Here's a character generator (test.sh): > ======================================================================== > while : ; do > echo -n x > sleep 1 > done > ======================================================================== > > and here's a character sink (test.py): > ======================================================================== > import sys > while True: > c = sys.stdin.read(1) > if not c: > break > print(ord(c[0])) > ======================================================================== > > Then, I run: > ======================================================================== > $ bash ./test.sh | python3 ./test.py > 120 > 120 > 120 > 120 > ======================================================================== > > The lines are output at one-second intervals. > > That demonstrates that sys.stdin.read(1) does not block for more than > one character. IOW, there is no buffering whatsoever.
Aren't character-buffered and unbuffered synonymous? Often with TCP protocols, line buffered is preferred to character buffered, both for performance and for simplicity: it doesn't suffer from tinygrams (as much), and telnet becomes a useful test client. Also, it's a straightforward way of framing your data, to avoid getting messed up by Nagle or fragmentation. One might find http://stromberg.dnsalias.org/~strombrg/bufsock.html worth a glance. It's buffered, but it keeps things framed, and doesn't fall prey to tinygrams nearly as much as character buffering. > If I change the sink a bit: "c = sys.stdin.read(5)", I get the same > output but at five-second intervals indicating that sys.stdin.read() > calls the underlying os.read() function five times before returning. In > fact, that conclusion is made explicit by running: > > ======================================================================== > $ bash ./test.sh | strace python3 ./test.py > ... > read(0, "x", 4096) = 1 > read(0, "x", 4096) = 1 > read(0, "x", 4096) = 1 > read(0, "x", 4096) = 1 > read(0, "x", 4096) = 1 > fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0 > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x7f3143bab000 > write(1, "120\n", 4120 > ) = 4 > ... ======================================================================== This is tremendously inefficient. It demands a context switch for every character. > If I modify test.py to call os.read(): > ======================================================================== > import os > while True: > c = os.read(0, 5) > if not c: > break > print(ord(c[0])) > ======================================================================== > > The output is again printed at one-second intervals: no buffering. > > Thus, we are back at my suggestion: use os.read() if you don't want > Python to buffer stdin for you. It's true that Python won't buffer (or will be character-buffered) then, but that takes some potentially-salient elements out of the picture. IOW, I don't think Python reading unbuffered is necessarily the whole issue, and may even be going to far. I have a habit of saying "necessary, but not necessarily sufficient", but in this case I believe it's more of a "not necessarily necessary, and not necessarily sufficient". A lot depends on the other pieces of the puzzle that you've chosen to "unconfuse" away. Yes, you can make Python unbuffered/character-buffered, but that's not the whole story. -- https://mail.python.org/mailman/listinfo/python-list