On Mon, 9 May 2022 at 23:15, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote: > > On Mon, 9 May 2022 21:11:23 +0200, Marco Sulla > <marco.sulla.pyt...@gmail.com> declaimed the following: > > >Nevertheless, tail is a fundamental tool in *nix. It's fast and > >reliable. Also the tail command can't handle different encodings? > > Based upon > https://github.com/coreutils/coreutils/blob/master/src/tail.c the ONLY > thing tail looks at is single byte "\n". It does not handle other line > endings, and appears to performs BINARY I/O, not text I/O. It does nothing > for bytes that are not "\n". Split multi-byte encodings are irrelevant > since, if it does not find enough "\n" bytes in the buffer (chunk) it reads > another binary chunk and seeks for additional "\n" bytes. Once it finds the > desired amount, it is synchronized on the byte following the "\n" (which, > for multi-byte encodings might be a NUL, but in any event, should be a safe > location for subsequent I/O). > > Interpretation of encoding appears to fall to the console driver > configuration when displaying the bytes output by tail.
Ok, I understand. This should be a Python implementation of *nix tail: import os _lf = b"\n" _err_n = "Parameter n must be a positive integer number" _err_chunk_size = "Parameter chunk_size must be a positive integer number" def tail(filepath, n=10, chunk_size=100): if (n <= 0): raise ValueError(_err_n) if (n % 1 != 0): raise ValueError(_err_n) if (chunk_size <= 0): raise ValueError(_err_chunk_size) if (chunk_size % 1 != 0): raise ValueError(_err_chunk_size) n_chunk_size = n * chunk_size pos = os.stat(filepath).st_size chunk_line_pos = -1 lines_not_found = n with open(filepath, "rb") as f: text = bytearray() while pos != 0: pos -= n_chunk_size if pos < 0: pos = 0 f.seek(pos) chars = f.read(n_chunk_size) text[0:0] = chars search_pos = n_chunk_size while search_pos != -1: chunk_line_pos = chars.rfind(_lf, 0, search_pos) if chunk_line_pos != -1: lines_not_found -= 1 if lines_not_found == 0: break search_pos = chunk_line_pos if lines_not_found == 0: break return bytes(text[chunk_line_pos+1:]) The function opens the file in binary mode and searches only for b"\n". It returns the last n lines of the file as bytes. I suppose this function is fast. It reads the bytes from the file in chunks and stores them in a bytearray, prepending them to it. The final result is read from the bytearray and converted to bytes (to be consistent with the read method). I suppose the function is reliable. File is opened in binary mode and only b"\n" is searched as line end, as *nix tail (and python readline in binary mode) do. And bytes are returned. The caller can use them as is or convert them to a string using the encoding it wants, or do whatever its imagination can think :) Finally, it seems to me the function is quite simple. If all my affirmations are true, the three obstacles written by Chris should be passed. I'd very much like to see a CPython implementation of that function. It could be a method of a file object opened in binary mode, and *only* in binary mode. What do you think about it? -- https://mail.python.org/mailman/listinfo/python-list