Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Sat, 16 Apr 2022 at 17:14, Peter J. Holzer wrote: > > On 2022-04-16 16:49:17 +0200, Marco Sulla wrote: > > Furthermore, you didn't answer my simple question: why does the > > security update package contain metadata about Debian patches, if the > > Ubuntu security team did not benefit from Debian security patches but > > only from internal work? > > It DOES NOT contain metadata about Debian patches. You are > misinterpreting the name "debian". The directory has this name because > the tools (dpkg, quilt, etc.) were originally written by the Debian team > for the Debian distribution. Ubuntu uses the same tools. They didn't > bother to rename the directory (why should they?), so the directory is > still called "debian" on Ubuntu (and yes I know this because I've built > numerous .deb packages on Ubuntu systems). Ah ok, now I understand. Sorry for the confusion. -- https://mail.python.org/mailman/listinfo/python-list
tail
What about introducing a method for text streams that reads the lines from the bottom? Java has also a ReversedLinesFileReader with Apache Commons IO. -- https://mail.python.org/mailman/listinfo/python-list
Re: Receive a signal when waking or suspending?
I don't know in Python, but maybe you can create a script that writes on a named pipe and read it from Python? https://askubuntu.com/questions/226278/run-script-on-wakeup -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 23 Apr 2022 at 20:59, Chris Angelico wrote: > > On Sun, 24 Apr 2022 at 04:37, Marco Sulla > wrote: > > > > What about introducing a method for text streams that reads the lines > > from the bottom? Java has also a ReversedLinesFileReader with Apache > > Commons IO. > > It's fundamentally difficult to get precise. In general, there are > three steps to reading the last N lines of a file: > > 1) Find out the size of the file (currently, if it's being grown) > 2) Seek to the end of the file, minus some threshold that you hope > will contain a number of lines > 3) Read from there to the end of the file, split it into lines, and > keep the last N > > Reading the preceding N lines is basically a matter of repeating the > same exercise, but instead of "end of the file", use the byte position > of the line you last read. > > The problem is, seeking around in a file is done by bytes, not > characters. So if you know for sure that you can resynchronize > (possible with UTF-8, not possible with some other encodings), then > you can do this, but it's probably best to build it yourself (opening > the file in binary mode). Well, indeed I have an implementation that does more or less what you described for utf8 only. The only difference is that I just started from the end of file -1. I'm just wondering if this will be useful in the stdlib. I think it's not too difficult to generalise for every encoding. > This is quite inefficient in general. Why inefficient? I think that readlines() will be much slower, not only more time consuming. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 23 Apr 2022 at 23:00, Chris Angelico wrote: > > > This is quite inefficient in general. > > > > Why inefficient? I think that readlines() will be much slower, not > > only more time consuming. > > It depends on which is more costly: reading the whole file (cost > depends on size of file) or reading chunks and splitting into lines > (cost depends on how well you guess at chunk size). If the lines are > all *precisely* the same number of bytes each, you can pick a chunk > size and step backwards with near-perfect efficiency (it's still > likely to be less efficient than reading a file forwards, on most file > systems, but it'll be close); but if you have to guess, adjust, and > keep going, then you lose efficiency there. Emh, why chunks? My function simply reads byte per byte and compares it to b"\n". When it find it, it stops and do a readline(): def tail(filepath): """ @author Marco Sulla @date May 31, 2016 """ try: filepath.is_file fp = str(filepath) except AttributeError: fp = filepath with open(fp, "rb") as f: size = os.stat(fp).st_size start_pos = 0 if size - 1 < 0 else size - 1 if start_pos != 0: f.seek(start_pos) char = f.read(1) if char == b"\n": start_pos -= 1 f.seek(start_pos) if start_pos == 0: f.seek(start_pos) else: for pos in range(start_pos, -1, -1): f.seek(pos) char = f.read(1) if char == b"\n": break return f.readline() This is only for one line and in utf8, but it can be generalised. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 23 Apr 2022 at 23:18, Chris Angelico wrote: > Ah. Well, then, THAT is why it's inefficient: you're seeking back one > single byte at a time, then reading forwards. That is NOT going to > play nicely with file systems or buffers. > > Compare reading line by line over the file with readlines() and you'll > see how abysmal this is. > > If you really only need one line (which isn't what your original post > suggested), I would recommend starting with a chunk that is likely to > include a full line, and expanding the chunk until you have that > newline. Much more efficient than one byte at a time. > Well, I would like to have a sort of tail, so to generalise to more than 1 line. But I think that once you have a good algorithm for one line, you can repeat it N times. I understand that you can read a chunk instead of a single byte, so when the newline is found you can return all the cached chunks concatenated. But will this make the search of the start of the line faster? I suppose you have always to read byte by byte (or more, if you're using urf16 etc) and see if there's a newline. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sun, 24 Apr 2022 at 00:19, Cameron Simpson wrote: > An approach I think you both may have missed: mmap the file and use > mmap.rfind(b'\n') to locate line delimiters. > https://docs.python.org/3/library/mmap.html#mmap.mmap.rfind > Ah, I played very little with mmap, I didn't know about this. So I suppose you can locate the newline and at that point read the line without using chunks? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sun, 24 Apr 2022 at 11:21, Roel Schroeven wrote: > dn schreef op 24/04/2022 om 0:04: > > Disagreeing with @Chris in the sense that I use tail very frequently, > > and usually in the context of server logs - but I'm talking about the > > Linux implementation, not Python code! > If I understand Marco correctly, what he want is to read the lines from > bottom to top, i.e. tac instead of tail, despite his subject. > I use tail very frequently too, but tac is something I almost never use. > Well, the inverse reader is only a secondary suggestion. I suppose a tail is much more useful. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
Something like this is OK? import os def tail(f): chunk_size = 100 size = os.stat(f.fileno()).st_size positions = iter(range(size, -1, -chunk_size)) next(positions) chunk_line_pos = -1 pos = 0 for pos in positions: f.seek(pos) chars = f.read(chunk_size) chunk_line_pos = chars.rfind(b"\n") if chunk_line_pos != -1: break if chunk_line_pos == -1: nbytes = pos pos = 0 f.seek(pos) chars = f.read(nbytes) chunk_line_pos = chars.rfind(b"\n") if chunk_line_pos == -1: line_pos = pos else: line_pos = pos + chunk_line_pos + 1 f.seek(line_pos) return f.readline() This is simply for one line and for utf8. -- https://mail.python.org/mailman/listinfo/python-list
Re: new sorting algorithm
I suppose you should write to python-...@python.org , or in https://discuss.python.org/ under the section Core development -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 2 May 2022 at 18:31, Stefan Ram wrote: > > |The Unicode standard defines a number of characters that > |conforming applications should recognize as line terminators:[7] > | > |LF:Line Feed, U+000A > |VT:Vertical Tab, U+000B > |FF:Form Feed, U+000C > |CR:Carriage Return, U+000D > |CR+LF: CR (U+000D) followed by LF (U+000A) > |NEL: Next Line, U+0085 > |LS:Line Separator, U+2028 > |PS:Paragraph Separator, U+2029 > | > Wikipedia "Newline". Should I suppose that other encodings may have more line ending chars? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
Ok, I suppose \n and \r are enough: readline(size=- 1, /) Read and return one line from the stream. If size is specified, at most size bytes will be read. The line terminator is always b'\n' for binary files; for text files, the newline argument to open() can be used to select the line terminator(s) recognized. open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None) [...] newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n' -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 2 May 2022 at 00:20, Cameron Simpson wrote: > > On 01May2022 18:55, Marco Sulla wrote: > >Something like this is OK? > [...] > >def tail(f): > >chunk_size = 100 > >size = os.stat(f.fileno()).st_size > > I think you want os.fstat(). It's the same from py 3.3 > >chunk_line_pos = -1 > >pos = 0 > > > >for pos in positions: > >f.seek(pos) > >chars = f.read(chunk_size) > >chunk_line_pos = chars.rfind(b"\n") > > > >if chunk_line_pos != -1: > >break > > Normal text file _end_ in a newline. I'd expect this to stop immediately > at the end of the file. I think it's correct. The last line in this case is an empty bytes. > >if chunk_line_pos == -1: > >nbytes = pos > >pos = 0 > >f.seek(pos) > >chars = f.read(nbytes) > >chunk_line_pos = chars.rfind(b"\n") > > I presume this is because unless you're very lucky, 0 will not be a > position in the range(). I'd be inclined to avoid duplicating this code > and special case and instead maybe make the range unbounded and do > something like this: > > if pos < 0: > pos = 0 > ... seek/read/etc ... > if pos == 0: > break > > around the for-loop body. Yes, I was not very happy to duplicate the code... I have to think about it. > Seems sane. I haven't tried to run it. Thank you ^^ -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
I have a little problem. I tried to extend the tail function, so it can read lines from the bottom of a file object opened in text mode. The problem is it does not work. It gets a starting position that is lower than the expected by 3 characters. So the first line is read only for 2 chars, and the last line is missing. import os _lf = "\n" _cr = "\r" _lf_ord = ord(_lf) def tail(f, n=10, chunk_size=100): n_chunk_size = n * chunk_size pos = os.stat(f.fileno()).st_size chunk_line_pos = -1 lines_not_found = n binary_mode = "b" in f.mode lf = _lf_ord if binary_mode else _lf while pos != 0: pos -= n_chunk_size if pos < 0: pos = 0 f.seek(pos) chars = f.read(n_chunk_size) for i, char in enumerate(reversed(chars)): if char == lf: lines_not_found -= 1 if lines_not_found == 0: chunk_line_pos = len(chars) - i - 1 print(chunk_line_pos, i) break if lines_not_found == 0: break line_pos = pos + chunk_line_pos + 1 f.seek(line_pos) res = b"" if binary_mode else "" for i in range(n): res += f.readline() return res Maybe the problem is 1 char != 1 byte? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 7 May 2022 at 01:03, Dennis Lee Bieber wrote: > > Windows also uses for the EOL marker, but Python's I/O system > condenses that to just internally (for TEXT mode) -- so using the > length of a string so read to compute a file position may be off-by-one for > each EOL in the string. So there's no way to reliably read lines in reverse in text mode using seek and read, but the only option is readlines? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 7 May 2022 at 16:08, Barry wrote: > You need to handle the file in bin mode and do the handling of line endings > and encodings yourself. It’s not that hard for the cases you wanted. >>> "\n".encode("utf-16") b'\xff\xfe\n\x00' >>> "".encode("utf-16") b'\xff\xfe' >>> "a\nb".encode("utf-16") b'\xff\xfea\x00\n\x00b\x00' >>> "\n".encode("utf-16").lstrip("".encode("utf-16")) b'\n\x00' Can I use the last trick to get the encoding of a LF or a CR in any encoding? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 7 May 2022 at 19:02, MRAB wrote: > > On 2022-05-07 17:28, Marco Sulla wrote: > > On Sat, 7 May 2022 at 16:08, Barry wrote: > >> You need to handle the file in bin mode and do the handling of line > >> endings and encodings yourself. It’s not that hard for the cases you > >> wanted. > > > >>>> "\n".encode("utf-16") > > b'\xff\xfe\n\x00' > >>>> "".encode("utf-16") > > b'\xff\xfe' > >>>> "a\nb".encode("utf-16") > > b'\xff\xfea\x00\n\x00b\x00' > >>>> "\n".encode("utf-16").lstrip("".encode("utf-16")) > > b'\n\x00' > > > > Can I use the last trick to get the encoding of a LF or a CR in any > > encoding? > > In the case of UTF-16, it's 2 bytes per code unit, but those 2 bytes > could be little-endian or big-endian. > > As you didn't specify which you wanted, it defaulted to little-endian > and added a BOM (U+FEFF). > > If you specify which endianness you want with "utf-16le" or "utf-16be", > it won't add the BOM: > > >>> # Little-endian. > >>> "\n".encode("utf-16le") > b'\n\x00' > >>> # Big-endian. > >>> "\n".encode("utf-16be") > b'\x00\n' Well, ok, but I need a generic method to get LF and CR for any encoding an user can input. Do you think that "\n".encode(encoding).lstrip("".encode(encoding)) is good for any encoding? Furthermore, is there a way to get the encoding of an opened file object? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
I think I've _almost_ found a simpler, general way: import os _lf = "\n" _cr = "\r" def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100): n_chunk_size = n * chunk_size pos = os.stat(filepath).st_size chunk_line_pos = -1 lines_not_found = n with open(filepath, newline=newline, encoding=encoding) as f: text = "" hard_mode = False if newline == None: newline = _lf elif newline == "": hard_mode = True if hard_mode: while pos != 0: pos -= n_chunk_size if pos < 0: pos = 0 f.seek(pos) text = f.read() lf_after = False for i, char in enumerate(reversed(text)): if char == _lf: lf_after == True elif char == _cr: lines_not_found -= 1 newline_size = 2 if lf_after else 1 lf_after = False elif lf_after: lines_not_found -= 1 newline_size = 1 lf_after = False if lines_not_found == 0: chunk_line_pos = len(text) - 1 - i + newline_size break if lines_not_found == 0: break else: while pos != 0: pos -= n_chunk_size if pos < 0: pos = 0 f.seek(pos) text = f.read() for i, char in enumerate(reversed(text)): if char == newline: lines_not_found -= 1 if lines_not_found == 0: chunk_line_pos = len(text) - 1 - i + len(newline) break if lines_not_found == 0: break if chunk_line_pos == -1: chunk_line_pos = 0 return text[chunk_line_pos:] Shortly, the file is always opened in text mode. File is read at the end in bigger and bigger chunks, until the file is finished or all the lines are found. Why? Because in encodings that have more than 1 byte per character, reading a chunk of n bytes, then reading the previous chunk, can eventually split the character between the chunks in two distinct bytes. I think one can read chunk by chunk and test the chunk junction problem. I suppose the code will be faster this way. Anyway, it seems that this trick is quite fast anyway and it's a lot simpler. The final result is read from the chunk, and not from the file, so there's no problems of misalignment of bytes and text. Furthermore, the builtin encoding parameter is used, so this should work with all the encodings (untested). Furthermore, a newline parameter can be specified, as in open(). If it's equal to the empty string, the things are a little more complicated, anyway I suppose the code is clear. It's untested too. I only tested with an utf8 linux file. Do you think there are chances to get this function as a method of the file object in CPython? The method for a file object opened in bytes mode is simpler, since there's no encoding and newline is only \n in that case. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sun, 8 May 2022 at 20:31, Barry Scott wrote: > > > On 8 May 2022, at 17:05, Marco Sulla wrote: > > > > def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100): > >n_chunk_size = n * chunk_size > > Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its typically > the smaller size the file system will allocate. > I tend to read on multiple of MiB as its near instant. Well, I tested on a little file, a list of my preferred pizzas, so > >pos = os.stat(filepath).st_size > > You cannot mix POSIX API with text mode. > pos is in bytes from the start of the file. > Textmode will be in code points. bytes != code points. > > >chunk_line_pos = -1 > >lines_not_found = n > > > >with open(filepath, newline=newline, encoding=encoding) as f: > >text = "" > > > >hard_mode = False > > > >if newline == None: > >newline = _lf > >elif newline == "": > >hard_mode = True > > > >if hard_mode: > >while pos != 0: > >pos -= n_chunk_size > > > >if pos < 0: > >pos = 0 > > > >f.seek(pos) > > In text mode you can only seek to a value return from f.tell() otherwise the > behaviour is undefined. Why? I don't see any recommendation about it in the docs: https://docs.python.org/3/library/io.html#io.IOBase.seek > >text = f.read() > > You have on limit on the amount of data read. I explained that previously. Anyway, chunk_size is small, so it's not a great problem. > >lf_after = False > > > >for i, char in enumerate(reversed(text)): > > Simple use text.rindex('\n') or text.rfind('\n') for speed. I can't use them when I have to find both \n or \r. So I preferred to simplify the code and use the for cycle every time. Take into mind anyway that this is a prototype for a Python C Api implementation (builtin I hope, or a C extension if not) > > Shortly, the file is always opened in text mode. File is read at the end in > > bigger and bigger chunks, until the file is finished or all the lines are > > found. > > It will fail if the contents is not ASCII. Why? > > Why? Because in encodings that have more than 1 byte per character, reading > > a chunk of n bytes, then reading the previous chunk, can eventually split > > the character between the chunks in two distinct bytes. > > No it cannot. text mode only knows how to return code points. Now if you are > in > binary it could be split, but you are not in binary mode so it cannot. >From the docs: seek(offset, whence=SEEK_SET) Change the stream position to the given byte offset. > > Do you think there are chances to get this function as a method of the file > > object in CPython? The method for a file object opened in bytes mode is > > simpler, since there's no encoding and newline is only \n in that case. > > State your requirements. Then see if your implementation meets them. The method should return the last n lines from a file object. If the file object is in text mode, the newline parameter must be honored. If the file object is in binary mode, a newline is always b"\n", to be consistent with readline. I suppose the current implementation of tail satisfies the requirements for text mode. The previous one satisfied binary mode. Anyway, apart from my implementation, I'm curious if you think a tail method is worth it to be a method of the builtin file objects in CPython. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sun, 8 May 2022 at 22:02, Chris Angelico wrote: > > Absolutely not. As has been stated multiple times in this thread, a > fully general approach is extremely complicated, horrifically > unreliable, and hopelessly inefficient. Well, my implementation is quite general now. It's not complicated and inefficient. About reliability, I can't say anything without a test case. > The ONLY way to make this sort > of thing any good whatsoever is to know your own use-case and code to > exactly that. Given the size of files you're working with, for > instance, a simple approach of just reading the whole file would make > far more sense than the complex seeking you're doing. For reading a > multi-gigabyte file, the choices will be different. Apart from the fact that it's very, very simple to optimize for small files: this is, IMHO, a premature optimization. The code is quite fast even if the file is small. Can it be faster? Of course, but it depends on the use case. Every optimization in CPython must pass the benchmark suite test. If there's little or no gain, the optimization is usually rejected. > No, this does NOT belong in the core language. I respect your opinion, but IMHO you think that the task is more complicated than the reality. It seems to me that the method can be quite simple and fast. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sun, 8 May 2022 at 22:34, Barry wrote: > > > On 8 May 2022, at 20:48, Marco Sulla wrote: > > > > On Sun, 8 May 2022 at 20:31, Barry Scott wrote: > >> > >>>> On 8 May 2022, at 17:05, Marco Sulla > >>>> wrote: > >>> > >>> def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100): > >>> n_chunk_size = n * chunk_size > >> > >> Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its > >> typically the smaller size the file system will allocate. > >> I tend to read on multiple of MiB as its near instant. > > > > Well, I tested on a little file, a list of my preferred pizzas, so > > Try it on a very big file. I'm not saying it's a good idea, it's only the value that I needed for my tests. Anyway, it's not a problem with big files. The problem is with files with long lines. > >> In text mode you can only seek to a value return from f.tell() otherwise > >> the behaviour is undefined. > > > > Why? I don't see any recommendation about it in the docs: > > https://docs.python.org/3/library/io.html#io.IOBase.seek > > What does adding 1 to a pos mean? > If it’s binary it mean 1 byte further down the file but in text mode it may > need to > move the point 1, 2 or 3 bytes down the file. Emh. I re-quote seek(offset, whence=SEEK_SET) Change the stream position to the given byte offset. And so on. No mention of differences between text and binary mode. > >> You have on limit on the amount of data read. > > > > I explained that previously. Anyway, chunk_size is small, so it's not > > a great problem. > > Typo I meant you have no limit. > > You read all the data till the end of the file that might be mega bytes of > data. Yes, I already explained why and how it could be optimized. I quote myself: Shortly, the file is always opened in text mode. File is read at the end in bigger and bigger chunks, until the file is finished or all the lines are found. Why? Because in encodings that have more than 1 byte per character, reading a chunk of n bytes, then reading the previous chunk, can eventually split the character between the chunks in two distinct bytes. I think one can read chunk by chunk and test the chunk junction problem. I suppose the code will be faster this way. Anyway, it seems that this trick is quite fast anyway and it's a lot simpler. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > The point here is that text is a very different thing. Because you > cannot seek to an absolute number of characters in an encoding with > variable sized characters. _If_ you did a seek to an arbitrary number > you can end up in the middle of some character. And there are encodings > where you cannot inspect the data to find a character boundary in the > byte stream. Ooook, now I understand what you and Barry mean. I suppose there's no reliable way to tail a big file opened in text mode with a decent performance. Anyway, the previous-previous function I posted worked only for files opened in binary mode, and I suppose it's reliable, since it searches only for b"\n", as readline() in binary mode do. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 9 May 2022 at 19:53, Chris Angelico wrote: > > On Tue, 10 May 2022 at 03:47, Marco Sulla > wrote: > > > > On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > > > > > The point here is that text is a very different thing. Because you > > > cannot seek to an absolute number of characters in an encoding with > > > variable sized characters. _If_ you did a seek to an arbitrary number > > > you can end up in the middle of some character. And there are encodings > > > where you cannot inspect the data to find a character boundary in the > > > byte stream. > > > > Ooook, now I understand what you and Barry mean. I suppose there's no > > reliable way to tail a big file opened in text mode with a decent > > performance. > > > > Anyway, the previous-previous function I posted worked only for files > > opened in binary mode, and I suppose it's reliable, since it searches > > only for b"\n", as readline() in binary mode do. > > It's still fundamentally impossible to solve this in a general way, so > the best way to do things will always be to code for *your* specific > use-case. That means that this doesn't belong in the stdlib or core > language, but in your own toolkit. Nevertheless, tail is a fundamental tool in *nix. It's fast and reliable. Also the tail command can't handle different encodings? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 9 May 2022 at 23:15, Dennis Lee Bieber wrote: > > On Mon, 9 May 2022 21:11:23 +0200, Marco Sulla > declaimed the following: > > >Nevertheless, tail is a fundamental tool in *nix. It's fast and > >reliable. Also the tail command can't handle different encodings? > > Based upon > https://github.com/coreutils/coreutils/blob/master/src/tail.c the ONLY > thing tail looks at is single byte "\n". It does not handle other line > endings, and appears to performs BINARY I/O, not text I/O. It does nothing > for bytes that are not "\n". Split multi-byte encodings are irrelevant > since, if it does not find enough "\n" bytes in the buffer (chunk) it reads > another binary chunk and seeks for additional "\n" bytes. Once it finds the > desired amount, it is synchronized on the byte following the "\n" (which, > for multi-byte encodings might be a NUL, but in any event, should be a safe > location for subsequent I/O). > > Interpretation of encoding appears to fall to the console driver > configuration when displaying the bytes output by tail. Ok, I understand. This should be a Python implementation of *nix tail: import os _lf = b"\n" _err_n = "Parameter n must be a positive integer number" _err_chunk_size = "Parameter chunk_size must be a positive integer number" def tail(filepath, n=10, chunk_size=100): if (n <= 0): raise ValueError(_err_n) if (n % 1 != 0): raise ValueError(_err_n) if (chunk_size <= 0): raise ValueError(_err_chunk_size) if (chunk_size % 1 != 0): raise ValueError(_err_chunk_size) n_chunk_size = n * chunk_size pos = os.stat(filepath).st_size chunk_line_pos = -1 lines_not_found = n with open(filepath, "rb") as f: text = bytearray() while pos != 0: pos -= n_chunk_size if pos < 0: pos = 0 f.seek(pos) chars = f.read(n_chunk_size) text[0:0] = chars search_pos = n_chunk_size while search_pos != -1: chunk_line_pos = chars.rfind(_lf, 0, search_pos) if chunk_line_pos != -1: lines_not_found -= 1 if lines_not_found == 0: break search_pos = chunk_line_pos if lines_not_found == 0: break return bytes(text[chunk_line_pos+1:]) The function opens the file in binary mode and searches only for b"\n". It returns the last n lines of the file as bytes. I suppose this function is fast. It reads the bytes from the file in chunks and stores them in a bytearray, prepending them to it. The final result is read from the bytearray and converted to bytes (to be consistent with the read method). I suppose the function is reliable. File is opened in binary mode and only b"\n" is searched as line end, as *nix tail (and python readline in binary mode) do. And bytes are returned. The caller can use them as is or convert them to a string using the encoding it wants, or do whatever its imagination can think :) Finally, it seems to me the function is quite simple. If all my affirmations are true, the three obstacles written by Chris should be passed. I'd very much like to see a CPython implementation of that function. It could be a method of a file object opened in binary mode, and *only* in binary mode. What do you think about it? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Wed, 11 May 2022 at 22:09, Chris Angelico wrote: > > Have you actually checked those three, or do you merely suppose them to be > true? I only suppose, as I said. I should do some benchmark and some other tests, and, frankly, I don't want to. I don't want to because I'm quite sure the implementation is fast, since it reads by chunks and cache them. I'm not sure it's 100% free of bugs, but the concept is very simple, since it simply mimics the *nix tail, so it should be reliable. > > > I'd very much like to see a CPython implementation of that function. It > > could be a method of a file object opened in binary mode, and *only* in > > binary mode. > > > > What do you think about it? > > Still not necessary. You can simply have it in your own toolkit. Why > should it be part of the core language? Why not? > How much benefit would it be > to anyone else? I suppose that every programmer, at least one time in its life, did a tail. > All the same assumptions are still there, so it still > isn't general It's general. It mimics the *nix tail. I can't think of a more general way to implement a tail. > I don't understand why this wants to be in the standard library. Well, the answer is really simple: I needed it and if I found it in the stdlib, I used it instead of writing the first horrible function. Furthermore, tail is such a useful tool that I suppose many others are interested, based on this quick Google search: https://www.google.com/search?q=python+tail A question on Stackoverflow really much voted, many other Stackoverflow questions, a package that seems to exactly do the same thing, that is mimic *nix tail, and a blog post about how to tail in Python. Furthermore, if you search python tail pypi, you can find a bunch of other packages: https://www.google.com/search?q=python+tail+pypi It seems the subject is quite popular, and I can't imagine otherwise. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Thu, 12 May 2022 at 00:50, Stefan Ram wrote: > > Marco Sulla writes: > >def tail(filepath, n=10, chunk_size=100): > >if (n <= 0): > >raise ValueError(_err_n) > ... > > There's no spec/doc, so one can't even test it. Excuse me, you're very right. """ A function that "tails" the file. If you don't know what that means, google "man tail" filepath: the file path of the file to be "tailed" n: the numbers of lines "tailed" chunk_size: oh don't care, use it as is """ -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
Thank you very much. This helped me to improve the function: import os _lf = b"\n" _err_n = "Parameter n must be a positive integer number" _err_chunk_size = "Parameter chunk_size must be a positive integer number" def tail(filepath, n=10, chunk_size=100): if (n <= 0): raise ValueError(_err_n) if (n % 1 != 0): raise ValueError(_err_n) if (chunk_size <= 0): raise ValueError(_err_chunk_size) if (chunk_size % 1 != 0): raise ValueError(_err_chunk_size) n_chunk_size = n * chunk_size pos = os.stat(filepath).st_size chunk_line_pos = -1 newlines_to_find = n first_step = True with open(filepath, "rb") as f: text = bytearray() while pos != 0: pos -= n_chunk_size if pos < 0: pos = 0 f.seek(pos) chars = f.read(n_chunk_size) text[0:0] = chars search_pos = n_chunk_size while search_pos != -1: chunk_line_pos = chars.rfind(_lf, 0, search_pos) if first_step and chunk_line_pos == search_pos - 1: newlines_to_find += 1 first_step = False if chunk_line_pos != -1: newlines_to_find -= 1 if newlines_to_find == 0: break search_pos = chunk_line_pos if newlines_to_find == 0: break return bytes(text[chunk_line_pos+1:]) On Thu, 12 May 2022 at 20:29, Stefan Ram wrote: > I am not aware of a definition of "line" above, > but the PLR says: > > |A physical line is a sequence of characters terminated > |by an end-of-line sequence. > > . So 10 lines should have 10 end-of-line sequences. > Maybe. Maybe not. What if the file ends with no newline? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Fri, 13 May 2022 at 00:31, Cameron Simpson wrote: > On 12May2022 19:48, Marco Sulla wrote: > >On Thu, 12 May 2022 at 00:50, Stefan Ram wrote: > >> There's no spec/doc, so one can't even test it. > > > >Excuse me, you're very right. > > > >""" > >A function that "tails" the file. If you don't know what that means, > >google "man tail" > > > >filepath: the file path of the file to be "tailed" > >n: the numbers of lines "tailed" > >chunk_size: oh don't care, use it as is > > This is nearly the worst "specification" I have ever seen. > You're lucky. I've seen much worse (or no one). -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Fri, 13 May 2022 at 12:49, <2qdxy4rzwzuui...@potatochowder.com> wrote: > > On 2022-05-13 at 12:16:57 +0200, > Marco Sulla wrote: > > > On Fri, 13 May 2022 at 00:31, Cameron Simpson wrote: > > [...] > > > > This is nearly the worst "specification" I have ever seen. > > > You're lucky. I've seen much worse (or no one). > > At least with *no* documentation, the source code stands for itself. So I did it well to not put one in the first time. I think that after 100 posts about tail, chunks etc it was clear what that stuff was about and how to use it. Speaking about more serious things, so far I've done a test with: * a file that does not end with \n * a file that ends with \n (after Stefan test) * a file with more than 10 lines * a file with less than 10 lines It seemed to work. I've only to benchmark it. I suppose I have to test with at least 1 GB file, a big lorem ipsum, and do an unequal comparison with Linux tail. I'll do it when I have time, so Chris will be no more angry with me. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
Well, I've done a benchmark. >>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, >>> number=10) 1.5963431186974049 >>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, >>> number=10) 2.5240604374557734 >>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", >>> globals={"tail":tail}, number=10) 1.8944984432309866 small.txt is a text file of 1.3 KB. lorem.txt is a lorem ipsum of 1.2 GB. It seems the performance is good, thanks to the chunk suggestion. But the time of Linux tail surprise me: marco@buzz:~$ time tail lorem.txt [text] real0m0.004s user0m0.003s sys0m0.001s It's strange that it's so slow. I thought it was because it decodes and print the result, but I timed timeit.timeit("print(tail('/home/marco/lorem.txt').decode('utf-8'))", globals={"tail":tail}, number=10) and I got ~36 seconds. It seems quite strange to me. Maybe I got the benchmarks wrong at some point? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Wed, 18 May 2022 at 23:32, Cameron Simpson wrote: > > On 17May2022 22:45, Marco Sulla wrote: > >Well, I've done a benchmark. > >>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, > >>>> number=10) > >1.5963431186974049 > >>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, > >>>> number=10) > >2.5240604374557734 > >>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", > >>>> globals={"tail":tail}, number=10) > >1.8944984432309866 > > This suggests that the file size does not dominate uour runtime. Yes, this is what I wanted to test and it seems good. > Ah. > _Or_ that there are similar numbers of newlines vs text in the files so > reading similar amounts of data from the end. If the "line desnity" of > the files were similar you would hope that the runtimes would be > similar. No, well, small.txt has very short lines. Lorem.txt is a lorem ipsum, so really long lines. Indeed I get better results tuning chunk_size. Anyway, also with the default value the performance is not bad at all. > >But the time of Linux tail surprise me: > > > >marco@buzz:~$ time tail lorem.txt > >[text] > > > >real0m0.004s > >user0m0.003s > >sys0m0.001s > > > >It's strange that it's so slow. I thought it was because it decodes > >and print the result, but I timed > > You're measuring different things. timeit() tries hard to measure just > the code snippet you provide. It doesn't measure the startup cost of the > whole python interpreter. Try: > > time python3 your-tail-prog.py /home/marco/lorem.txt Well, I'll try it, but it's not a bit unfair to compare Python startup with C? > BTW, does your `tail()` print output? If not, again not measuring the > same thing. > [...] > Also: does tail(1) do character set / encoding stuff? Does your Python > code do that? Might be apples and oranges. Well, as I wrote I also timed timeit.timeit("print(tail('/home/marco/lorem.txt').decode('utf-8'))", globals={"tail":tail}, number=10) and I got ~36 seconds. > If you have the source of tail(1) to hand, consider getting to the core > and measuring `time()` immediately before and immediately after the > central tail operation and printing the result. IMHO this is a very good idea, but I have to find the time(). Ahah. Emh. -- https://mail.python.org/mailman/listinfo/python-list
Re: Subtract n months from datetime
The package arrow has a simple shift method for months, weeks etc https://arrow.readthedocs.io/en/latest/#replace-shift -- https://mail.python.org/mailman/listinfo/python-list
Why I fail so bad to check for memory leak with this code?
I tried to check for memory leaks in a bunch of functions of mine using a simple decorator. It works, but it fails with this code, returning a random count_diff at every run. Why? import tracemalloc import gc import functools from uuid import uuid4 import pickle def getUuid(): return str(uuid4()) def trace(func): @functools.wraps(func) def inner(): tracemalloc.start() snapshot1 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) for i in range(100): func() gc.collect() snapshot2 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) top_stats = snapshot2.compare_to(snapshot1, 'lineno') tracemalloc.stop() for stat in top_stats: if stat.count_diff > 3: raise ValueError(f"count_diff: {stat.count_diff}") return inner dict_1 = {getUuid(): i for i in range(1000)} @trace def func_76(): pickle.dumps(iter(dict_1)) func_76() -- https://mail.python.org/mailman/listinfo/python-list
Re: Why I fail so bad to check for memory leak with this code?
On Thu, 21 Jul 2022 at 22:28, MRAB wrote: > > It's something to do with pickling iterators because it still occurs > when I reduce func_76 to: > > @trace > def func_76(): > pickle.dumps(iter([])) It's too strange. I found a bunch of true memory leaks with this decorator. It seems to be reliable. It's correct with pickle and with iter, but not when pickling iters. -- https://mail.python.org/mailman/listinfo/python-list
Re: Why I fail so bad to check for memory leak with this code?
This naif code shows no leak: import resource import pickle c = 0 while True: pickle.dumps(iter([])) if (c % 1) == 0: max_rss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss print(f"iteration: {c}, max rss: {max_rss} kb") c += 1 -- https://mail.python.org/mailman/listinfo/python-list
Re: Why I fail so bad to check for memory leak with this code?
I've done this other simple test: #!/usr/bin/env python3 import tracemalloc import gc import pickle tracemalloc.start() snapshot1 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) for i in range(1000): pickle.dumps(iter([])) gc.collect() snapshot2 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) top_stats = snapshot2.compare_to(snapshot1, 'lineno') tracemalloc.stop() for stat in top_stats: print(stat) The result is: /home/marco/sources/test.py:14: size=3339 B (+3339 B), count=63 (+63), average=53 B /home/marco/sources/test.py:9: size=464 B (+464 B), count=1 (+1), average=464 B /home/marco/sources/test.py:10: size=456 B (+456 B), count=1 (+1), average=456 B /home/marco/sources/test.py:13: size=28 B (+28 B), count=1 (+1), average=28 B It seems that, after 10 million loops, only 63 have a leak, with only ~3 KB. It seems to me that we can't call it a leak, no? Probably pickle needs a lot more cycles to be sure there's actually a real leakage. -- https://mail.python.org/mailman/listinfo/python-list
Re: Why I fail so bad to check for memory leak with this code?
On Fri, 22 Jul 2022 at 09:00, Barry wrote: > With code as complex as python’s there will be memory allocations that occur that will not be directly related to the python code you test. > > To put it another way there is noise in your memory allocation signal. > > Usually the signal of a memory leak is very clear, as you noticed. > > For rare leaks I would use a tool like valgrind. Thank you all, but I needed a simple decorator to automatize the memory leak (and segfault) tests. I think that this version is good enough, I hope that can be useful to someone: def trace(iterations=100): def decorator(func): def wrapper(): print( f"Loops: {iterations} - Evaluating: {func.__name__}", flush=True ) tracemalloc.start() snapshot1 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) for i in range(iterations): func() gc.collect() snapshot2 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) top_stats = snapshot2.compare_to(snapshot1, 'lineno') tracemalloc.stop() for stat in top_stats: if stat.count_diff * 100 > iterations: raise ValueError(f"stat: {stat}") return wrapper return decorator If the decorated function fails, you can try to raise the iterations parameter. I found that in my cases sometimes I needed a value of 200 or 300 -- https://mail.python.org/mailman/listinfo/python-list
How to generate a .pyi file for a C Extension using stubgen
I tried to follow the instructions here: https://mypy.readthedocs.io/en/stable/stubgen.html but the instructions about creating a stub for a C Extension are a little mysterious. I tried to use it on the .so file without luck. -- https://mail.python.org/mailman/listinfo/python-list
Re: How to generate a .pyi file for a C Extension using stubgen
On Fri, 29 Jul 2022 at 23:23, Barry wrote: > > > > > On 29 Jul 2022, at 19:33, Marco Sulla wrote: > > > > I tried to follow the instructions here: > > > > https://mypy.readthedocs.io/en/stable/stubgen.html > > > > but the instructions about creating a stub for a C Extension are a little > > mysterious. I tried to use it on the .so file without luck. > > It says that stubgen works on .py files not .so files. > You will need to write the .pyi for your .so manually. > > The docs could do with splitting the need for .pyi for .so > away from the stubgen description. But it says: "Mypy includes the stubgen tool that can automatically generate stub files (.pyi files) for Python modules and C extension modules." I tried stubgen -m modulename, but it generates very little code. -- https://mail.python.org/mailman/listinfo/python-list
Am I banned from Discuss forum?
I was banned from the mailing list and Discuss forum for a very long time. Too much IMHO, but I paid my dues. Now this is my state in the forum: - I never posted something unrespectful in the last months - I have a limitation of three posts per threads, but only on some threads - Some random posts of mine are obscured and must be restored manually by moderators - I opened a thread about the proposal of a new section called Brainstorming. It was closed without a reason. - I can't post links - Two discussions I posted in section Idea were moved to Help, without a single line of explanation. If I'm not appreciated, I want to be publicly banned with a good reason, or at least a reason. -- https://mail.python.org/mailman/listinfo/python-list
Re: use set notation for repr of dict_keys?
On Wed, 24 Feb 2021 at 06:29, Random832 wrote: > I was surprised, though, to find that you can't remove items directly from > the key set, or in general update it in place with &= or -= (these operators > work, but give a new set object). This is because they are a view. Changing the key object means you will change the underlying dict. Probably not that you want or expect. You can just "cast" them into a "real" set object. There was a discussion to implement the whole Set interface for dicts. Currently, only `|` is supported. -- https://mail.python.org/mailman/listinfo/python-list
Re: use set notation for repr of dict_keys?
On Wed, 24 Feb 2021 at 15:02, Random832 wrote: > On Wed, Feb 24, 2021, at 02:59, Marco Sulla wrote: > > On Wed, 24 Feb 2021 at 06:29, Random832 wrote: > > > I was surprised, though, to find that you can't remove items directly > > > from the key set, or in general update it in place with &= or -= (these > > > operators work, but give a new set object). > > > > This is because they are a view. Changing the key object means you > > will change the underlying dict. Probably not that you want or expect. > > Why wouldn't it be what I want or expect? Java allows exactly this I didn't know this. I like Java, but IMHO it's quite confusing that you can remove a key from a Map using the keys object. In my mind it's more natural to think views as read-only, while changes can be done only using the original object. But maybe my mind has too strict bounds. > [and it's the only way provided to, for example, remove all keys matching a > predicate in a single pass... an operation that Python sets don't support > either] I hope indeed that someday Python can do: filtered_dict = a_dict - a_set -- https://mail.python.org/mailman/listinfo/python-list
Re: editor recommendations?
I use Sublime free for simple tasks. I like the fact it's fast and it saves to disk immediately. You don't have even to name the file. I use it also for taking notes. Probably not as powerful as Vim and it's proprietary. For development, I use PyCharm, but it's an IDE. I also used in past: gedit: slow atom: slow notepad++: windows only emacs: too much for my needs scite: too minimalist kate: not bad at all visual studio: resource intensive eclipse: slow (even if I continue to use it for non-Python coding) -- https://mail.python.org/mailman/listinfo/python-list
Re: weirdness with list()
On Sun, 28 Feb 2021 at 01:19, Cameron Simpson wrote: > My object represents an MDAT box in an MP4 file: it is the ludicrously > large data box containing the raw audiovideo data; for a TV episode it > is often about 2GB and a movie is often 4GB to 6GB. > [...] > That length is presented via the object's __len__ method > [...] > > I noticed that it was stalling, and investigation revealed it was > stalling at this line: > > subboxes = list(self) > > when doing the MDAT box. That box (a) has no subboxes at all and (b) has > a very large __len__ value. > > BUT... It also has a __iter__ value, which like any Box iterates over > the subboxes. For MDAT that is implemented like this: > > def __iter__(self): > yield from () > > What I was expecting was pretty much instant construction of an empty > list. What I was getting was a very time consuming (10 seconds or more) > construction of an empty list. I can't reproduce, Am I missing something? marco@buzz:~$ python3 Python 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> class A: ... def __len__(self): ... return 1024**3 ... def __iter__(self): ... yield from () ... >>> a = A() >>> len(a) 1073741824 >>> list(a) [] >>> It takes milliseconds to run list(a) -- https://mail.python.org/mailman/listinfo/python-list
Why assert is not a function?
I have a curiosity. Python, as many languages, has assert as a keyword. Can't it be implemented as a function? Is there an advantage to have it as a keyword? -- https://mail.python.org/mailman/listinfo/python-list
Re: yield from () Was: Re: weirdness with list()
On Mon, 1 Mar 2021 at 19:51, Alan Gauld via Python-list wrote: > Sorry, a bit OT but I'm curious. I haven't seen > this before: > > yield from () > > What is it doing? > What do the () represent in this context? It's the empty tuple. -- https://mail.python.org/mailman/listinfo/python-list
How to create both a c extension and a pure python package
As title. Currently I ended up using this trick in my setup.py: if len(argv) > 1 and argv[1] == "c": sys.argv = [sys.argv[0]] + sys.argv[2:] setuptools.setup(ext_modules = ext_modules, **common_setup_args) else: setuptools.setup(**common_setup_args) So if I pass "c" as the first argument of ./setup.py , the c extension is builded, otherwise the py version is packaged. Is there not a better way to do this? -- https://mail.python.org/mailman/listinfo/python-list
Re: How to create both a c extension and a pure python package
On Wed, 10 Mar 2021 at 16:45, Thomas Jollans wrote: > Why are you doing this? > > If all you want is for it to be possible to install the package from > source on a system that can't use the C part, you could just declare > your extension modules optional Because I want to provide (at least) two wheels: a wheel for linux users with the C extension compiled and a generic wheel in pure python as a fallback for any other architecture. If I make the extension optional, as far as I know, only one wheel is produced: the wheel with the extension if all is successful, or the pure py wheel. -- https://mail.python.org/mailman/listinfo/python-list
Re: Why assert is not a function?
On Thu, 11 Mar 2021 at 23:11, Ethan Furman wrote: > Basically, you are looking at two different philosophies: > > - Always double check, get good error message when something fails > > vs > > - check during testing and QA, turn off double-checks for production for best > performance possible. In a perfect world, I said the second option is the best. But for the majority of projects I contributed, speed was not a critical issue. On the contrary, it's very hard to get meaningful informations about problems in production, so I'm in favour of the first school :) -- https://mail.python.org/mailman/listinfo/python-list
How to support annotations for a custom type in a C extension?
I created a custom dict in a C extension. Name it `promethea`. How can I implement `promethea[str, str]`? Now I get: TypeError: 'type' object is not subscriptable -- https://mail.python.org/mailman/listinfo/python-list
Re: How to support annotations for a custom type in a C extension?
Ooook. I have a question. Why is this code not present in dictobject.c? Where are the dict annotations implemented? On Sat, 18 Sept 2021 at 03:00, MRAB wrote: > > On 2021-09-17 21:03, Marco Sulla wrote: > > I created a custom dict in a C extension. Name it `promethea`. How can > > I implement `promethea[str, str]`? Now I get: > > > > TypeError: 'type' object is not subscriptable > > > Somewhere you'll have a table of the class's methods. It needs an entry > like this: > > > static PyMethodDef customdict_methods[] = { > ... > {"__class_getitem__", (PyCFunction)Py_GenericAlias, METH_CLASS | > METH_O | METH_COEXIST, PyDoc_STR("See PEP 585")}, > ... > }; > > > Note the flags: METH_CLASS says that it's a class method and > METH_COEXIST says that it should use this method instead of the slot. > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Python C API: how to mark a type as subclass of another type
I have two types declared as PyTypeObject PyX_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) etc. How can I mark one of the types as subclass of the other one? I tried to use tp_base but it didn't work. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python C API: how to mark a type as subclass of another type
I already added the address of the type to tp_base, but it does not work. On Mon, 1 Nov 2021 at 17:18, Dieter Maurer wrote: > > Marco Sulla wrote at 2021-10-31 23:59 +0100: > >I have two types declared as > > > >PyTypeObject PyX_Type = { > >PyVarObject_HEAD_INIT(&PyType_Type, 0) > > > >etc. > > > >How can I mark one of the types as subclass of the other one? I tried > >to use tp_base but it didn't work. > > Read the "Python/C Api" documentation. Watch out for `tp_base`. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python C API: how to mark a type as subclass of another type
*ahem* evidently I didn't check the right package. it works like a charme :D On Tue, 2 Nov 2021 at 13:43, Marco Sulla wrote: > > I already added the address of the type to tp_base, but it does not work. > > On Mon, 1 Nov 2021 at 17:18, Dieter Maurer wrote: > > > > Marco Sulla wrote at 2021-10-31 23:59 +0100: > > >I have two types declared as > > > > > >PyTypeObject PyX_Type = { > > >PyVarObject_HEAD_INIT(&PyType_Type, 0) > > > > > >etc. > > > > > >How can I mark one of the types as subclass of the other one? I tried > > >to use tp_base but it didn't work. > > > > Read the "Python/C Api" documentation. Watch out for `tp_base`. -- https://mail.python.org/mailman/listinfo/python-list
Py_IS_TYPE(op, &PyDict_Type) does not work on MacOS
As you can read here: https://github.com/Marco-Sulla/python-frozendict/issues/37 Py_IS_TYPE(op, &PyDict_Type) did not work on MacOS. I had to use PyDict_Check. Why don't I have this problem with Linux? PS: since I'm creating a modified version of dict, I copied the dict internal source and I link against them. Maybe the problem is correlated. -- https://mail.python.org/mailman/listinfo/python-list
Re: Py_IS_TYPE(op, &PyDict_Type) does not work on MacOS
Indeed now I use PyDict_Check, but anyway it's very strange that Py_IS_TYPE(op, &PyDict_Type) does not work only on MacOS. On Mon, 8 Nov 2021 at 19:30, Barry wrote: > > > > On 8 Nov 2021, at 07:45, Marco Sulla wrote: > > As you can read here: > > https://github.com/Marco-Sulla/python-frozendict/issues/37 > > Py_IS_TYPE(op, &PyDict_Type) did not work on MacOS. I had to use PyDict_Check. > > Why don't I have this problem with Linux? > > PS: since I'm creating a modified version of dict, I copied the dict > internal source and I link against them. Maybe the problem is > correlated. > > > You can see what I did for PyCXX at > https://sourceforge.net/p/cxx/code/HEAD/tree/trunk/CXX/Src/IndirectPythonInterface.cxx > > See the _DictCheck and ends up using PyObject_IsInstance. > > My guess is that use PyDict_Check is a good and better for the future. > > Barry > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Unable to compile my C Extension on Windows: unresolved external link errors
I have no problem compiling my extension under Linux and MacOS. Under Windows, I get a lot of Error LNK2001: unresolved external symbol PyErr_SetObject and so on. I post the part of my setup.py about the C Extension: extra_compile_args = ["-DPY_SSIZE_T_CLEAN", "-DPy_BUILD_CORE"] undef_macros = [] setuptools.Extension( ext1_fullname, sources = cpython_sources, include_dirs = cpython_include_dirs, extra_compile_args = extra_compile_args, undef_macros = undef_macros, ) Here is the full code: https://github.com/Marco-Sulla/python-frozendict/blob/master/setup.py Steps to reproduce: I installed python3.10 and VS compiler on my Windows 10 machine, then I created a venv, activated it, run pip install -U pip setuptools wheel and then python setup.py bdist_wheel -- https://mail.python.org/mailman/listinfo/python-list
Re: Unable to compile my C Extension on Windows: unresolved external link errors
On Fri, 12 Nov 2021 at 15:55, Gisle Vanem wrote: > Marco Sulla wrote: > > Error LNK2001: unresolved external symbol PyErr_SetObject > > > > and so on. > > > > I post the part of my setup.py about the C Extension: > > > > extra_compile_args = ["-DPY_SSIZE_T_CLEAN", "-DPy_BUILD_CORE"] > > Shouldn't this be "-DPy_BUILD_CORE_MODULE"? I tried it, but now I get three error C2099: initializer is not a constant when I try to compile dictobject.c. Yes, my extension needs dictobject. -- https://mail.python.org/mailman/listinfo/python-list
Re: Unable to compile my C Extension on Windows: unresolved external link errors
Chris? Maybe I'm dreaming X-D On Fri, 12 Nov 2021 at 17:38, Chris Angelico wrote: > Are you sure that you really need Py_BUILD_CORE? Yes, because I need the internal functions of `dict`. So I need to compile also dictobject.c and include it. So I need that flag. This is the code: https://github.com/Marco-Sulla/python-frozendict.git On Linux and MacOS it works like a charme. On Windows, it seems it does not find python3.lib. I also added its path to the PATH variable -- https://mail.python.org/mailman/listinfo/python-list
Re: Unable to compile my C Extension on Windows: unresolved external link errors
On Fri, 12 Nov 2021 at 21:09, Chris Angelico wrote: > > On Sat, Nov 13, 2021 at 7:01 AM Marco Sulla > wrote: > > On Fri, 12 Nov 2021 at 17:38, Chris Angelico wrote: > > > Are you sure that you really need Py_BUILD_CORE? > > > > Yes, because I need the internal functions of `dict`. So I need to > > compile also dictobject.c and include it. So I need that flag. > > > > This is the code: > > > > https://github.com/Marco-Sulla/python-frozendict.git > > > > Ah, gotcha. > > Unfortunately that does mean you're delving deep into internals, and a > lot of stuff that isn't designed for extensions to use. So my best > recommendation is: dig even deeper into internals, and duplicate how > the core is doing things (maybe including another header or > something). It may be that, by declaring Py_BUILD_CORE, you're getting > a macro version of that instead of the normal exported function. I've not understood what I have to do in practice but anyway, as I said, Py_BUILD_CORE works on Linux and MacOS. And it works also on Windows. Indeed dictobject.c is compiled. The only problem is in the linking phase, when the two objects should be linked in one library, _the_ library. It seems that on Windows it doesn't find python3.lib, even if I put it in the path. So I get the `unresolved external link` errors. -- https://mail.python.org/mailman/listinfo/python-list
Re: Unable to compile my C Extension on Windows: unresolved external link errors
. Sorry, the problem is I downloaded the 32 bit version of VS compiler and 64 bit version of Python.. On Sat, 13 Nov 2021 at 11:10, Barry Scott wrote: > > > > > On 13 Nov 2021, at 09:00, Barry wrote: > > > > > > > >> On 12 Nov 2021, at 22:53, Marco Sulla wrote: > >> > >> It seems that on Windows it doesn't find python3.lib, > >> even if I put it in the path. So I get the `unresolved external link` > >> errors. > > > > I think you need the python310.lib (not sure of file name) to get to the > > internal symbols. > > Another thing that you will need to check is that the symbols you are after > have been > exposed in the DLL at all. Being external in the source is not enough they > also have to > listed in the .DLL's def file ( is that the right term?) as well. > > If its not clear yet, you are going to have to read a lot or source code and > understand > the tool chain used on Windows to solve this. > > > > > > You can use the objdump(?) utility to check that the symbols are in the lib. > > > > Barry > > Barry > -- https://mail.python.org/mailman/listinfo/python-list
Re: Unable to compile my C Extension on Windows: unresolved external link errors
Okay, now the problem seems to be another: I get the same "unresolved external link" errors, but only for internal functions. This seems quite normal. The public .lib does not expose the internals of Python. The strange fact is: why can I compile it on Linux and MacOS? Their external libraries expose the internal functions? Anyway, is there a way to compile Python on Windows in such a way that I get a shared library that exposes all the functions? On Sat, 13 Nov 2021 at 12:17, Marco Sulla wrote: > > . Sorry, the problem is I downloaded the 32 bit version of VS > compiler and 64 bit version of Python.. > > On Sat, 13 Nov 2021 at 11:10, Barry Scott wrote: > > > > > > > > > On 13 Nov 2021, at 09:00, Barry wrote: > > > > > > > > > > > >> On 12 Nov 2021, at 22:53, Marco Sulla > > >> wrote: > > >> > > >> It seems that on Windows it doesn't find python3.lib, > > >> even if I put it in the path. So I get the `unresolved external link` > > >> errors. > > > > > > I think you need the python310.lib (not sure of file name) to get to the > > > internal symbols. > > > > Another thing that you will need to check is that the symbols you are after > > have been > > exposed in the DLL at all. Being external in the source is not enough they > > also have to > > listed in the .DLL's def file ( is that the right term?) as well. > > > > If its not clear yet, you are going to have to read a lot or source code > > and understand > > the tool chain used on Windows to solve this. > > > > > > > > > > You can use the objdump(?) utility to check that the symbols are in the > > > lib. > > > > > > Barry > > > > Barry > > -- https://mail.python.org/mailman/listinfo/python-list
Re: Unable to compile my C Extension on Windows: unresolved external link errors
On Sun, 14 Nov 2021 at 16:42, Barry Scott wrote: > > Sorry iPad sent the message before it was complete... > > > On 14 Nov 2021, at 10:38, Marco Sulla wrote: > > > > Okay, now the problem seems to be another: I get the same "unresolved > > external link" errors, but only for internal functions. > > > > This seems quite normal. The public .lib does not expose the internals > > of Python. > > The strange fact is: why can I compile it on Linux and MacOS? Their > > external libraries expose the internal functions? > > Windows is not Linux is not macOS, > The toolchain on each OS has its own strengths, weaknesses and quirks. > > On Windows DLLs only allow access to the symbols that are explicitly listed > to be access. Where are those symbols listed? > On macOS .dynlib and Unix .so its being extern that does this. And extern is the default. I understand now. > Maybe you could copy the code that you want and add it to your code? > Change any conflicting symbols of course. It's quite hard. I have to compile dictobject.c, which needs a lot of internal functions. And I suppose that every internal function may require 1 or more other internal functions. I have other two other solutions: * compile a whole python DLL with the symbols I need and link against it. I have to put this DLL in my code, which is ugly. * drop the support of the C Extension for Windows users and make for them the slow, pure py version only. Since my interest in Windows now is near to zero, I think I'll opt for the third for now. -- https://mail.python.org/mailman/listinfo/python-list
Re: How to support annotations for a custom type in a C extension?
It works. Thanks a lot. On Sun, 19 Sept 2021 at 19:23, Serhiy Storchaka wrote: > > 19.09.21 05:59, MRAB пише: > > On 2021-09-18 16:09, Serhiy Storchaka wrote: > >> "(PyCFunction)" is redundant, Py_GenericAlias already has the right > >> type. Overuse of casting to PyCFunction can hide actual bugs. > >> > > I borrowed that from listobject.c, which does have the cast. > > Fixed. https://github.com/python/cpython/pull/28450 > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
pytest segfault, not with -v
I have a battery of tests done with pytest. My tests break with a segfault if I run them normally. If I run them using pytest -v, the segfault does not happen. What could cause this quantical phenomenon? -- https://mail.python.org/mailman/listinfo/python-list
Re: getting source code line of error?
Have you tried the logger module and the format options? On Fri, 19 Nov 2021 at 19:09, Ulli Horlacher wrote: > > I am trying to get the source code line of the last error. > I know traceback.format_exc() but this contains much more information, e.g.: > > Traceback (most recent call last): > File "./error.py", line 18, in main > x=1/0 > ZeroDivisionError: division by zero > > I could extract the source code line with re.search(), but is there an > easier way? > > > I have: > > exc_type,exc_str,exc_tb = sys.exc_info() > fname = exc_tb.tb_frame.f_code.co_filename > line = exc_tb.tb_lineno > print('%s in %s line %d' % (exc_str,fname,line)) > > But I also want to output the line itself, not only its number. > > -- > Ullrich Horlacher Server und Virtualisierung > Rechenzentrum TIK > Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de > Allmandring 30aTel:++49-711-68565868 > 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
frozenset can be altered by |=
(venv_3_10) marco@buzz:~$ python Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18) [GCC 10.1.1 20200718] on linux Type "help", "copyright", "credits" or "license" for more information. >>> a = frozenset((3, 4)) >>> a frozenset({3, 4}) >>> a |= {5,} >>> a frozenset({3, 4, 5}) -- https://mail.python.org/mailman/listinfo/python-list
Re: frozenset can be altered by |=
Mh. Now I'm thinking that I've done a = "Marco " a += "Sulla" many times without bothering. On Fri, 19 Nov 2021 at 22:22, Chris Angelico wrote: > > On Sat, Nov 20, 2021 at 8:16 AM Chris Angelico wrote: > > > > On Sat, Nov 20, 2021 at 8:13 AM Marco Sulla > > wrote: > > > > > > (venv_3_10) marco@buzz:~$ python > > > Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18) > > > [GCC 10.1.1 20200718] on linux > > > Type "help", "copyright", "credits" or "license" for more information. > > > >>> a = frozenset((3, 4)) > > > >>> a > > > frozenset({3, 4}) > > > >>> a |= {5,} > > > >>> a > > > frozenset({3, 4, 5}) > > > > That's the same as how "x = 4; x += 1" can "alter" four into five. > > > > >>> a = frozenset((3, 4)) > > >>> id(a), a > > (140545764976096, frozenset({3, 4})) > > >>> a |= {5,} > > >>> id(a), a > > (140545763014944, frozenset({3, 4, 5})) > > > > It's a different frozenset. > > > > Oh, even better test: > > >>> a = frozenset((3, 4)); b = a > >>> id(a), a, id(b), b > (140602825123296, frozenset({3, 4}), 140602825123296, frozenset({3, 4})) > >>> a |= {5,} > >>> id(a), a, id(b), b > (140602825254144, frozenset({3, 4, 5}), 140602825123296, frozenset({3, 4})) > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: pytest segfault, not with -v
On Fri, 19 Nov 2021 at 20:38, MRAB wrote: > > On 2021-11-19 17:48, Marco Sulla wrote: > > I have a battery of tests done with pytest. My tests break with a > > segfault if I run them normally. If I run them using pytest -v, the > > segfault does not happen. > > > > What could cause this quantical phenomenon? > > > Are you testing an extension that you're compiling? That kind of problem > can occur if there's an uninitialised variable or incorrect reference > counting (Py_INCREF/Py_DECREF). Ok, I know. But why can't it be reproduced if I do pytest -v? This way I don't know which test fails. Furthermore I noticed that if I remove the __pycache__ dir of tests, pytest does not crash, until I re-ran it with the __pycache__ dir present. This way is very hard for me to understand what caused the segfault. I'm starting to think pytest is not good for testing C extensions. -- https://mail.python.org/mailman/listinfo/python-list
Re: pytest segfault, not with -v
Indeed I have introduced a command line parameter in my bench.py script that simply specifies the number of times the benchmarks are performed. This way I have a sort of segfault checker. But I don't bench any part of the library. I suppose I have to create a separate script that does a simple loop for all the cases, and remove the optional parameter from bench. How boring. PS: is there a way to monitor the Python consumed memory inside Python itself? In this way I could also trap memory leaks. On Sat, 20 Nov 2021 at 01:46, MRAB wrote: > > On 2021-11-19 23:44, Marco Sulla wrote: > > On Fri, 19 Nov 2021 at 20:38, MRAB wrote: > >> > >> On 2021-11-19 17:48, Marco Sulla wrote: > >> > I have a battery of tests done with pytest. My tests break with a > >> > segfault if I run them normally. If I run them using pytest -v, the > >> > segfault does not happen. > >> > > >> > What could cause this quantical phenomenon? > >> > > >> Are you testing an extension that you're compiling? That kind of problem > >> can occur if there's an uninitialised variable or incorrect reference > >> counting (Py_INCREF/Py_DECREF). > > > > Ok, I know. But why can't it be reproduced if I do pytest -v? This way > > I don't know which test fails. > > Furthermore I noticed that if I remove the __pycache__ dir of tests, > > pytest does not crash, until I re-ran it with the __pycache__ dir > > present. > > This way is very hard for me to understand what caused the segfault. > > I'm starting to think pytest is not good for testing C extensions. > > > If there are too few Py_INCREF or too many Py_DECREF, it'll free the > object too soon, and whether or when that will cause a segfault will > depend on whatever other code is running. That's the nature of the > beast: it's unpredictable! > > You could try running each of the tests in a loop to see which one > causes a segfault. (Trying several in a loop will let you narrow it down > more quickly.) > > pytest et al. are good for testing behaviour, but not for narrowing down > segfaults. > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
No right operator in tp_as_number?
I checked the documentation: https://docs.python.org/3/c-api/typeobj.html#number-structs and it seems that, in the Python C API, the right operators do not exist. For example, there is nb_add, that in Python is __add__, but there's no nb_right_add, that in Python is __radd__ Am I missing something? -- https://mail.python.org/mailman/listinfo/python-list
Re: pytest segfault, not with -v
I know how to check the refcounts, but I don't know how to check the memory usage, since it's not a program, it's a simple library. Is there not a way to check inside Python the memory usage? I have to use a bash script (I'm on Linux)? On Sat, 20 Nov 2021 at 19:00, MRAB wrote: > > On 2021-11-20 17:40, Marco Sulla wrote: > > Indeed I have introduced a command line parameter in my bench.py > > script that simply specifies the number of times the benchmarks are > > performed. This way I have a sort of segfault checker. > > > > But I don't bench any part of the library. I suppose I have to create > > a separate script that does a simple loop for all the cases, and > > remove the optional parameter from bench. How boring. > > PS: is there a way to monitor the Python consumed memory inside Python > > itself? In this way I could also trap memory leaks. > > > I'm on Windows 10, so I debug in Microsoft Visual Studio. I also have a > look at the memory usage in Task Manager. If the program uses more > memory when there are more iterations, then that's a sign of a memory > leak. For some objects I'd look at the reference count to see if it's > increasing or decreasing for each iteration when it should be constant > over time. > > > On Sat, 20 Nov 2021 at 01:46, MRAB wrote: > >> > >> On 2021-11-19 23:44, Marco Sulla wrote: > >> > On Fri, 19 Nov 2021 at 20:38, MRAB wrote: > >> >> > >> >> On 2021-11-19 17:48, Marco Sulla wrote: > >> >> > I have a battery of tests done with pytest. My tests break with a > >> >> > segfault if I run them normally. If I run them using pytest -v, the > >> >> > segfault does not happen. > >> >> > > >> >> > What could cause this quantical phenomenon? > >> >> > > >> >> Are you testing an extension that you're compiling? That kind of problem > >> >> can occur if there's an uninitialised variable or incorrect reference > >> >> counting (Py_INCREF/Py_DECREF). > >> > > >> > Ok, I know. But why can't it be reproduced if I do pytest -v? This way > >> > I don't know which test fails. > >> > Furthermore I noticed that if I remove the __pycache__ dir of tests, > >> > pytest does not crash, until I re-ran it with the __pycache__ dir > >> > present. > >> > This way is very hard for me to understand what caused the segfault. > >> > I'm starting to think pytest is not good for testing C extensions. > >> > > >> If there are too few Py_INCREF or too many Py_DECREF, it'll free the > >> object too soon, and whether or when that will cause a segfault will > >> depend on whatever other code is running. That's the nature of the > >> beast: it's unpredictable! > >> > >> You could try running each of the tests in a loop to see which one > >> causes a segfault. (Trying several in a loop will let you narrow it down > >> more quickly.) > >> > >> pytest et al. are good for testing behaviour, but not for narrowing down > >> segfaults. > > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: frozenset can be altered by |=
Yes, and you do this regularly. Indeed integers, for example, are immutables and a = 0 a += 1 is something you do dozens of times, and you simply don't think that another object is created and substituted for the variable named `a`. On Mon, 22 Nov 2021 at 14:59, Chris Angelico wrote: > > On Tue, Nov 23, 2021 at 12:52 AM David Raymond > wrote: > > It is a little confusing since the docs list this in a section that says > > they don't apply to frozensets, and lists the two versions next to each > > other as the same thing. > > > > https://docs.python.org/3.9/library/stdtypes.html#set-types-set-frozenset > > > > The following table lists operations available for set that do not apply to > > immutable instances of frozenset: > > > > update(*others) > > set |= other | ... > > > > Update the set, adding elements from all others. > > Yeah, it's a little confusing, but at the language level, something > that doesn't support |= will implicitly support it using the expanded > version: > > a |= b > a = a | b > > and in the section above, you can see that frozensets DO support the > Or operator. > > By not having specific behaviour on the |= operator, frozensets > implicitly fall back on this default. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: frozenset can be altered by |=
I must say that I'm reading the documentation now, and it's a bit confusing. In the docs, inplace operators as |= should not work. They are listed under the set-only functions and operators. But, as we saw, this is not completely true: they work but they don't mutate the original object. The same for += and *= that are listed under `list` only. On Mon, 22 Nov 2021 at 19:54, Marco Sulla wrote: > > Yes, and you do this regularly. Indeed integers, for example, are immutables > and > > a = 0 > a += 1 > > is something you do dozens of times, and you simply don't think that > another object is created and substituted for the variable named `a`. > > On Mon, 22 Nov 2021 at 14:59, Chris Angelico wrote: > > > > On Tue, Nov 23, 2021 at 12:52 AM David Raymond > > wrote: > > > It is a little confusing since the docs list this in a section that says > > > they don't apply to frozensets, and lists the two versions next to each > > > other as the same thing. > > > > > > https://docs.python.org/3.9/library/stdtypes.html#set-types-set-frozenset > > > > > > The following table lists operations available for set that do not apply > > > to immutable instances of frozenset: > > > > > > update(*others) > > > set |= other | ... > > > > > > Update the set, adding elements from all others. > > > > Yeah, it's a little confusing, but at the language level, something > > that doesn't support |= will implicitly support it using the expanded > > version: > > > > a |= b > > a = a | b > > > > and in the section above, you can see that frozensets DO support the > > Or operator. > > > > By not having specific behaviour on the |= operator, frozensets > > implicitly fall back on this default. > > > > ChrisA > > -- > > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: pytest segfault, not with -v
Ok, I created the script: https://github.com/Marco-Sulla/python-frozendict/blob/master/test/debug.py The problem is it does _not_ crash, while a get a segfault using pytest with python 3.9 on MacOS 10.15 Maybe it's because I'm using eval / exec in the script? On Sat, 20 Nov 2021 at 18:40, Marco Sulla wrote: > > Indeed I have introduced a command line parameter in my bench.py > script that simply specifies the number of times the benchmarks are > performed. This way I have a sort of segfault checker. > > But I don't bench any part of the library. I suppose I have to create > a separate script that does a simple loop for all the cases, and > remove the optional parameter from bench. How boring. > PS: is there a way to monitor the Python consumed memory inside Python > itself? In this way I could also trap memory leaks. > > On Sat, 20 Nov 2021 at 01:46, MRAB wrote: > > > > On 2021-11-19 23:44, Marco Sulla wrote: > > > On Fri, 19 Nov 2021 at 20:38, MRAB wrote: > > >> > > >> On 2021-11-19 17:48, Marco Sulla wrote: > > >> > I have a battery of tests done with pytest. My tests break with a > > >> > segfault if I run them normally. If I run them using pytest -v, the > > >> > segfault does not happen. > > >> > > > >> > What could cause this quantical phenomenon? > > >> > > > >> Are you testing an extension that you're compiling? That kind of problem > > >> can occur if there's an uninitialised variable or incorrect reference > > >> counting (Py_INCREF/Py_DECREF). > > > > > > Ok, I know. But why can't it be reproduced if I do pytest -v? This way > > > I don't know which test fails. > > > Furthermore I noticed that if I remove the __pycache__ dir of tests, > > > pytest does not crash, until I re-ran it with the __pycache__ dir > > > present. > > > This way is very hard for me to understand what caused the segfault. > > > I'm starting to think pytest is not good for testing C extensions. > > > > > If there are too few Py_INCREF or too many Py_DECREF, it'll free the > > object too soon, and whether or when that will cause a segfault will > > depend on whatever other code is running. That's the nature of the > > beast: it's unpredictable! > > > > You could try running each of the tests in a loop to see which one > > causes a segfault. (Trying several in a loop will let you narrow it down > > more quickly.) > > > > pytest et al. are good for testing behaviour, but not for narrowing down > > segfaults. > > -- > > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: pytest segfault, not with -v
Emh, maybe I was not clear. I created a C extension and it segfaults. So I created that script to see where it segfaults. But the script does not segfault. My doubt is: is that because I'm using eval and exec in the script? On Sat, 18 Dec 2021 at 18:33, Dieter Maurer wrote: > > Marco Sulla wrote at 2021-12-18 14:10 +0100: > >Ok, I created the script: > > > >https://github.com/Marco-Sulla/python-frozendict/blob/master/test/debug.py > > > >The problem is it does _not_ crash, while a get a segfault using > >pytest with python 3.9 on MacOS 10.15 > > > >Maybe it's because I'm using eval / exec in the script? > > Segfaults can result from C stack overflow which in turn can > be caused in special cases by too deeply nested function calls > (usually, Python's "maximal recursion depth exceeded" prevents > this before a C stack overflow). > > Otherwise, whatever you do in Python (this includes "eval/exec") > should not cause a segfault. The cause for it likely comes from > a memory management bug in some C implemented part of your > application. > > Note that memory management bugs may not show deterministic > behavior. Minor changes (such as "with/without -v") > can significantly change the outcome. -- https://mail.python.org/mailman/listinfo/python-list
Py_TRASHCAN_SAFE_BEGIN/END in C extension?
In Python 3.7, must Py_TRASHCAN_SAFE_BEGIN - Py_TRASHCAN_SAFE_END be used in a C extension? I'm asking because in my C extension I use them in the deallocator without problems, but users signalled me that they segfault in Python 3.7 on Debian 10. I checked and this is true. -- https://mail.python.org/mailman/listinfo/python-list
Re: Py_TRASHCAN_SAFE_BEGIN/END in C extension?
Yes, it's deprecated, but I need it for Python 3.7, since there was yet no Py_TRASHCAN_BEGIN / END On Tue, 21 Dec 2021 at 23:22, Barry wrote: > > > > On 21 Dec 2021, at 22:08, Marco Sulla wrote: > > In Python 3.7, must Py_TRASHCAN_SAFE_BEGIN - Py_TRASHCAN_SAFE_END be > used in a C extension? > > I'm asking because in my C extension I use them in the deallocator > without problems, but users signalled me that they segfault in Python > 3.7 on Debian 10. I checked and this is true. > > > I searched the web for Py_TRASHCAN_SAFE_BEGIN > And that quickly lead me to this bug. > > https://bugs.python.org/issue40608. > > That gives lots of clues for what might be the problem. > It seems that is a deprecated api. > > Barry > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: recover pickled data: pickle data was truncated
Use a semaphore. On Sun, 26 Dec 2021 at 03:30, iMath wrote: > > Normally, the shelve data should be read and write by only one process at a > time, but unfortunately it was simultaneously read and write by two > processes, thus corrupted it. Is there any way to recover all data in it ? > Currently I just get "pickle data was truncated" exception after reading a > portion of the data? > > Data and code here > :https://drive.google.com/file/d/137nJFc1TvOge88EjzhnFX9bXg6vd0RYQ/view?usp=sharing > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
What's the public API alternative to _PyObject_GC_IS_TRACKED()?
I have to use _PyObject_GC_IS_TRACKED(). It can't be used unless you define Py_BUILD_CORE. I want to avoid this. What macro or function can substitute it? -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
I need it since I'm developing an immutable dict. And in dict that function is used. I do not understand why there's no public API for that function. It seems very useful. On Sun, 26 Dec 2021 at 17:28, Barry Scott wrote: > > > > > On 26 Dec 2021, at 13:48, Marco Sulla wrote: > > > > I have to use _PyObject_GC_IS_TRACKED(). It can't be used unless you > > define Py_BUILD_CORE. I want to avoid this. What macro or function can > > substitute it? > > Why is this needed by your code? Surely the GC does its thing as an > implementation detail of python. > > Barry > > > > -- > > https://mail.python.org/mailman/listinfo/python-list > > > -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
Hi, Inada Senpai. So I do not need PyObject_GC_Track on cloning or merging, or MAINTAIN_TRACKING on insert? On Tue, 28 Dec 2021 at 07:58, Inada Naoki wrote: > > On Tue, Dec 28, 2021 at 3:31 AM Marco Sulla > wrote: > > > > I need it since I'm developing an immutable dict. And in dict that > > function is used. > > > > I do not understand why there's no public API for that function. It > > seems very useful. > > > > I think it is useful only for optimization based on *current* Python > internals. > That's why it is not a public API. If we expose it as public API, it > makes harder to change Python's GC internals. > > > -- > Inada Naoki -- https://mail.python.org/mailman/listinfo/python-list
Option for venv to upgrade pip automatically?
I think it's very boring that, after creating a venv, you have immediately to do every time: pip install -U pip Can't venv have an option for doing this automatically or, better, a config file where you can put commands that will be launched every time after you create a venv? -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On Tue, 28 Dec 2021 at 12:38, Inada Naoki wrote: > Your case is special. > You want to create a frozendict which performance is same to builtin dict. > Builtin dict has special optimization which tightly coupled with > current CPython implementation. > So you need to use private APIs for MAINTAIN_TRACKING. I solved this problem with a hacky trick: I included a reduced and slightly modified version of dictobject.c. Furthermore I copy / pasted stringlib\eq.h and _Py_bit_length. I'm currently doing this in a refactor branch. (Yes, I know that including a .c is very bad... but I need to do this to separate the code of dict from the code of frozendict. Putting all in the same files mess my head) > But PyObject_GC_Track() is a public API. The problem is I can't invoke PyObject_GC_Track() on an already tracked object. I tried it and Python segfaulted. That's why CPython uses _PyObject_GC_IS_TRACKED() before. I'll try to copy/paste it too... :D but I do not understand why there's not a public version of it. -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On Wed, 29 Dec 2021 at 00:03, Dieter Maurer wrote: > Why do you not derive from `dict` and override its mutating methods > (to raise a type error after initialization is complete)? I've done this for the pure py version, for speed. But in this way, frozendict results to be a subclass of MutableMapping. -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On Wed, 29 Dec 2021 at 07:46, Inada Naoki wrote: > You are right. I thought PyObject_GC_Track() can be used to tracked > objects because PyObject_GC_Untrack() can be used untracked object. > I think there is no enough reason for this asymmetry. > > Additionally, adding PyObject_GC_IsTracked() to public API will not > bother future Python improvements. > If Python changed its GC to mark-and-sweep, PyObject_GC_IsTracked() > can return true always. I think you are right :) -- https://mail.python.org/mailman/listinfo/python-list
Re: Option for venv to upgrade pip automatically?
Cool, thanks! On Wed, 29 Dec 2021 at 07:10, Inada Naoki wrote: > > You can use --upgrade-deps option. My alias is: > > alias mkvenv='python3 -m venv --upgrade-deps --prompt . venv' > > On Wed, Dec 29, 2021 at 4:55 AM Marco Sulla > wrote: > > > > I think it's very boring that, after creating a venv, you have > > immediately to do every time: > > > > pip install -U pip > > > > Can't venv have an option for doing this automatically or, better, a > > config file where you can put commands that will be launched every > > time after you create a venv? > > -- > > https://mail.python.org/mailman/listinfo/python-list > > > > -- > Inada Naoki -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On Wed, 29 Dec 2021 at 09:12, Dieter Maurer wrote: > > Marco Sulla wrote at 2021-12-29 08:08 +0100: > >On Wed, 29 Dec 2021 at 00:03, Dieter Maurer wrote: > >> Why do you not derive from `dict` and override its mutating methods > >> (to raise a type error after initialization is complete)? > > > >I've done this for the pure py version, for speed. But in this way, > >frozendict results to be a subclass of MutableMapping. > > `MutableMapping` is a so called abstract base class (--> `abc`). > > It uses the `__subclass_check__` (and `__instance_check__`) of > `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`. > Those can be customized by overriding `MutableMapping.__subclasshook__` > to ensure that your `frozendict` class (and their subclasses) > are not considered subclasses of `MutableMapping`. Emh. Too hacky for me too, sorry :D -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On second thought, I think I'll do this for the pure py version. But I will definitely not do this for the C extension, since it's anyway strange that an immutable mapping inherits from a mutable one! I've done it in the pure py version only for a matter of speed. On Wed, 29 Dec 2021 at 09:24, Marco Sulla wrote: > > On Wed, 29 Dec 2021 at 09:12, Dieter Maurer wrote: > > > > Marco Sulla wrote at 2021-12-29 08:08 +0100: > > >On Wed, 29 Dec 2021 at 00:03, Dieter Maurer wrote: > > >> Why do you not derive from `dict` and override its mutating methods > > >> (to raise a type error after initialization is complete)? > > > > > >I've done this for the pure py version, for speed. But in this way, > > >frozendict results to be a subclass of MutableMapping. > > > > `MutableMapping` is a so called abstract base class (--> `abc`). > > > > It uses the `__subclass_check__` (and `__instance_check__`) of > > `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`. > > Those can be customized by overriding `MutableMapping.__subclasshook__` > > to ensure that your `frozendict` class (and their subclasses) > > are not considered subclasses of `MutableMapping`. > > Emh. Too hacky for me too, sorry :D -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On Wed, 29 Dec 2021 at 10:06, Dieter Maurer wrote: > > Are you sure you need to implement your type in C at all? It's already implemented, and, in some cases, is faster than dict: https://github.com/Marco-Sulla/python-frozendict#benchmarks PS: I'm doing a refactoring that speeds up creation even further, making it almost as fast as dict. -- https://mail.python.org/mailman/listinfo/python-list
How to implement freelists in dict 3.10 for previous versions?
I noticed that now freelists in dict use _Py_dict_state. I suppose this is done for thread safety. I would implement it also for a C extension that uses CPython < 3.10. How can I achieve this? -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On Wed, 29 Dec 2021 at 09:12, Dieter Maurer wrote: > `MutableMapping` is a so called abstract base class (--> `abc`). > > It uses the `__subclass_check__` (and `__instance_check__`) of > `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`. > Those can be customized by overriding `MutableMapping.__subclasshook__` > to ensure that your `frozendict` class (and their subclasses) > are not considered subclasses of `MutableMapping`. It does not work: $ python Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18) [GCC 10.1.1 20200718] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import frozendict >>> frozendict.c_ext False >>> from frozendict import frozendict as fd >>> from collections.abc import MutableMapping as Mm >>> issubclass(fd, Mm) True >>> @classmethod ... def _my_subclasshook(klass, subclass): ... if subclass == fd: ... return False ... return NotImplemented ... >>> @classmethod ... def _my_subclasshook(klass, subclass): ... print(subclass) ... if subclass == fd: ... return False ... return NotImplemented ... >>> Mm.__subclasshook__ = _my_subclasshook >>> issubclass(fd, Mm) True >>> issubclass(tuple, Mm) False >>> -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On Wed, 29 Dec 2021 at 12:11, Dieter Maurer wrote: > > Marco Sulla wrote at 2021-12-29 11:59 +0100: > >On Wed, 29 Dec 2021 at 09:12, Dieter Maurer wrote: > >> `MutableMapping` is a so called abstract base class (--> `abc`). > >> > >> It uses the `__subclass_check__` (and `__instance_check__`) of > >> `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`. > >> Those can be customized by overriding `MutableMapping.__subclasshook__` > >> to ensure that your `frozendict` class (and their subclasses) > >> are not considered subclasses of `MutableMapping`. > > > >It does not work: > > ... > >>>> issubclass(fd, Mm) > >True > > There is a cache involved. The `issubclass` above, > brings your `fd` in the `Mn`'s subclass cache. It works, thank you! I had to put it before Mapping.register(frozendict) -- https://mail.python.org/mailman/listinfo/python-list
Re: recover pickled data: pickle data was truncated
On Wed, 29 Dec 2021 at 18:33, iMath wrote: > But I found the size of the file of the shelve data didn't change much, so I > guess the data are still in it , I just wonder any way to recover my data. I agree with Barry, Chris and Avi. IMHO your data is lost. Unpickling it by hand is a harsh work and maybe unreliable. Is there any reason you can't simply add a semaphore to avoid writing at the same time and re-run the code and regenerate the data? -- https://mail.python.org/mailman/listinfo/python-list
Re: builtins.TypeError: catching classes that do not inherit from BaseException is not allowed
It was already done: https://pypi.org/project/tail-recursive/ On Thu, 30 Dec 2021 at 16:00, hongy...@gmail.com wrote: > > I try to compute the factorial of a large number with tail-recursion > optimization decorator in Python3. The following code snippet is converted > from the code snippet given here [1] by the following steps: > > $ pyenv shell datasci > $ python --version > Python 3.9.1 > $ pip install 2to3 > $ 2to3 -w this-script.py > > ``` > # This program shows off a python decorator( > # which implements tail call optimization. It > # does this by throwing an exception if it is > # its own grandparent, and catching such > # exceptions to recall the stack. > > import sys > > class TailRecurseException: > def __init__(self, args, kwargs): > self.args = args > self.kwargs = kwargs > > def tail_call_optimized(g): > """ > This function decorates a function with tail call > optimization. It does this by throwing an exception > if it is its own grandparent, and catching such > exceptions to fake the tail call optimization. > > This function fails if the decorated > function recurses in a non-tail context. > """ > def func(*args, **kwargs): > f = sys._getframe() > if f.f_back and f.f_back.f_back \ > and f.f_back.f_back.f_code == f.f_code: > raise TailRecurseException(args, kwargs) > else: > while 1: > try: > return g(*args, **kwargs) > except TailRecurseException as e: > args = e.args > kwargs = e.kwargs > func.__doc__ = g.__doc__ > return func > > @tail_call_optimized > def factorial(n, acc=1): > "calculate a factorial" > if n == 0: > return acc > return factorial(n-1, n*acc) > > print(factorial(1)) > # prints a big, big number, > # but doesn't hit the recursion limit. > > @tail_call_optimized > def fib(i, current = 0, next = 1): > if i == 0: > return current > else: > return fib(i - 1, next, current + next) > > print(fib(1)) > # also prints a big number, > # but doesn't hit the recursion limit. > ``` > However, when I try to test the above script, the following error will be > triggered: > ``` > $ python this-script.py > Traceback (most recent call last): > File "/home/werner/this-script.py", line 32, in func > return g(*args, **kwargs) > File "/home/werner/this-script.py", line 44, in factorial > return factorial(n-1, n*acc) > File "/home/werner/this-script.py", line 28, in func > raise TailRecurseException(args, kwargs) > TypeError: exceptions must derive from BaseException > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "/home/werner/this-script.py", line 46, in > print(factorial(1)) > File "/home/werner/this-script.py", line 33, in func > except TailRecurseException as e: > TypeError: catching classes that do not inherit from BaseException is not > allowed > ``` > > Any hints for fixing this problem will be highly appreciated. > > [1] https://stackoverflow.com/q/27417874 > > Regards, > HZ > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: recover pickled data: pickle data was truncated
I agree with Barry. You can create a folder or a file with pseudo-random names. I recommend you to use str(uuid.uuid4()) On Sat, 1 Jan 2022 at 14:11, Barry wrote: > > > > > On 31 Dec 2021, at 17:53, iMath wrote: > > > > 在 2021年12月30日星期四 UTC+8 03:13:21, 写道: > >>> On Wed, 29 Dec 2021 at 18:33, iMath wrote: > >>> But I found the size of the file of the shelve data didn't change much, > >>> so I guess the data are still in it , I just wonder any way to recover my > >>> data. > >> I agree with Barry, Chris and Avi. IMHO your data is lost. Unpickling > >> it by hand is a harsh work and maybe unreliable. > >> > >> Is there any reason you can't simply add a semaphore to avoid writing > >> at the same time and re-run the code and regenerate the data? > > > > Thanks for your replies! I didn't have a sense of adding a semaphore on > > writing to pickle data before, so corrupted the data. > > Since my data was colleted in the daily usage, so cannot re-run the code > > and regenerate the data. > > In order to avoid corrupting my data again and the complicity of using a > > semaphore, now I am using json text to store my data. > > That will not fix the problem. You will end up with corrupt json. > > If you have one writer and one read then may be you can use the fact that a > rename is atomic. > > Writer does this: > 1. Creat new json file in the same folder but with a tmp name > 2. Rename the file from its tmp name to the public name. > > The read will just read the public name. > > I am not sure what happens in your world if the writer runs a second time > before the data is read. > > In that case you need to create a queue of files to be read. > > But if the problem is two process racing against each other you MUST use > locking. > It cannot be avoided for robust operations. > > Barry > > > > -- > > https://mail.python.org/mailman/listinfo/python-list > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
How to make a type of a C extension compatible with mypy
I created a type in a C extension, that is an immutable dict. If I do: a: mydict[str, str] it works. But it doesn't work with mypy, as signalled to me by an user: https://github.com/Marco-Sulla/python-frozendict/issues/39 How can I make it work? I don't know what he means with annotating methods, and furthermore I suppose I can't do this in C. -- https://mail.python.org/mailman/listinfo/python-list
Re: How to implement freelists in dict 3.10 for previous versions?
Ooookay, I suppose I have to study a little the thing :D On Thu, 30 Dec 2021 at 07:59, Inada Naoki wrote: > > On Wed, Dec 29, 2021 at 7:25 PM Marco Sulla > wrote: > > > > I noticed that now freelists in dict use _Py_dict_state. I suppose > > this is done for thread safety. > > > > Some core-dev are working on per-interpreter GIL. But it is not done yet. > So you don't need to follow it soon. Your extension module will work > well in Python 3.11. > > > I would implement it also for a C extension that uses CPython < 3.10. > > How can I achieve this? > > See PyModule_GetState() to have per-interpreter module state instead > of static variables. > https://docs.python.org/3/c-api/module.html#c.PyModule_GetState > > > -- > Inada Naoki -- https://mail.python.org/mailman/listinfo/python-list
Who wrote Py_UNREACHABLE?
#if defined(RANDALL_WAS_HERE) # define Py_UNREACHABLE() \ Py_FatalError( \ "If you're seeing this, the code is in what I thought was\n" \ "an unreachable state.\n\n" \ "I could give you advice for what to do, but honestly, why\n" \ "should you trust me? I clearly screwed this up. I'm writing\n" \ "a message that should never appear, yet I know it will\n" \ "probably appear someday.\n\n" \ "On a deep level, I know I'm not up to this task.\n" \ "I'm so sorry.\n" \ "https://xkcd.com/2200";) #elif defined(Py_DEBUG) # define Py_UNREACHABLE() \ Py_FatalError( \ "We've reached an unreachable state. Anything is possible.\n" \ "The limits were in our heads all along. Follow your dreams.\n" \ "https://xkcd.com/2200";) etc -- https://mail.python.org/mailman/listinfo/python-list
Re: ModuleNotFoundError: No module named 'DistUtilsExtra'
https://askubuntu.com/questions/584857/distutilsextra-problem On Sun, 2 Jan 2022 at 18:52, hongy...@gmail.com wrote: > > On Ubuntu 20.04.3 LTS, I try to install pdfarranger [1] as follows but failed: > > $ sudo apt-get install python3-pip python3-distutils-extra \ > python3-wheel python3-gi > python3-gi-cairo \ > gir1.2-gtk-3.0 gir1.2-poppler-0.18 > python3-setuptools > $ git clone https://github.com/pdfarranger/pdfarranger.git pdfarranger.git > $ cd pdfarranger.git > $ pyenv shell 3.8.3 > $ pyenv virtualenv --system-site-packages pdfarranger > $ pyenv shell pdfarranger > $ pip install -U pip > $ ./setup.py build > Traceback (most recent call last): > File "./setup.py", line 24, in > from DistUtilsExtra.command import ( > ModuleNotFoundError: No module named 'DistUtilsExtra' > > > See the following for the package list installed in this virtualenv: > > $ pip list > PackageVersion > -- > pip21.3.1 > pyfiglet 0.8.post1 > setuptools 41.2.0 > vtk9.0.20200612 > > Any hints for fixing this problem? Also see here [2-3] for relevant > discussions. > > [1] https://github.com/pdfarranger/pdfarranger > [2] https://github.com/pdfarranger/pdfarranger/issues/604 > [3] > https://discuss.python.org/t/modulenotfounderror-no-module-named-distutilsextra/12834 > > Regards, > HZ > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list