from:"Marco Sulla"

Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-04-18 Thread Marco Sulla

On Sat, 16 Apr 2022 at 17:14, Peter J. Holzer  wrote:
>
> On 2022-04-16 16:49:17 +0200, Marco Sulla wrote:
> > Furthermore, you didn't answer my simple question: why does the
> > security update package contain metadata about Debian patches, if the
> > Ubuntu security team did not benefit from Debian security patches but
> > only from internal work?
>
> It DOES NOT contain metadata about Debian patches. You are
> misinterpreting the name "debian". The directory has this name because
> the tools (dpkg, quilt, etc.) were originally written by the Debian team
> for the Debian distribution. Ubuntu uses the same tools. They didn't
> bother to rename the directory (why should they?), so the directory is
> still called "debian" on Ubuntu (and yes I know this because I've built
> numerous .deb packages on Ubuntu systems).

Ah ok, now I understand. Sorry for the confusion.
-- 
https://mail.python.org/mailman/listinfo/python-list

tail

2022-04-23 Thread Marco Sulla

What about introducing a method for text streams that reads the lines
from the bottom? Java has also a ReversedLinesFileReader with Apache
Commons IO.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Receive a signal when waking or suspending?

2022-04-23 Thread Marco Sulla

I don't know in Python, but maybe you can create a script that writes
on a named pipe and read it from Python?
https://askubuntu.com/questions/226278/run-script-on-wakeup
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-04-23 Thread Marco Sulla

On Sat, 23 Apr 2022 at 20:59, Chris Angelico  wrote:
>
> On Sun, 24 Apr 2022 at 04:37, Marco Sulla  
> wrote:
> >
> > What about introducing a method for text streams that reads the lines
> > from the bottom? Java has also a ReversedLinesFileReader with Apache
> > Commons IO.
>
> It's fundamentally difficult to get precise. In general, there are
> three steps to reading the last N lines of a file:
>
> 1) Find out the size of the file (currently, if it's being grown)
> 2) Seek to the end of the file, minus some threshold that you hope
> will contain a number of lines
> 3) Read from there to the end of the file, split it into lines, and
> keep the last N
>
> Reading the preceding N lines is basically a matter of repeating the
> same exercise, but instead of "end of the file", use the byte position
> of the line you last read.
>
> The problem is, seeking around in a file is done by bytes, not
> characters. So if you know for sure that you can resynchronize
> (possible with UTF-8, not possible with some other encodings), then
> you can do this, but it's probably best to build it yourself (opening
> the file in binary mode).

Well, indeed I have an implementation that does more or less what you
described for utf8 only. The only difference is that I just started
from the end of file -1. I'm just wondering if this will be useful in
the stdlib. I think it's not too difficult to generalise for every
encoding.

> This is quite inefficient in general.

Why inefficient? I think that readlines() will be much slower, not
only more time consuming.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-04-23 Thread Marco Sulla

On Sat, 23 Apr 2022 at 23:00, Chris Angelico  wrote:
> > > This is quite inefficient in general.
> >
> > Why inefficient? I think that readlines() will be much slower, not
> > only more time consuming.
>
> It depends on which is more costly: reading the whole file (cost
> depends on size of file) or reading chunks and splitting into lines
> (cost depends on how well you guess at chunk size). If the lines are
> all *precisely* the same number of bytes each, you can pick a chunk
> size and step backwards with near-perfect efficiency (it's still
> likely to be less efficient than reading a file forwards, on most file
> systems, but it'll be close); but if you have to guess, adjust, and
> keep going, then you lose efficiency there.

Emh, why chunks? My function simply reads byte per byte and compares it to
b"\n". When it find it, it stops and do a readline():

def tail(filepath):
"""
@author Marco Sulla
@date May 31, 2016
"""

try:
filepath.is_file
fp = str(filepath)
except AttributeError:
fp = filepath

with open(fp, "rb") as f:
size = os.stat(fp).st_size
start_pos = 0 if size - 1 < 0 else size - 1

if start_pos != 0:
f.seek(start_pos)
char = f.read(1)

if char == b"\n":
start_pos -= 1
f.seek(start_pos)

if start_pos == 0:
f.seek(start_pos)
else:
for pos in range(start_pos, -1, -1):
f.seek(pos)

char = f.read(1)

if char == b"\n":
break

return f.readline()

This is only for one line and in utf8, but it can be generalised.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-04-24 Thread Marco Sulla

On Sat, 23 Apr 2022 at 23:18, Chris Angelico  wrote:

> Ah. Well, then, THAT is why it's inefficient: you're seeking back one
> single byte at a time, then reading forwards. That is NOT going to
> play nicely with file systems or buffers.
>
> Compare reading line by line over the file with readlines() and you'll
> see how abysmal this is.
>
> If you really only need one line (which isn't what your original post
> suggested), I would recommend starting with a chunk that is likely to
> include a full line, and expanding the chunk until you have that
> newline. Much more efficient than one byte at a time.
>

Well, I would like to have a sort of tail, so to generalise to more than 1
line. But I think that once you have a good algorithm for one line, you can
repeat it N times.

I understand that you can read a chunk instead of a single byte, so when
the newline is found you can return all the cached chunks concatenated. But
will this make the search of the start of the line faster? I suppose you
have always to read byte by byte (or more, if you're using urf16 etc) and
see if there's a newline.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-04-24 Thread Marco Sulla

On Sun, 24 Apr 2022 at 00:19, Cameron Simpson  wrote:

> An approach I think you both may have missed: mmap the file and use
> mmap.rfind(b'\n') to locate line delimiters.
> https://docs.python.org/3/library/mmap.html#mmap.mmap.rfind
>

Ah, I played very little with mmap, I didn't know about this. So I suppose
you can locate the newline and at that point read the line without using
chunks?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-04-24 Thread Marco Sulla

On Sun, 24 Apr 2022 at 11:21, Roel Schroeven  wrote:

> dn schreef op 24/04/2022 om 0:04:
> > Disagreeing with @Chris in the sense that I use tail very frequently,
> > and usually in the context of server logs - but I'm talking about the
> > Linux implementation, not Python code!
> If I understand Marco correctly, what he want is to read the lines from
> bottom to top, i.e. tac instead of tail, despite his subject.
> I use tail very frequently too, but tac is something I almost never use.
>

Well, the inverse reader is only a secondary suggestion. I suppose a tail
is much more useful.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-01 Thread Marco Sulla

Something like this is OK?

import os

def tail(f):
chunk_size = 100
size = os.stat(f.fileno()).st_size

positions = iter(range(size, -1, -chunk_size))
next(positions)

chunk_line_pos = -1
pos = 0

for pos in positions:
f.seek(pos)
chars = f.read(chunk_size)
chunk_line_pos = chars.rfind(b"\n")

if chunk_line_pos != -1:
break

if chunk_line_pos == -1:
nbytes = pos
pos = 0
f.seek(pos)
chars = f.read(nbytes)
chunk_line_pos = chars.rfind(b"\n")

if chunk_line_pos == -1:
line_pos = pos
else:
line_pos = pos + chunk_line_pos + 1

f.seek(line_pos)

return f.readline()

This is simply for one line and for utf8.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: new sorting algorithm

2022-05-01 Thread Marco Sulla

I suppose you should write to python-...@python.org , or in
https://discuss.python.org/ under the section Core development
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-02 Thread Marco Sulla

On Mon, 2 May 2022 at 18:31, Stefan Ram  wrote:
>
> |The Unicode standard defines a number of characters that
> |conforming applications should recognize as line terminators:[7]
> |
> |LF:Line Feed, U+000A
> |VT:Vertical Tab, U+000B
> |FF:Form Feed, U+000C
> |CR:Carriage Return, U+000D
> |CR+LF: CR (U+000D) followed by LF (U+000A)
> |NEL:   Next Line, U+0085
> |LS:Line Separator, U+2028
> |PS:Paragraph Separator, U+2029
> |
> Wikipedia "Newline".

Should I suppose that other encodings may have more line ending chars?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-02 Thread Marco Sulla

Ok, I suppose \n and \r are enough:


readline(size=- 1, /)

Read and return one line from the stream. If size is specified, at
most size bytes will be read.

The line terminator is always b'\n' for binary files; for text files,
the newline argument to open() can be used to select the line
terminator(s) recognized.

open(file, mode='r', buffering=- 1, encoding=None, errors=None,
newline=None, closefd=True, opener=None)
[...]
newline controls how universal newlines mode works (it only applies to
text mode). It can be None, '', '\n', '\r', and '\r\n'

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-02 Thread Marco Sulla

On Mon, 2 May 2022 at 00:20, Cameron Simpson  wrote:
>
> On 01May2022 18:55, Marco Sulla  wrote:
> >Something like this is OK?
> [...]
> >def tail(f):
> >chunk_size = 100
> >size = os.stat(f.fileno()).st_size
>
> I think you want os.fstat().

It's the same from py 3.3

> >chunk_line_pos = -1
> >pos = 0
> >
> >for pos in positions:
> >f.seek(pos)
> >chars = f.read(chunk_size)
> >chunk_line_pos = chars.rfind(b"\n")
> >
> >if chunk_line_pos != -1:
> >break
>
> Normal text file _end_ in a newline. I'd expect this to stop immediately
> at the end of the file.

I think it's correct. The last line in this case is an empty bytes.

> >if chunk_line_pos == -1:
> >nbytes = pos
> >pos = 0
> >f.seek(pos)
> >chars = f.read(nbytes)
> >chunk_line_pos = chars.rfind(b"\n")
>
> I presume this is because unless you're very lucky, 0 will not be a
> position in the range(). I'd be inclined to avoid duplicating this code
> and special case and instead maybe make the range unbounded and do
> something like this:
>
> if pos < 0:
> pos = 0
> ... seek/read/etc ...
> if pos == 0:
> break
>
> around the for-loop body.

Yes, I was not very happy to duplicate the code... I have to think about it.

> Seems sane. I haven't tried to run it.

Thank you ^^
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-06 Thread Marco Sulla

I have a little problem.

I tried to extend the tail function, so it can read lines from the bottom
of a file object opened in text mode.

The problem is it does not work. It gets a starting position that is lower
than the expected by 3 characters. So the first line is read only for 2
chars, and the last line is missing.

import os

_lf = "\n"
_cr = "\r"
_lf_ord = ord(_lf)

def tail(f, n=10, chunk_size=100):
n_chunk_size = n * chunk_size
pos = os.stat(f.fileno()).st_size
chunk_line_pos = -1
lines_not_found = n
binary_mode = "b" in f.mode
lf = _lf_ord if binary_mode else _lf

while pos != 0:
pos -= n_chunk_size

if pos < 0:
pos = 0

f.seek(pos)
chars = f.read(n_chunk_size)

for i, char in enumerate(reversed(chars)):
if char == lf:
lines_not_found -= 1

if lines_not_found == 0:
chunk_line_pos = len(chars) - i - 1
print(chunk_line_pos, i)
break

if lines_not_found == 0:
break

line_pos = pos + chunk_line_pos + 1

f.seek(line_pos)

res = b"" if binary_mode else ""

for i in range(n):
res += f.readline()

return res

Maybe the problem is 1 char != 1 byte?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-07 Thread Marco Sulla

On Sat, 7 May 2022 at 01:03, Dennis Lee Bieber  wrote:
>
> Windows also uses  for the EOL marker, but Python's I/O system
> condenses that to just  internally (for TEXT mode) -- so using the
> length of a string so read to compute a file position may be off-by-one for
> each EOL in the string.

So there's no way to reliably read lines in reverse in text mode using
seek and read, but the only option is readlines?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-07 Thread Marco Sulla

On Sat, 7 May 2022 at 16:08, Barry  wrote:
> You need to handle the file in bin mode and do the handling of line endings 
> and encodings yourself. It’s not that hard for the cases you wanted.

>>> "\n".encode("utf-16")
b'\xff\xfe\n\x00'
>>> "".encode("utf-16")
b'\xff\xfe'
>>> "a\nb".encode("utf-16")
b'\xff\xfea\x00\n\x00b\x00'
>>> "\n".encode("utf-16").lstrip("".encode("utf-16"))
b'\n\x00'

Can I use the last trick to get the encoding of a LF or a CR in any encoding?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-07 Thread Marco Sulla

On Sat, 7 May 2022 at 19:02, MRAB  wrote:
>
> On 2022-05-07 17:28, Marco Sulla wrote:
> > On Sat, 7 May 2022 at 16:08, Barry  wrote:
> >> You need to handle the file in bin mode and do the handling of line 
> >> endings and encodings yourself. It’s not that hard for the cases you 
> >> wanted.
> >
> >>>> "\n".encode("utf-16")
> > b'\xff\xfe\n\x00'
> >>>> "".encode("utf-16")
> > b'\xff\xfe'
> >>>> "a\nb".encode("utf-16")
> > b'\xff\xfea\x00\n\x00b\x00'
> >>>> "\n".encode("utf-16").lstrip("".encode("utf-16"))
> > b'\n\x00'
> >
> > Can I use the last trick to get the encoding of a LF or a CR in any 
> > encoding?
>
> In the case of UTF-16, it's 2 bytes per code unit, but those 2 bytes
> could be little-endian or big-endian.
>
> As you didn't specify which you wanted, it defaulted to little-endian
> and added a BOM (U+FEFF).
>
> If you specify which endianness you want with "utf-16le" or "utf-16be",
> it won't add the BOM:
>
>  >>> # Little-endian.
>  >>> "\n".encode("utf-16le")
> b'\n\x00'
>  >>> # Big-endian.
>  >>> "\n".encode("utf-16be")
> b'\x00\n'

Well, ok, but I need a generic method to get LF and CR for any
encoding an user can input.
Do you think that

"\n".encode(encoding).lstrip("".encode(encoding))

is good for any encoding? Furthermore, is there a way to get the
encoding of an opened file object?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-08 Thread Marco Sulla

I think I've _almost_ found a simpler, general way:

import os

_lf = "\n"
_cr = "\r"

def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100):
n_chunk_size = n * chunk_size
pos = os.stat(filepath).st_size
chunk_line_pos = -1
lines_not_found = n

with open(filepath, newline=newline, encoding=encoding) as f:
text = ""

hard_mode = False

if newline == None:
newline = _lf
elif newline == "":
hard_mode = True

if hard_mode:
while pos != 0:
pos -= n_chunk_size

if pos < 0:
pos = 0

f.seek(pos)
text = f.read()
lf_after = False

for i, char in enumerate(reversed(text)):
if char == _lf:
lf_after == True
elif char == _cr:
lines_not_found -= 1

newline_size = 2 if lf_after else 1

lf_after = False
elif lf_after:
lines_not_found -= 1
newline_size = 1
lf_after = False


if lines_not_found == 0:
chunk_line_pos = len(text) - 1 - i + newline_size
break

if lines_not_found == 0:
break
else:
while pos != 0:
pos -= n_chunk_size

if pos < 0:
pos = 0

f.seek(pos)
text = f.read()

for i, char in enumerate(reversed(text)):
if char == newline:
lines_not_found -= 1

if lines_not_found == 0:
chunk_line_pos = len(text) - 1 - i +
len(newline)
break

if lines_not_found == 0:
break


if chunk_line_pos == -1:
chunk_line_pos = 0

return text[chunk_line_pos:]


Shortly, the file is always opened in text mode. File is read at the end in
bigger and bigger chunks, until the file is finished or all the lines are
found.

Why? Because in encodings that have more than 1 byte per character, reading
a chunk of n bytes, then reading the previous chunk, can eventually split
the character between the chunks in two distinct bytes.

I think one can read chunk by chunk and test the chunk junction problem. I
suppose the code will be faster this way. Anyway, it seems that this trick
is quite fast anyway and it's a lot simpler.

The final result is read from the chunk, and not from the file, so there's
no problems of misalignment of bytes and text. Furthermore, the builtin
encoding parameter is used, so this should work with all the encodings
(untested).

Furthermore, a newline parameter can be specified, as in open(). If it's
equal to the empty string, the things are a little more complicated, anyway
I suppose the code is clear. It's untested too. I only tested with an utf8
linux file.

Do you think there are chances to get this function as a method of the file
object in CPython? The method for a file object opened in bytes mode is
simpler, since there's no encoding and newline is only \n in that case.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-08 Thread Marco Sulla

On Sun, 8 May 2022 at 20:31, Barry Scott  wrote:
>
> > On 8 May 2022, at 17:05, Marco Sulla  wrote:
> >
> > def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100):
> >n_chunk_size = n * chunk_size
>
> Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its typically 
> the smaller size the file system will allocate.
> I tend to read on multiple of MiB as its near instant.

Well, I tested on a little file, a list of my preferred pizzas, so

> >pos = os.stat(filepath).st_size
>
> You cannot mix POSIX API with text mode.
> pos is in bytes from the start of the file.
> Textmode will be in code points. bytes != code points.
>
> >chunk_line_pos = -1
> >lines_not_found = n
> >
> >with open(filepath, newline=newline, encoding=encoding) as f:
> >text = ""
> >
> >hard_mode = False
> >
> >if newline == None:
> >newline = _lf
> >elif newline == "":
> >hard_mode = True
> >
> >if hard_mode:
> >while pos != 0:
> >pos -= n_chunk_size
> >
> >if pos < 0:
> >pos = 0
> >
> >f.seek(pos)
>
> In text mode you can only seek to a value return from f.tell() otherwise the 
> behaviour is undefined.

Why? I don't see any recommendation about it in the docs:
https://docs.python.org/3/library/io.html#io.IOBase.seek

> >text = f.read()
>
> You have on limit on the amount of data read.

I explained that previously. Anyway, chunk_size is small, so it's not
a great problem.

> >lf_after = False
> >
> >for i, char in enumerate(reversed(text)):
>
> Simple use text.rindex('\n') or text.rfind('\n') for speed.

I can't use them when I have to find both \n or \r. So I preferred to
simplify the code and use the for cycle every time. Take into mind
anyway that this is a prototype for a Python C Api implementation
(builtin I hope, or a C extension if not)

> > Shortly, the file is always opened in text mode. File is read at the end in
> > bigger and bigger chunks, until the file is finished or all the lines are
> > found.
>
> It will fail if the contents is not ASCII.

Why?

> > Why? Because in encodings that have more than 1 byte per character, reading
> > a chunk of n bytes, then reading the previous chunk, can eventually split
> > the character between the chunks in two distinct bytes.
>
> No it cannot. text mode only knows how to return code points. Now if you are 
> in
> binary it could be split, but you are not in binary mode so it cannot.

>From the docs:

seek(offset, whence=SEEK_SET)
Change the stream position to the given byte offset.

> > Do you think there are chances to get this function as a method of the file
> > object in CPython? The method for a file object opened in bytes mode is
> > simpler, since there's no encoding and newline is only \n in that case.
>
> State your requirements. Then see if your implementation meets them.

The method should return the last n lines from a file object.
If the file object is in text mode, the newline parameter must be honored.
If the file object is in binary mode, a newline is always b"\n", to be
consistent with readline.

I suppose the current implementation of tail satisfies the
requirements for text mode. The previous one satisfied binary mode.

Anyway, apart from my implementation, I'm curious if you think a tail
method is worth it to be a method of the builtin file objects in
CPython.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-08 Thread Marco Sulla

On Sun, 8 May 2022 at 22:02, Chris Angelico  wrote:
>
> Absolutely not. As has been stated multiple times in this thread, a
> fully general approach is extremely complicated, horrifically
> unreliable, and hopelessly inefficient.

Well, my implementation is quite general now. It's not complicated and
inefficient. About reliability, I can't say anything without a test
case.

> The ONLY way to make this sort
> of thing any good whatsoever is to know your own use-case and code to
> exactly that. Given the size of files you're working with, for
> instance, a simple approach of just reading the whole file would make
> far more sense than the complex seeking you're doing. For reading a
> multi-gigabyte file, the choices will be different.

Apart from the fact that it's very, very simple to optimize for small
files: this is, IMHO, a premature optimization. The code is quite fast
even if the file is small. Can it be faster? Of course, but it depends
on the use case. Every optimization in CPython must pass the benchmark
suite test. If there's little or no gain, the optimization is usually
rejected.

> No, this does NOT belong in the core language.

I respect your opinion, but IMHO you think that the task is more
complicated than the reality. It seems to me that the method can be
quite simple and fast.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-08 Thread Marco Sulla

On Sun, 8 May 2022 at 22:34, Barry  wrote:
>
> > On 8 May 2022, at 20:48, Marco Sulla  wrote:
> >
> > On Sun, 8 May 2022 at 20:31, Barry Scott  wrote:
> >>
> >>>> On 8 May 2022, at 17:05, Marco Sulla  
> >>>> wrote:
> >>>
> >>> def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100):
> >>>   n_chunk_size = n * chunk_size
> >>
> >> Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its 
> >> typically the smaller size the file system will allocate.
> >> I tend to read on multiple of MiB as its near instant.
> >
> > Well, I tested on a little file, a list of my preferred pizzas, so
>
> Try it on a very big file.

I'm not saying it's a good idea, it's only the value that I needed for my tests.
Anyway, it's not a problem with big files. The problem is with files
with long lines.

> >> In text mode you can only seek to a value return from f.tell() otherwise 
> >> the behaviour is undefined.
> >
> > Why? I don't see any recommendation about it in the docs:
> > https://docs.python.org/3/library/io.html#io.IOBase.seek
>
> What does adding 1 to a pos mean?
> If it’s binary it mean 1 byte further down the file but in text mode it may 
> need to
> move the point 1, 2 or 3 bytes down the file.

Emh. I re-quote

seek(offset, whence=SEEK_SET)
Change the stream position to the given byte offset.

And so on. No mention of differences between text and binary mode.

> >> You have on limit on the amount of data read.
> >
> > I explained that previously. Anyway, chunk_size is small, so it's not
> > a great problem.
>
> Typo I meant you have no limit.
>
> You read all the data till the end of the file that might be mega bytes of 
> data.

Yes, I already explained why and how it could be optimized. I quote myself:

Shortly, the file is always opened in text mode. File is read at the
end in bigger and bigger chunks, until the file is finished or all the
lines are found.

Why? Because in encodings that have more than 1 byte per character,
reading a chunk of n bytes, then reading the previous chunk, can
eventually split the character between the chunks in two distinct
bytes.

I think one can read chunk by chunk and test the chunk junction
problem. I suppose the code will be faster this way. Anyway, it seems
that this trick is quite fast anyway and it's a lot simpler.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-09 Thread Marco Sulla

On Mon, 9 May 2022 at 07:56, Cameron Simpson  wrote:
>
> The point here is that text is a very different thing. Because you
> cannot seek to an absolute number of characters in an encoding with
> variable sized characters. _If_ you did a seek to an arbitrary number
> you can end up in the middle of some character. And there are encodings
> where you cannot inspect the data to find a character boundary in the
> byte stream.

Ooook, now I understand what you and Barry mean. I suppose there's no
reliable way to tail a big file opened in text mode with a decent performance.

Anyway, the previous-previous function I posted worked only for files
opened in binary mode, and I suppose it's reliable, since it searches
only for b"\n", as readline() in binary mode do.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-09 Thread Marco Sulla

On Mon, 9 May 2022 at 19:53, Chris Angelico  wrote:
>
> On Tue, 10 May 2022 at 03:47, Marco Sulla  
> wrote:
> >
> > On Mon, 9 May 2022 at 07:56, Cameron Simpson  wrote:
> > >
> > > The point here is that text is a very different thing. Because you
> > > cannot seek to an absolute number of characters in an encoding with
> > > variable sized characters. _If_ you did a seek to an arbitrary number
> > > you can end up in the middle of some character. And there are encodings
> > > where you cannot inspect the data to find a character boundary in the
> > > byte stream.
> >
> > Ooook, now I understand what you and Barry mean. I suppose there's no
> > reliable way to tail a big file opened in text mode with a decent 
> > performance.
> >
> > Anyway, the previous-previous function I posted worked only for files
> > opened in binary mode, and I suppose it's reliable, since it searches
> > only for b"\n", as readline() in binary mode do.
>
> It's still fundamentally impossible to solve this in a general way, so
> the best way to do things will always be to code for *your* specific
> use-case. That means that this doesn't belong in the stdlib or core
> language, but in your own toolkit.

Nevertheless, tail is a fundamental tool in *nix. It's fast and
reliable. Also the tail command can't handle different encodings?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-11 Thread Marco Sulla

On Mon, 9 May 2022 at 23:15, Dennis Lee Bieber 
wrote:
>
> On Mon, 9 May 2022 21:11:23 +0200, Marco Sulla
>  declaimed the following:
>
> >Nevertheless, tail is a fundamental tool in *nix. It's fast and
> >reliable. Also the tail command can't handle different encodings?
>
> Based upon
> https://github.com/coreutils/coreutils/blob/master/src/tail.c the ONLY
> thing tail looks at is single byte "\n". It does not handle other line
> endings, and appears to performs BINARY I/O, not text I/O. It does nothing
> for bytes that are not "\n". Split multi-byte encodings are irrelevant
> since, if it does not find enough "\n" bytes in the buffer (chunk) it
reads
> another binary chunk and seeks for additional "\n" bytes. Once it finds
the
> desired amount, it is synchronized on the byte following the "\n" (which,
> for multi-byte encodings might be a NUL, but in any event, should be a
safe
> location for subsequent I/O).
>
> Interpretation of encoding appears to fall to the console driver
> configuration when displaying the bytes output by tail.

Ok, I understand. This should be a Python implementation of *nix tail:

import os

_lf = b"\n"
_err_n = "Parameter n must be a positive integer number"
_err_chunk_size = "Parameter chunk_size must be a positive integer number"

def tail(filepath, n=10, chunk_size=100):
if (n <= 0):
raise ValueError(_err_n)

if (n % 1 != 0):
raise ValueError(_err_n)

if (chunk_size <= 0):
raise ValueError(_err_chunk_size)

if (chunk_size % 1 != 0):
raise ValueError(_err_chunk_size)

n_chunk_size = n * chunk_size
pos = os.stat(filepath).st_size
chunk_line_pos = -1
lines_not_found = n

with open(filepath, "rb") as f:
text = bytearray()

while pos != 0:
pos -= n_chunk_size

if pos < 0:
pos = 0

f.seek(pos)
chars = f.read(n_chunk_size)
text[0:0] = chars
search_pos = n_chunk_size

while search_pos != -1:
chunk_line_pos = chars.rfind(_lf, 0, search_pos)

if chunk_line_pos != -1:
lines_not_found -= 1

if lines_not_found == 0:
break

search_pos = chunk_line_pos

if lines_not_found == 0:
break

return bytes(text[chunk_line_pos+1:])

The function opens the file in binary mode and searches only for b"\n". It
returns the last n lines of the file as bytes.

I suppose this function is fast. It reads the bytes from the file in chunks
and stores them in a bytearray, prepending them to it. The final result is
read from the bytearray and converted to bytes (to be consistent with the
read method).

I suppose the function is reliable. File is opened in binary mode and only
b"\n" is searched as line end, as *nix tail (and python readline in binary
mode) do. And bytes are returned. The caller can use them as is or convert
them to a string using the encoding it wants, or do whatever its
imagination can think :)

Finally, it seems to me the function is quite simple.

If all my affirmations are true, the three obstacles written by Chris
should be passed.

I'd very much like to see a CPython implementation of that function. It
could be a method of a file object opened in binary mode, and *only* in
binary mode.

What do you think about it?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-11 Thread Marco Sulla

On Wed, 11 May 2022 at 22:09, Chris Angelico  wrote:
>
> Have you actually checked those three, or do you merely suppose them to be 
> true?

I only suppose, as I said. I should do some benchmark and some other
tests, and, frankly, I don't want to. I don't want to because I'm
quite sure the implementation is fast, since it reads by chunks and
cache them. I'm not sure it's 100% free of bugs, but the concept is
very simple, since it simply mimics the *nix tail, so it should be
reliable.

>
> > I'd very much like to see a CPython implementation of that function. It
> > could be a method of a file object opened in binary mode, and *only* in
> > binary mode.
> >
> > What do you think about it?
>
> Still not necessary. You can simply have it in your own toolkit. Why
> should it be part of the core language?

Why not?

> How much benefit would it be
> to anyone else?

I suppose that every programmer, at least one time in its life, did a tail.

> All the same assumptions are still there, so it still
> isn't general

It's general. It mimics the *nix tail. I can't think of a more general
way to implement a tail.

> I don't understand why this wants to be in the standard library.

Well, the answer is really simple: I needed it and if I found it in
the stdlib, I used it instead of writing the first horrible function.
Furthermore, tail is such a useful tool that I suppose many others are
interested, based on this quick Google search:

https://www.google.com/search?q=python+tail

A question on Stackoverflow really much voted, many other
Stackoverflow questions, a package that seems to exactly do the same
thing, that is mimic *nix tail, and a blog post about how to tail in
Python. Furthermore, if you search python tail pypi, you can find a
bunch of other packages:

https://www.google.com/search?q=python+tail+pypi

It seems the subject is quite popular, and I can't imagine otherwise.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-12 Thread Marco Sulla

On Thu, 12 May 2022 at 00:50, Stefan Ram  wrote:
>
> Marco Sulla  writes:
> >def tail(filepath, n=10, chunk_size=100):
> >if (n <= 0):
> >raise ValueError(_err_n)
> ...
>
>   There's no spec/doc, so one can't even test it.

Excuse me, you're very right.

"""
A function that "tails" the file. If you don't know what that means,
google "man tail"

filepath: the file path of the file to be "tailed"
n: the numbers of lines "tailed"
chunk_size: oh don't care, use it as is
"""
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-12 Thread Marco Sulla

Thank you very much. This helped me to improve the function:

import os

_lf = b"\n"
_err_n = "Parameter n must be a positive integer number"
_err_chunk_size = "Parameter chunk_size must be a positive integer number"

def tail(filepath, n=10, chunk_size=100):
if (n <= 0):
raise ValueError(_err_n)

if (n % 1 != 0):
raise ValueError(_err_n)

if (chunk_size <= 0):
raise ValueError(_err_chunk_size)

if (chunk_size % 1 != 0):
raise ValueError(_err_chunk_size)

n_chunk_size = n * chunk_size
pos = os.stat(filepath).st_size
chunk_line_pos = -1
newlines_to_find = n
first_step = True

with open(filepath, "rb") as f:
text = bytearray()

while pos != 0:
pos -= n_chunk_size

if pos < 0:
pos = 0

f.seek(pos)
chars = f.read(n_chunk_size)
text[0:0] = chars
search_pos = n_chunk_size

while search_pos != -1:
chunk_line_pos = chars.rfind(_lf, 0, search_pos)

if first_step and chunk_line_pos == search_pos - 1:
newlines_to_find += 1

first_step = False

if chunk_line_pos != -1:
newlines_to_find -= 1

if newlines_to_find == 0:
break

search_pos = chunk_line_pos

if newlines_to_find == 0:
break

return bytes(text[chunk_line_pos+1:])



On Thu, 12 May 2022 at 20:29, Stefan Ram  wrote:

>   I am not aware of a definition of "line" above,
>   but the PLR says:
>
> |A physical line is a sequence of characters terminated
> |by an end-of-line sequence.
>
>   . So 10 lines should have 10 end-of-line sequences.
>

Maybe. Maybe not. What if the file ends with no newline?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-13 Thread Marco Sulla

On Fri, 13 May 2022 at 00:31, Cameron Simpson  wrote:

> On 12May2022 19:48, Marco Sulla  wrote:
> >On Thu, 12 May 2022 at 00:50, Stefan Ram  wrote:
> >>   There's no spec/doc, so one can't even test it.
> >
> >Excuse me, you're very right.
> >
> >"""
> >A function that "tails" the file. If you don't know what that means,
> >google "man tail"
> >
> >filepath: the file path of the file to be "tailed"
> >n: the numbers of lines "tailed"
> >chunk_size: oh don't care, use it as is
>
> This is nearly the worst "specification" I have ever seen.
>

You're lucky. I've seen much worse (or no one).
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-16 Thread Marco Sulla

On Fri, 13 May 2022 at 12:49, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>
> On 2022-05-13 at 12:16:57 +0200,
> Marco Sulla  wrote:
>
> > On Fri, 13 May 2022 at 00:31, Cameron Simpson  wrote:
>
> [...]
>
> > > This is nearly the worst "specification" I have ever seen.
>
> > You're lucky. I've seen much worse (or no one).
>
> At least with *no* documentation, the source code stands for itself.

So I did it well to not put one in the first time. I think that after
100 posts about tail, chunks etc it was clear what that stuff was
about and how to use it.

Speaking about more serious things, so far I've done a test with:

* a file that does not end with \n
* a file that ends with \n (after Stefan test)
* a file with more than 10 lines
* a file with less than 10 lines

It seemed to work. I've only to benchmark it. I suppose I have to test
with at least 1 GB file, a big lorem ipsum, and do an unequal
comparison with Linux tail. I'll do it when I have time, so Chris will
be no more angry with me.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-18 Thread Marco Sulla

Well, I've done a benchmark.

>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, 
>>> number=10)
1.5963431186974049
>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, 
>>> number=10)
2.5240604374557734
>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", 
>>> globals={"tail":tail}, number=10)
1.8944984432309866

small.txt is a text file of 1.3 KB. lorem.txt is a lorem ipsum of 1.2
GB. It seems the performance is good, thanks to the chunk suggestion.

But the time of Linux tail surprise me:

marco@buzz:~$ time tail lorem.txt
[text]

real0m0.004s
user0m0.003s
sys0m0.001s

It's strange that it's so slow. I thought it was because it decodes
and print the result, but I timed

timeit.timeit("print(tail('/home/marco/lorem.txt').decode('utf-8'))",
globals={"tail":tail}, number=10)

and I got ~36 seconds. It seems quite strange to me. Maybe I got the
benchmarks wrong at some point?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-19 Thread Marco Sulla

On Wed, 18 May 2022 at 23:32, Cameron Simpson  wrote:
>
> On 17May2022 22:45, Marco Sulla  wrote:
> >Well, I've done a benchmark.
> >>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, 
> >>>> number=10)
> >1.5963431186974049
> >>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, 
> >>>> number=10)
> >2.5240604374557734
> >>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", 
> >>>> globals={"tail":tail}, number=10)
> >1.8944984432309866
>
> This suggests that the file size does not dominate uour runtime.

Yes, this is what I wanted to test and it seems good.

> Ah.
> _Or_ that there are similar numbers of newlines vs text in the files so
> reading similar amounts of data from the end. If the "line desnity" of
> the files were similar you would hope that the runtimes would be
> similar.

No, well, small.txt has very short lines. Lorem.txt is a lorem ipsum,
so really long lines. Indeed I get better results tuning chunk_size.
Anyway, also with the default value the performance is not bad at all.

> >But the time of Linux tail surprise me:
> >
> >marco@buzz:~$ time tail lorem.txt
> >[text]
> >
> >real0m0.004s
> >user0m0.003s
> >sys0m0.001s
> >
> >It's strange that it's so slow. I thought it was because it decodes
> >and print the result, but I timed
>
> You're measuring different things. timeit() tries hard to measure just
> the code snippet you provide. It doesn't measure the startup cost of the
> whole python interpreter. Try:
>
> time python3 your-tail-prog.py /home/marco/lorem.txt

Well, I'll try it, but it's not a bit unfair to compare Python startup with C?
> BTW, does your `tail()` print output? If not, again not measuring the
> same thing.
> [...]
> Also: does tail(1) do character set / encoding stuff? Does your Python
> code do that? Might be apples and oranges.

Well, as I wrote I also timed

timeit.timeit("print(tail('/home/marco/lorem.txt').decode('utf-8'))",
globals={"tail":tail}, number=10)

and I got ~36 seconds.

> If you have the source of tail(1) to hand, consider getting to the core
> and measuring `time()` immediately before and immediately after the
> central tail operation and printing the result.

IMHO this is a very good idea, but I have to find the time(). Ahah. Emh.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Subtract n months from datetime

2022-06-22 Thread Marco Sulla

The package arrow has a simple shift method for months, weeks etc

https://arrow.readthedocs.io/en/latest/#replace-shift
-- 
https://mail.python.org/mailman/listinfo/python-list

Why I fail so bad to check for memory leak with this code?

2022-07-21 Thread Marco Sulla

I tried to check for memory leaks in a bunch of functions of mine using a
simple decorator. It works, but it fails with this code, returning a random
count_diff at every run. Why?

import tracemalloc
import gc
import functools
from uuid import uuid4
import pickle

def getUuid():
return str(uuid4())

def trace(func):
@functools.wraps(func)
def inner():
tracemalloc.start()

snapshot1 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

for i in range(100):
func()

gc.collect()

snapshot2 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

top_stats = snapshot2.compare_to(snapshot1, 'lineno')
tracemalloc.stop()

for stat in top_stats:
if stat.count_diff > 3:
raise ValueError(f"count_diff: {stat.count_diff}")

return inner

dict_1 = {getUuid(): i for i in range(1000)}

@trace
def func_76():
pickle.dumps(iter(dict_1))

func_76()
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Why I fail so bad to check for memory leak with this code?

2022-07-21 Thread Marco Sulla

On Thu, 21 Jul 2022 at 22:28, MRAB  wrote:
>
> It's something to do with pickling iterators because it still occurs
> when I reduce func_76 to:
>
> @trace
> def func_76():
>  pickle.dumps(iter([]))

It's too strange. I found a bunch of true memory leaks with this
decorator. It seems to be reliable. It's correct with pickle and with
iter, but not when pickling iters.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Why I fail so bad to check for memory leak with this code?

2022-07-21 Thread Marco Sulla

This naif code shows no leak:

import resource
import pickle

c = 0

while True:
pickle.dumps(iter([]))

if (c % 1) == 0:
max_rss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print(f"iteration: {c}, max rss: {max_rss} kb")

c += 1
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Why I fail so bad to check for memory leak with this code?

2022-07-21 Thread Marco Sulla

I've done this other simple test:

#!/usr/bin/env python3

import tracemalloc
import gc
import pickle

tracemalloc.start()

snapshot1 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

for i in range(1000):
pickle.dumps(iter([]))

gc.collect()

snapshot2 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

top_stats = snapshot2.compare_to(snapshot1, 'lineno')
tracemalloc.stop()

for stat in top_stats:
print(stat)

The result is:

/home/marco/sources/test.py:14: size=3339 B (+3339 B), count=63 (+63),
average=53 B
/home/marco/sources/test.py:9: size=464 B (+464 B), count=1 (+1),
average=464 B
/home/marco/sources/test.py:10: size=456 B (+456 B), count=1 (+1),
average=456 B
/home/marco/sources/test.py:13: size=28 B (+28 B), count=1 (+1), average=28
B

It seems that, after 10 million loops, only 63 have a leak, with only ~3
KB. It seems to me that we can't call it a leak, no? Probably pickle needs
a lot more cycles to be sure there's actually a real leakage.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Why I fail so bad to check for memory leak with this code?

2022-07-22 Thread Marco Sulla

On Fri, 22 Jul 2022 at 09:00, Barry  wrote:
> With code as complex as python’s there will be memory allocations that
occur that will not be directly related to the python code you test.
>
> To put it another way there is noise in your memory allocation signal.
>
> Usually the signal of a memory leak is very clear, as you noticed.
>
> For rare leaks I would use a tool like valgrind.

Thank you all, but I needed a simple decorator to automatize the memory
leak (and segfault) tests. I think that this version is good enough, I hope
that can be useful to someone:

def trace(iterations=100):
def decorator(func):
def wrapper():
print(
f"Loops: {iterations} - Evaluating: {func.__name__}",
flush=True
)

tracemalloc.start()

snapshot1 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

for i in range(iterations):
func()

gc.collect()

snapshot2 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

top_stats = snapshot2.compare_to(snapshot1, 'lineno')
tracemalloc.stop()

for stat in top_stats:
if stat.count_diff * 100 > iterations:
raise ValueError(f"stat: {stat}")

return wrapper

return decorator


If the decorated function fails, you can try to raise the iterations
parameter. I found that in my cases sometimes I needed a value of 200 or 300
-- 
https://mail.python.org/mailman/listinfo/python-list

How to generate a .pyi file for a C Extension using stubgen

2022-07-29 Thread Marco Sulla

I tried to follow the instructions here:

https://mypy.readthedocs.io/en/stable/stubgen.html

but the instructions about creating a stub for a C Extension are a little
mysterious. I tried to use it on the .so file without luck.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to generate a .pyi file for a C Extension using stubgen

2022-07-30 Thread Marco Sulla

On Fri, 29 Jul 2022 at 23:23, Barry  wrote:
>
>
>
> > On 29 Jul 2022, at 19:33, Marco Sulla  wrote:
> >
> > I tried to follow the instructions here:
> >
> > https://mypy.readthedocs.io/en/stable/stubgen.html
> >
> > but the instructions about creating a stub for a C Extension are a little
> > mysterious. I tried to use it on the .so file without luck.
>
> It says that stubgen works on .py files not .so files.
> You will need to write the .pyi for your .so manually.
>
> The docs could do with splitting the need for .pyi for .so
> away from the stubgen description.

But it says:

"Mypy includes the stubgen tool that can automatically generate stub
files (.pyi files) for Python modules and C extension modules."

I tried stubgen -m modulename, but it generates very little code.
-- 
https://mail.python.org/mailman/listinfo/python-list

Am I banned from Discuss forum?

2023-02-10 Thread Marco Sulla

I was banned from the mailing list and Discuss forum for a very long time.
Too much IMHO, but I paid my dues.

Now this is my state in the forum:
- I never posted something unrespectful in the last months
- I have a limitation of three posts per threads, but only on some threads
- Some random posts of mine are obscured and must be restored manually by
moderators
- I opened a thread about the proposal of a new section called
Brainstorming. It was closed without a reason.
- I can't post links
- Two discussions I posted in section Idea were moved to Help, without a
single line of explanation.

If I'm not appreciated, I want to be publicly banned with a good reason, or
at least a reason.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: use set notation for repr of dict_keys?

2021-02-24 Thread Marco Sulla

On Wed, 24 Feb 2021 at 06:29, Random832  wrote:
> I was surprised, though, to find that you can't remove items directly from 
> the key set, or in general update it in place with &= or -= (these operators 
> work, but give a new set object).

This is because they are a view. Changing the key object means you
will change the underlying dict. Probably not that you want or expect.
You can just "cast" them into a "real" set object.

There was a discussion to implement the whole Set interface for dicts.
Currently, only `|` is supported.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: use set notation for repr of dict_keys?

2021-02-24 Thread Marco Sulla

On Wed, 24 Feb 2021 at 15:02, Random832  wrote:
> On Wed, Feb 24, 2021, at 02:59, Marco Sulla wrote:
> > On Wed, 24 Feb 2021 at 06:29, Random832  wrote:
> > > I was surprised, though, to find that you can't remove items directly 
> > > from the key set, or in general update it in place with &= or -= (these 
> > > operators work, but give a new set object).
> >
> > This is because they are a view. Changing the key object means you
> > will change the underlying dict. Probably not that you want or expect.
>
> Why wouldn't it be what I want or expect? Java allows exactly this

I didn't know this. I like Java, but IMHO it's quite confusing that
you can remove a key from a Map using the keys object. In my mind it's
more natural to think views as read-only, while changes can be done
only using the original object. But maybe my mind has too strict
bounds.

> [and it's the only way provided to, for example, remove all keys matching a
> predicate in a single pass... an operation that Python sets don't support 
> either]

I hope indeed that someday Python can do:

filtered_dict = a_dict - a_set
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: editor recommendations?

2021-02-26 Thread Marco Sulla

I use Sublime free for simple tasks. I like the fact it's fast and it
saves to disk immediately. You don't have even to name the file. I use
it also for taking notes. Probably not as powerful as Vim and it's
proprietary.
For development, I use PyCharm, but it's an IDE.

I also used in past:
gedit: slow
atom: slow
notepad++: windows only
emacs: too much for my needs
scite: too minimalist
kate: not bad at all
visual studio: resource intensive
eclipse: slow (even if I continue to use it for non-Python coding)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: weirdness with list()

2021-02-28 Thread Marco Sulla

On Sun, 28 Feb 2021 at 01:19, Cameron Simpson  wrote:
> My object represents an MDAT box in an MP4 file: it is the ludicrously
> large data box containing the raw audiovideo data; for a TV episode it
> is often about 2GB and a movie is often 4GB to 6GB.
> [...]
> That length is presented via the object's __len__ method
> [...]
>
> I noticed that it was stalling, and investigation revealed it was
> stalling at this line:
>
> subboxes = list(self)
>
> when doing the MDAT box. That box (a) has no subboxes at all and (b) has
> a very large __len__ value.
>
> BUT... It also has a __iter__ value, which like any Box iterates over
> the subboxes. For MDAT that is implemented like this:
>
> def __iter__(self):
> yield from ()
>
> What I was expecting was pretty much instant construction of an empty
> list. What I was getting was a very time consuming (10 seconds or more)
> construction of an empty list.

I can't reproduce, Am I missing something?

marco@buzz:~$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class A:
... def __len__(self):
... return 1024**3
... def __iter__(self):
... yield from ()
...
>>> a = A()
>>> len(a)
1073741824
>>> list(a)
[]
>>>

It takes milliseconds to run list(a)
-- 
https://mail.python.org/mailman/listinfo/python-list

Why assert is not a function?

2021-03-02 Thread Marco Sulla

I have a curiosity. Python, as many languages, has assert as a
keyword. Can't it be implemented as a function? Is there an advantage
to have it as a keyword?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: yield from () Was: Re: weirdness with list()

2021-03-02 Thread Marco Sulla

On Mon, 1 Mar 2021 at 19:51, Alan Gauld via Python-list
 wrote:
> Sorry, a bit OT but I'm curious. I haven't seen
> this before:
>
> yield from ()
>
> What is it doing?
> What do the () represent in this context?

It's the empty tuple.
-- 
https://mail.python.org/mailman/listinfo/python-list

How to create both a c extension and a pure python package

2021-03-09 Thread Marco Sulla

As title. Currently I ended up using this trick in my setup.py:


if len(argv) > 1 and argv[1] == "c":
sys.argv = [sys.argv[0]] + sys.argv[2:]
setuptools.setup(ext_modules = ext_modules, **common_setup_args)
else:
setuptools.setup(**common_setup_args)


So if I pass "c" as the first argument of ./setup.py , the c extension
is builded, otherwise the py version is packaged.

Is there not a better way to do this?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to create both a c extension and a pure python package

2021-03-10 Thread Marco Sulla

On Wed, 10 Mar 2021 at 16:45, Thomas Jollans  wrote:
> Why are you doing this?
>
> If all you want is for it to be possible to install the package from
> source on a system that can't use the C part, you could just declare
> your extension modules optional

Because I want to provide (at least) two wheels: a wheel for linux
users with the C extension compiled and a generic wheel in pure python
as a fallback for any other architecture.

If I make the extension optional, as far as I know, only one wheel is
produced: the wheel with the extension if all is successful, or the
pure py wheel.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Why assert is not a function?

2021-03-12 Thread Marco Sulla

On Thu, 11 Mar 2021 at 23:11, Ethan Furman  wrote:
> Basically, you are looking at two different philosophies:
>
> - Always double check, get good error message when something fails
>
> vs
>
> - check during testing and QA, turn off double-checks for production for best 
> performance possible.

In a perfect world, I said the second option is the best. But for the
majority of projects I contributed, speed was not a critical issue. On
the contrary, it's very hard to get meaningful informations about
problems in production, so I'm in favour of the first school :)
-- 
https://mail.python.org/mailman/listinfo/python-list

How to support annotations for a custom type in a C extension?

2021-09-17 Thread Marco Sulla

I created a custom dict in a C extension. Name it `promethea`. How can
I implement `promethea[str, str]`? Now I get:

TypeError: 'type' object is not subscriptable
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to support annotations for a custom type in a C extension?

2021-09-17 Thread Marco Sulla

Ooook. I have a question. Why is this code not present in
dictobject.c? Where are the dict annotations implemented?

On Sat, 18 Sept 2021 at 03:00, MRAB  wrote:
>
> On 2021-09-17 21:03, Marco Sulla wrote:
> > I created a custom dict in a C extension. Name it `promethea`. How can
> > I implement `promethea[str, str]`? Now I get:
> >
> > TypeError: 'type' object is not subscriptable
> >
> Somewhere you'll have a table of the class's methods. It needs an entry
> like this:
>
>
> static PyMethodDef customdict_methods[] = {
> ...
>  {"__class_getitem__", (PyCFunction)Py_GenericAlias, METH_CLASS |
> METH_O | METH_COEXIST, PyDoc_STR("See PEP 585")},
> ...
> };
>
>
> Note the flags: METH_CLASS says that it's a class method and
> METH_COEXIST says that it should use this method instead of the slot.
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Python C API: how to mark a type as subclass of another type

2021-10-31 Thread Marco Sulla

I have two types declared as

PyTypeObject PyX_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)

etc.

How can I mark one of the types as subclass of the other one? I tried
to use tp_base but it didn't work.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python C API: how to mark a type as subclass of another type

2021-11-02 Thread Marco Sulla

I already added the address of the type to tp_base, but it does not work.

On Mon, 1 Nov 2021 at 17:18, Dieter Maurer  wrote:
>
> Marco Sulla wrote at 2021-10-31 23:59 +0100:
> >I have two types declared as
> >
> >PyTypeObject PyX_Type = {
> >PyVarObject_HEAD_INIT(&PyType_Type, 0)
> >
> >etc.
> >
> >How can I mark one of the types as subclass of the other one? I tried
> >to use tp_base but it didn't work.
>
> Read the "Python/C Api" documentation. Watch out for `tp_base`.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python C API: how to mark a type as subclass of another type

2021-11-02 Thread Marco Sulla

*ahem* evidently I didn't check the right package. it works like a charme :D

On Tue, 2 Nov 2021 at 13:43, Marco Sulla  wrote:
>
> I already added the address of the type to tp_base, but it does not work.
>
> On Mon, 1 Nov 2021 at 17:18, Dieter Maurer  wrote:
> >
> > Marco Sulla wrote at 2021-10-31 23:59 +0100:
> > >I have two types declared as
> > >
> > >PyTypeObject PyX_Type = {
> > >PyVarObject_HEAD_INIT(&PyType_Type, 0)
> > >
> > >etc.
> > >
> > >How can I mark one of the types as subclass of the other one? I tried
> > >to use tp_base but it didn't work.
> >
> > Read the "Python/C Api" documentation. Watch out for `tp_base`.
-- 
https://mail.python.org/mailman/listinfo/python-list

Py_IS_TYPE(op, &PyDict_Type) does not work on MacOS

2021-11-07 Thread Marco Sulla

As you can read here:

https://github.com/Marco-Sulla/python-frozendict/issues/37

Py_IS_TYPE(op, &PyDict_Type) did not work on MacOS. I had to use PyDict_Check.

Why don't I have this problem with Linux?

PS: since I'm creating a modified version of dict, I copied the dict
internal source and I link against them. Maybe the problem is
correlated.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Py_IS_TYPE(op, &PyDict_Type) does not work on MacOS

2021-11-10 Thread Marco Sulla

Indeed now I use PyDict_Check, but anyway it's very strange that
Py_IS_TYPE(op, &PyDict_Type) does not work only on MacOS.

On Mon, 8 Nov 2021 at 19:30, Barry  wrote:
>
>
>
> On 8 Nov 2021, at 07:45, Marco Sulla  wrote:
>
> As you can read here:
>
> https://github.com/Marco-Sulla/python-frozendict/issues/37
>
> Py_IS_TYPE(op, &PyDict_Type) did not work on MacOS. I had to use PyDict_Check.
>
> Why don't I have this problem with Linux?
>
> PS: since I'm creating a modified version of dict, I copied the dict
> internal source and I link against them. Maybe the problem is
> correlated.
>
>
> You can see what I did for PyCXX at 
> https://sourceforge.net/p/cxx/code/HEAD/tree/trunk/CXX/Src/IndirectPythonInterface.cxx
>
> See the _DictCheck and ends up using PyObject_IsInstance.
>
> My guess is that use PyDict_Check is a good and better for the future.
>
> Barry
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Unable to compile my C Extension on Windows: unresolved external link errors

2021-11-12 Thread Marco Sulla

I have no problem compiling my extension under Linux and MacOS. Under
Windows, I get a lot of

Error LNK2001: unresolved external symbol PyErr_SetObject

and so on.

I post the part of my setup.py about the C Extension:

extra_compile_args = ["-DPY_SSIZE_T_CLEAN", "-DPy_BUILD_CORE"]
undef_macros = []

setuptools.Extension(
ext1_fullname,
sources = cpython_sources,
include_dirs = cpython_include_dirs,
extra_compile_args = extra_compile_args,
undef_macros = undef_macros,
)

Here is the full code:
https://github.com/Marco-Sulla/python-frozendict/blob/master/setup.py

Steps to reproduce: I installed python3.10 and VS compiler on my
Windows 10 machine, then I created a venv, activated it, run

pip install -U pip setuptools wheel

and then

python setup.py bdist_wheel
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unable to compile my C Extension on Windows: unresolved external link errors

2021-11-12 Thread Marco Sulla

On Fri, 12 Nov 2021 at 15:55, Gisle Vanem  wrote:
> Marco Sulla wrote:
> > Error LNK2001: unresolved external symbol PyErr_SetObject
> >
> > and so on.
> >
> > I post the part of my setup.py about the C Extension:
> >
> > extra_compile_args = ["-DPY_SSIZE_T_CLEAN", "-DPy_BUILD_CORE"]
>
> Shouldn't this be "-DPy_BUILD_CORE_MODULE"?

I tried it, but now I get three

error C2099: initializer is not a constant

when I try to compile dictobject.c. Yes, my extension needs dictobject.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unable to compile my C Extension on Windows: unresolved external link errors

2021-11-12 Thread Marco Sulla

Chris? Maybe I'm dreaming X-D

On Fri, 12 Nov 2021 at 17:38, Chris Angelico  wrote:
> Are you sure that you really need Py_BUILD_CORE?

Yes, because I need the internal functions of `dict`. So I need to
compile also dictobject.c and include it. So I need that flag.

This is the code:

https://github.com/Marco-Sulla/python-frozendict.git

On Linux and MacOS it works like a charme. On Windows, it seems it does not find
python3.lib. I also added its path to the PATH variable
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unable to compile my C Extension on Windows: unresolved external link errors

2021-11-12 Thread Marco Sulla

On Fri, 12 Nov 2021 at 21:09, Chris Angelico  wrote:
>
> On Sat, Nov 13, 2021 at 7:01 AM Marco Sulla
>  wrote:
> > On Fri, 12 Nov 2021 at 17:38, Chris Angelico  wrote:
> > > Are you sure that you really need Py_BUILD_CORE?
> >
> > Yes, because I need the internal functions of `dict`. So I need to
> > compile also dictobject.c and include it. So I need that flag.
> >
> > This is the code:
> >
> > https://github.com/Marco-Sulla/python-frozendict.git
> >
>
> Ah, gotcha.
>
> Unfortunately that does mean you're delving deep into internals, and a
> lot of stuff that isn't designed for extensions to use. So my best
> recommendation is: dig even deeper into internals, and duplicate how
> the core is doing things (maybe including another header or
> something). It may be that, by declaring Py_BUILD_CORE, you're getting
> a macro version of that instead of the normal exported function.

I've not understood what I have to do in practice but anyway, as I
said, Py_BUILD_CORE works on Linux and MacOS. And it works also on
Windows. Indeed dictobject.c is compiled. The only problem is in the
linking phase, when the two objects should be linked in one library,
_the_ library. It seems that on Windows it doesn't find python3.lib,
even if I put it in the path. So I get the `unresolved external link`
errors.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unable to compile my C Extension on Windows: unresolved external link errors

2021-11-13 Thread Marco Sulla

. Sorry, the problem is I downloaded the 32 bit version of VS
compiler and 64 bit version of Python..

On Sat, 13 Nov 2021 at 11:10, Barry Scott  wrote:
>
>
>
> > On 13 Nov 2021, at 09:00, Barry  wrote:
> >
> >
> >
> >> On 12 Nov 2021, at 22:53, Marco Sulla  wrote:
> >>
> >> It seems that on Windows it doesn't find python3.lib,
> >> even if I put it in the path. So I get the `unresolved external link`
> >> errors.
> >
> > I think you need the python310.lib (not sure of file name) to get to the 
> > internal symbols.
>
> Another thing that you will need to check is that the symbols you are after 
> have been
> exposed in the DLL at all. Being external in the source is not enough they 
> also have to
> listed in the .DLL's def file ( is that the right term?) as well.
>
> If its not clear yet, you are going to have to read a lot or source code and 
> understand
> the tool chain used on Windows to solve this.
>
>
> >
> > You can use the objdump(?) utility to check that the symbols are in the lib.
> >
> > Barry
>
> Barry
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unable to compile my C Extension on Windows: unresolved external link errors

2021-11-14 Thread Marco Sulla

Okay, now the problem seems to be another: I get the same "unresolved
external link" errors, but only for internal functions.

This seems quite normal. The public .lib does not expose the internals
of Python.
The strange fact is: why can I compile it on Linux and MacOS? Their
external libraries expose the internal functions?

Anyway, is there a way to compile Python on Windows in such a way that
I get a shared library that exposes all the functions?

On Sat, 13 Nov 2021 at 12:17, Marco Sulla  wrote:
>
> . Sorry, the problem is I downloaded the 32 bit version of VS
> compiler and 64 bit version of Python..
>
> On Sat, 13 Nov 2021 at 11:10, Barry Scott  wrote:
> >
> >
> >
> > > On 13 Nov 2021, at 09:00, Barry  wrote:
> > >
> > >
> > >
> > >> On 12 Nov 2021, at 22:53, Marco Sulla  
> > >> wrote:
> > >>
> > >> It seems that on Windows it doesn't find python3.lib,
> > >> even if I put it in the path. So I get the `unresolved external link`
> > >> errors.
> > >
> > > I think you need the python310.lib (not sure of file name) to get to the 
> > > internal symbols.
> >
> > Another thing that you will need to check is that the symbols you are after 
> > have been
> > exposed in the DLL at all. Being external in the source is not enough they 
> > also have to
> > listed in the .DLL's def file ( is that the right term?) as well.
> >
> > If its not clear yet, you are going to have to read a lot or source code 
> > and understand
> > the tool chain used on Windows to solve this.
> >
> >
> > >
> > > You can use the objdump(?) utility to check that the symbols are in the 
> > > lib.
> > >
> > > Barry
> >
> > Barry
> >
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unable to compile my C Extension on Windows: unresolved external link errors

2021-11-14 Thread Marco Sulla

On Sun, 14 Nov 2021 at 16:42, Barry Scott  wrote:
>
> Sorry iPad sent the message before it was complete...
>
> > On 14 Nov 2021, at 10:38, Marco Sulla  wrote:
> >
> > Okay, now the problem seems to be another: I get the same "unresolved
> > external link" errors, but only for internal functions.
> >
> > This seems quite normal. The public .lib does not expose the internals
> > of Python.
> > The strange fact is: why can I compile it on Linux and MacOS? Their
> > external libraries expose the internal functions?
>
> Windows is not Linux is not macOS,
> The toolchain on each OS has its own strengths, weaknesses and quirks.
>
> On Windows DLLs only allow access to the symbols that are explicitly listed 
> to be access.

Where are those symbols listed?

> On macOS .dynlib and Unix .so its being extern that does this.

And extern is the default. I understand now.

> Maybe you could copy the code that you want and add it to your code?
> Change any conflicting symbols of course.

It's quite hard. I have to compile dictobject.c, which needs a lot of
internal functions. And I suppose that every internal function may
require 1 or more other internal functions.

I have other two other solutions:
* compile a whole python DLL with the symbols I need and link against
it. I have to put this DLL in my code, which is ugly.
* drop the support of the C Extension for Windows users and make for
them the slow, pure py version only.

Since my interest in Windows now is near to zero, I think I'll opt for
the third for now.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to support annotations for a custom type in a C extension?

2021-11-18 Thread Marco Sulla

It works. Thanks a lot.

On Sun, 19 Sept 2021 at 19:23, Serhiy Storchaka  wrote:
>
> 19.09.21 05:59, MRAB пише:
> > On 2021-09-18 16:09, Serhiy Storchaka wrote:
> >> "(PyCFunction)" is redundant, Py_GenericAlias already has the right
> >> type. Overuse of casting to PyCFunction can hide actual bugs.
> >>
> > I borrowed that from listobject.c, which does have the cast.
>
> Fixed. https://github.com/python/cpython/pull/28450
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

pytest segfault, not with -v

2021-11-19 Thread Marco Sulla

I have a battery of tests done with pytest. My tests break with a
segfault if I run them normally. If I run them using pytest -v, the
segfault does not happen.

What could cause this quantical phenomenon?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: getting source code line of error?

2021-11-19 Thread Marco Sulla

Have you tried the logger module and the format options?

On Fri, 19 Nov 2021 at 19:09, Ulli Horlacher
 wrote:
>
> I am trying to get the source code line of the last error.
> I know traceback.format_exc() but this contains much more information, e.g.:
>
> Traceback (most recent call last):
>   File "./error.py", line 18, in main
> x=1/0
> ZeroDivisionError: division by zero
>
> I could extract the source code line with re.search(), but is there an
> easier way?
>
>
> I have:
>
>   exc_type,exc_str,exc_tb = sys.exc_info()
>   fname = exc_tb.tb_frame.f_code.co_filename
>   line = exc_tb.tb_lineno
>   print('%s in %s line %d' % (exc_str,fname,line))
>
> But I also want to output the line itself, not only its number.
>
> --
> Ullrich Horlacher  Server und Virtualisierung
> Rechenzentrum TIK
> Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de
> Allmandring 30aTel:++49-711-68565868
> 70569 Stuttgart (Germany)  WWW:http://www.tik.uni-stuttgart.de/
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

frozenset can be altered by |=

2021-11-19 Thread Marco Sulla

(venv_3_10) marco@buzz:~$ python
Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
[GCC 10.1.1 20200718] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = frozenset((3, 4))
>>> a
frozenset({3, 4})
>>> a |= {5,}
>>> a
frozenset({3, 4, 5})
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: frozenset can be altered by |=

2021-11-19 Thread Marco Sulla

Mh. Now I'm thinking that I've done

a = "Marco "
a += "Sulla"

many times without bothering.

On Fri, 19 Nov 2021 at 22:22, Chris Angelico  wrote:
>
> On Sat, Nov 20, 2021 at 8:16 AM Chris Angelico  wrote:
> >
> > On Sat, Nov 20, 2021 at 8:13 AM Marco Sulla
> >  wrote:
> > >
> > > (venv_3_10) marco@buzz:~$ python
> > > Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
> > > [GCC 10.1.1 20200718] on linux
> > > Type "help", "copyright", "credits" or "license" for more information.
> > > >>> a = frozenset((3, 4))
> > > >>> a
> > > frozenset({3, 4})
> > > >>> a |= {5,}
> > > >>> a
> > > frozenset({3, 4, 5})
> >
> > That's the same as how "x = 4; x += 1" can "alter" four into five.
> >
> > >>> a = frozenset((3, 4))
> > >>> id(a), a
> > (140545764976096, frozenset({3, 4}))
> > >>> a |= {5,}
> > >>> id(a), a
> > (140545763014944, frozenset({3, 4, 5}))
> >
> > It's a different frozenset.
> >
>
> Oh, even better test:
>
> >>> a = frozenset((3, 4)); b = a
> >>> id(a), a, id(b), b
> (140602825123296, frozenset({3, 4}), 140602825123296, frozenset({3, 4}))
> >>> a |= {5,}
> >>> id(a), a, id(b), b
> (140602825254144, frozenset({3, 4, 5}), 140602825123296, frozenset({3, 4}))
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: pytest segfault, not with -v

2021-11-19 Thread Marco Sulla

On Fri, 19 Nov 2021 at 20:38, MRAB  wrote:
>
> On 2021-11-19 17:48, Marco Sulla wrote:
> > I have a battery of tests done with pytest. My tests break with a
> > segfault if I run them normally. If I run them using pytest -v, the
> > segfault does not happen.
> >
> > What could cause this quantical phenomenon?
> >
> Are you testing an extension that you're compiling? That kind of problem
> can occur if there's an uninitialised variable or incorrect reference
> counting (Py_INCREF/Py_DECREF).

Ok, I know. But why can't it be reproduced if I do pytest -v? This way
I don't know which test fails.
Furthermore I noticed that if I remove the __pycache__ dir of tests,
pytest does not crash, until I re-ran it with the __pycache__ dir
present.
This way is very hard for me to understand what caused the segfault.
I'm starting to think pytest is not good for testing C extensions.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: pytest segfault, not with -v

2021-11-20 Thread Marco Sulla

Indeed I have introduced a command line parameter in my bench.py
script that simply specifies the number of times the benchmarks are
performed. This way I have a sort of segfault checker.

But I don't bench any part of the library. I suppose I have to create
a separate script that does a simple loop for all the cases, and
remove the optional parameter from bench. How boring.
PS: is there a way to monitor the Python consumed memory inside Python
itself? In this way I could also trap memory leaks.

On Sat, 20 Nov 2021 at 01:46, MRAB  wrote:
>
> On 2021-11-19 23:44, Marco Sulla wrote:
> > On Fri, 19 Nov 2021 at 20:38, MRAB  wrote:
> >>
> >> On 2021-11-19 17:48, Marco Sulla wrote:
> >> > I have a battery of tests done with pytest. My tests break with a
> >> > segfault if I run them normally. If I run them using pytest -v, the
> >> > segfault does not happen.
> >> >
> >> > What could cause this quantical phenomenon?
> >> >
> >> Are you testing an extension that you're compiling? That kind of problem
> >> can occur if there's an uninitialised variable or incorrect reference
> >> counting (Py_INCREF/Py_DECREF).
> >
> > Ok, I know. But why can't it be reproduced if I do pytest -v? This way
> > I don't know which test fails.
> > Furthermore I noticed that if I remove the __pycache__ dir of tests,
> > pytest does not crash, until I re-ran it with the __pycache__ dir
> > present.
> > This way is very hard for me to understand what caused the segfault.
> > I'm starting to think pytest is not good for testing C extensions.
> >
> If there are too few Py_INCREF or too many Py_DECREF, it'll free the
> object too soon, and whether or when that will cause a segfault will
> depend on whatever other code is running. That's the nature of the
> beast: it's unpredictable!
>
> You could try running each of the tests in a loop to see which one
> causes a segfault. (Trying several in a loop will let you narrow it down
> more quickly.)
>
> pytest et al. are good for testing behaviour, but not for narrowing down
> segfaults.
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

No right operator in tp_as_number?

2021-11-20 Thread Marco Sulla

I checked the documentation:
https://docs.python.org/3/c-api/typeobj.html#number-structs
and it seems that, in the Python C API, the right operators do not exist.
For example, there is nb_add, that in Python is __add__, but there's
no nb_right_add, that in Python is __radd__

Am I missing something?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: pytest segfault, not with -v

2021-11-20 Thread Marco Sulla

I know how to check the refcounts, but I don't know how to check the
memory usage, since it's not a program, it's a simple library. Is
there not a way to check inside Python the memory usage? I have to use
a bash script (I'm on Linux)?

On Sat, 20 Nov 2021 at 19:00, MRAB  wrote:
>
> On 2021-11-20 17:40, Marco Sulla wrote:
> > Indeed I have introduced a command line parameter in my bench.py
> > script that simply specifies the number of times the benchmarks are
> > performed. This way I have a sort of segfault checker.
> >
> > But I don't bench any part of the library. I suppose I have to create
> > a separate script that does a simple loop for all the cases, and
> > remove the optional parameter from bench. How boring.
> > PS: is there a way to monitor the Python consumed memory inside Python
> > itself? In this way I could also trap memory leaks.
> >
> I'm on Windows 10, so I debug in Microsoft Visual Studio. I also have a
> look at the memory usage in Task Manager. If the program uses more
> memory when there are more iterations, then that's a sign of a memory
> leak. For some objects I'd look at the reference count to see if it's
> increasing or decreasing for each iteration when it should be constant
> over time.
>
> > On Sat, 20 Nov 2021 at 01:46, MRAB  wrote:
> >>
> >> On 2021-11-19 23:44, Marco Sulla wrote:
> >> > On Fri, 19 Nov 2021 at 20:38, MRAB  wrote:
> >> >>
> >> >> On 2021-11-19 17:48, Marco Sulla wrote:
> >> >> > I have a battery of tests done with pytest. My tests break with a
> >> >> > segfault if I run them normally. If I run them using pytest -v, the
> >> >> > segfault does not happen.
> >> >> >
> >> >> > What could cause this quantical phenomenon?
> >> >> >
> >> >> Are you testing an extension that you're compiling? That kind of problem
> >> >> can occur if there's an uninitialised variable or incorrect reference
> >> >> counting (Py_INCREF/Py_DECREF).
> >> >
> >> > Ok, I know. But why can't it be reproduced if I do pytest -v? This way
> >> > I don't know which test fails.
> >> > Furthermore I noticed that if I remove the __pycache__ dir of tests,
> >> > pytest does not crash, until I re-ran it with the __pycache__ dir
> >> > present.
> >> > This way is very hard for me to understand what caused the segfault.
> >> > I'm starting to think pytest is not good for testing C extensions.
> >> >
> >> If there are too few Py_INCREF or too many Py_DECREF, it'll free the
> >> object too soon, and whether or when that will cause a segfault will
> >> depend on whatever other code is running. That's the nature of the
> >> beast: it's unpredictable!
> >>
> >> You could try running each of the tests in a loop to see which one
> >> causes a segfault. (Trying several in a loop will let you narrow it down
> >> more quickly.)
> >>
> >> pytest et al. are good for testing behaviour, but not for narrowing down
> >> segfaults.
> >
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: frozenset can be altered by |=

2021-11-22 Thread Marco Sulla

Yes, and you do this regularly. Indeed integers, for example, are immutables and

a = 0
a += 1

is something you do dozens of times, and you simply don't think that
another object is created and substituted for the variable named `a`.

On Mon, 22 Nov 2021 at 14:59, Chris Angelico  wrote:
>
> On Tue, Nov 23, 2021 at 12:52 AM David Raymond  
> wrote:
> > It is a little confusing since the docs list this in a section that says 
> > they don't apply to frozensets, and lists the two versions next to each 
> > other as the same thing.
> >
> > https://docs.python.org/3.9/library/stdtypes.html#set-types-set-frozenset
> >
> > The following table lists operations available for set that do not apply to 
> > immutable instances of frozenset:
> >
> > update(*others)
> > set |= other | ...
> >
> > Update the set, adding elements from all others.
>
> Yeah, it's a little confusing, but at the language level, something
> that doesn't support |= will implicitly support it using the expanded
> version:
>
> a |= b
> a = a | b
>
> and in the section above, you can see that frozensets DO support the
> Or operator.
>
> By not having specific behaviour on the |= operator, frozensets
> implicitly fall back on this default.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: frozenset can be altered by |=

2021-11-29 Thread Marco Sulla

I must say that I'm reading the documentation now, and it's a bit
confusing. In the docs, inplace operators as |= should not work. They
are listed under the set-only functions and operators. But, as we saw,
this is not completely true: they work but they don't mutate the
original object. The same for += and *= that are listed under `list`
only.

On Mon, 22 Nov 2021 at 19:54, Marco Sulla  wrote:
>
> Yes, and you do this regularly. Indeed integers, for example, are immutables 
> and
>
> a = 0
> a += 1
>
> is something you do dozens of times, and you simply don't think that
> another object is created and substituted for the variable named `a`.
>
> On Mon, 22 Nov 2021 at 14:59, Chris Angelico  wrote:
> >
> > On Tue, Nov 23, 2021 at 12:52 AM David Raymond  
> > wrote:
> > > It is a little confusing since the docs list this in a section that says 
> > > they don't apply to frozensets, and lists the two versions next to each 
> > > other as the same thing.
> > >
> > > https://docs.python.org/3.9/library/stdtypes.html#set-types-set-frozenset
> > >
> > > The following table lists operations available for set that do not apply 
> > > to immutable instances of frozenset:
> > >
> > > update(*others)
> > > set |= other | ...
> > >
> > > Update the set, adding elements from all others.
> >
> > Yeah, it's a little confusing, but at the language level, something
> > that doesn't support |= will implicitly support it using the expanded
> > version:
> >
> > a |= b
> > a = a | b
> >
> > and in the section above, you can see that frozensets DO support the
> > Or operator.
> >
> > By not having specific behaviour on the |= operator, frozensets
> > implicitly fall back on this default.
> >
> > ChrisA
> > --
> > https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: pytest segfault, not with -v

2021-12-18 Thread Marco Sulla

Ok, I created the script:

https://github.com/Marco-Sulla/python-frozendict/blob/master/test/debug.py

The problem is it does _not_ crash, while a get a segfault using
pytest with python 3.9 on MacOS 10.15

Maybe it's because I'm using eval / exec in the script?

On Sat, 20 Nov 2021 at 18:40, Marco Sulla  wrote:
>
> Indeed I have introduced a command line parameter in my bench.py
> script that simply specifies the number of times the benchmarks are
> performed. This way I have a sort of segfault checker.
>
> But I don't bench any part of the library. I suppose I have to create
> a separate script that does a simple loop for all the cases, and
> remove the optional parameter from bench. How boring.
> PS: is there a way to monitor the Python consumed memory inside Python
> itself? In this way I could also trap memory leaks.
>
> On Sat, 20 Nov 2021 at 01:46, MRAB  wrote:
> >
> > On 2021-11-19 23:44, Marco Sulla wrote:
> > > On Fri, 19 Nov 2021 at 20:38, MRAB  wrote:
> > >>
> > >> On 2021-11-19 17:48, Marco Sulla wrote:
> > >> > I have a battery of tests done with pytest. My tests break with a
> > >> > segfault if I run them normally. If I run them using pytest -v, the
> > >> > segfault does not happen.
> > >> >
> > >> > What could cause this quantical phenomenon?
> > >> >
> > >> Are you testing an extension that you're compiling? That kind of problem
> > >> can occur if there's an uninitialised variable or incorrect reference
> > >> counting (Py_INCREF/Py_DECREF).
> > >
> > > Ok, I know. But why can't it be reproduced if I do pytest -v? This way
> > > I don't know which test fails.
> > > Furthermore I noticed that if I remove the __pycache__ dir of tests,
> > > pytest does not crash, until I re-ran it with the __pycache__ dir
> > > present.
> > > This way is very hard for me to understand what caused the segfault.
> > > I'm starting to think pytest is not good for testing C extensions.
> > >
> > If there are too few Py_INCREF or too many Py_DECREF, it'll free the
> > object too soon, and whether or when that will cause a segfault will
> > depend on whatever other code is running. That's the nature of the
> > beast: it's unpredictable!
> >
> > You could try running each of the tests in a loop to see which one
> > causes a segfault. (Trying several in a loop will let you narrow it down
> > more quickly.)
> >
> > pytest et al. are good for testing behaviour, but not for narrowing down
> > segfaults.
> > --
> > https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: pytest segfault, not with -v

2021-12-18 Thread Marco Sulla

Emh, maybe I was not clear. I created a C extension and it segfaults.
So I created that script to see where it segfaults. But the script
does not segfault. My doubt is: is that because I'm using eval and
exec in the script?

On Sat, 18 Dec 2021 at 18:33, Dieter Maurer  wrote:
>
> Marco Sulla wrote at 2021-12-18 14:10 +0100:
> >Ok, I created the script:
> >
> >https://github.com/Marco-Sulla/python-frozendict/blob/master/test/debug.py
> >
> >The problem is it does _not_ crash, while a get a segfault using
> >pytest with python 3.9 on MacOS 10.15
> >
> >Maybe it's because I'm using eval / exec in the script?
>
> Segfaults can result from C stack overflow which in turn can
> be caused in special cases by too deeply nested function calls
> (usually, Python's "maximal recursion depth exceeded" prevents
> this before a C stack overflow).
>
> Otherwise, whatever you do in Python (this includes "eval/exec")
> should not cause a segfault. The cause for it likely comes from
> a memory management bug in some C implemented part of your
> application.
>
> Note that memory management bugs may not show deterministic
> behavior. Minor changes (such as "with/without -v")
> can significantly change the outcome.
-- 
https://mail.python.org/mailman/listinfo/python-list

Py_TRASHCAN_SAFE_BEGIN/END in C extension?

2021-12-21 Thread Marco Sulla

In Python 3.7, must Py_TRASHCAN_SAFE_BEGIN - Py_TRASHCAN_SAFE_END be
used in a C extension?

I'm asking because in my C extension I use them in the deallocator
without problems, but users signalled me that they segfault in Python
3.7 on Debian 10. I checked and this is true.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Py_TRASHCAN_SAFE_BEGIN/END in C extension?

2021-12-22 Thread Marco Sulla

Yes, it's deprecated, but I need it for Python 3.7, since there was
yet no Py_TRASHCAN_BEGIN / END

On Tue, 21 Dec 2021 at 23:22, Barry  wrote:
>
>
>
> On 21 Dec 2021, at 22:08, Marco Sulla  wrote:
>
> In Python 3.7, must Py_TRASHCAN_SAFE_BEGIN - Py_TRASHCAN_SAFE_END be
> used in a C extension?
>
> I'm asking because in my C extension I use them in the deallocator
> without problems, but users signalled me that they segfault in Python
> 3.7 on Debian 10. I checked and this is true.
>
>
> I searched the web for Py_TRASHCAN_SAFE_BEGIN
> And that quickly lead me to this bug.
>
> https://bugs.python.org/issue40608.
>
> That gives lots of clues for what might be the problem.
> It seems that is a deprecated api.
>
> Barry
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: recover pickled data: pickle data was truncated

2021-12-26 Thread Marco Sulla

Use a semaphore.

On Sun, 26 Dec 2021 at 03:30, iMath  wrote:
>
> Normally, the shelve data should be read and write by only one process at a 
> time, but unfortunately it was simultaneously read and write by two 
> processes, thus corrupted it. Is there any way to recover all data in it ? 
> Currently I just get "pickle data was truncated" exception after reading a 
> portion of the data?
>
> Data and code here 
> :https://drive.google.com/file/d/137nJFc1TvOge88EjzhnFX9bXg6vd0RYQ/view?usp=sharing
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-26 Thread Marco Sulla

I have to use _PyObject_GC_IS_TRACKED(). It can't be used unless you
define Py_BUILD_CORE. I want to avoid this. What macro or function can
substitute it?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-27 Thread Marco Sulla

I need it since I'm developing an immutable dict. And in dict that
function is used.

I do not understand why there's no public API for that function. It
seems very useful.

On Sun, 26 Dec 2021 at 17:28, Barry Scott  wrote:
>
>
>
> > On 26 Dec 2021, at 13:48, Marco Sulla  wrote:
> >
> > I have to use _PyObject_GC_IS_TRACKED(). It can't be used unless you
> > define Py_BUILD_CORE. I want to avoid this. What macro or function can
> > substitute it?
>
> Why is this needed by your code? Surely the GC does its thing as an 
> implementation detail of python.
>
> Barry
>
>
> > --
> > https://mail.python.org/mailman/listinfo/python-list
> >
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-27 Thread Marco Sulla

Hi, Inada Senpai. So I do not need PyObject_GC_Track on cloning or
merging, or MAINTAIN_TRACKING on insert?

On Tue, 28 Dec 2021 at 07:58, Inada Naoki  wrote:
>
> On Tue, Dec 28, 2021 at 3:31 AM Marco Sulla
>  wrote:
> >
> > I need it since I'm developing an immutable dict. And in dict that
> > function is used.
> >
> > I do not understand why there's no public API for that function. It
> > seems very useful.
> >
>
> I think it is useful only for optimization based on *current* Python 
> internals.
> That's why it is not a public API. If we expose it as public API, it
> makes harder to change Python's GC internals.
>
>
> --
> Inada Naoki  
-- 
https://mail.python.org/mailman/listinfo/python-list

Option for venv to upgrade pip automatically?

2021-12-28 Thread Marco Sulla

I think it's very boring that, after creating a venv, you have
immediately to do every time:

pip install -U pip

Can't venv have an option for doing this automatically or, better, a
config file where you can put commands that will be launched every
time after you create a venv?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-28 Thread Marco Sulla

On Tue, 28 Dec 2021 at 12:38, Inada Naoki  wrote:
> Your case is special.
> You want to create a frozendict which performance is same to builtin dict.
> Builtin dict has special optimization which tightly coupled with
> current CPython implementation.
> So you need to use private APIs for MAINTAIN_TRACKING.

I solved this problem with a hacky trick: I included a reduced and
slightly modified version of dictobject.c. Furthermore I copy / pasted
stringlib\eq.h and _Py_bit_length. I'm currently doing this in a
refactor branch.
(Yes, I know that including a .c is very bad... but I need to do this
to separate the code of dict from the code of frozendict. Putting all
in the same files mess my head)

> But PyObject_GC_Track() is a public API.

The problem is I can't invoke PyObject_GC_Track() on an already
tracked object. I tried it and Python segfaulted. That's why CPython
uses _PyObject_GC_IS_TRACKED() before.

I'll try to copy/paste it too... :D but I do not understand why
there's not a public version of it.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-28 Thread Marco Sulla

On Wed, 29 Dec 2021 at 00:03, Dieter Maurer  wrote:
> Why do you not derive from `dict` and override its mutating methods
> (to raise a type error after initialization is complete)?

I've done this for the pure py version, for speed. But in this way,
frozendict results to be a subclass of MutableMapping.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-28 Thread Marco Sulla

On Wed, 29 Dec 2021 at 07:46, Inada Naoki  wrote:
> You are right. I thought PyObject_GC_Track() can be used to tracked
> objects because PyObject_GC_Untrack() can be used untracked object.
> I think there is no enough reason for this asymmetry.
>
> Additionally, adding PyObject_GC_IsTracked() to public API will not
> bother future Python improvements.
> If Python changed its GC to mark-and-sweep, PyObject_GC_IsTracked()
> can return true always.

I think you are right :)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Option for venv to upgrade pip automatically?

2021-12-29 Thread Marco Sulla

Cool, thanks!

On Wed, 29 Dec 2021 at 07:10, Inada Naoki  wrote:
>
> You can use --upgrade-deps option. My alias is:
>
>   alias mkvenv='python3 -m venv --upgrade-deps --prompt . venv'
>
> On Wed, Dec 29, 2021 at 4:55 AM Marco Sulla
>  wrote:
> >
> > I think it's very boring that, after creating a venv, you have
> > immediately to do every time:
> >
> > pip install -U pip
> >
> > Can't venv have an option for doing this automatically or, better, a
> > config file where you can put commands that will be launched every
> > time after you create a venv?
> > --
> > https://mail.python.org/mailman/listinfo/python-list
>
>
>
> --
> Inada Naoki  
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-29 Thread Marco Sulla

On Wed, 29 Dec 2021 at 09:12, Dieter Maurer  wrote:
>
> Marco Sulla wrote at 2021-12-29 08:08 +0100:
> >On Wed, 29 Dec 2021 at 00:03, Dieter Maurer  wrote:
> >> Why do you not derive from `dict` and override its mutating methods
> >> (to raise a type error after initialization is complete)?
> >
> >I've done this for the pure py version, for speed. But in this way,
> >frozendict results to be a subclass of MutableMapping.
>
> `MutableMapping` is a so called abstract base class (--> `abc`).
>
> It uses the `__subclass_check__` (and `__instance_check__`) of
> `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`.
> Those can be customized by overriding `MutableMapping.__subclasshook__`
> to ensure that your `frozendict` class (and their subclasses)
> are not considered subclasses of `MutableMapping`.

Emh. Too hacky for me too, sorry :D
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-29 Thread Marco Sulla

On second thought, I think I'll do this for the pure py version. But I
will definitely not do this for the C extension, since it's anyway
strange that an immutable mapping inherits from a mutable one! I've
done it in the pure py version only for a matter of speed.

On Wed, 29 Dec 2021 at 09:24, Marco Sulla  wrote:
>
> On Wed, 29 Dec 2021 at 09:12, Dieter Maurer  wrote:
> >
> > Marco Sulla wrote at 2021-12-29 08:08 +0100:
> > >On Wed, 29 Dec 2021 at 00:03, Dieter Maurer  wrote:
> > >> Why do you not derive from `dict` and override its mutating methods
> > >> (to raise a type error after initialization is complete)?
> > >
> > >I've done this for the pure py version, for speed. But in this way,
> > >frozendict results to be a subclass of MutableMapping.
> >
> > `MutableMapping` is a so called abstract base class (--> `abc`).
> >
> > It uses the `__subclass_check__` (and `__instance_check__`) of
> > `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`.
> > Those can be customized by overriding `MutableMapping.__subclasshook__`
> > to ensure that your `frozendict` class (and their subclasses)
> > are not considered subclasses of `MutableMapping`.
>
> Emh. Too hacky for me too, sorry :D
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-29 Thread Marco Sulla

On Wed, 29 Dec 2021 at 10:06, Dieter Maurer  wrote:
>
> Are you sure you need to implement your type in C at all?

It's already implemented, and, in some cases, is faster than dict:

https://github.com/Marco-Sulla/python-frozendict#benchmarks

PS: I'm doing a refactoring that speeds up creation even further,
making it almost as fast as dict.
-- 
https://mail.python.org/mailman/listinfo/python-list

How to implement freelists in dict 3.10 for previous versions?

2021-12-29 Thread Marco Sulla

I noticed that now freelists in dict use _Py_dict_state. I suppose
this is done for thread safety.

I would implement it also for a C extension that uses CPython < 3.10.
How can I achieve this?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-29 Thread Marco Sulla

On Wed, 29 Dec 2021 at 09:12, Dieter Maurer  wrote:
> `MutableMapping` is a so called abstract base class (--> `abc`).
>
> It uses the `__subclass_check__` (and `__instance_check__`) of
> `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`.
> Those can be customized by overriding `MutableMapping.__subclasshook__`
> to ensure that your `frozendict` class (and their subclasses)
> are not considered subclasses of `MutableMapping`.

It does not work:

$ python
Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
[GCC 10.1.1 20200718] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import frozendict
>>> frozendict.c_ext
False
>>> from frozendict import frozendict as fd
>>> from collections.abc import MutableMapping as Mm
>>> issubclass(fd, Mm)
True
>>> @classmethod
... def _my_subclasshook(klass, subclass):
... if subclass == fd:
... return False
... return NotImplemented
...
>>> @classmethod
... def _my_subclasshook(klass, subclass):
... print(subclass)
... if subclass == fd:
... return False
... return NotImplemented
...
>>> Mm.__subclasshook__ = _my_subclasshook
>>> issubclass(fd, Mm)
True
>>> issubclass(tuple, Mm)






False
>>>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-29 Thread Marco Sulla

On Wed, 29 Dec 2021 at 12:11, Dieter Maurer  wrote:
>
> Marco Sulla wrote at 2021-12-29 11:59 +0100:
> >On Wed, 29 Dec 2021 at 09:12, Dieter Maurer  wrote:
> >> `MutableMapping` is a so called abstract base class (--> `abc`).
> >>
> >> It uses the `__subclass_check__` (and `__instance_check__`) of
> >> `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`.
> >> Those can be customized by overriding `MutableMapping.__subclasshook__`
> >> to ensure that your `frozendict` class (and their subclasses)
> >> are not considered subclasses of `MutableMapping`.
> >
> >It does not work:
> > ...
> >>>> issubclass(fd, Mm)
> >True
>
> There is a cache involved. The `issubclass` above,
> brings your `fd` in the `Mn`'s subclass cache.

It works, thank you! I had to put it before

Mapping.register(frozendict)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: recover pickled data: pickle data was truncated

2021-12-29 Thread Marco Sulla

On Wed, 29 Dec 2021 at 18:33, iMath  wrote:
> But I found the size of the file of the shelve data didn't change much, so I 
> guess the data are still in it , I just wonder any way to recover my data.

I agree with Barry, Chris and Avi. IMHO your data is lost. Unpickling
it by hand is a harsh work and maybe unreliable.

Is there any reason you can't simply add a semaphore to avoid writing
at the same time and re-run the code and regenerate the data?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: builtins.TypeError: catching classes that do not inherit from BaseException is not allowed

2021-12-31 Thread Marco Sulla

It was already done: https://pypi.org/project/tail-recursive/

On Thu, 30 Dec 2021 at 16:00, hongy...@gmail.com  wrote:
>
> I try to compute the factorial of a large number with tail-recursion 
> optimization decorator in Python3. The following code snippet is converted 
> from the code snippet given here [1] by the following steps:
>
> $ pyenv shell datasci
> $ python --version
> Python 3.9.1
> $ pip install 2to3
> $ 2to3 -w this-script.py
>
> ```
> # This program shows off a python decorator(
> # which implements tail call optimization. It
> # does this by throwing an exception if it is
> # its own grandparent, and catching such
> # exceptions to recall the stack.
>
> import sys
>
> class TailRecurseException:
>   def __init__(self, args, kwargs):
> self.args = args
> self.kwargs = kwargs
>
> def tail_call_optimized(g):
>   """
>   This function decorates a function with tail call
>   optimization. It does this by throwing an exception
>   if it is its own grandparent, and catching such
>   exceptions to fake the tail call optimization.
>
>   This function fails if the decorated
>   function recurses in a non-tail context.
>   """
>   def func(*args, **kwargs):
> f = sys._getframe()
> if f.f_back and f.f_back.f_back \
> and f.f_back.f_back.f_code == f.f_code:
>   raise TailRecurseException(args, kwargs)
> else:
>   while 1:
> try:
>   return g(*args, **kwargs)
> except TailRecurseException as e:
>   args = e.args
>   kwargs = e.kwargs
>   func.__doc__ = g.__doc__
>   return func
>
> @tail_call_optimized
> def factorial(n, acc=1):
>   "calculate a factorial"
>   if n == 0:
> return acc
>   return factorial(n-1, n*acc)
>
> print(factorial(1))
> # prints a big, big number,
> # but doesn't hit the recursion limit.
>
> @tail_call_optimized
> def fib(i, current = 0, next = 1):
>   if i == 0:
> return current
>   else:
> return fib(i - 1, next, current + next)
>
> print(fib(1))
> # also prints a big number,
> # but doesn't hit the recursion limit.
> ```
> However, when I try to test the above script, the following error will be 
> triggered:
> ```
> $ python this-script.py
> Traceback (most recent call last):
>   File "/home/werner/this-script.py", line 32, in func
> return g(*args, **kwargs)
>   File "/home/werner/this-script.py", line 44, in factorial
> return factorial(n-1, n*acc)
>   File "/home/werner/this-script.py", line 28, in func
> raise TailRecurseException(args, kwargs)
> TypeError: exceptions must derive from BaseException
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "/home/werner/this-script.py", line 46, in 
> print(factorial(1))
>   File "/home/werner/this-script.py", line 33, in func
> except TailRecurseException as e:
> TypeError: catching classes that do not inherit from BaseException is not 
> allowed
> ```
>
> Any hints for fixing this problem will be highly appreciated.
>
> [1]  https://stackoverflow.com/q/27417874
>
> Regards,
> HZ
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: recover pickled data: pickle data was truncated

2022-01-01 Thread Marco Sulla

I agree with Barry. You can create a folder or a file with
pseudo-random names. I recommend you to use str(uuid.uuid4())

On Sat, 1 Jan 2022 at 14:11, Barry  wrote:
>
>
>
> > On 31 Dec 2021, at 17:53, iMath  wrote:
> >
> > 在 2021年12月30日星期四 UTC+8 03:13:21， 写道：
> >>> On Wed, 29 Dec 2021 at 18:33, iMath  wrote:
> >>> But I found the size of the file of the shelve data didn't change much, 
> >>> so I guess the data are still in it , I just wonder any way to recover my 
> >>> data.
> >> I agree with Barry, Chris and Avi. IMHO your data is lost. Unpickling
> >> it by hand is a harsh work and maybe unreliable.
> >>
> >> Is there any reason you can't simply add a semaphore to avoid writing
> >> at the same time and re-run the code and regenerate the data?
> >
> > Thanks for your replies! I didn't have a sense of adding a semaphore on 
> > writing to pickle data before, so  corrupted the data.
> > Since my data was colleted in the daily usage, so cannot re-run the code 
> > and regenerate the data.
> > In order to avoid corrupting my data again and the complicity of using  a 
> > semaphore, now I am using json text to store my data.
>
> That will not fix the problem. You will end up with corrupt json.
>
> If you have one writer and one read then may be you can use the fact that a 
> rename is atomic.
>
> Writer does this:
> 1. Creat new json file in the same folder but with a tmp name
> 2. Rename the file from its tmp name to the public name.
>
> The read will just read the public name.
>
> I am not sure what happens in your world if the writer runs a second time 
> before the data is read.
>
> In that case you need to create a queue of files to be read.
>
> But if the problem is two process racing against each other you MUST use 
> locking.
> It cannot be avoided for robust operations.
>
> Barry
>
>
> > --
> > https://mail.python.org/mailman/listinfo/python-list
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

How to make a type of a C extension compatible with mypy

2022-01-01 Thread Marco Sulla

I created a type in a C extension, that is an immutable dict. If I do:

a: mydict[str, str]

it works. But it doesn't work with mypy, as signalled to me by an user:

https://github.com/Marco-Sulla/python-frozendict/issues/39

How can I make it work? I don't know what he means with annotating
methods, and furthermore I suppose I can't do this in C.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to implement freelists in dict 3.10 for previous versions?

2022-01-01 Thread Marco Sulla

Ooookay, I suppose I have to study a little the thing :D

On Thu, 30 Dec 2021 at 07:59, Inada Naoki  wrote:
>
> On Wed, Dec 29, 2021 at 7:25 PM Marco Sulla
>  wrote:
> >
> > I noticed that now freelists in dict use _Py_dict_state. I suppose
> > this is done for thread safety.
> >
>
> Some core-dev are working on per-interpreter GIL. But it is not done yet.
> So you don't need to follow it soon. Your extension module will work
> well in Python 3.11.
>
> > I would implement it also for a C extension that uses CPython < 3.10.
> > How can I achieve this?
>
> See PyModule_GetState() to have per-interpreter module state instead
> of static variables.
> https://docs.python.org/3/c-api/module.html#c.PyModule_GetState
>
>
> --
> Inada Naoki  
-- 
https://mail.python.org/mailman/listinfo/python-list

Who wrote Py_UNREACHABLE?

2022-01-02 Thread Marco Sulla

#if defined(RANDALL_WAS_HERE)
#  define Py_UNREACHABLE() \
Py_FatalError( \
"If you're seeing this, the code is in what I thought was\n" \
"an unreachable state.\n\n" \
"I could give you advice for what to do, but honestly, why\n" \
"should you trust me?  I clearly screwed this up.  I'm writing\n" \
"a message that should never appear, yet I know it will\n" \
"probably appear someday.\n\n" \
"On a deep level, I know I'm not up to this task.\n" \
"I'm so sorry.\n" \
"https://xkcd.com/2200";)
#elif defined(Py_DEBUG)
#  define Py_UNREACHABLE() \
Py_FatalError( \
"We've reached an unreachable state. Anything is possible.\n" \
"The limits were in our heads all along. Follow your dreams.\n" \
"https://xkcd.com/2200";)

etc
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: ModuleNotFoundError: No module named 'DistUtilsExtra'

2022-01-02 Thread Marco Sulla

https://askubuntu.com/questions/584857/distutilsextra-problem

On Sun, 2 Jan 2022 at 18:52, hongy...@gmail.com  wrote:
>
> On Ubuntu 20.04.3 LTS, I try to install pdfarranger [1] as follows but failed:
>
> $ sudo apt-get install python3-pip python3-distutils-extra \
>   python3-wheel python3-gi 
> python3-gi-cairo \
>   gir1.2-gtk-3.0 gir1.2-poppler-0.18 
> python3-setuptools
> $ git clone https://github.com/pdfarranger/pdfarranger.git pdfarranger.git
> $ cd pdfarranger.git
> $ pyenv shell 3.8.3
> $ pyenv virtualenv --system-site-packages pdfarranger
> $ pyenv shell pdfarranger
> $ pip install -U pip
> $ ./setup.py build
> Traceback (most recent call last):
>   File "./setup.py", line 24, in 
> from DistUtilsExtra.command import (
> ModuleNotFoundError: No module named 'DistUtilsExtra'
>
>
> See the following for the package list installed in this virtualenv:
>
> $ pip list
> PackageVersion
> -- 
> pip21.3.1
> pyfiglet   0.8.post1
> setuptools 41.2.0
> vtk9.0.20200612
>
> Any hints for fixing this problem? Also see here [2-3] for relevant 
> discussions.
>
> [1] https://github.com/pdfarranger/pdfarranger
> [2] https://github.com/pdfarranger/pdfarranger/issues/604
> [3] 
> https://discuss.python.org/t/modulenotfounderror-no-module-named-distutilsextra/12834
>
> Regards,
> HZ
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

1 2 3 >

1 - 100 of 263 matches

Mail list logo