Re: mailbox misbehavior with non-ASCII

2022-07-30 Thread Barry


> On 30 Jul 2022, at 00:30, Peter Pearson  wrote:
> 
> The following code produces a nonsense result with the input 
> described below:
> 
> import mailbox
> box = mailbox.Maildir("/home/peter/Temp/temp",create=False)
> x = box.values()[0]
> h = x.get("X-DSPAM-Factors")
> print(type(h))
> # 
> 
> The output is the desired "str" when the message file contains this:
> 
> To: recipi...@example.com
> Message-ID: <123>
> Date: Sun, 24 Jul 2022 15:31:19 +
> Subject: Blah blah
> From: f...@from.com
> X-DSPAM-Factors: a'b
> 
> xxx
> 
> ... but if the apostrophe in "a'b" is replaced with a
> RIGHT SINGLE QUOTATION MARK, the returned h is of type 
> "email.header.Header", and seems to contain inscrutable garbage.

Include in any bug report the exact bytes that are in the header.
In may not be utf-8 encoded it maybe windows cp1252, etc.
Repr of the bytes header will show this.

Barry

> 
> I realize that one should not put non-ASCII characters in
> message headers, but of course I didn't put it there, it
> just showed up, pretty much beyond my control.  And I realize
> that when software is given input that breaks the rules, one
> cannot expect optimal results, but I'd think an exception
> would be the right answer.
> 
> Is this worth a bug report?
> 
> -- 
> To email me, substitute nowhere->runbox, invalid->com.
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to generate a .pyi file for a C Extension using stubgen

2022-07-30 Thread Marco Sulla
On Fri, 29 Jul 2022 at 23:23, Barry  wrote:
>
>
>
> > On 29 Jul 2022, at 19:33, Marco Sulla  wrote:
> >
> > I tried to follow the instructions here:
> >
> > https://mypy.readthedocs.io/en/stable/stubgen.html
> >
> > but the instructions about creating a stub for a C Extension are a little
> > mysterious. I tried to use it on the .so file without luck.
>
> It says that stubgen works on .py files not .so files.
> You will need to write the .pyi for your .so manually.
>
> The docs could do with splitting the need for .pyi for .so
> away from the stubgen description.

But it says:

"Mypy includes the stubgen tool that can automatically generate stub
files (.pyi files) for Python modules and C extension modules."

I tried stubgen -m modulename, but it generates very little code.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-30 Thread Roel Schroeven

Morten W. Petersen schreef op 29/07/2022 om 22:59:

OK, sounds like sunshine is getting the best of you.

It has to be said: that is uncalled for.

Chris gave you good advice, with the best of intentions. Sometimes we 
don't like good advice if it says something we don't like, but that's no 
reason to take it off on the messenger.


--
"Iceland is the place you go to remind yourself that planet Earth is a
machine... and that all organic life that has ever existed amounts to a greasy
film that has survived on the exterior of that machine thanks to furious
improvisation."
-- Sam Hughes, Ra

--
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-30 Thread Barry Scott
Morten,

As Chris remarked you need to learn a number of networking, python, system 
performance
and other skills to turn your project into production code.

Using threads does not scale very well. Its uses a lot of memory and raises CPU 
used
just to do the context switches. Also the GIL means that even if you are doing 
blocking
I/O the use of threads does not scale well.

Its rare to see multi threaded code, rather what you see is code that uses 
async I/O.

At its heart async code at the low level is using a kernel interface like epoll 
(or on old
systems select). What epoll allow you to do is wait on a sets of FDs for a 
range of
I/O operations. Like ready to read, ready to write and other activity (like the 
socket
closing).

You could write code to use epoll your self, but while fun to write you need to 
know
a lot about networking and linux to cover all the corner cases.

Libraries like twisted, trio, uvloop and pythons selectors implemented 
production quality
version of the required code with good APIs.

Do not judge these libraries by their size. They are no bloated and only as 
complex as
the problem they are solving requires.

There is a simple example of async code using the python selectors here that 
shows
the style of programming.
https://docs.python.org/3/library/selectors.html#examples 


The issues that you likely need to solve and test for include:
* handling unexpected socket close events.
* buffering and flow control from one socket's read to the another socket's 
write.
  What if one side is reading slower then the other is writing?
* timeout sockets that stop sending data and close them

At some point you will exceed the capacity for one process to handle the load.
The solution we used is to listen on the socket in a parent process and fork
enough child processes to handle the I/O load. This avoids issues with the GIL
and allows you to scale.

But I am still not sure why you need to do anything more the increase the 
backlog
on your listen socket in the main app. Set the backlog to 1,000,000 does that 
fix
your issue? 

You will need on Linux to change kernel limits to allow that size. See man 
listen
for info on what you need to change.

Barry

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP about recommended project folder layout

2022-07-30 Thread c.buhtz
Isn't there a PEP?

On 2022-07-26 07:14 c.bu...@posteo.jp wrote:
> Hello,
> 
> I am not sure if I looked into the correct sources. I was looking in 
> "PEP 609 – Python Packaging Authority (PyPA) Governance" [1] and the 
> "PyPA specifications" [2].
> 
> My question in short: Is there an official document (e.g. a PEP)
> about a recommended layout for project folders.
> 
> Looking into the wild and past there are a lot of variations of such 
> layouts. I am far away from being a pro but depending on experience
> in my own projects and what I have learned from others (e.g. in 
> blog-posts/tutorials) I recommend to have the "test" folder and the 
> package folder side by side on the same level in the project folder
> (the root).
> 
> my_project
> |- tests
> |  └ test_*.py
> |- my_package
> |  └ __init__.py
> └-- README.md
> 
> I sometimes add to it the so called "src"-Layout where the package 
> folder is one level deeper in an extra "src" folder.
> 
> my_project
> |- tests
> |  └ test_*.py
> |- src
> |  └- my_package
> | └ __init__.py
> └-- README.md
> 
> I don't want to discuss the pros and cons of all variations. What I
> need is an official document I can use in discussions with other
> maintainers. If there is a PEP/document against my current
> recommendation I am also fine with this. ;)
> 
> Kind
> Christian
> 
> [1] -- 
> [2] -- 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to generate a .pyi file for a C Extension using stubgen

2022-07-30 Thread Barry


> On 30 Jul 2022, at 10:30, Marco Sulla  wrote:
> 
> On Fri, 29 Jul 2022 at 23:23, Barry  wrote:
>> 
>> 
>> 
 On 29 Jul 2022, at 19:33, Marco Sulla  wrote:
>>> 
>>> I tried to follow the instructions here:
>>> 
>>> https://mypy.readthedocs.io/en/stable/stubgen.html
>>> 
>>> but the instructions about creating a stub for a C Extension are a little
>>> mysterious. I tried to use it on the .so file without luck.
>> 
>> It says that stubgen works on .py files not .so files.
>> You will need to write the .pyi for your .so manually.
>> 
>> The docs could do with splitting the need for .pyi for .so
>> away from the stubgen description
> 
> But it says:
> 
> "Mypy includes the stubgen tool that can automatically generate stub
> files (.pyi files) for Python modules and C extension modules."
> 
> I tried stubgen -m modulename, but it generates very little code.

Oh…

From the .so I am struggling to figure out how it could ever work reliably.
I cannot see that there is enough information in a useful form to allow
the tool to work.

Barry

> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-30 Thread Morten W. Petersen
I thought it was a bit much.

I just did a bit more testing, and saw that the throughput of wget through
regular lighttpd was 1,3 GB/s, while through STP it was 122 MB/s, and using
quite a bit of CPU.

Then I increased the buffer size 8-fold for reading and writing in run.py,
and the CPU usage went way down, and the transfer speed went up to 449 MB/s.

So it would require well more than a gigabit network interface to max out
STP throughput; CPU usage was around 30-40% max, on one processor core.

There is good enough, and then there's general practice and/or what is
regarded as an elegant solution.  I'm looking for good enough, and in the
process I don't mind pushing the envelope on Python threading.

-Morten

On Sat, Jul 30, 2022 at 12:59 PM Roel Schroeven 
wrote:

> Morten W. Petersen schreef op 29/07/2022 om 22:59:
> > OK, sounds like sunshine is getting the best of you.
> It has to be said: that is uncalled for.
>
> Chris gave you good advice, with the best of intentions. Sometimes we
> don't like good advice if it says something we don't like, but that's no
> reason to take it off on the messenger.
>
> --
> "Iceland is the place you go to remind yourself that planet Earth is a
> machine... and that all organic life that has ever existed amounts to a
> greasy
> film that has survived on the exterior of that machine thanks to furious
> improvisation."
>  -- Sam Hughes, Ra
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>


-- 
I am https://leavingnorway.info
Videos at https://www.youtube.com/user/TheBlogologue
Twittering at http://twitter.com/blogologue
Blogging at http://blogologue.com
Playing music at https://soundcloud.com/morten-w-petersen
Also playing music and podcasting here:
http://www.mixcloud.com/morten-w-petersen/
On Google+ here https://plus.google.com/107781930037068750156
On Instagram at https://instagram.com/morphexx/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: mailbox misbehavior with non-ASCII

2022-07-30 Thread Peter J. Holzer
On 2022-07-29 23:24:57 +, Peter Pearson wrote:
> The following code produces a nonsense result with the input 
> described below:
> 
> import mailbox
> box = mailbox.Maildir("/home/peter/Temp/temp",create=False)
> x = box.values()[0]
> h = x.get("X-DSPAM-Factors")
> print(type(h))
> # 
> 
> The output is the desired "str" when the message file contains this:
> 
> To: recipi...@example.com
> Message-ID: <123>
> Date: Sun, 24 Jul 2022 15:31:19 +
> Subject: Blah blah
> From: f...@from.com
> X-DSPAM-Factors: a'b
> 
> xxx
> 
> ... but if the apostrophe in "a'b" is replaced with a
> RIGHT SINGLE QUOTATION MARK, the returned h is of type 
> "email.header.Header", and seems to contain inscrutable garbage.

It's not inscrutable to me, but then I remember when RFC 1522 was the
relevant RFC.

Calling h.encode() returns

=?unknown-8bit?b?YeKAmWI=?=

which is about the best result you can get. The character set is unknown
and the content (when decoded) is the bytes

61 e2 80 99 62

which is what your file contained (assuming you used UTF-8).

What would be nice if you could get at that content directly. There
doesn't seem to be documented method to do that. You can use h._chunks,
but as the _ in the name implies, that's implementation detail which
might change in future versions (and it's not quite straightforward
either, although consistent with other parts of python, I think).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-30 Thread Barry



> On 30 Jul 2022, at 20:33, Morten W. Petersen  wrote:
> I thought it was a bit much.
> 
> I just did a bit more testing, and saw that the throughput of wget through
> regular lighttpd was 1,3 GB/s, while through STP it was 122 MB/s, and using
> quite a bit of CPU.
> 
> Then I increased the buffer size 8-fold for reading and writing in run.py,
> and the CPU usage went way down, and the transfer speed went up to 449 MB/s.

You are trading latency for through put.

> 
> So it would require well more than a gigabit network interface to max out
> STP throughput; CPU usage was around 30-40% max, on one processor core.

With how many connections?

> 
> There is good enough, and then there's general practice and/or what is
> regarded as an elegant solution.  I'm looking for good enough, and in the
> process I don't mind pushing the envelope on Python threading.

You never did answer my query on why a large backlog is not good enough.
Why do you need this program at all?

Barry
> 
> -Morten
> 
> On Sat, Jul 30, 2022 at 12:59 PM Roel Schroeven 
> wrote:
> 
>> Morten W. Petersen schreef op 29/07/2022 om 22:59:
>>> OK, sounds like sunshine is getting the best of you.
>> It has to be said: that is uncalled for.
>> 
>> Chris gave you good advice, with the best of intentions. Sometimes we
>> don't like good advice if it says something we don't like, but that's no
>> reason to take it off on the messenger.
>> 
>> --
>> "Iceland is the place you go to remind yourself that planet Earth is a
>> machine... and that all organic life that has ever existed amounts to a
>> greasy
>> film that has survived on the exterior of that machine thanks to furious
>> improvisation."
>> -- Sam Hughes, Ra
>> 
>> --
>> https://mail.python.org/mailman/listinfo/python-list
> 
> 
> -- 
> I am https://leavingnorway.info
> Videos at https://www.youtube.com/user/TheBlogologue
> Twittering at http://twitter.com/blogologue
> Blogging at http://blogologue.com
> Playing music at https://soundcloud.com/morten-w-petersen
> Also playing music and podcasting here:
> http://www.mixcloud.com/morten-w-petersen/
> On Google+ here https://plus.google.com/107781930037068750156
> On Instagram at https://instagram.com/morphexx/
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP about recommended project folder layout

2022-07-30 Thread Barry


> On 30 Jul 2022, at 13:52, c.bu...@posteo.jp wrote:
> 
> Isn't there a PEP?

PEP are for improving python. They are not for telling people how to use python.
I would be surprised to fine a PEP that addressed this.

Barry

> 
>> On 2022-07-26 07:14 c.bu...@posteo.jp wrote:
>> Hello,
>> 
>> I am not sure if I looked into the correct sources. I was looking in 
>> "PEP 609 – Python Packaging Authority (PyPA) Governance" [1] and the 
>> "PyPA specifications" [2].
>> 
>> My question in short: Is there an official document (e.g. a PEP)
>> about a recommended layout for project folders.
>> 
>> Looking into the wild and past there are a lot of variations of such 
>> layouts. I am far away from being a pro but depending on experience
>> in my own projects and what I have learned from others (e.g. in 
>> blog-posts/tutorials) I recommend to have the "test" folder and the 
>> package folder side by side on the same level in the project folder
>> (the root).
>> 
>> my_project
>> |- tests
>> |  └ test_*.py
>> |- my_package
>> |  └ __init__.py
>> └-- README.md
>> 
>> I sometimes add to it the so called "src"-Layout where the package 
>> folder is one level deeper in an extra "src" folder.
>> 
>> my_project
>> |- tests
>> |  └ test_*.py
>> |- src
>> |  └- my_package
>> | └ __init__.py
>> └-- README.md
>> 
>> I don't want to discuss the pros and cons of all variations. What I
>> need is an official document I can use in discussions with other
>> maintainers. If there is a PEP/document against my current
>> recommendation I am also fine with this. ;)
>> 
>> Kind
>> Christian
>> 
>> [1] -- 
>> [2] -- 
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list