Re: ABC with abstractmethod: kwargs on Base, explicit names on implementation

2020-08-27 Thread Peter Otten
Samuel Marks wrote:

> The main thing I want is type safety. I want Python to complain if the
> callee uses the wrong argument types, and to provide suggestions on
> what's needed and info about it.
> 
> Without a base class I can just have docstrings and type annotations
> to achieve that.
> 
> What can I use that will require all implementers to have a minimum of
> the same properties and arguments, but also allow them to add new
> properties and arguments?

The clean way would be to give the variants with a different signature a 
different name.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How do I do this in Python 3 (string.join())?

2020-08-27 Thread Chris Green
Cameron Simpson  wrote:
> On 26Aug2020 15:09, Chris Green  wrote:
> >2qdxy4rzwzuui...@potatochowder.com wrote:
> >> Join bytes objects with a byte object:
> >>
> >> b"\n".join(popmsg[1])
> >
> >Aaahhh!  Thank you (and the other reply).
> 
> But note: joining bytes like strings is uncommon, and may indicate that 
> you should be working in strings to start with. Eg you may want to 
> convert popmsg from bytes to str and do a str.join anyway. It depends on 
> exactly what you're dealing with: are you doing text work, or are you 
> doing "binary data" work?
> 
> I know many network protocols are "bytes-as-text, but that is 
> accomplished by implying an encoding of the text, eg as ASCII, where 
> characters all fit in single bytes/octets.
> 
Yes, I realise that making everything a string before I start might be
the 'right' way to do things but one is a bit limited by what the mail
handling modules in Python provide.

E.g. in this case the only (well the only ready made) way to get a
POP3 message is using poplib and this just gives you a list of lines
made up of "bytes as text" :-

popmsg = pop3.retr(i+1)

I join the lines to feed them into mailbox.mbox() to create a mbox I
can analyse and also a message which can be sent using SMTP.

Should I be converting to string somewhere?  I guess the POP3 and SMTP
libraries will cope with strings as input.  Can I convert to string
after the join for example?  If so, how?  Can I just do:-

msgbytes = b'\n'.join(popmsg[1])
msgstr = str(mshbytes)

(Yes, I know it can be one line, I was just being explicit).

... or do I need to stringify the lines returned by popmsg() before
joining them together?


Thank you for all your help and comments!

(I'm a C programmer at heart, preceded by being an assembler
programmer.  I started programming way back in the 1970s, I'm retired
now and Python is for relaxation (?) in my dotage)

-- 
Chris Green
·
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Chris Green
Terry Reedy  wrote:
> On 8/26/2020 11:10 AM, Chris Green wrote:
> 
> > I have a simple[ish] local mbox mail delivery module as follows:-
> ...
> > It has run faultlessly for many years under Python 2.  I've now
> > changed the calling program to Python 3 and while it handles most
> > E-Mail OK I have just got the following error:-
> > 
> >  Traceback (most recent call last):
> >File "/home/chris/.mutt/bin/filter.py", line 102, in 
> >  mailLib.deliverMboxMsg(dest, msg, log)
> ...
> >File "/usr/lib/python3.8/email/generator.py", line 406, in write
> >  self._fp.write(s.encode('ascii', 'surrogateescape'))
> > UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in 
> position 4: ordinal not in range(128) 
> 
> '\ufeff' is the Unicode byte-order mark.  It should not be present in an 
> ascii-only 3.x string and would not normally be present in general 
> unicode except in messages like this that talk about it.  Read about it, 
> for instance, at
> https://en.wikipedia.org/wiki/Byte_order_mark
> 
> I would catch the error and print part or all of string s to see what is 
> going on with this particular message.  Does it have other non-ascii chars?
> 
I can provoke the error simply by sending myself an E-Mail with
accented characters in it.  I'm pretty sure my Linux system is set up
correctly for UTF8 characters, I certainly seem to be able to send and
receive these to others and I even get to see messages in other
scripts such as arabic, chinese, etc.

The code above works perfectly in Python 2 delivering messages with
accented (and other extended) characters with no problems at all.
Sending myself E-Mails with accented characters works OK with the code
running under Python 2.

While an E-Mail body possibly *shouldn't* have non-ASCII characters in
it one must be able to handle them without errors.  In fact haven't
the RFCs changed such that the message body should be 8-bit clean?
Anyway I think the Python 3 mail handling libraries need to be able to
pass extended characters through without errors.

-- 
Chris Green
·
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Chris Green
Peter J. Holzer  wrote:
> The problem is that the message contains a '\ufeff' character (byte
> order mark) where email/generator.py expects only ASCII characters.
> 
> I see two possible reasons for this:
> 
>  * The mbox writing code assumes that all messages with non-ascii
>characters are QP or base64 encoded, and some higher layer uses 8bit
>instead.
> 
>  * A mime-part is declared as charset=us-ascii but contains really
>Unicode characters.
> 
> Both reasons are weird.
> 
> The first would be an unreasonable assumption (8bit encoding has been
> common since the mid-1990s), but even if the code made that assumption,
> one would expect that other code from the same library honors it.
> 
> The second shouldn't be possible: If a message is mis-declared (that
> happens) one would expect that the error happens during parsing, not
> when trying to serialize the already parsed message. 
> 
> But then you haven't shown where msg comes from. How do you parse the
> message to get "msg"?
> 
> Can you construct a minimal test message which triggers the bug?
> 
Yes, simply sending myself an E-Mail with (for example) accented
characters triggers the error.

I'm pretty certain my system (and E-Mail in and out, and Usenet news)
handle these correctly as UTF8.  E.g.:-

àéçł

It's *only* when I switch the mail delivery to Python 3 that the error
appears.

-- 
Chris Green
·
-- 
https://mail.python.org/mailman/listinfo/python-list


Aw: Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Karsten Hilbert
> Terry Reedy  wrote:
> > On 8/26/2020 11:10 AM, Chris Green wrote:
> >
> > > I have a simple[ish] local mbox mail delivery module as follows:-
> > ...
> > > It has run faultlessly for many years under Python 2.  I've now
> > > changed the calling program to Python 3 and while it handles most
> > > E-Mail OK I have just got the following error:-
> > >
> > >  Traceback (most recent call last):
> > >File "/home/chris/.mutt/bin/filter.py", line 102, in 
> > >  mailLib.deliverMboxMsg(dest, msg, log)
> > ...
> > >File "/usr/lib/python3.8/email/generator.py", line 406, in write
> > >  self._fp.write(s.encode('ascii', 'surrogateescape'))
> > > UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in
> > position 4: ordinal not in range(128)
> >
> > '\ufeff' is the Unicode byte-order mark.  It should not be present in an
> > ascii-only 3.x string and would not normally be present in general
> > unicode except in messages like this that talk about it.  Read about it,
> > for instance, at
> > https://en.wikipedia.org/wiki/Byte_order_mark
> >
> > I would catch the error and print part or all of string s to see what is
> > going on with this particular message.  Does it have other non-ascii chars?
> >
> I can provoke the error simply by sending myself an E-Mail with
> accented characters in it.  I'm pretty sure my Linux system is set up
> correctly for UTF8 characters, I certainly seem to be able to send and
> receive these to others and I even get to see messages in other
> scripts such as arabic, chinese, etc.
>
> The code above works perfectly in Python 2 delivering messages with
> accented (and other extended) characters with no problems at all.
> Sending myself E-Mails with accented characters works OK with the code
> running under Python 2.
>
> While an E-Mail body possibly *shouldn't* have non-ASCII characters in
> it one must be able to handle them without errors.  In fact haven't
> the RFCs changed such that the message body should be 8-bit clean?
> Anyway I think the Python 3 mail handling libraries need to be able to
> pass extended characters through without errors.

Well, '\ufeff' is not a *character* at all in much of any
sense of that word in unicode.

It's a marker. Whatever puts it into the stream is wrong. I guess the
best one can (and should) do is to catch the exception and dump
the offending stream somewhere binary-capable and pass on a notice. What
you are receiving there very much isn't a (well-formed) e-mail message.

I would then attempt to backwards-crawl the delivery chain to
find out where it came from.

Or so is my current understanding.

Karsten
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Peter Otten
Chris Green wrote:

> To add a little to this, the problem is definitely when I receive a
> message with UTF8 (or at least non-ascci) characters in it.  My code
> is basically very simple, the main program reads an E-Mail message
> received from .forward on its standard input and makes it into an mbox
> message as follows:-
> 
> msg = mailbox.mboxMessage(sys.stdin.read())
> 
> it then does various tests (but doesn't change msg at all) and at the
> end delivers the message to my local mbox with:-
> 
> mbx.add(msg)
> 
> where mbx is an instance of mailbox.mbox.
> 
> 
> So, how is one supposed to handle this, should I encode the incoming
> message somewhere?
> 

This is what I'd try. Or just read the raw bytes:

data = sys.stdin.detach().read()
msg = mailbox.mboxMessage(data)

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Cameron Simpson
On 27Aug2020 09:31, Chris Green  wrote:
>I can provoke the error simply by sending myself an E-Mail with
>accented characters in it.  I'm pretty sure my Linux system is set up
>correctly for UTF8 characters, I certainly seem to be able to send and
>receive these to others and I even get to see messages in other
>scripts such as arabic, chinese, etc.

See:


https://docs.python.org/3/library/email.generator.html#module-email.generator

While is conservatively writes ASCII (and email has extensive support 
for encoding other character sets into ASCII), you might profit by 
looking at the BytesGenerator in that module using the policy parameter, 
which looks like it tunes the behaviour of the flatten method.

I have a mailfiler of my own, which copes just fine.

It loads messages with email.parser.Parser, whose .parse() method 
returns a Message, and Message.as_string() seems to write happily into a 
text file for me. I run _all_ my messages through this stuff.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Chris Green
Karsten Hilbert  wrote:
> > Terry Reedy  wrote:
> > > On 8/26/2020 11:10 AM, Chris Green wrote:
> > >
> > > > I have a simple[ish] local mbox mail delivery module as follows:-
> > > ...
> > > > It has run faultlessly for many years under Python 2.  I've now
> > > > changed the calling program to Python 3 and while it handles most
> > > > E-Mail OK I have just got the following error:-
> > > >
> > > >  Traceback (most recent call last):
> > > >File "/home/chris/.mutt/bin/filter.py", line 102, in 
> > > >  mailLib.deliverMboxMsg(dest, msg, log)
> > > ...
> > > >File "/usr/lib/python3.8/email/generator.py", line 406, in write
> > > >  self._fp.write(s.encode('ascii', 'surrogateescape'))
> > > > UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in
> > > position 4: ordinal not in range(128)
> > >
> > > '\ufeff' is the Unicode byte-order mark.  It should not be present in an
> > > ascii-only 3.x string and would not normally be present in general
> > > unicode except in messages like this that talk about it.  Read about it,
> > > for instance, at
> > > https://en.wikipedia.org/wiki/Byte_order_mark
> > >
> > > I would catch the error and print part or all of string s to see what is
> > > going on with this particular message.  Does it have other non-ascii 
> > > chars?
> > >
> > I can provoke the error simply by sending myself an E-Mail with
> > accented characters in it.  I'm pretty sure my Linux system is set up
> > correctly for UTF8 characters, I certainly seem to be able to send and
> > receive these to others and I even get to see messages in other
> > scripts such as arabic, chinese, etc.
> >
> > The code above works perfectly in Python 2 delivering messages with
> > accented (and other extended) characters with no problems at all.
> > Sending myself E-Mails with accented characters works OK with the code
> > running under Python 2.
> >
> > While an E-Mail body possibly *shouldn't* have non-ASCII characters in
> > it one must be able to handle them without errors.  In fact haven't
> > the RFCs changed such that the message body should be 8-bit clean?
> > Anyway I think the Python 3 mail handling libraries need to be able to
> > pass extended characters through without errors.
> 
> Well, '\ufeff' is not a *character* at all in much of any
> sense of that word in unicode.
> 
> It's a marker. Whatever puts it into the stream is wrong. I guess the
> best one can (and should) do is to catch the exception and dump
> the offending stream somewhere binary-capable and pass on a notice. What
> you are receiving there very much isn't a (well-formed) e-mail message.
> 
> I would then attempt to backwards-crawl the delivery chain to
> find out where it came from.
> 
The error seems to occur with any non-7-bit-ASCII, e.g. my accented
characters gave:-

  File "/usr/lib/python3.8/email/generator.py", line 406, in write
  self._fp.write(s.encode('ascii', 'surrogateescape'))
  UnicodeEncodeError: 'ascii' codec can't encode character
  '\u2019' in position 34: ordinal not in
   range(128)

It just happened that the first example was an escape.

-- 
Chris Green
·
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How do I do this in Python 3 (string.join())?

2020-08-27 Thread Cameron Simpson
On 27Aug2020 09:16, Chris Green  wrote:
>Cameron Simpson  wrote:
>> But note: joining bytes like strings is uncommon, and may indicate 
>> that
>> you should be working in strings to start with. Eg you may want to
>> convert popmsg from bytes to str and do a str.join anyway. It depends on
>> exactly what you're dealing with: are you doing text work, or are you
>> doing "binary data" work?
>>
>> I know many network protocols are "bytes-as-text, but that is
>> accomplished by implying an encoding of the text, eg as ASCII, where
>> characters all fit in single bytes/octets.
>>
>Yes, I realise that making everything a string before I start might be
>the 'right' way to do things but one is a bit limited by what the mail
>handling modules in Python provide.

I do ok, though most of my message processing happens to messages 
already landed in my "spool" Maildir by getmail. My setup uses getmail 
to get messages with POP into a single Maildir, and then I process the 
message files from there.

>E.g. in this case the only (well the only ready made) way to get a
>POP3 message is using poplib and this just gives you a list of lines
>made up of "bytes as text" :-
>
>popmsg = pop3.retr(i+1)

Ok, so you have bytes? You need to know.

>I join the lines to feed them into mailbox.mbox() to create a mbox I
>can analyse and also a message which can be sent using SMTP.
>
>Should I be converting to string somewhere?

I have not used poplib, but the Python email modules have a BytesParser, 
which gets you a Message object; I would feed the poplib bytes to that 
to parse the received message.  A Message object can then be transcribed 
as text via its .as_string method. Or you can do other things with it.

I think my main points are:

- know whether you're using bytes (uninterpreted data) or text (strings 
  of _characters_); treating bytes _as_ text implies an encoding, and 
  when that assumption is incorrect you get mojibake[1]

- look at the email modules' parsers, which return Messages, a 
  representation of the message in a structure (so that MIME subparts 
  etc are correctly broken out, and the character sets are _known_, post 
  parse)

[1] https://en.wikipedia.org/wiki/Mojibake

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Richard Damon
On 8/27/20 4:31 AM, Chris Green wrote:
> While an E-Mail body possibly *shouldn't* have non-ASCII characters in
> it one must be able to handle them without errors.  In fact haven't
> the RFCs changed such that the message body should be 8-bit clean?
> Anyway I think the Python 3 mail handling libraries need to be able to
> pass extended characters through without errors.

Email message a fully allowed to use non-ASCII characters in them as
long as the headers indicate this. They can be encoded either as raw 8
bit bytes on systems that are 8-bit clean, or for systems that are not,
they will need to be encoded either as base-64 or using quote-printable
encoding. These characters are to interpreted in the character set
defined (or presumed) in the header, or even some other binary object
like and image or executable if the content type isn't text.

Because of this, the Python 3 str type is not suitable to store an email
message, since it insists on the string being Unicode encoded, but the
Python 2 str class could hold it.

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python installer hangs in Windows 7

2020-08-27 Thread mikkow34ify
poniedziałek, 16 kwietnia 2018 o 10:51:37 UTC+2 jtsh...@gmail.com napisał(a):
> On Monday, February 6, 2017 at 10:46:24 AM UTC+5:30, Jean-Claude Roy wrote: 
> > I am trying to install Python 3.6.0 on a Windows 7 computer. 
> > The download of 29.1 MB is successful and I get the nextwindow. I choose 
> > the "install now" selection and thatopens the Setup Program window. 
> > Now the trouble starts:I get "Installing:" and the Initialization 
> > progress...and nothing else. 
> > There is no additional disk activity, no progress on initialization, 
> > andeverything appears dead. Even after 20 minutes there is zero progress. 
> > I've repeated this as both a user and the administrator of this 
> > Windowscomputer. I get the same results in either case. 
> > If I go to the task manager it shows that Python 3.6.0 (32-bit) setup is 
> > running. If I try to end the task Iget the message that the program is not 
> > responding. 
> > Do you have any suggestions as to how I can get past this? 
> > Thank you.
> Uncheck the install for all users part and it will work and log into each 
> user individually and then install.
Unchecking  works also for 3.8.5 version. Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list


Aw: Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Karsten Hilbert
> Because of this, the Python 3 str type is not suitable to store an email
> message, since it insists on the string being Unicode encoded,

I should greatly appreciate to be enlightened as to what
a "string being Unicode encoded" is intended to say ?

Thanks,
Karsten
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Chris Angelico
On Thu, Aug 27, 2020 at 11:10 PM Karsten Hilbert
 wrote:
>
> > Because of this, the Python 3 str type is not suitable to store an email
> > message, since it insists on the string being Unicode encoded,
>
> I should greatly appreciate to be enlightened as to what
> a "string being Unicode encoded" is intended to say ?
>

A Python 3 "str" or a Python 2 "unicode" is an abstract sequence of
Unicode codepoints. As such, it's not suitable for transparently
round-tripping an email, as it would lose information about the way
that things were encoded. However, it is excellent for building and
processing emails - you deal with character encodings at the same
point where you deal with the RFC 822 header format. In the abstract,
your headers might be stored in a dict, but then you encode them to a
flat sequence of bytes by putting "Header: value", wrapping correctly
- and also encode the text into bytes at the same time.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Aw: Re: Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Karsten Hilbert
> > > Because of this, the Python 3 str type is not suitable to store an email
> > > message, since it insists on the string being Unicode encoded,
> >
> > I should greatly appreciate to be enlightened as to what
> > a "string being Unicode encoded" is intended to say ?
> >
>
> A Python 3 "str" or a Python 2 "unicode" is an abstract sequence of
> Unicode codepoints.

OK, I figured that much. So it was the "encoded" that threw me off.

Being a sequence of Unicode codepoints makes it en-Uni-coded at
a technically abstract level while I assumed the "encoded" is meant
to somehow reference ''.encode() and friends.

Karsten

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How do I do this in Python 3 (string.join())?

2020-08-27 Thread Chris Green
Cameron Simpson  wrote:
> On 27Aug2020 09:16, Chris Green  wrote:
> >Cameron Simpson  wrote:
> >> But note: joining bytes like strings is uncommon, and may indicate 
> >> that
> >> you should be working in strings to start with. Eg you may want to
> >> convert popmsg from bytes to str and do a str.join anyway. It depends on
> >> exactly what you're dealing with: are you doing text work, or are you
> >> doing "binary data" work?
> >>
> >> I know many network protocols are "bytes-as-text, but that is
> >> accomplished by implying an encoding of the text, eg as ASCII, where
> >> characters all fit in single bytes/octets.
> >>
> >Yes, I realise that making everything a string before I start might be
> >the 'right' way to do things but one is a bit limited by what the mail
> >handling modules in Python provide.
> 
> I do ok, though most of my message processing happens to messages 
> already landed in my "spool" Maildir by getmail. My setup uses getmail 
> to get messages with POP into a single Maildir, and then I process the 
> message files from there.
> 
Most of my mail is delivered by SMTP, I run a Postfix SMTP *serever*
on my desktop machine which stays on permanently.

The POP3 processing is solely to collect E-Mail that ends up in the
'catchall' mailbox on my hosting provider.  It empties the POP3
catchall mailbox, checks for anything that *might* be for me or other
family members then just deletes the rest.

> >E.g. in this case the only (well the only ready made) way to get a
> >POP3 message is using poplib and this just gives you a list of lines
> >made up of "bytes as text" :-
> >
> >popmsg = pop3.retr(i+1)
> 
> Ok, so you have bytes? You need to know.
> 
The documentation says (and it's exactly the same for Python 2 and
Python 3):-

POP3.retr(which)
Retrieve whole message number which, and set its seen flag. Result
is in form (response, ['line', ...], octets).

Which isn't amazingly explicit unless 'line' implies a string.


> >I join the lines to feed them into mailbox.mbox() to create a mbox I
> >can analyse and also a message which can be sent using SMTP.
> >
> >Should I be converting to string somewhere?
> 
> I have not used poplib, but the Python email modules have a BytesParser, 
> which gets you a Message object; I would feed the poplib bytes to that 
> to parse the received message.  A Message object can then be transcribed 
> as text via its .as_string method. Or you can do other things with it.
> 
> I think my main points are:
> 
> - know whether you're using bytes (uninterpreted data) or text (strings 
>   of _characters_); treating bytes _as_ text implies an encoding, and 
>   when that assumption is incorrect you get mojibake[1]
> 
> - look at the email modules' parsers, which return Messages, a 
>   representation of the message in a structure (so that MIME subparts 
>   etc are correctly broken out, and the character sets are _known_, post 
>   parse)

OK, thanks Cameron.
 
-- 
Chris Green
·
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Chris Green
Richard Damon  wrote:
> On 8/27/20 4:31 AM, Chris Green wrote:
> > While an E-Mail body possibly *shouldn't* have non-ASCII characters in
> > it one must be able to handle them without errors.  In fact haven't
> > the RFCs changed such that the message body should be 8-bit clean?
> > Anyway I think the Python 3 mail handling libraries need to be able to
> > pass extended characters through without errors.
> 
> Email message a fully allowed to use non-ASCII characters in them as
> long as the headers indicate this. They can be encoded either as raw 8
> bit bytes on systems that are 8-bit clean, or for systems that are not,
> they will need to be encoded either as base-64 or using quote-printable
> encoding. These characters are to interpreted in the character set
> defined (or presumed) in the header, or even some other binary object
> like and image or executable if the content type isn't text.
> 
> Because of this, the Python 3 str type is not suitable to store an email
> message, since it insists on the string being Unicode encoded, but the
> Python 2 str class could hold it.
> 
Which sounds like the core of my problem[s]! :-)

As I said my system (ignoring the Python issues) is all UTF8 and all
seems to work well so I think it's pretty much correctly configured.
When I send mail that has accented and other extended characters in it
the E-Mail headers have:-
Content-Type: text/plain; charset=utf-8

If I save a message like the above sent to myself it's stored using
the UTF8 characters directly, I can open it with my text editor (which
is also UTF8 aware) and see the characters as I entered them, there's
no encoding because my system is 8-bit clean and I'm talking to myself
as it were.

The above is using Python 2 to handle and filter my incoming mail
which, as you say, works fine.  However when I try switching to Python
3 I get the errors I've been asking about, even though this is
'talking to myself' and the E-Mail message is just UTF8.


-- 
Chris Green
·
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Barry


> On 27 Aug 2020, at 10:40, Chris Green  wrote:
> 
> Karsten Hilbert  wrote:
>>> Terry Reedy  wrote:
> On 8/26/2020 11:10 AM, Chris Green wrote:
> 
>> I have a simple[ish] local mbox mail delivery module as follows:-
> ...
>> It has run faultlessly for many years under Python 2.  I've now
>> changed the calling program to Python 3 and while it handles most
>> E-Mail OK I have just got the following error:-
>> 
>> Traceback (most recent call last):
>>   File "/home/chris/.mutt/bin/filter.py", line 102, in 
>> mailLib.deliverMboxMsg(dest, msg, log)
> ...
>>   File "/usr/lib/python3.8/email/generator.py", line 406, in write
>> self._fp.write(s.encode('ascii', 'surrogateescape'))
>> UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in
> position 4: ordinal not in range(128)

I would guess the fix is do s.encode(‘utf-8’).

You might need to add a header to say that you are using utf-8 to the 
email/mime-part.

If you do that does your code work?

Barry


> 
> '\ufeff' is the Unicode byte-order mark.  It should not be present in an
> ascii-only 3.x string and would not normally be present in general
> unicode except in messages like this that talk about it.  Read about it,
> for instance, at
> https://en.wikipedia.org/wiki/Byte_order_mark
> 
> I would catch the error and print part or all of string s to see what is
> going on with this particular message.  Does it have other non-ascii 
> chars?
> 
>>> I can provoke the error simply by sending myself an E-Mail with
>>> accented characters in it.  I'm pretty sure my Linux system is set up
>>> correctly for UTF8 characters, I certainly seem to be able to send and
>>> receive these to others and I even get to see messages in other
>>> scripts such as arabic, chinese, etc.
>>> 
>>> The code above works perfectly in Python 2 delivering messages with
>>> accented (and other extended) characters with no problems at all.
>>> Sending myself E-Mails with accented characters works OK with the code
>>> running under Python 2.
>>> 
>>> While an E-Mail body possibly *shouldn't* have non-ASCII characters in
>>> it one must be able to handle them without errors.  In fact haven't
>>> the RFCs changed such that the message body should be 8-bit clean?
>>> Anyway I think the Python 3 mail handling libraries need to be able to
>>> pass extended characters through without errors.
>> 
>> Well, '\ufeff' is not a *character* at all in much of any
>> sense of that word in unicode.
>> 
>> It's a marker. Whatever puts it into the stream is wrong. I guess the
>> best one can (and should) do is to catch the exception and dump
>> the offending stream somewhere binary-capable and pass on a notice. What
>> you are receiving there very much isn't a (well-formed) e-mail message.
>> 
>> I would then attempt to backwards-crawl the delivery chain to
>> find out where it came from.
>> 
> The error seems to occur with any non-7-bit-ASCII, e.g. my accented
> characters gave:-
> 
>  File "/usr/lib/python3.8/email/generator.py", line 406, in write
>  self._fp.write(s.encode('ascii', 'surrogateescape'))
>  UnicodeEncodeError: 'ascii' codec can't encode character
>  '\u2019' in position 34: ordinal not in
>   range(128)
> 
> It just happened that the first example was an escape.
> 
> -- 
> Chris Green
> ·
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Barry Scott



> On 26 Aug 2020, at 16:10, Chris Green  wrote:
> 
>  UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in 
> position 4: ordinal not in range(128)
> 
> So what do I need to do to the message I'm adding with mbx.add(msg) to
> fix this?  (I assume that's what I need to do).

>>> import unicodedata
>>> unicodedata.name('\ufeff')
'ZERO WIDTH NO-BREAK SPACE'

I guess the editor you use to compose the text is adding that to your message.

Barry

-- 
https://mail.python.org/mailman/listinfo/python-list


Video file to subtitles file

2020-08-27 Thread Muskan Sanghai
I would be really thankful if someone can suggest me how can I generate 
subtitles file (srt format) from a video or audio without using Google cloud  
and AWS. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: ABC with abstractmethod: kwargs on Base, explicit names on implementation

2020-08-27 Thread Dieter Maurer
Samuel Marks wrote at 2020-8-27 15:58 +1000:
>The main thing I want is type safety. I want Python to complain if the
>callee uses the wrong argument types, and to provide suggestions on
>what's needed and info about it.
>
>Without a base class I can just have docstrings and type annotations
>to achieve that.
>
>What can I use that will require all implementers to have a minimum of
>the same properties and arguments, but also allow them to add new
>properties and arguments?

A main paradigm of object oriented programming is the
ability to use a derived class instance with knowledge only
about the base class. This implies that in many cases, you
need not know the concrete class because any instance of a derived
class must have the essential behavior of the base class instances.

This paradigm imposes limitations on the allowable signature changes.
An overriding method may add further parameters but all those
must have default values - otherwise, the use with base class knowledge
only would cause errors.

> Preferably I would like this all to happen before compile/interpret
time.

Use a "lint" version to check compatibilty.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Video file to subtitles file

2020-08-27 Thread Barry Scott



> On 27 Aug 2020, at 18:00, Muskan Sanghai  wrote:
> 
> I would be really thankful if someone can suggest me how can I generate 
> subtitles file (srt format) from a video or audio without using Google cloud  
> and AWS. 

What do you know about how subtitles work with video?  Do you mean you want to 
extract the bitmap subtitle data from a MPEG video?

Barry



> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread MRAB

On 2020-08-27 17:29, Barry Scott wrote:




On 26 Aug 2020, at 16:10, Chris Green  wrote:

 UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in position 
4: ordinal not in range(128)

So what do I need to do to the message I'm adding with mbx.add(msg) to
fix this?  (I assume that's what I need to do).



import unicodedata
unicodedata.name('\ufeff')

'ZERO WIDTH NO-BREAK SPACE'

I guess the editor you use to compose the text is adding that to your message.


That's used as a BOM (Byte-Order Marker) at the start of UTF16-BE.
It's also used at the start of UTF-8-SIG.
--
https://mail.python.org/mailman/listinfo/python-list


Python 3 how to convert a list of bytes objects to a list of strings?

2020-08-27 Thread Chris Green
This sounds quite an easy thing to do but I can't find how to do it
elegantly.

I have a list of bytes class objects (i.e. a list containing sequences
of bytes, which are basically text) and I want to convert it to a list
of string objects.

One of the difficulties of finding out how to do this is that 'list of
bytes' tends to mean a bytes object with a sequence of bytes in it
which is *not* what I'm after converting. :-)

Obviously I can do:-

bbb = [b'aaa', b'bbb', b'ccc']
sss = []
for i in range(0, 2):
sss.append(str(bbb[i])

but that does seem a bit clumsy.  Is there a better way?


-- 
Chris Green
·
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 3 how to convert a list of bytes objects to a list of strings?

2020-08-27 Thread Chris Angelico
On Fri, Aug 28, 2020 at 6:36 AM Chris Green  wrote:
>
> This sounds quite an easy thing to do but I can't find how to do it
> elegantly.
>
> I have a list of bytes class objects (i.e. a list containing sequences
> of bytes, which are basically text) and I want to convert it to a list
> of string objects.
>
> One of the difficulties of finding out how to do this is that 'list of
> bytes' tends to mean a bytes object with a sequence of bytes in it
> which is *not* what I'm after converting. :-)
>
> Obviously I can do:-
>
> bbb = [b'aaa', b'bbb', b'ccc']
> sss = []
> for i in range(0, 2):
> sss.append(str(bbb[i])
>
> but that does seem a bit clumsy.  Is there a better way?
>

Firstly, you shouldn't iterate over the range, but over the items themselves:

for word in bbb:

But this is a really good job for a list comprehension:

sss = [str(word) for word in bbb]

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 3 how to convert a list of bytes objects to a list of strings?

2020-08-27 Thread Marco Sulla
Are you sure you want `str()`?

>>> str(b'aaa')
"b'aaa'"

Probably you want:

map(lambda x: x.decode(), bbb)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 3 how to convert a list of bytes objects to a list of strings?

2020-08-27 Thread Cameron Simpson
On 27Aug2020 23:54, Marco Sulla  wrote:
>Are you sure you want `str()`?
>
 str(b'aaa')
>"b'aaa'"
>
>Probably you want:
>
>map(lambda x: x.decode(), bbb)

_And_ you need to know the encoding of the text in the bytes. The above 
_assumes_ UTF-8 because that is the default for bytes.decode, and if 
that is _not_ what is in the bytes objects you will get mojibake.

Because a lot of stuff is "mostly ASCII", this is the kind of bug which 
can lurk until much later when you have less usual data.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How do I do this in Python 3 (string.join())?

2020-08-27 Thread Cameron Simpson
On 27Aug2020 14:36, Chris Green  wrote:
>Cameron Simpson  wrote:
>> I do ok, though most of my message processing happens to messages
>> already landed in my "spool" Maildir by getmail. My setup uses getmail
>> to get messages with POP into a single Maildir, and then I process the
>> message files from there.
>>
>Most of my mail is delivered by SMTP, I run a Postfix SMTP *serever*
>on my desktop machine which stays on permanently.

I run postfix on my machines too, including my laptop, but mostly for 
sending - it means I can queue messages while offline, and they'll go 
out later.

I don't receive SMTP on my laptop (which is where my mail lives); I 
receive elsewhere such as the machine hosting my email domain (which 
also runs postfix), and the various external addresses I have (one for 
each ISP of course, and a couple of external email addresses such as a 
GMail one (largely to interact with stuff like Google Groups, which is 
pretty parochial).

So I use getmail to fetch from most of these (GMail just forwards a copy 
of everything "personal" to my primary address) and deliver to a spool 
Maildir on my laptop, and the mailfiler processes the spool Maildir.

>The POP3 processing is solely to collect E-Mail that ends up in the
>'catchall' mailbox on my hosting provider.  It empties the POP3
>catchall mailbox, checks for anything that *might* be for me or other
>family members then just deletes the rest.

Very strong email policy, that one. Personally I fear data loss, and 
process everything; anything which doesn't match a rule lands in my 
"UNKNOWN" mail folder for manual consideration when I'm bored. It is 
largely spam, but sometimes has a message wanting a new filing rule.

>> >E.g. in this case the only (well the only ready made) way to get a
>> >POP3 message is using poplib and this just gives you a list of lines
>> >made up of "bytes as text" :-
>> >
>> >popmsg = pop3.retr(i+1)
>>
>> Ok, so you have bytes? You need to know.
>>
>The documentation says (and it's exactly the same for Python 2 and
>Python 3):-
>
>POP3.retr(which)
>Retrieve whole message number which, and set its seen flag. Result
>is in form (response, ['line', ...], octets).
>
>Which isn't amazingly explicit unless 'line' implies a string.

Aye. But "print(repr(a_pop_line))" will tell you. Almost certainly a 
string-of-bytes, so I would expect bytes. The docs are probably 
unchanged during the Python2->3 move.

>> >I join the lines to feed them into mailbox.mbox() to create a mbox I
>> >can analyse and also a message which can be sent using SMTP.

Ah. I like Maildirs for analysis; every message has its own file, which 
makes adding and removing messages easy, and avoids contention with 
other things using the Maildir.

My mailfiler can process Maildirs (scan, add, remove) and add to 
Maildirs and mboxes.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Output showing "None" in Terminal

2020-08-27 Thread Py Noob
Thank you so much for the help.

I'm self-studying and watching tutorials on youTube. The problem was given
as an exercise after the tutorial.
I did modify my code based on the suggestions here and it helps.

Thank you!

On Tue, Aug 25, 2020 at 4:31 PM Schachner, Joseph <
joseph.schach...@teledyne.com> wrote:

> The very first line of your function km_mi(): ends it:
> def km_mi():
> return answer
>
> answer has not been assigned, so it returns None.
>
> Advice: remove that "return" line from there.  Also get rid of the last
> line, answer = km_mi which makes answer refer to the function km_mi().
> Put the "return answer" line at the end, where the "answer=km_mi" used to
> be.
>
> That should help.  The code calculates "answer".   It prints "answer".
>  You should return "answer" at the end, after it has been calculated.
>
> --- Joseph S.
>
> -Original Message-
> From: Py Noob 
> Sent: Monday, August 24, 2020 9:12 AM
> To: python-list@python.org
> Subject: Output showing "None" in Terminal
>
> Hi!
>
> i'm new to python and would like some help with something i was working on
> from a tutorial. I'm using VScode with 3.7.0 version on Windows 7. Below is
> my code and the terminal is showing the word "None" everytime I execute my
> code.
>
> Many thanks!
>
> print("Conversion")
>
> def km_mi():
> return answer
>
> selection = input("Type mi for miles or km for kilometers: ")
>
> if selection == "mi":
> n = int(input(print("Please enter distance in miles: ")))
> answer = (1.6*n)
> print("%.2f" % answer, "miles")
>
> else:
> n = float(input(print("Please enter distance in kilometers: ")))
> answer = (n/1.6)
> print("%.2f" % answer, "kilometers")
>
> answer = km_mi
>
>
-- 
https://mail.python.org/mailman/listinfo/python-list