imaplib: is this really so unwieldy?

2021-05-25 Thread hw



Hi,

I'm about to do stuff with emails on an IMAP server and wrote a program 
using imaplib which, so far, gets the UIDs of the messages in the inbox:



#!/usr/bin/python

import imaplib
import re

imapsession = imaplib.IMAP4_SSL('imap.example.com', port = 993)

status, data = imapsession.login('user', 'password')
if status != 'OK':
print('Login failed')
exit

messages = imapsession.select(mailbox = 'INBOX', readonly = True)
typ, msgnums = imapsession.search(None, 'ALL')
message_uuids = []
for number in str(msgnums)[3:-2].split():
status, data = imapsession.fetch(number, '(UID)')
if status == 'OK':
match = re.match('.*\(UID (\d+)\)', str(data))
message_uuids.append(match.group(1))
for uid in message_uuids:
print('UID %5s' % uid)

imapsession.close()
imapsession.logout()


It's working (with Cyrus), but I have the feeling I'm doing it all wrong 
because it seems so unwieldy.  Apparently the functions of imaplib 
return some kind of bytes while expecting strings as arguments, like 
message numbers must be strings.  The documentation doesn't seem to say 
if message UIDs are supposed to be integers or strings.


So I'm forced to convert stuff from bytes to strings (which is weird 
because bytes are bytes) and to use regular expressions to extract the 
message-uids from what the functions return (which I shouldn't have to 
because when I'm asking a function to give me a uid, I expect it to 
return a uid).


This so totally awkward and unwieldy and involves so much overhead that 
I must be doing this wrong.  But am I?  How would I do this right?

--
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Chris Angelico
On Tue, May 25, 2021 at 1:00 PM hw  wrote:
>
> On 5/24/21 3:54 PM, Chris Angelico wrote:
> > You keep using that word "unfinished". I do not think it means what
> > you think it does.
>
> What do you think I think it means?

I think it means that the language is half way through development,
doesn't have enough features to be usable, isn't reliable enough for
production, and might at some point in the future become ready to use.

None of which is even slightly supported by evidence.

> > Python has keywords. C has keywords. In Python, "None" is a keyword,
> > so you can't assign to it; in C, "int" is a keyword, so you can't
> > assign to it. There is no fundamental difference here, and the two
> > languages even have roughly the same number of keywords (35 in current
> > versions of Python; about 40 or 50 in C, depending on the exact
> > specification you're targeting). The only difference is that, in
> > Python, type names aren't keywords. You're getting caught up on a
> > trivial difference that has happened to break one particular test that
> > you did, and that's all.
>
> Then what is 'float' in the case of isinstance() as the second
> parameter, and why can't python figure out what 'float' refers to in
> this case?  Perhaps type names should be keywords to avoid confusion.

It's a name. In Python, any name reference is just a name reference.
There's no magic about the language "knowing" that the isinstance()
function should take a keyword, especially since there's no keywords
for these things.

> >> Maybe you can show how this is a likeable feature.  I already understood
> >> that you can somehow re-define functions in python and I can see how
> >> that can be useful.  You can do things like that in elisp as well.  But
> >> easily messing up built-in variable types like that is something else.
> >> Why would I want that in a programming language, and why would I want to
> >> use one that allows it?
> >
> > Because all you did was mess with the *name* of the type. It's not
> > breaking the type system at all.
>
> And how is it a likeable feature?

You can complain about whether it's likeable or not, but all you're
doing is demonstrating the Blub Paradox.

> > The C language never says that Python is "unfinished". I'm not making
> > assumptions, I'm reading your posts.
>
> I never said it is unfinished, I said it /seems/ unfinished.  In any
> case, there is nothing insulting about it.  Python is still being worked
> on (which is probably a good thing), and the switch from version 2 to
> version 3 has broken a lot of software, which doesn't help in making it
> appear as finished or mature.

It's been around for thirty years. Quit fudding. You're getting very
close to my killfile.

Python 3 has been around since 2009. Are you really telling me that
Python looks unfinished because of a breaking change more than a
decade ago? The Go language didn't even *exist* before Python 3 - does
that mean that Go is also unfinished?

> Just look at what the compiler says when you try to compile these
> examples.  In the first example, you can't defeat a built-in data type
> by assigning something to it, and in the second one, you declare
> something as an instance of a build-in data type and then try to use it
> as a function.  That is so because the language is designed as it is.

Yes, because C uses keywords for types. That's the only difference
you're seeing here. You keep getting caught up on this one thing, one
phenomenon that comes about because of YOUR expectations that Python
and C should behave the same way. If you weren't doing isinstance
checks, you wouldn't even have noticed this! It is *NOT* a fundamental
difference.

Also, you keep arguing against the language, instead of just using it
the way it is. It really sounds to me like you'd do better to just
give up on Python and go use some language that fits your brain
better. If you won't learn how a language works, it's not going to
work well for you.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Shadowing, was Re: learning python ...

2021-05-25 Thread Peter Otten

On 25/05/2021 05:20, hw wrote:


We're talking about many different things. If it's simply "num = ..."
followed by "num = ...", then it's not a new variable or anything,
it's simply rebinding the same name. But when you do "int = ...", it's
shadowing the builtin name.


Why?  And how is that "shadowing"?

What if I wanted to re-define the built-in thing?



When you write

foo

in a module the name "foo" is looked up in the module's global 
namespace. If it's not found it is looked up in builtins. If that lookup 
fails a NameError exception is raised.


>>> import builtins
>>> builtins.foo = "built-in foo"
>>> foo
'built-in foo'
>>> foo = "module-global-foo"  # at this point builtins.foo is shadowed
>>> foo
'module-global-foo'
>>> del foo # delete the global to make the built-in visible again:
>>> foo
'built-in foo'

That mechanism allows newbies who don't know the builtins to write

list = [1, 2, 3]

without affecting other modules they may use. It also allows old scripts 
that were written when a builtin name did not yet exist to run without 
error.


The problem you ran into, using a name in two roles

float = float("1.2")

could be circumvented by writing

float = builtins.float("1.2")

but most of the time it is more convenient to think of builtins as names 
that are predefined in your current module and act accordingly.


As you see redefining a builtin is as easy as importing builtins and 
setting the respective attribute


>>> builtins.float = int
>>> float(1.23)
1

--
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Cameron Simpson
On 25May2021 10:23, hw  wrote:
>I'm about to do stuff with emails on an IMAP server and wrote a program 
>using imaplib which, so far, gets the UIDs of the messages in the 
>inbox:
>
>
>#!/usr/bin/python

I'm going to assume you're using Python 3.

>import imaplib
>import re
>
>imapsession = imaplib.IMAP4_SSL('imap.example.com', port = 993)
>
>status, data = imapsession.login('user', 'password')
>if status != 'OK':
>print('Login failed')
>exit

Your "exit" won't do what you want. I expect this code to raise a 
NameError exception here (you've not defined "exit"). That _will_ abort 
the programme, but in a manner indicating that you're used an unknown 
name.  You probably want:

sys.exit(1)

You'll need to import "sys".

>messages = imapsession.select(mailbox = 'INBOX', readonly = True)
>typ, msgnums = imapsession.search(None, 'ALL')

I've done little with IMAP. What's in msgnums here? Eg:

print(type(msgnums), repr(msgnums))

just so we all know what we're dealing with here.

>message_uuids = []
>for number in str(msgnums)[3:-2].split():

This is very strange. Did you see the example at the end of the module 
docs, it has this example code:

import getpass, imaplib

M = imaplib.IMAP4()
M.login(getpass.getuser(), getpass.getpass())
M.select()
typ, data = M.search(None, 'ALL')
for num in data[0].split():
typ, data = M.fetch(num, '(RFC822)')
print('Message %s\n%s\n' % (num, data[0][1]))
M.close()
M.logout()

It is just breaking apart data[0] into strings which were separated by 
whitespace in the response. And then using those same strings as keys 
for the .fecth() call. That doesn't seem complex, and in fact is blind 
to the format of the "message numbers" returned. It just takes what it 
is handed and uses those to fetch each message.

>status, data = imapsession.fetch(number, '(UID)')
>if status == 'OK':
>match = re.match('.*\(UID (\d+)\)', str(data))
[...]
>It's working (with Cyrus), but I have the feeling I'm doing it all 
>wrong because it seems so unwieldy.

IMAP's quite complex. Have you read RFC2060?

https://datatracker.ietf.org/doc/html/rfc2060.html

The imaplib library is probably a fairly basic wrapper for the 
underlying protocol which provides methods for the basic client requests 
and conceals the asynchronicity from the user for ease of (basic) use.

>Apparently the functions of imaplib return some kind of bytes while 
>expecting strings as arguments, like message numbers must be strings.  
>The documentation doesn't seem to say if message UIDs are supposed to 
>be integers or strings.

You can go a long way by pretending that they are opaque strings. That 
they may be numeric in content can be irrelevant to you. treat them as 
strings.

>So I'm forced to convert stuff from bytes to strings (which is weird 
>because bytes are bytes)

"bytes are bytes" is tautological. You're getting bytes for a few 
reasons:

- the imap protocol largely talks about octets (bytes), but says they're
  text. For this reason a lot of stuff you pass as client parameters are
  strings, because strings are text.

- text may be encoded as bytes in many ways, and without knowing the
  encoding, you can't extract text (strings) from bytes

- the imaplib library may date from Python 2, where the str type was
  essentially a byte sequence. In Python 3 a str is a sequence of
  Unicode code points, and you translate to/from bytes if you need to
  work with bytes.

Anyway, the IMAP response are bytes containing text. You get a lot of 
bytes.

When you go:

text = str(data)

that is _assuming_ a particular text encoding stored in the data. You 
really ought to specify an encoding here. If you've not specified the 
CHARSET for things, 'ascii' would be a conservative choice. The IMAP RFC 
talks about what to expect in section 4 (Data Formats). There's quite a 
lot of possible response formats and I can understand imaplib not 
getting deeply into decoding these.

>and to use regular expressions to extract the message-uids from what 
>the functions return (which I shouldn't have to because when I'm asking 
>a function to give me a uid, I expect it to return a uid).

No, you're asking the IMAP _protocol_ to return you UIDs. The module 
itself doesn't parse what you ask for in the fetch results, and 
therefore it can't decode the response (data bytes) into some higher 
level thing (such as UIDs in your case, but you can ask for all sorts of 
weird stuff with IMAP).

So having passed '(UID)' to the SEARCH request, you now need to parse 
the response.

>This so totally awkward and unwieldy and involves so much overhead 
>that I must be doing this wrong.  But am I?  How would I do this right?

Well, you _could_ get immersed in the nitty gritty of the IMAP protocol 
and the imaplib module, _or_ you could see if someone else has done some 
work to make this easier by writing a higher level library. A search at 
pypi.org for "imap" found a lot of stuff. 

Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Chris Angelico
On Tue, May 25, 2021 at 8:21 PM Cameron Simpson  wrote:
> When you go:
>
> text = str(data)
>
> that is _assuming_ a particular text encoding stored in the data. You
> really ought to specify an encoding here. If you've not specified the
> CHARSET for things, 'ascii' would be a conservative choice. The IMAP RFC
> talks about what to expect in section 4 (Data Formats). There's quite a
> lot of possible response formats and I can understand imaplib not
> getting deeply into decoding these.

Worse than that: what you actually get is the repr of the bytes. That
might happen to look a lot like an ASCII decode, but if the string
contains unprintable characters, quotes, or anything outside of the
ASCII range, it's going to represent it as an escape code.

The best way to turn bytes into text is the decode method:

data.decode("UTF-8")

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Richard Damon
On 5/25/21 12:08 AM, hw wrote:
>
> Are all names references?  When I pass a name as a parameter to a
> function, does the object the name is referring to, when altered by
> the function, still appear altered after the function has returned?  I
> wouldn't expect that ...

If you mutate the object the parameter was bound to, the calling
function will see the changed object. (This requires the object to BE
mutateable, like a list, not an int)

If you rebind that parameter to a new object, the calling function
doesn't see the change, as its name wasn't rebound.

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Skip Montanaro
> It's working (with Cyrus), but I have the feeling I'm doing it all wrong
> because it seems so unwieldy.

I have a program, Polly , which I
wrote to generate XKCD 936 passphrases. (I got the idea - and the name -
from Chris Angelico. See the README.) It builds its dictionary from emails
in my Gmail account which are tagged "polly" by a Gmail filter. I had put
it away for a few years, at which time it was still using Python 2. When I
came back to it, I wanted to update it to Python 3. As with so many 2-to-3
ports, the whole bytes/str problem was my stumbling block. Imaplib's API
(as you've discovered) is not the most Pythonic. I didn't spend much time
horsing around with it. Instead, I searched for higher-level packages,
eventually landing on IMAPClient .
Once I made the switch, things came together pretty quickly, due in large
part, I think, to its more sane API.

YMMV, but you're more than welcome to steal code from Polly.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Michael Torrie
On 5/24/21 9:53 PM, hw wrote:
> That seems like an important distinction.  I've always been thinking of 
> variables that get something assigned to them, not as something that is 
> being assigned to something.

Your thinking is not incorrect.  Assignment is how you set a variable to
something.  For the most part the details of how the variables work
doesn't matter all that much.  An expression in Python works about the
same as it does in other languages.  Where it becomes important to
understand the name binding mechanism is in situations like you found
yourself.  What happens, for example, when you do something like
float=5? Hence the discussion about name shadowing.

The reason I brought up the distinction of how python's variables work
compared to a language like C is because under the hood Python's
assignment doesn't "alter" the variable.  Assignment replaces it
entirely in the name space.  This is consistent with a more formal
definition of variable found in lambda calculus.  I learned in uni there
are some formal languages that don't allow any variable names to be
rebound at all, which makes formal proofs and analysis easier. But I
digress.

There are also implications for parameter passing.  All of this is in
the language reference documentation of course.  But even still there
have been many arguments about whether Python is pass by value or pass
by reference.  Consider:

def foo(bar):
bar += 1

a = 5
foo(a)
print(a)

or

def frob(foo):
foo.append('bar')

a = [ 'one', 'two' ]
frob(a)
print(a)

The truth is Python might be said to "pass by object."  In other words
when you call a function, it goes through the names table and extracts
references to all the objects involves with the arguments and passes
those objects to the function.  Objects that are mutable can be changed
by a function, and those changes are visible in the code that called it,
since both caller and callee are dealing with the *same object*, just by
different names (aliases).  Strings and other values like ints are
*immutable*.  They cannot be changed.  Assignment will not change them,
only overwrite the names in the locals table.

> I would think of it as assigning a string to a variable and then 
> changing the content of the variable by assigning something else to the 
> same variable.  When variables are typeless, it doesn't matter if a 
> string or an integer is assigned to one (which is weird but can be very 
> useful).

Yes that's how it's done in many lower-level languages.  Python does not
assign that way, though. It's not clearing the contents and placing
something else there. Instead assignment overwrites the binding in the
name table, connecting the name to the new string object that was
created. The old object is dereferenced, and the garbage collector will
eventually remove it.

> It seems much more practical to assign different strings to the same 
> variable rather than assigning a different variable to each string, or 
> to assign a string to a variable and then to assign an integer to it.

How exactly would one overwrite an integer in memory with a string,
though?  You would have to either preallocate a lot of memory for it in
case something large were to be written to the variable, or you'd
allocate it on the heap on demand and use a reference for it.  Under the
hood, Python does the second.  How else would you do it?

> Isn't that what variables are for?

In the formal sense, variables are just names that stand in for values.
 Don't get too hung up on the mechanics of how one implements that as
being a formal part of the definition, and don't think that one
language's implementation of variables is the only way to do it.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Greg Ewing

On 25/05/21 2:59 pm, hw wrote:
Then what is 'float' in the case of isinstance() as the second 
parameter, and why can't python figure out what 'float' refers to in 
this case?  


You seem to be asking for names to be interpreted differently
when they are used as parameters to certain functions.

Python doesn't do that sort of thing. The way it evaluates
expressions is very simple and consistent, and that's a good
thing. It means there aren't any special cases to learn and
remember.

Maybe you're not aware that isinstance is just a function,
and not any kind of special syntax?


Perhaps type names should be keywords to avoid confusion.


Python has quite a lot of built-in types, some of them in
the builtin namespace, some elsewhere. Making them all keywords
would be impractical, even if it were desirable.

And what about user-defined types? Why should they be treated
differently to built-in types? Or are you suggesting there
should be a special syntax for declaring type names?

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Alan Gauld via Python-list
On 25/05/2021 00:41, Jon Ribbens via Python-list wrote:

> What would you call the argument to a function that
> returns, say, an upper-cased version of its input?

Probably 'candidate' or 'original' or 'initial' or
somesuch.  Or even just 's'. Single character names
are OK when there is no significant meaning to convey!

But never a type name since the type could change or
be extended (like bytes or even a user defined string
subclass.)

The exception being where it's a teaching exercise
where the type is important, but even there I'd precede
it with an article: aString, the_string or similar.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Grant Edwards
On 2021-05-25, hw  wrote:

> I'm about to do stuff with emails on an IMAP server and wrote a program 
> using imaplib

My recollection of using imaplib a few years ago is that yes, it is
unweildy, oddly low-level, and rather un-Pythonic (excuse my
presumption in declaring what is and isn't "Pythonic").

I switched to using imaplib2 and found it much easier to use. It's a
higher-level wrapper for imaplib.

I think this is the currently maintained fork:

  https://github.com/jazzband/imaplib2

I haven't activly used either for several years, so things may have
changed...

--
Grant

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Greg Ewing

On 25/05/21 5:56 pm, Avi Gross wrote:

Var = read in something from a file and make some structure like a data.frame
Var = remove some columns from the above thing pointed to by Var
Var = make some new calculated columns ditto
Var = remove some rows ...
Var = set some kind of grouping on the above or sort it and so on.


As long as all the values are of the same type, this isn't too bad,
although it might interfere with your ability to give the intermediate
results names that help the reader understand what they refer to.

A variable that refers to things of different *types* at different
times is considerably more confusing, both for a human reader and
for any type checking software you might want to use.


How can you write a recursive function without this kind of variable shadowing? 
Each invocation of a function places the internal namespace in front of the 
parent so the meaning of a variable name used within is always backed by  all 
the iterations before it.


Um, no. What you're describing is called "dynamic scoping", and
Python doesn't have it. Python is *lexically* scoped, meaning that
only scopes that textually enclose the function in the source are
searched for names. Frames on the call stack don't come into it.


So what if you suggest we allow re-use of names but WARN you. ... The first 50 
places may be in other instances of the recursive function and you have already 
been warned this way 49 times


If this were to be done, the shadowing would be detected at compile
time, so you would only be warned once.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Grant Edwards
On 2021-05-24, Alan Gauld via Python-list  wrote:
> On 24/05/2021 19:48, Grant Edwards wrote:
>
>>> Traceback (  File "", line 1
>>> if = 1.234
>>>^
>>> SyntaxError: invalid syntax
>> 
>> I must admit it might be nice if the compiler told you _why_ the
>> syntax is invalid (e.g. "expected conditional expression while parsing
>> 'if' statement").
>
> Although wouldn't it be "expected boolean expression" rather than
> conditional expression? Python doesn't care how the argument  to 'if'
> is arrived at so long as it's a boolean.

Indeed -- after posting that I realized that "conditional expression"
was not the best phrase to choose because that's often used to refer
to an expression involving the new ternary operator. I should have
said "boolean valued expression".  Though in the syntax for the
ternary operator expression I've seen such a boolean valued expression
called a "conditional expression".

https://realpython.com/python-conditional-statements/

if  else 


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Grant Edwards
On 2021-05-25, Greg Ewing  wrote:
> On 25/05/21 5:56 pm, Avi Gross wrote:
>> Var = read in something from a file and make some structure like a data.frame
>> Var = remove some columns from the above thing pointed to by Var
>> Var = make some new calculated columns ditto
>> Var = remove some rows ...
>> Var = set some kind of grouping on the above or sort it and so on.
>
> As long as all the values are of the same type, this isn't too bad,
> although it might interfere with your ability to give the intermediate
> results names that help the reader understand what they refer to.

I do vaguely recall 20+ years ago when I first started writing Python
I recoiled at it, but now I don't find it to be a problem if all of
the assignments are close together as above (so that it's not possible
to see one and miss the others) and there's only one execution path
through that chunk of code.

I try to avoid it if they're spread out over hundreds of lines of code
or if there are paths that result in different types at the end.

> A variable that refers to things of different *types* at different
> times is considerably more confusing, both for a human reader and
> for any type checking software you might want to use.

Ah, I've never tried any type checking software, so that may explain
my lax attitude.

--
Grant

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Michael F. Stemper

On 24/05/2021 23.08, hw wrote:

On 5/25/21 12:37 AM, Greg Ewing wrote:



Python does have references to *objects*. All objects live on
the heap and are kept alive as long as there is at least one
reference to them.

If you rebind a name, and it held the last reference to an
object, there is no way to get that object back.


Are all names references?  When I pass a name as a parameter to a 
function, does the object the name is referring to, when altered by the 
function, still appear altered after the function has returned?  I 
wouldn't expect that ...


I just ran a quick check and java (Ack, spit) does the same thing.


--
Michael F. Stemper
Isaiah 10:1-2
--
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread hw

On 5/25/21 11:38 AM, Cameron Simpson wrote:

On 25May2021 10:23, hw  wrote:

I'm about to do stuff with emails on an IMAP server and wrote a program
using imaplib which, so far, gets the UIDs of the messages in the
inbox:


#!/usr/bin/python


I'm going to assume you're using Python 3.


Python 3.9.5


import imaplib
import re

imapsession = imaplib.IMAP4_SSL('imap.example.com', port = 993)

status, data = imapsession.login('user', 'password')
if status != 'OK':
print('Login failed')
exit


Your "exit" won't do what you want. I expect this code to raise a
NameError exception here (you've not defined "exit"). That _will_ abort
the programme, but in a manner indicating that you're used an unknown
name.  You probably want:

 sys.exit(1)

You'll need to import "sys".


Oh ok, it seemed to be fine.  Would it be the right way to do it with 
sys.exit()?  Having to import another library just to end a program 
might not be ideal.



messages = imapsession.select(mailbox = 'INBOX', readonly = True)
typ, msgnums = imapsession.search(None, 'ALL')


I've done little with IMAP. What's in msgnums here? Eg:

 print(type(msgnums), repr(msgnums))

just so we all know what we're dealing with here.


 [b'']


message_uuids = []
for number in str(msgnums)[3:-2].split():


This is very strange. Did you see the example at the end of the module
docs, it has this example code:

import getpass, imaplib

 M = imaplib.IMAP4()
 M.login(getpass.getuser(), getpass.getpass())
 M.select()
 typ, data = M.search(None, 'ALL')
 for num in data[0].split():
 typ, data = M.fetch(num, '(RFC822)')
 print('Message %s\n%s\n' % (num, data[0][1]))
 M.close()
 M.logout()


Yes, and I don't understand it.  'print(msgnums)' prints:

[b'']

when there are no messages and

[b'1 2 3 4 5']

So I was guessing that it might be an array containing a single a string 
and that refering to the first element of the array turns into a string 
with which split() can used.  But 'print(msgnums[0].split())' prints


[b'1', b'2', b'3', b'4', b'5']

so I can only guess what that's supposed to mean: maybe an array of many 
bytes?  The documentation[1] clearly says: "The message_set options to 
commands below is a string [...]"


I also need to work with message uids rather than message numbers 
because the numbers can easily change.  There doesn't seem to be a way 
to do that with this library in python.


So it's all guesswork, and I gave up after a while and programmed what I 
wanted in perl.  The documentation of this library sucks, and there are 
worlds between it and the documentation for the libraries I used with perl.


That doesn't mean I don't want to understand why this is so unwieldy. 
It's all nice and smooth in perl.


[1]: https://docs.python.org/3/library/imaplib.html


It is just breaking apart data[0] into strings which were separated by
whitespace in the response. And then using those same strings as keys
for the .fecth() call. That doesn't seem complex, and in fact is blind
to the format of the "message numbers" returned. It just takes what it
is handed and uses those to fetch each message.


That's not what the documentation says.


status, data = imapsession.fetch(number, '(UID)')
if status == 'OK':
match = re.match('.*\(UID (\d+)\)', str(data))

[...]

It's working (with Cyrus), but I have the feeling I'm doing it all
wrong because it seems so unwieldy.


IMAP's quite complex. Have you read RFC2060?

 https://datatracker.ietf.org/doc/html/rfc2060.html


Yes, I referred to it and it didn't become any more clear in combination 
with the documentation of the python library.



The imaplib library is probably a fairly basic wrapper for the
underlying protocol which provides methods for the basic client requests
and conceals the asynchronicity from the user for ease of (basic) use.


Skip Montanaro seems to say that the byte problem comes from the change 
from python 2 to 3 and there is a better library now: 
https://pypi.org/project/IMAPClient/


But the documentation seems even more sparse than the one for imaplib. 
Is it a general thing with python that libraries are not well documented?



Apparently the functions of imaplib return some kind of bytes while
expecting strings as arguments, like message numbers must be strings.
The documentation doesn't seem to say if message UIDs are supposed to
be integers or strings.


You can go a long way by pretending that they are opaque strings. That
they may be numeric in content can be irrelevant to you. treat them as
strings.


That's what I ended up doing.


So I'm forced to convert stuff from bytes to strings (which is weird
because bytes are bytes)


"bytes are bytes" is tautological.


which is a good thing


You're getting bytes for a few
reasons:

- the imap protocol largely talks about octets (bytes), but says they're
   text. For this reason a lot of stuff you pass as client parameters are
   strings, because strings are t

Re: imaplib: is this really so unwieldy?

2021-05-25 Thread MRAB

On 2021-05-25 16:41, Dennis Lee Bieber wrote:

On Tue, 25 May 2021 10:23:41 +0200, hw  declaimed the
following:



So I'm forced to convert stuff from bytes to strings (which is weird 
because bytes are bytes) and to use regular expressions to extract the 
message-uids from what the functions return (which I shouldn't have to 
because when I'm asking a function to give me a uid, I expect it to 
return a uid).



In Python 3, strings are UNICODE, using 1, 2, or 4 bytes PER CHARACTER
(I don't recall if there is a 3-byte version). If your input bytes are all
7-bit ASCII, then they map directly to a 1-byte per character string. If
they contain any 8-bit upper half character they may map into a 2-byte per
character string.


In CPython 3.3+:

U+..U+00FF are stored in 1 byte.
U+0100..U+ are stored in 2 bytes.
U+01..U+10 are stored in 4 bytes.


Bytes in Python 3 are just a binary stream, which needs an encoding to
produce characters. Use the wrong encoding (say ISO-Latin-1) when the data
is really UTF-8 will result in garbage.



--
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Grant Edwards
On 2021-05-25, MRAB  wrote:
> On 2021-05-25 16:41, Dennis Lee Bieber wrote:

>> In Python 3, strings are UNICODE, using 1, 2, or 4 bytes PER
>> CHARACTER (I don't recall if there is a 3-byte version). If your
>> input bytes are all 7-bit ASCII, then they map directly to a 1-byte
>> per character string. If they contain any 8-bit upper half
>> character they may map into a 2-byte per character string.
>> 
> In CPython 3.3+:
>
> U+..U+00FF are stored in 1 byte.
> U+0100..U+ are stored in 2 bytes.
> U+01..U+10 are stored in 4 bytes.

Are all characters in a string stored with the same "width"? IOW, does
the presense of one Unicode character in the range U+01..U+10
in a string that is otherwise all 7-bit ASCII values result in the
entire string being stored 4-bytes per character? Or is the storage
width variable within a single string?

--
Grant




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Grant Edwards
On 2021-05-25, Michael F. Stemper  wrote:
> On 24/05/2021 23.08, hw wrote:
>> On 5/25/21 12:37 AM, Greg Ewing wrote:
>> 
>> Are all names references?  When I pass a name as a parameter to a 
>> function, does the object the name is referring to, when altered by the 
>> function, still appear altered after the function has returned?  I 
>> wouldn't expect that ...
>
> I just ran a quick check and java (Ack, spit) does the same thing.

PHP might or might not do the same thing. There might or might not be
extra syntax to specify which you want. Where that syntax goes and how
it works varies depending on the version of PHP. There might also be a
global config file where you can change the "default" behavior. Or
not.

In PHP, there's nothing _but_ special cases to try to remember. And
the details for those cases change from one version to the next.

If ever there was a language that's perpetually unfinished, it's PHP.





-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Michael F. Stemper

On 24/05/2021 18.30, Alan Gauld wrote:

On 24/05/2021 16:54, Michael F. Stemper wrote:


In my early days of writing python, I created lists named "list",
dictionaries named "dict", and strings named "str". I mostly know better
now, but sometimes still need to restrain my fingers.


I think most newbie programmers make that mistake. I certainly
did when learning Pascal back in the 80's.

But I was lucky, the tutorials were run by a guy who penalized
bad variable names severely and took a half-mark off for every
bad name. We very quickly learned to choose names that were
descriptive of the purpose rather than the type.


And when I write code that models something physical, I'll create
an object with attributes named after the real-world attributes
that such an object has. For instance, a generator (NOT in the
python sense) might have attributes such as:
RealPower
ReactivePower
IncrementalCostCurve (an object all on its own)
DispatchedPower

But, when I mess around with number theory, if I need a dict
that has naturals as keys and their aliquot sums as values, it's
easy enough to fall into that trap; especially if I already have
a function AliquotSum() that populates the dictionary as it grows.


--
Michael F. Stemper
Deuteronomy 10:18-19
--
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Chris Angelico
On Wed, May 26, 2021 at 8:27 AM Grant Edwards  wrote:
>
> On 2021-05-25, MRAB  wrote:
> > On 2021-05-25 16:41, Dennis Lee Bieber wrote:
>
> >> In Python 3, strings are UNICODE, using 1, 2, or 4 bytes PER
> >> CHARACTER (I don't recall if there is a 3-byte version). If your
> >> input bytes are all 7-bit ASCII, then they map directly to a 1-byte
> >> per character string. If they contain any 8-bit upper half
> >> character they may map into a 2-byte per character string.
> >>
> > In CPython 3.3+:
> >
> > U+..U+00FF are stored in 1 byte.
> > U+0100..U+ are stored in 2 bytes.
> > U+01..U+10 are stored in 4 bytes.
>
> Are all characters in a string stored with the same "width"? IOW, does
> the presense of one Unicode character in the range U+01..U+10
> in a string that is otherwise all 7-bit ASCII values result in the
> entire string being stored 4-bytes per character? Or is the storage
> width variable within a single string?
>

Yes, any given string has a single width, which makes indexing fast.
The memory cost you're describing can happen, but apart from a BOM
widening an otherwise-ASCII string to 16-bit, there aren't many cases
where you'll get a single wide character in a narrow string. Usually,
if there are any wide characters, there'll be a good number of them
(for instance, text in any particular language will often have a lot
of characters from a block of characters allocated to it).

As an added benefit, keeping all characters the same width simplifies
string searching algorithms, if I'm reading the code correctly. Checks
like >>"foo" in some_string<< can widen the string "foo" to the width
of the target string and then search efficiently.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Terry Reedy

On 5/25/2021 1:25 PM, MRAB wrote:

On 2021-05-25 16:41, Dennis Lee Bieber wrote:


In Python 3, strings are UNICODE, using 1, 2, or 4 bytes PER 
CHARACTER


This is CPython 3.3+ specific.  Before than, it depended on the OS.  I 
believe MicroPython uses utf-8 for strings.



(I don't recall if there is a 3-byte version).


There isn't.  It would save space but cost time.


If your input bytes are all
7-bit ASCII, then they map directly to a 1-byte per character string.


If your input bytes all have the upper bit 0 and they are interpreted as 
encoding ascii characters then they map to overhead + 1 byte per char


>>> sys.getsizeof(b''.decode('ascii'))
49
>>> sys.getsizeof(b'a'.decode('ascii'))
50
>>> sys.getsizeof(11*b'a'.decode('ascii'))
60


If
they contain any 8-bit upper half character they may map into a 2-byte 
per character string.


See below.


In CPython 3.3+:

U+..U+00FF are stored in 1 byte.
U+0100..U+ are stored in 2 bytes.
U+01..U+10 are stored in 4 bytes.


In CPython's Flexible String Representation all characters in a string 
are stored with the same number of bytes, depending on the largest 
codepoint.


>>> sys.getsizeof('\U0001')
80
>>> sys.getsizeof('\U0001'*2)
84
>>> sys.getsizeof('a\U0001')
84

Bytes in Python 3 are just a binary stream, which needs an 
encoding to produce characters.


Or any other Python object.

Use the wrong encoding (say ISO-Latin-1) when the 
data is really UTF-8 will result in garbage.


So does decoding bytes as text when the bytes encode something else,
such as an image ;-).


--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Manfred Lotz
On Mon, 24 May 2021 08:14:39 +1200
Ron.Lauzon@f192.n1.z21.fsxnet (Ron Lauzon) wrote:

> -=> hw wrote to All <=-  
> 
>  hw> Traceback (most recent call last):
>  hw>File "[...]/hworld.py", line 18, in 
>  hw>  print(isinstance(int, float))
>  hw> TypeError: isinstance() arg 2 must be a type or tuple of types  
> 
>  hw> I would understand to get an error message in line 5 but not in
>  hw> 18.  Is this a bug or a feature?  
> 
> Python is case sensitive.  Is that supposed to be "isInstance"?
> 
> 

This is easy to check

$ python3
Python 3.9.5 (default, May 14 2021, 00:00:00) 
[GCC 11.1.1 20210428 (Red Hat 11.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> isinstance

>>> isInstance
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'isInstance' is not defined


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Greg Ewing

On 26/05/21 3:33 am, Dennis Lee Bieber wrote:

the OBJECTS have a type and can not change type.


Well... built-in types can't, but...

>>> class A:
...  pass
...
>>> class B:
...  pass
...
>>> a = A()
>>> type(a)

>>> a.__class__ = B
>>> type(a)


--
Greg

--
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Greg Ewing

On 26/05/21 5:21 am, hw wrote:

On 5/25/21 11:38 AM, Cameron Simpson wrote:

You'll need to import "sys".


aving to import another library just to end a program 
might not be ideal.


The sys module is built-in, so the import isn't really
loading anything, it's just giving you access to a
namespace.

But if you prefer, you can get the same result without
needing an import using

   raise SystemExit(1)

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-25 Thread Greg Ewing

On 2021-05-24, Alan Gauld via Python-list  wrote:

Although wouldn't it be "expected boolean expression" rather than
conditional expression? Python doesn't care how the argument  to 'if'
is arrived at so long as it's a boolean.


This isn't really true either. Almost every object in Python has
an interpretation as true or false, and can be used wherever a
boolean value is needed.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Cameron Simpson
On 25May2021 15:53, Dennis Lee Bieber  wrote:
>On Tue, 25 May 2021 19:21:39 +0200, hw  declaimed the
>following:
>>Oh ok, it seemed to be fine.  Would it be the right way to do it with
>>sys.exit()?  Having to import another library just to end a program
>>might not be ideal.
>
>   I've never had to use sys. for exit...
>
>C:\Users\Wulfraed>python
>Python ActivePython 3.8.2 (ActiveState Software Inc.) based on
> on win32
>Type "help", "copyright", "credits" or "license" for more information.
 exit()



I have learned a new thing today.

Regardless, hw didn't call it, just named it :-)

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Cameron Simpson
On 25May2021 19:21, hw  wrote:
>On 5/25/21 11:38 AM, Cameron Simpson wrote:
>>On 25May2021 10:23, hw  wrote:
>>>if status != 'OK':
>>>print('Login failed')
>>>exit
>>
>>Your "exit" won't do what you want. I expect this code to raise a
>>NameError exception here (you've not defined "exit"). That _will_ abort
>>the programme, but in a manner indicating that you're used an unknown
>>name.  You probably want:
>>
>> sys.exit(1)
>>
>>You'll need to import "sys".
>
>Oh ok, it seemed to be fine.  Would it be the right way to do it with 
>sys.exit()?  Having to import another library just to end a program 
>might not be ideal.

To end a programme early, yes. (sys.exit() actually just raises a 
particular exception, BTW.)

I usually write a distinct main function, so in that case one can just 
"return". After all, what seems an end-of-life circumstance in a 
standalone script like yours is just an "end this function" circumstance 
when viewed as a function, and that also lets you _call_ the main 
programme from some outer thing. Wouldn't want that outer thing 
cancelled, if it exists.

My usual boilerplate for a module with a main programme looks like this:

import sys
..
def main(argv):
... main programme, return like any other function ...
 other code for the module - functions, classes etc ...
if __name__ == '__main__':
sys.exit(main(sys.argv))

which (a) puts main(0 up the top where it can be seen, (b) makes main() 
an ordinary function like any other (c) lets me just import that module 
elsewhere and (d) no globals - everything's local to main().

The __name__ boilerplate at the bottom is the magic which figures out if 
the module was imported (__name__ will be the import module name) or 
invoked from the command line like:

python -m my_module cmd-line-args...

in which case __name__ has the special value '__main__'. A historic 
mechanism which you will convince nobody to change.

You'd be surprised how useful it is to make almost any standalone 
programme a module like this - in the medium term it almost always pays 
off for me. Even just the discipline of shoving all the formerly-global 
variables in the main function brings lower-bugs benefits.

>>I've done little with IMAP. What's in msgnums here? Eg:
>> print(type(msgnums), repr(msgnums))
>>just so we all know what we're dealing with here.
>
> [b'']
>
>>>message_uuids = []
>>>for number in str(msgnums)[3:-2].split():
>>
>>This is very strange. [...]
>Yes, and I don't understand it.  'print(msgnums)' prints:
>
>[b'']
>
>when there are no messages and
>
>[b'1 2 3 4 5']

Chris has addressed this. msgnums is list of the data components of the 
IMAP response.  By going str(msgnums) you're not getting "the message 
numbers as text" you're getting what printing a list prints. Which is 
roughly Python code: the brakcets and the repr() of each list member.

Notice that the example code accessed msgnums[0] - that is the first 
data component, a bytes. That you _can_ convert to a string (under 
assumptions about the encoding).

By getting the "str" form of a list, you're forced into the weird [3:-2] 
hack to ttrim the ends. But it is just a hack for a transcription 
mistake, not a sane parse.

>So I was guessing that it might be an array containing a single a 
>string and that refering to the first element of the array turns into 
>a string with which split() can used.  But 'print(msgnums[0].split())' 
>prints
>
>[b'1', b'2', b'3', b'4', b'5']

msgnums[0] is bytes. You can do most str things with bytes (because that 
was found to be often useful) but you get bytes back from those 
operations as you'd hope.

>so I can only guess what that's supposed to mean: maybe an array of 
>many bytes?  The documentation[1] clearly says: "The message_set 
>options to commands below is a string [...]"

But that is the parameter to the _call_: your '(UID)' parameter.

>I also need to work with message uids rather than message numbers 
>because the numbers can easily change.  There doesn't seem to be a way 
>to do that with this library in python.

By asking for UIDs you're getting uids. Do they not work in subsequent 
calls?

>So it's all guesswork, and I gave up after a while and programmed what 
>I wanted in perl.  The documentation of this library sucks, and there 
>are worlds between it and the documentation for the libraries I used 
>with perl.

I think you're better of looking for another Python imap library. The 
imaplib was basic functionality to (a) access the rpotocol in basic form 
and (b) conceal the async stuff, since IMAP is an asynchronous protocol.

You can in fact subclass it to do better things. Other library might do 
thatm or they might have written their own protocol implementations.

>That doesn't mean I don't want to understand why this is so unwieldy. 
>It's all nice and smooth in perl.

But using what library? Something out of CPAN? Those are third party 
libraries, not Perl's presupplied stuff

Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Grant Edwards
On 2021-05-25, Dennis Lee Bieber  wrote:
> On Tue, 25 May 2021 19:21:39 +0200, hw  declaimed the
> following:
>
>
>>
>>Oh ok, it seemed to be fine.  Would it be the right way to do it with 
>>sys.exit()?  Having to import another library just to end a program 
>>might not be ideal.
>
>   I've never had to use sys. for exit...
>
> C:\Users\Wulfraed>python
> Python ActivePython 3.8.2 (ActiveState Software Inc.) based on
>  on win32
> Type "help", "copyright", "credits" or "license" for more information.
 exit()
>
> C:\Users\Wulfraed>python

According to the docs (and various other sources), the global variable
"exit" is provided by the site module and is only for use at the
interactive prompt -- it should not be used in programs.

  https://docs.python.org/3/library/constants.html#exit

I get the impression that real programs should not assume that the
site module has been pre-loaded during startup.

--
Grant

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Grant Edwards
On 2021-05-25, Dennis Lee Bieber  wrote:

>>Oh ok, it seemed to be fine.  Would it be the right way to do it with 
>>sys.exit()?  Having to import another library just to end a program 
>>might not be ideal.
>
>   I've never had to use sys. for exit...

I would have sworn you used to have to import sys to use exit(). Am I
misremembering?

Apparently exit() and sys.exit() aren't the same, so what is the
difference between the builtin exit and sys.exit?

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: imaplib: is this really so unwieldy?

2021-05-25 Thread Chris Angelico
On Wed, May 26, 2021 at 4:21 PM Grant Edwards  wrote:
>
> On 2021-05-25, Dennis Lee Bieber  wrote:
>
> >>Oh ok, it seemed to be fine.  Would it be the right way to do it with
> >>sys.exit()?  Having to import another library just to end a program
> >>might not be ideal.
> >
> >   I've never had to use sys. for exit...
>
> I would have sworn you used to have to import sys to use exit(). Am I
> misremembering?
>
> Apparently exit() and sys.exit() aren't the same, so what is the
> difference between the builtin exit and sys.exit?
>

exit() is designed to be used interactively, so, among other things,
it also has a helpful repr:

>>> exit
Use exit() or Ctrl-D (i.e. EOF) to exit

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list