Re: best way to remove leading zeros from a tuple like string

2018-05-22 Thread Thomas Jollans
On 2018-05-20 23:54, Paul wrote:
> you will find several useful sites where you can test regexes.  Regex
> errors are very common, even after you have experience with them.

What's the benefit of those compared to simply trying out the regex in a
Python console?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Issue

2018-05-22 Thread Abdur-Rahmaan Janhangeer
greetings,

did you send a log file attached?

Abdur-Rahmaan Janhangeer
https://github.com/Abdur-rahmaanJ

On Tue, 22 May 2018, 10:28 sujith.j Sjk,  wrote:

> > Hi,
> >
> > Am facing the below issue when starting pyton.
> >
> >
> >
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: best way to remove leading zeros from a tuple like string

2018-05-22 Thread Paul
Thomas Jollans wrote:

> On 2018-05-20 23:54, Paul wrote:
> > you will find several useful sites where you can test regexes.
>
> What's the benefit of those compared to simply trying out the regex in a
> Python console?
>

Possibly nothing.  But there are obvious benefits compared to trying to
write and test such an expression in one's head, which was the situation
which led me to suggest it.  :)

>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Issue

2018-05-22 Thread sujith.j Sjk
yes

On Tue, May 22, 2018 at 3:04 PM, Abdur-Rahmaan Janhangeer <
arj.pyt...@gmail.com> wrote:

> greetings,
>
> did you send a log file attached?
>
> Abdur-Rahmaan Janhangeer
> https://github.com/Abdur-rahmaanJ
>
> On Tue, 22 May 2018, 10:28 sujith.j Sjk,  wrote:
>
>> > Hi,
>> >
>> > Am facing the below issue when starting pyton.
>> >
>> >
>> >
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Problem of writing long list of lists file to csv

2018-05-22 Thread subhabangalore
I have a list of lists (177 lists). 

I am trying to write them as file.

I used the following code to write it in a .csv file.

import  csv
def word2vec_preprocessing():
a1=open("/python27/EngText1.txt","r")
list1=[]
for line in a1:
line1=line.lower().replace(".","").split()
#print line1
list1.append(line1)
lst1=list1
lst2=lst1[:4]
with open("my_csv.csv","wb") as f:
writer = csv.writer(f)
writer.writerows(lst2)

Here it is writing only the first four lists. 

I have searched for help and it seems it is an issue and 
without much of fix. 
Please see the following link. 
https://stackoverflow.com/questions/30711899/python-how-to-write-list-of-lists-to-file

I have now tried pandas and json as follows, but same result. 
my_df = pd.DataFrame(lst2)
my_df.to_csv('sbb_csv.csv', index=False, header=False)

with open('sbb1.json', 'w') as F:
# Use the json dumps method to write the list to disk  
F.write(json.dumps(lst2))
with open('sbb1.json', 'r') as F:
B = json.loads(F.read())

print B


I am using Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:22:17) [MSC 
v.1500 32 bit (Intel)] on win32
in MS-Windows.

Please suggest what  error I may be doing? 

Thanking in advance.




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Problem of writing long list of lists file to csv

2018-05-22 Thread Peter Otten
subhabangal...@gmail.com wrote:

> lst2=lst1[:4]
> with open("my_csv.csv","wb") as f:
> writer = csv.writer(f)
> writer.writerows(lst2)
> 
> Here it is writing only the first four lists. 

Hint: look at the first line in the quotation above. 


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread bartc

On 22/05/2018 03:49, Mikhail V wrote:

On Mon, May 21, 2018 at 3:48 PM, bartc  wrote:



But I have to say it looks pretty terrible, and I can't see that it buys
much over normal syntax.




# t
# t
   11  22  33



Is this example complete? Presumably it means ((11,22,33),).


You get the point?
So basically all nice chars are already occupied.


You mean for introducing tuple, list and dict literals? Python already 
uses (, [ and { for those, with the advantage of having a closing ), ] 
and } to make it easier to see where each ends.


The only advantage of your proposal is that it resembles Python block 
syntax a little more, but I don't know if it follows the same rules of 
indentation and for inlining content.



Proposing Unicode symbols -- that will probably will be
dead on arrival (just remembering some of such past proposals).
Leaving out symbols could be an option as well.
Still the structure needs a syntactical entry point.


Note that Python tuples don't always need a start symbol:

   a = 10,20,30

assigns a tuple to a.


E.g.

data = ///
t
t
   11  22  33

Hmm. not bad. But I must think about parsing as well.


Have you tried writing a parser for this? It can be stand-alone, not a 
full parser for Python code. That could help reveal any problems.


But think about when t could be the name of a variable, and you want to 
construct the tuple (t,t,t):


 ///t t t t

That already looks a little odd. And when the /// is omitted:

 t t t t

Is that one tuple of (t,t,t), or a tuple of (t,(t))?


Also, is ///t ///t ///t a b c allowed, or does it have to be split 
across lines? If it is allowed, then it's not clear to which tuple b and 
c belong to, or even a, if an empty tuple is allowed.


I think this syntax is ambiguous; you need a more rigorous 
specification. (How will it parse ///.3 4 5 for example?)



So I can change types of all child nodes with one keystroke.


Suppose you only wanted to change the top one?





The ///d dictionary example is ambiguous: can you have more than one
key:value per line or not? If so, it would look like this:

   ///d "a" "b" "c" "d" "e" "f"


///d   "a" "b""c" "d""e" "f"

Now better? :-)


Not really. Suppose one got accidentally missed out, and there was some 
spurious name at the end, so that you had this (dispensing with quotes, 
these are variables):


   ///d a b c e f x

The pairing is a:b, c:e, f:x rather the a:b, c:d, e:f that was intended 
with the x being an error. Use of : and , add useful redundancy. It's 
not clear whether:


  ///d a b
c e
f x

is allowed (I don't know what the terminating conditions are), but in a 
very long dict literal, it's easy to get confused.



I think this is an interesting first draft of an idea, but it doesn't 
seem rigorous. And people don't like that triple stroke prefix, or those 
single letter codes (why not just use 'tuple', 'list', 'dict')?


For example, here is a proposal I've just made up for a similar idea, 
but to make such constructors obey similar rules to Python blocks:


 tuple:
 10
 20
 30

 list:
 list:
 10
 tuple: 5,6,7
 30
 "forty"
 "fifty"

So, the keyword prefixes are followed by ":"; entities can follow on the 
same line, but using "," rather than ";", and the end of a sequence is 
just like the end of a 'suite'.


But even here there is ambiguity: the '5,6,7' forms a tuple of its own 
in normal syntax, so these could be a tuple of one tuple of 3, rather 
than a tuple of 3. (I may need ";" here rather than ,")


--
bartc

--
https://mail.python.org/mailman/listinfo/python-list


Re: Issue

2018-05-22 Thread Rhodri James

[Re-ordered for comprehensibility.]

On 22/05/18 11:08, sujith.j Sjk wrote:

On Tue, May 22, 2018 at 3:04 PM, Abdur-Rahmaan Janhangeer <
arj.pyt...@gmail.com> wrote:

On Tue, 22 May 2018, 10:28 sujith.j Sjk,  wrote:

Hi,

Am facing the below issue when starting pyton.



>> greetings,
>>
>> did you send a log file attached?
>
> yes

I'm afraid it didn't make it to us.  The mailing list strips off 
attachments.  Please copy and paste the error into your message, and 
we'll do our best to decode it for you.


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: Spam levels.

2018-05-22 Thread C W Rose via Python-list
m  wrote:
> W dniu 10.02.2018 o 15:57, C W Rose pisze:
>> No other groups (in the limited set which I read) have the problem,
>> and I don't understand why the spammers neither spam a range of
>> groups, nor change their adddresses more frequently.  It may be
>> that destroying comp.lang.python is their actual objective.
>> 
>> Either way, a depressing state of affairs.
> 
> The sad thing is, that your post is unseen, because of spam :S
> 
> I also almost stopped reading c.l.python, because of enormous spam
> levels. Do I have any option to read it without spam, other than launch
> my own filtering NNTP server and do whack the mole game for myself?
> 
> Maybe join forces and establish such server for public use?
> 

The situation is getting worse:

comp.lang.python messages 29 Jan - 14 May 2018
Fetched: 3081
Killed: 6616
Valid: 31.77 %

Almost all of the garbage is coming from the "Case Solutions" poster,
with a hotmail address.  He's said himself that he doesn't read the
group, and there's really no point to endless reposting in a newsgroup
with no relevance to the posts, so it's just mindless vandalism.
He doesn't change addresses or headers much, so the filter seldom needs
updating; however, I think comp.lang.python is reaching the end of the line.

comp.lang.c has less overwhelming problems, due to a single obsessive:
comp.lang.c messages 29 Jan - 14 May 2018
Fetched: 3969
Killed: 618
Valid: 86.52 %

If you are using Linux, leafnode is easy to set up, and has enough filtering
to keep comp.lang.python readable.  I pull from news.eternal-september.org
and news.gmane.org (though I don't know how much longer gmane will last).
Both are free.

Will

-- 
"It is very disappointing that mindless individuals are vandalising
 the Larkin toads in Hull."
A police spokesman

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Chris Angelico
On Tue, May 22, 2018 at 8:25 PM, bartc  wrote:
> Note that Python tuples don't always need a start symbol:
>
>a = 10,20,30
>
> assigns a tuple to a.

The tuple has nothing to do with the parentheses, except for the
special case of the empty tuple. It's the comma.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: best way to remove leading zeros from a tuple like string

2018-05-22 Thread Grant Edwards
On 2018-05-22, Thomas Jollans  wrote:
> On 2018-05-20 23:54, Paul wrote:
>> you will find several useful sites where you can test regexes.  Regex
>> errors are very common, even after you have experience with them.
>
> What's the benefit of those compared to simply trying out the regex in a
> Python console?

Doesn't everybody have an executable file in their home directory
named "testit.py" which is continually morphed to test different
Python features?

-- 
Grant Edwards   grant.b.edwardsYow! What's the MATTER
  at   Sid? ... Is your BEVERAGE
  gmail.comunsatisfactory?

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: best way to remove leading zeros from a tuple like string

2018-05-22 Thread Chris Angelico
On Wed, May 23, 2018 at 12:50 AM, Grant Edwards
 wrote:
> On 2018-05-22, Thomas Jollans  wrote:
>> On 2018-05-20 23:54, Paul wrote:
>>> you will find several useful sites where you can test regexes.  Regex
>>> errors are very common, even after you have experience with them.
>>
>> What's the benefit of those compared to simply trying out the regex in a
>> Python console?
>
> Doesn't everybody have an executable file in their home directory
> named "testit.py" which is continually morphed to test different
> Python features?
>

Certainly not! Mine is ~/tmp/runme.py in case I need other files with it.

:)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Ian Kelly
On Tue, May 22, 2018 at 8:25 AM, Chris Angelico  wrote:
> On Tue, May 22, 2018 at 8:25 PM, bartc  wrote:
>> Note that Python tuples don't always need a start symbol:
>>
>>a = 10,20,30
>>
>> assigns a tuple to a.
>
> The tuple has nothing to do with the parentheses, except for the
> special case of the empty tuple. It's the comma.

Although, if the rule were really as simple as "commas make tuples",
then this would be a list containing a tuple: [1, 2, 3].

Curiously, parentheses are also sometimes required for iterable
unpacking. For example:

py> 1, 2, *range(3,5)
(1, 2, 3, 4)
py> d = {}
py> d[1, 2] = 42
py> d[1, 2, *range(3,5)] = 43
  File "", line 1
d[1, 2, *range(3,5)] = 43
^
SyntaxError: invalid syntax

py> def foo():
...   return 1, 2
...
py> foo()
(1, 2)

py> def foo():
...   return 1, 2, *range(3, 5)
  File "", line 2
return 1, 2, *range(3, 5)
 ^
SyntaxError: invalid syntax

py> def foo():
...   yield 1, 2
...
py> list(foo())
[(1, 2)]
py> def foo():
...   yield 1, 2, *range(3, 5)
  File "", line 2
yield 1, 2, *range(3, 5)
^
SyntaxError: invalid syntax

py> for x in 1, 2: print(x)
...
1
2
py> for x in 1, 2, *range(3, 5): print(x)
  File "", line 1
for x in 1, 2, *range(3, 5): print(x)
   ^
SyntaxError: invalid syntax
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Ian Kelly
On Tue, May 22, 2018 at 9:22 AM, Ian Kelly  wrote:
> On Tue, May 22, 2018 at 8:25 AM, Chris Angelico  wrote:
>> On Tue, May 22, 2018 at 8:25 PM, bartc  wrote:
>>> Note that Python tuples don't always need a start symbol:
>>>
>>>a = 10,20,30
>>>
>>> assigns a tuple to a.
>>
>> The tuple has nothing to do with the parentheses, except for the
>> special case of the empty tuple. It's the comma.
>
> Although, if the rule were really as simple as "commas make tuples",
> then this would be a list containing a tuple: [1, 2, 3].
>
> Curiously, parentheses are also sometimes required for iterable
> unpacking. For example:

[SNIP]

> py> def foo():
> ...   yield 1, 2
> ...
> py> list(foo())
> [(1, 2)]
> py> def foo():
> ...   yield 1, 2, *range(3, 5)
>   File "", line 2
> yield 1, 2, *range(3, 5)
> ^
> SyntaxError: invalid syntax

Here's another case where parentheses are always required:

py> def foo():
...   yield from 1, 2
  File "", line 2
yield from 1, 2
^
SyntaxError: invalid syntax

This one might be explained by noting that "yield from" is actually an
expression and so it could be confusing as to whether this should be
equivalent to "yield from (1, 2)" or "(yield from 1), 2". But "yield"
has the same issue and does allow an unparenthesized tuple.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Chris Angelico
On Wed, May 23, 2018 at 1:22 AM, Ian Kelly  wrote:
> On Tue, May 22, 2018 at 8:25 AM, Chris Angelico  wrote:
>> On Tue, May 22, 2018 at 8:25 PM, bartc  wrote:
>>> Note that Python tuples don't always need a start symbol:
>>>
>>>a = 10,20,30
>>>
>>> assigns a tuple to a.
>>
>> The tuple has nothing to do with the parentheses, except for the
>> special case of the empty tuple. It's the comma.
>
> Although, if the rule were really as simple as "commas make tuples",
> then this would be a list containing a tuple: [1, 2, 3].

In an arbitrary expression, a comma between two expressions creates a
tuple. In other contexts, the comma has other meanings, which take
precedence:

* Separating a function's arguments (both at definition and call)
* Enumerating import targets and global/nonlocal names
* Separating an assertion from its message
* Listing multiple context managers
* And probably some that I've forgotten.

In those contexts, you can override the normal interpretation and
force the tuple by using parentheses, preventing it from being parsed
as something else, and making it instead a single expression:

print((1, 2)) # prints a tuple
print(1, 2) # prints two items

The comma is what makes the tuple, though, not the parentheses. The
parentheses merely prevent this from being something else.

> Curiously, parentheses are also sometimes required for iterable
> unpacking. For example:
>
> py> 1, 2, *range(3,5)
> (1, 2, 3, 4)
> py> d = {}
> py> d[1, 2] = 42
> py> d[1, 2, *range(3,5)] = 43
>   File "", line 1
> d[1, 2, *range(3,5)] = 43
> ^
> SyntaxError: invalid syntax

I'm not sure what you mean about the parentheses here. AIUI iterable
unpacking simply isn't supported inside subscripting. If that's an
actual problem anywhere, I'm sure it could be added :)

> py> def foo():
> ...   return 1, 2
> ...
> py> foo()
> (1, 2)
>
> py> def foo():
> ...   return 1, 2, *range(3, 5)
>   File "", line 2
> return 1, 2, *range(3, 5)
>  ^
> SyntaxError: invalid syntax

That's a slightly curious case, since it's definitely being parsed the
same way. PEP 448 gives precedence for adding this sort of thing, if
anyone feels like digging into it. You may find that there's some
ambiguity somewhere in the unparenthesized version.

> py> def foo():
> ...   yield 1, 2
> ...
> py> list(foo())
> [(1, 2)]
> py> def foo():
> ...   yield 1, 2, *range(3, 5)
>   File "", line 2
> yield 1, 2, *range(3, 5)
> ^
> SyntaxError: invalid syntax

That's the exact same thing as the 'return' example, so it'll behave
the same way.

> py> for x in 1, 2: print(x)
> ...
> 1
> 2
> py> for x in 1, 2, *range(3, 5): print(x)
>   File "", line 1
> for x in 1, 2, *range(3, 5): print(x)
>^
> SyntaxError: invalid syntax

In fact, I think probably all four of your examples would behave the
same way. So if you want to push for the change, go for it :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Ian Kelly
On Tue, May 22, 2018 at 9:34 AM, Chris Angelico  wrote:
> On Wed, May 23, 2018 at 1:22 AM, Ian Kelly  wrote:
>> On Tue, May 22, 2018 at 8:25 AM, Chris Angelico  wrote:
>>> On Tue, May 22, 2018 at 8:25 PM, bartc  wrote:
 Note that Python tuples don't always need a start symbol:

a = 10,20,30

 assigns a tuple to a.
>>>
>>> The tuple has nothing to do with the parentheses, except for the
>>> special case of the empty tuple. It's the comma.
>>
>> Although, if the rule were really as simple as "commas make tuples",
>> then this would be a list containing a tuple: [1, 2, 3].
>
> In an arbitrary expression, a comma between two expressions creates a
> tuple. In other contexts, the comma has other meanings, which take
> precedence:
>
> * Separating a function's arguments (both at definition and call)
> * Enumerating import targets and global/nonlocal names
> * Separating an assertion from its message
> * Listing multiple context managers
> * And probably some that I've forgotten.
>
> In those contexts, you can override the normal interpretation and
> force the tuple by using parentheses, preventing it from being parsed
> as something else, and making it instead a single expression:
>
> print((1, 2)) # prints a tuple
> print(1, 2) # prints two items
>
> The comma is what makes the tuple, though, not the parentheses. The
> parentheses merely prevent this from being something else.

In other words, the rule is not really as simple as "commas make
tuples". I stand by what I wrote.

>> Curiously, parentheses are also sometimes required for iterable
>> unpacking. For example:
>>
>> py> 1, 2, *range(3,5)
>> (1, 2, 3, 4)
>> py> d = {}
>> py> d[1, 2] = 42
>> py> d[1, 2, *range(3,5)] = 43
>>   File "", line 1
>> d[1, 2, *range(3,5)] = 43
>> ^
>> SyntaxError: invalid syntax
>
> I'm not sure what you mean about the parentheses here. AIUI iterable
> unpacking simply isn't supported inside subscripting. If that's an
> actual problem anywhere, I'm sure it could be added :)

Of course it's supported:

py> d = {}
py> d[(1, 2, *range(3, 5))] = 43
py> d
{(1, 2, 3, 4): 43}

Works just fine. But take out the parentheses and you get the SyntaxError.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Chris Angelico
On Wed, May 23, 2018 at 1:43 AM, Ian Kelly  wrote:
> On Tue, May 22, 2018 at 9:34 AM, Chris Angelico  wrote:
>> On Wed, May 23, 2018 at 1:22 AM, Ian Kelly  wrote:
>>> On Tue, May 22, 2018 at 8:25 AM, Chris Angelico  wrote:
 On Tue, May 22, 2018 at 8:25 PM, bartc  wrote:
> Note that Python tuples don't always need a start symbol:
>
>a = 10,20,30
>
> assigns a tuple to a.

 The tuple has nothing to do with the parentheses, except for the
 special case of the empty tuple. It's the comma.
>>>
>>> Although, if the rule were really as simple as "commas make tuples",
>>> then this would be a list containing a tuple: [1, 2, 3].
>>
>> In an arbitrary expression, a comma between two expressions creates a
>> tuple. In other contexts, the comma has other meanings, which take
>> precedence:
>>
>> * Separating a function's arguments (both at definition and call)
>> * Enumerating import targets and global/nonlocal names
>> * Separating an assertion from its message
>> * Listing multiple context managers
>> * And probably some that I've forgotten.
>>
>> In those contexts, you can override the normal interpretation and
>> force the tuple by using parentheses, preventing it from being parsed
>> as something else, and making it instead a single expression:
>>
>> print((1, 2)) # prints a tuple
>> print(1, 2) # prints two items
>>
>> The comma is what makes the tuple, though, not the parentheses. The
>> parentheses merely prevent this from being something else.
>
> In other words, the rule is not really as simple as "commas make
> tuples". I stand by what I wrote.

Neither of us is wrong here. "Commas make tuples" is a useful
oversimplification in the same way that "asterisk means
multiplication" is. The asterisk has other meanings in specific
contexts (eg unpacking), but outside of those contexts, it means
multiplication.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Problem of writing long list of lists file to csv

2018-05-22 Thread subhabangalore
On Tuesday, May 22, 2018 at 3:55:58 PM UTC+5:30, Peter Otten wrote:
> 
> 
> > lst2=lst1[:4]
> > with open("my_csv.csv","wb") as f:
> > writer = csv.writer(f)
> > writer.writerows(lst2)
> > 
> > Here it is writing only the first four lists. 
> 
> Hint: look at the first line in the quotation above.

Thank you Sir. Sorry to disturb you. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread bartc

On 22/05/2018 15:25, Chris Angelico wrote:

On Tue, May 22, 2018 at 8:25 PM, bartc  wrote:

Note that Python tuples don't always need a start symbol:

a = 10,20,30

assigns a tuple to a.


The tuple has nothing to do with the parentheses, except for the
special case of the empty tuple. It's the comma.


No? Take these:

 a = (10,20,30)
 a = [10,20,30]
 a = {10,20,30}

If you print type(a) after each, only one of them is a tuple - the one 
with the round brackets.


The 10,20,30 in those other contexts doesn't create a tuple, nor does it 
here:


  f(10,20,30)

Or here:

  def g(a,b,c):

Or here in Python 2:

  print 10,20,30

and no doubt in a few other cases. It's just that special case I 
highlighted where an unbracketed sequence of expressions yields a tuple.


The comma is just generally used to separate expressions, it's not 
specific to tuples.


--
bart
--
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Chris Angelico
On Wed, May 23, 2018 at 3:51 AM, bartc  wrote:
> On 22/05/2018 15:25, Chris Angelico wrote:
>>
>> On Tue, May 22, 2018 at 8:25 PM, bartc  wrote:
>>>
>>> Note that Python tuples don't always need a start symbol:
>>>
>>> a = 10,20,30
>>>
>>> assigns a tuple to a.
>>
>>
>> The tuple has nothing to do with the parentheses, except for the
>> special case of the empty tuple. It's the comma.
>
>
> No? Take these:
>
>  a = (10,20,30)
>  a = [10,20,30]
>  a = {10,20,30}
>
> If you print type(a) after each, only one of them is a tuple - the one with
> the round brackets.

And this isn't a tuple either:

import os, sys, math

If you've actually read the other emails in this thread, you'll see
that this has already been said.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Target WSGI script cannot be loaded as Python module.

2018-05-22 Thread Νίκος
Hello all,

Iam tryign to run a bootle script iw rote as wsgi app and iam gettign the 
follwing eroor.

===
[Tue May 22 06:49:45.763808 2018] [:error] [pid 24298] [client 
46.103.59.37:14500] mod_wsgi (pid=24298): Target WSGI script 
'/home/nikos/public_html/app.py' cannot be loaded as Python module.
[Tue May 22 06:49:45.763842 2018] [:error] [pid 24298] [client 
46.103.59.37:14500] mod_wsgi (pid=24298): Exception occurred processing WSGI 
script '/home/nikos/public_html/app.py'.
[Tue May 22 06:49:45.763872 2018] [:error] [pid 24298] [client 
46.103.59.37:14500] Traceback (most recent call last):
[Tue May 22 06:49:45.763911 2018] [:error] [pid 24298] [client 
46.103.59.37:14500]   File "/home/nikos/public_html/app.py", line 4, in 
[Tue May 22 06:49:45.763951 2018] [:error] [pid 24298] [client 
46.103.59.37:14500] import re, os, sys, socket, time, datetime, locale, 
codecs, random, smtplib, subprocess, geoip2.database, bottle_pymysql
[Tue May 22 06:49:45.763976 2018] [:error] [pid 24298] [client 
46.103.59.37:14500] ImportError: No module named geoip2.database
===

He is the relative httpd-vhosts.conf


ServerName superhost.gr

WSGIDaemonProcess public_html user=nikos group=nikos processes=1 threads=5
WSGIScriptAlias / /home/nikos/public_html/app.py

ProxyPass / http://superhost.gr:5000/
ProxyPassReverse / http://superhost:5000/



WSGIProcessGroup public_html
WSGIApplicationGroup %{GLOBAL}
WSGIScriptReloading On

Options -Indexes +IncludesNOEXEC +SymLinksIfOwnerMatch +ExecCGI

AddHandler cgi-script .cgi .py
AddHandler wsgi-script .wsgi .py

AllowOverride None
Require all granted




Any ideas as to why iam getting the above error although i have python36 
isntalled along with all modules? why can it find it?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Target WSGI script cannot be loaded as Python module.

2018-05-22 Thread Alexandre Brault
On 2018-05-22 02:29 PM, Νίκος wrote:
> Hello all,
>
> Iam tryign to run a bootle script iw rote as wsgi app and iam gettign the 
> follwing eroor.
>
> ===
> [Tue May 22 06:49:45.763808 2018] [:error] [pid 24298] [client 
> 46.103.59.37:14500] mod_wsgi (pid=24298): Target WSGI script 
> '/home/nikos/public_html/app.py' cannot be loaded as Python module.
> [Tue May 22 06:49:45.763842 2018] [:error] [pid 24298] [client 
> 46.103.59.37:14500] mod_wsgi (pid=24298): Exception occurred processing WSGI 
> script '/home/nikos/public_html/app.py'.
> [Tue May 22 06:49:45.763872 2018] [:error] [pid 24298] [client 
> 46.103.59.37:14500] Traceback (most recent call last):
> [Tue May 22 06:49:45.763911 2018] [:error] [pid 24298] [client 
> 46.103.59.37:14500]   File "/home/nikos/public_html/app.py", line 4, in 
> 
> [Tue May 22 06:49:45.763951 2018] [:error] [pid 24298] [client 
> 46.103.59.37:14500] import re, os, sys, socket, time, datetime, locale, 
> codecs, random, smtplib, subprocess, geoip2.database, bottle_pymysql
> [Tue May 22 06:49:45.763976 2018] [:error] [pid 24298] [client 
> 46.103.59.37:14500] ImportError: No module named geoip2.database
> ===
>
> He is the relative httpd-vhosts.conf
>
> 
> ServerName superhost.gr
>
> WSGIDaemonProcess public_html user=nikos group=nikos processes=1 threads=5
> WSGIScriptAlias / /home/nikos/public_html/app.py
>
> ProxyPass / http://superhost.gr:5000/
> ProxyPassReverse / http://superhost:5000/
>
>
> 
> WSGIProcessGroup public_html
> WSGIApplicationGroup %{GLOBAL}
> WSGIScriptReloading On
>
> Options -Indexes +IncludesNOEXEC +SymLinksIfOwnerMatch +ExecCGI
>
> AddHandler cgi-script .cgi .py
> AddHandler wsgi-script .wsgi .py
>
> AllowOverride None
> Require all granted
> 
> 
>
>
> Any ideas as to why iam getting the above error although i have python36 
> isntalled along with all modules? why can it find it?
How did you install geoip2? Was it by any chance in a virtual
environment? If it was, you need to tell mod_wsgi to use this virtual
environment; otherwise, it'll use the global environment that probably
doesn't have geoip2 installed

Alex
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Spam levels.

2018-05-22 Thread Peter J. Holzer
On 2018-05-21 15:42:28 +, Grant Edwards wrote:
> I switched from Usenet to Gmane mainly because references headers are
> bit more consistent on Gmane, so threading works somewhat better.

This is interesting, because Gmane was the reason I switched from
reading on usenet to reading the mailinglist: Every article coming
through the Gmane gateway had broken headers, which completely messed up
threading (It looked like Gmane was replacing Message-Ids with their
own). I haven't checked recently whether that is still the case.

On the mailing-list threading seems to work.

hp


-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: what does := means simply?

2018-05-22 Thread Peter J. Holzer
On 2018-05-20 11:37:14 -0400, Dennis Lee Bieber wrote:
> On Sun, 20 May 2018 12:38:59 +0100, bartc  declaimed the
> following:
> >Then the /same software/ probably wouldn't work anywhere else. I mean 
> >taking source which doesn't know or care about what system its on, and 
> >that operates on a ppm file downloaded from the internet.
> 
>   And software that handles binary PPM on a big-endian system probably
> won't run on a little-endian system, if said PPM is using a maxval >255
> (meaning each R,G,B takes up two bytes, and said bytes would be seen in
> reverse order when moving from one system to the other).

The byte order in PPM files is well-defined.

It is trivial to write a portable C program which reads raw PPM files.
And by "portable" I mean that the same program works on big- or little
endian systems, on ASCII or EBCDIC systems, on systems where 1 byte is
more than 8 bits (as long as the file stores still one octet per byte),
etc.

hp


-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Spam levels.

2018-05-22 Thread Grant Edwards
On 2018-05-22, Peter J. Holzer  wrote:
> On 2018-05-21 15:42:28 +, Grant Edwards wrote:
>> I switched from Usenet to Gmane mainly because references headers are
>> bit more consistent on Gmane, so threading works somewhat better.
>
> This is interesting, because Gmane was the reason I switched from
> reading on usenet to reading the mailinglist: Every article coming
> through the Gmane gateway had broken headers, which completely messed up
> threading (It looked like Gmane was replacing Message-Ids with their
> own). I haven't checked recently whether that is still the case.
>
> On the mailing-list threading seems to work.

I've never tried reading the mailing list directly (I'm not willing to
give up slrn), but the last time I ran NNTP threading tests (I refuse
to admin how much time I spent writing a Python app to do that), My
Usenet feed was noticably worse than Gmane.  Gmane had a fair amount
of breakage as well, but was better than Usenet.

-- 
Grant Edwards   grant.b.edwardsYow! Did YOU find a
  at   DIGITAL WATCH in YOUR box
  gmail.comof VELVEETA?

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: what does := means simply?

2018-05-22 Thread Peter J. Holzer
On 2018-05-20 16:36:12 -0400, Richard Damon wrote:
> 2) Try to maximize portability by not only looking at the specs, but
> also common implementations, and choosing the options that maximize the
> acceptability of your output to tools that don't fully meet the specs.
> Also, if a common implementation generates something not quite to the
> standard, try to make it so you can accept that output too.

This is the well-known "be conservative in what you send and liberal in
what you accept" principle. 

It has fallen into disfavour over the last decade or so. There are
several reasons:

* Being liberal in what you accept is problematic because you are
  accepting input which has no specified meaning and interpret it as you
  see fit - but there is no guarantee that this is the interpretation
  that the sender intended. This may result in silent data loss. In many
  cases an error message is better.
* Accepting non-standard input is also problematic because such input is
  probably not well-tested. The code is much more likely to contain
  bugs, maybe even security-critical bugs.
* If some features of a spec are rarely used, programmers may not
  implement them. When they are needed, they won't work. A recent
  example is that the TLS working group found that they can't use the
  version number field to signal the version number because too many
  implementations got it wrong (I have no idea how that happened. We are
  already on the 6th version and all previous upgrades used the version
  field).

Of course, if a popular implementation has known bugs you may have no
choice but make concessions.

hp


-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-22 Thread Peter J. Holzer
On 2018-05-20 15:43:54 +0200, Karsten Hilbert wrote:
> On Sun, May 20, 2018 at 04:59:12AM -0700, bellcanada...@gmail.com wrote:
> 
> > On Saturday, 19 May 2018 19:48:20 UTC-4, Skip Montanaro  wrote:
> > > As Chris indicated, you'll have to figure out the correct encoding. You
> > > might want to check out the chardet module (available on PyPI, I believe)
> > > and see if it can come up with a better guess. I imagine there are other
> > > encoding guessers out there. That's just one I'm familiar with.
> > 
> > thank you for the reply, but how exactly am i supposed to find oout what is 
> > the correct encodeing??
> 
> One CAN NOT.
> 
> The best you can do is to go ask the canonical source of the
> file what encoding the file is _supposed_ to be in.

I disagree on both counts.

1) For any given file it is almost always possible to find the correct
   encoding (or *a* correct encoding, as there may be more than one).

   This may require domain-specific knowledge (e.g. it may be necessary
   to recognize the human language and know at least some distinctive
   words, or to know some special symbols likely to be used in a data
   file), and it almost always takes a bit of detective work and trial
   and error. But I don't think I ever encountered a file where I
   couldn't figure out the encoding.

   (If you have several files in the same encoding, it may not be
   possible to figure out the encoding from a subset of them. For
   example, the files may all be in ISO-8859-2, but the subset you have
   contains only characters <= 0x7F. But if you have several files, they
   may not all be the same encoding, either).

2) The canonical source of the file may not know. This is quite frequent
   when the source is some non-technical person. Then you get answers
   like "it's ASCII" (although the file contains umlauts, which aren't
   in ASCII) or "it's ANSI" (which isn't an encoding, although Windows
   pretends it is). Or they may not be aware that the file is converted
   somewhere in the pipeline, to that the file they generated isn't
   actually the file you received. So ask (or check the docs), but
   verify!

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Target WSGI script cannot be loaded as Python module.

2018-05-22 Thread Νίκος
Τη Τρίτη, 22 Μαΐου 2018 - 10:55:54 μ.μ. UTC+3, ο χρήστης Alexandre Brault 
> > Any ideas as to why iam getting the above error although i have python36 
> > isntalled along with all modules? why can it find it?
> How did you install geoip2? Was it by any chance in a virtual
> environment? If it was, you need to tell mod_wsgi to use this virtual
> environment; otherwise, it'll use the global environment that probably
> doesn't have geoip2 installed

I have both python installed in parallel.
python2.7 and python3.6

I have installed the modules as

pip3.6 install bottle bottle-pymysql geopip2
and they were installed successfully.

I dont know why error log is complaining that it cnanot see the modules.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-22 Thread Chris Angelico
On Wed, May 23, 2018 at 7:23 AM, Peter J. Holzer  wrote:
>> The best you can do is to go ask the canonical source of the
>> file what encoding the file is _supposed_ to be in.
>
> I disagree on both counts.
>
> 1) For any given file it is almost always possible to find the correct
>encoding (or *a* correct encoding, as there may be more than one).

You can find an encoding which is capable of decoding a file. That's
not the same thing.

>This may require domain-specific knowledge (e.g. it may be necessary
>to recognize the human language and know at least some distinctive
>words, or to know some special symbols likely to be used in a data
>file), and it almost always takes a bit of detective work and trial
>and error. But I don't think I ever encountered a file where I
>couldn't figure out the encoding.

Look up the old classic "bush hid the facts" hack with Windows
Notepad. A pure ASCII file that got misdetected based on the byte
patterns in it.

If you restrict yourself to ASCII-compatible eight-bit encodings, you
MAY be able to figure out what something is. (I have that exact
situation when parsing subtitles files.) Bizarre constructs like
"Tuuleen jδiseen mδ nostan pδδn" are a strong indication that the
encoding is wrong - if most of a word is ASCII, it's likely that the
non-ASCII bytes represent accented characters, not characters from a
completely different alphabet. But there are a number of annoyingly
similar encodings around, where a large number of the mappings are the
same, but you're left with just a few ambiguous bytes.

And if you're considering non-ASCII-compatible encodings, things get a
lot harder. UTF-16 can represent large slabs of Chinese text using the
same bytes that would represent alphanumeric characters; so how can
you distinguish it from base-64?

I have encountered MANY files where I couldn't figure out the
encoding. Some of them were quite possibly in ancient encodings (some
had CR line endings), some were ambiguous, and on multiple occasions,
I've had to deal with files that had more than one encoding in the
same block of content. (Or more frequently, not files but socket
connections. Same difference.) So no, you cannot always figure out a
file's encoding from its contents. Because that will, on some
occasions, violate the laws of physics - granted, that's merely a
misdemeanour in some states.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Tkinter and root vs. Wayland

2018-05-22 Thread Grant Edwards
For a couple decades now, I've been distributing a couple smallish
Tkinter applications that need to run as root for a variety of reasons
(raw Ethernet access, starting/stopping daemons, loading and unloading
kernel modules, reading and writing config files that are owned by
root).

As part of RedHat's switch to Wayland, they've decided that GUI X11
apps running as root will no longer be allowed to connect to the
Wayland desktop server/compositor/whatever-it's-called.  When it was
pointed out to RedHat that this will break lots of applications, the
official word from on high is that all GUI apps requiring root
privileges need to be redesigned so that their GUI is running as a
normal user.

How does one do that in a Tkinter app?  Do I need to start as root and
fork a process that drops privledges and starts Tkinter and then the
two processes communicate via sockets or Posix queues or whatnot?

Can Python multiprocessing be used in this way?

-- 
Grant Edwards   grant.b.edwardsYow! If our behavior is
  at   strict, we do not need fun!
  gmail.com

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Spam levels.

2018-05-22 Thread Peter J. Holzer
On 2018-05-22 20:42:43 +, Grant Edwards wrote:
> On 2018-05-22, Peter J. Holzer  wrote:
> > On 2018-05-21 15:42:28 +, Grant Edwards wrote:
> >> I switched from Usenet to Gmane mainly because references headers are
> >> bit more consistent on Gmane, so threading works somewhat better.
> >
> > This is interesting, because Gmane was the reason I switched from
> > reading on usenet to reading the mailinglist: Every article coming
> > through the Gmane gateway had broken headers, which completely messed up
> > threading (It looked like Gmane was replacing Message-Ids with their
> > own). I haven't checked recently whether that is still the case.
> >
> > On the mailing-list threading seems to work.
> 
> I've never tried reading the mailing list directly (I'm not willing to
> give up slrn), but the last time I ran NNTP threading tests (I refuse
> to admin how much time I spent writing a Python app to do that), My
> Usenet feed was noticably worse than Gmane.  Gmane had a fair amount
> of breakage as well, but was better than Usenet.

I didn't read on Gmane. I read on my usenet server. But the broken
messages were all coming from Gmane. It is possible that the breakage
only occurs when Gmane passes the message to other Usenet servers,
although I have no idea how that could happen (frankly, I have no idea
why Gmane should replace message-ids at all - it just doesn't make
sense).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Spam levels.

2018-05-22 Thread Grant Edwards
On 2018-05-22, Peter J. Holzer  wrote:

> I didn't read on Gmane. I read on my usenet server. But the broken
> messages were all coming from Gmane. It is possible that the breakage
> only occurs when Gmane passes the message to other Usenet servers,
> although I have no idea how that could happen (frankly, I have no idea
> why Gmane should replace message-ids at all - it just doesn't make
> sense).

I never figured out exactly what the broken scenarios were nor did I
try to figure out which gateway was causing them.

Ignoring Google Groups, there are 9 possible combinations:

   Usenet <---[gateway]---> M-List <---[gateway]---> Gmane

 1. Usenet followup to M-List posting
 2. Usenet followup to Gmane posting
 3. Usenet followup to Usenet posting
 4. M-List followup to Usenet posting
 5. M-List followup to Gmane posting
 6. M-List followup to M-List posting
 7. Gmane  followup to Usenet posting
 8. Gmane  followup to M-List posting
 9. Gmane  followup to Gmane posting

Most of the combinations seem to work most of the time.  It looked
like there was at least 1 broken scenario when subscribed either via
Gmane or via "real" Usenet, but it's pretty difficult to glean the the
signal from the noise created by people with broken MUAs and/or NNTP
clients.

It's actually pretty impressive it all works as well as it does...

In any case, ignoring all postings from Google Groups is recommended.

-- 
Grant Edwards   grant.b.edwardsYow! Today, THREE WINOS
  at   from DETROIT sold me a
  gmail.comframed photo of TAB HUNTER
   before his MAKEOVER!

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-22 Thread Peter J. Holzer
On 2018-05-23 07:38:27 +1000, Chris Angelico wrote:
> On Wed, May 23, 2018 at 7:23 AM, Peter J. Holzer  wrote:
> >> The best you can do is to go ask the canonical source of the
> >> file what encoding the file is _supposed_ to be in.
> >
> > I disagree on both counts.
> >
> > 1) For any given file it is almost always possible to find the correct
> >encoding (or *a* correct encoding, as there may be more than one).
> 
> You can find an encoding which is capable of decoding a file. That's
> not the same thing.

If the result is correct, it is the same thing.

If I have an input file 

4c 69 65 62 65 20 47 72 fc df 65 0a

and I decode it correctly to

Liebe Grüße

it doesn't matter whether I used ISO-8859-1 or ISO-8859-2. The mapping
for all bytes in the input file is the same in both encodings.


> >This may require domain-specific knowledge (e.g. it may be necessary
> >to recognize the human language and know at least some distinctive
> >words, or to know some special symbols likely to be used in a data
> >file), and it almost always takes a bit of detective work and trial
> >and error. But I don't think I ever encountered a file where I
> >couldn't figure out the encoding.
> 
> Look up the old classic "bush hid the facts" hack with Windows
> Notepad. A pure ASCII file that got misdetected based on the byte
> patterns in it.

And would you have made the same mistake as notepad? Nope, I'm quite
sure that you are able to recognize an ASCII file with an English
sentence as ASCII. You wouldn't even consider that it could be UTF-16LE.


> If you restrict yourself to ASCII-compatible eight-bit encodings, you
> MAY be able to figure out what something is.
[...]
> But there are a number of annoyingly similar encodings around, where a
> large number of the mappings are the same, but you're left with just a
> few ambiguous bytes.

They are rarely ambiguous if you speak the language.

> And if you're considering non-ASCII-compatible encodings, things get a
> lot harder. UTF-16 can represent large slabs of Chinese text using the
> same bytes that would represent alphanumeric characters; so how can
> you distinguish it from base-64?

I'll ask my Chinese colleague to read it. If he can read it, it's almost
certainly Chinese and not base-64.

As I said, domain knowledge may be necessary. If you are decoding a file
which may contain a Chinese text, you may have to know Chinese to check
whether the decoded text makes sense.

If your job is to figure out the encoding of files which you don't
understand (and hence can't check whether your results are correct) I
will concede that this is impossible.

> I have encountered MANY files where I couldn't figure out the
> encoding. Some of them were quite possibly in ancient encodings (some
> had CR line endings), some were ambiguous, and on multiple occasions,
> I've had to deal with files that had more than one encoding in the
> same block of content.

Well, files with multiple encodings break the assumption that there is
*one* correct encoding. While I have encountered such files, too (as
well as multi-encodings and random errors), I don't think we were
talking about that.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-22 Thread Chris Angelico
On Wed, May 23, 2018 at 8:31 AM, Peter J. Holzer  wrote:
> On 2018-05-23 07:38:27 +1000, Chris Angelico wrote:
>> On Wed, May 23, 2018 at 7:23 AM, Peter J. Holzer  wrote:
>> >> The best you can do is to go ask the canonical source of the
>> >> file what encoding the file is _supposed_ to be in.
>> >
>> > I disagree on both counts.
>> >
>> > 1) For any given file it is almost always possible to find the correct
>> >encoding (or *a* correct encoding, as there may be more than one).
>>
>> You can find an encoding which is capable of decoding a file. That's
>> not the same thing.
>
> If the result is correct, it is the same thing.
>
> If I have an input file
>
> 4c 69 65 62 65 20 47 72 fc df 65 0a
>
> and I decode it correctly to
>
> Liebe Grüße
>
> it doesn't matter whether I used ISO-8859-1 or ISO-8859-2. The mapping
> for all bytes in the input file is the same in both encodings.

Sure, but if you try it as ISO-8859-5 or  -7, you won't get an error,
but you also won't get that string. So it DOES matter.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Mikhail V
On Tue, May 22, 2018 at 9:01 AM, Christian Gollwitzer  wrote:
> Am 22.05.18 um 04:17 schrieb Mikhail V:
>>> YAML comes to mind
>>
>>
>> Actually plugging a data syntax in existing language is not a new idea.
>> Though I don't know real success stories.
>>
>
> Thing is, you can do it already now in the script, without modifying the
> Python interpreter, by parsing a triple-quoted string. See the examples
> right here: http://pyyaml.org/wiki/PyYAMLDocumentation
>


Yes. That is exactly what I wanted to discuss actually.
So the feature, which makes it possible in this case  is
triple quote string (TQS).

I think it would be appropriate to propose an alternative
to TQS for this specific purposes. Namely for making it
easier to implement parsers and embedded syntaxes.

So what do I have now with triple quoted strings -
a simple example:

if 1:
s = """\
print ("\n") \\
foo = 5
"""

So there is a _possibility_ in the sense it is possible to do, so
let's say I have a lib with a parser, etc. Though now a developer
and a user will face quite real issues:

- TQS itself has its specific purpose already in many contents,
  which may mean for example hard-coded syntax highlighting
- there a lot of things happening here: e.g. in the above example
  I use "\n" which I assume a part of string, or \\ - but it is interpreted.
  Maybe some other things regarding escaping. This particular
  issue maybe a blocker for making use of TQS in some data cases,
  Say if the target source text need these very characters.

- indentation is the part of TQS. That is of couse by design
  so and it's quite logical, though it is hard-coded behaviour and thus
  does not make the presentation a natural part of blocks containing
  this string.
- appearance: imagine you have some small chunks of embedded
  code parts and you will still have the closing """ everywhere -
  that would be really hairy.


The alternative proposal therefore comes down to a "data block" syntax,
without much assumption about the contents of the block.

This should be simpler to implement, because it should not need a lot
of parsing rules - only some basic options. At the same time it enables
the 'embedding' of user-defined blocks/syntax more naturally
looking than TQS.

My thoughts on possible solution.
-

Problem one: make it look natural inside python source.
Current Python behaviour: very simply speaking, first leading white space
on a line is taken and compared with the one from the next line.
Its Okay for statements, but not okay for raw text data -
because probably I want custom leading whitespaces:

string =
 abc
   abc

(So the TQS takes it simple - grabs it from the line beginning)

So the idea:
- add such a block  to syntax
- *force explicit parameter for the indent charaters.*

Explanation:
[here i'll use same symbol /// for the data entry point, but of course it can
be changed if a better idea comes later. Also for now, just for simplicity -
the rule is that the contents of a block starts always on the new line.

So, e.g. this:

data = /// s4
first line
last line
the rest python code

- will parse the block and knock out leading 4 spaces.
i.e. if the first line has 5 leading spaces then 1 space will be left
in the string. Block parsing terminates when the next line does not
satisfy the indent sequence (4 spaces in this case).
Another obvious type: tabs:

data = /// t1
first line
last line
the rest python code

Will do the same but with one tabstop character.

Actually that's it!
Some further ideas:

data = /// ts
- "any whitespace" (mimic current Python behaviour)

data = /// s# or
data = /// t
- simply count amount of spaces (tabs) from first
  line and proceed, otherwise terminate.

data = /// "???"
??? abc foo bar
???

- defines indent character by string: crazy idea but why not.

Language  parameter, e.g.:
data = /// t1."yaml"

-this can be reserved for future usage by code analysis tools
or dynamic syntax highlighting.

That's just a rough specification.

What should it give as result:

1. No clash with current TQS rules - less worries
  about reserved characters.

2. Built-in indentation parsing parameter makes it more or
  less natural continuation of Python blocks and is char-precise,
  which is very important here.

3. Independent of the indent of containing block!

4. Parameter descriptor can be developed in such manner
   that it allows more customisation and additions in the future.


Does seem to be more generalized problem-solving here.

One problem, as usual - tabs may be implicitly converted
to spaces by some software. That obviously could brake
something, but so is with any tabs, and its not related to
Python problem.


Is there something I miss here?
What caveats can be with such approach?



M
-- 
https://mail.python.org/mailman/listinfo/python-list


Getting Unicode decode error using lxml.iterparse

2018-05-22 Thread digitig
I'm trying to read my iTunes library in Python using iterparse. My current stub 
is:

 Snip 

import sys
import datetime
import xml.etree.ElementTree as ET
import argparse
import re

class Library:

unmarshallers = {
# collections
"array": lambda x: [v.text for v in x],
"dict": lambda x:
dict((x[i].text, x[i+1].text) for i in range(0, len(x), 2)),
"key": lambda x: x.text or "",

# simple types
"string": lambda x: x.text or "",
"data": lambda x: base64.decodestring(x.text or ""),
"date": lambda x: datetime.datetime(*map(int, re.findall("\d+", 
x.text))),
"true": lambda x: True,
"false": lambda x: False,
"real": lambda x: float(x.text),
"integer": lambda x: int(x.text)
}

def load(self, file):
print('Starting...')
parser = ET.iterparse(file)
for action, elem in parser:
unmarshal = self.unmarshallers.get(elem.tag)
if unmarshal:
data = unmarshal(elem)
elem.clear()
elem.text = data
print(elem.text)
elif elem.tag != "plist":
raise IOError("unknown plist type: %r" % elem.tag)
return parser.root[0].text

def __init__(self, infile):
self.root = self.load(infile)

if __name__ == "__main__":
parser = argparse.ArgumentParser(description = "Parse an iTunes library 
file to a set of CSV files suitable for import to a database.")
parser.add_argument('infile', nargs='?', type=argparse.FileType('r'), 
default=sys.stdin)
args=parser.parse_args()
print('Infile = ', args.infile)
library = Library(args.infile)


My input file (reduced to home in on the error) is:


 snip -





15078

NamePart 2. The Death Of Enkidu. 
Skon Přitele Mého Mne Zdeptal Težče





 snip 






15078

NamePart 2. The Death Of Enkidu. 
Skon Přitele Mého Mne Zdeptal Težče







I'm getting an error on one part of the XML:


 File "C:\Users\digit\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 202: 
character maps to 


I suspect the issue is that it's using cp1252.py, which I don't think is UTF-8 
as specified in the XML prolog. Is this an iterparse problem, or am I using it 
wrongly?


Thanks.

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: "Data blocks" syntax specification draft

2018-05-22 Thread Dan Strohl via Python-list

> -Original Message-
> 
> I think it would be appropriate to propose an alternative to TQS for this
> specific purposes. Namely for making it easier to implement parsers and
> embedded syntaxes.
> 
> So what do I have now with triple quoted strings - a simple example:
> 
> if 1:
> s = """\
> print ("\n") \\
> foo = 5
> """
> 
> So there is a _possibility_ in the sense it is possible to do, so let's say I 
> have a
> lib with a parser, etc. Though now a developer and a user will face quite real
> issues:
> 
> - TQS itself has its specific purpose already in many contents,
>   which may mean for example hard-coded syntax highlighting
> - there a lot of things happening here: e.g. in the above example
>   I use "\n" which I assume a part of string, or \\ - but it is interpreted.
>   Maybe some other things regarding escaping. This particular
>   issue maybe a blocker for making use of TQS in some data cases,
>   Say if the target source text need these very characters.
> 

Yup, I can see this, I do use """ in a number of ways, often to comment out 
large chunks of code. (OK, I probably should not, but I do).

> - indentation is the part of TQS. That is of couse by design
>   so and it's quite logical, though it is hard-coded behaviour and thus
>   does not make the presentation a natural part of blocks containing
>   this string.
> - appearance: imagine you have some small chunks of embedded
>   code parts and you will still have the closing """ everywhere -
>   that would be really hairy.
> 
> 

And yup, that does cause some challenges sometimes.

> 
> Explanation:
> [here i'll use same symbol /// for the data entry point, but of course it can 
> be
> changed if a better idea comes later. Also for now, just for simplicity - the 
> rule
> is that the contents of a block starts always on the new line.
> 
> So, e.g. this:
> 
> data = /// s4
> first line
> last line
> the rest python code
> 
> - will parse the block and knock out leading 4 spaces.
> i.e. if the first line has 5 leading spaces then 1 space will be left in the 
> string.
> Block parsing terminates when the next line does not satisfy the indent
> sequence (4 spaces in this case).
> Another obvious type: tabs:

OK, I CAN see this as a potentially useful suggestion.  There are a number of 
times where I would like to define a large chunk of text, but using tqs and 
having it suddenly move to the left is painful visually.  Right now, I tend to 
either a) do it anyway, b) do it in a separate module and import the variables, 
or c) do it and parse the string to remove the extra spaces.

Personally though, I would not hard code it to knock out 4 leading spaces.   I 
would have it handle spaces the same was that the existing parser does, if 
there are 4 spaces indending the next line, then it removes 4 spaces, if there 
are 6 spaces, it removes 6 spaces, etc... ignoring additional spaces within the 
data-string object.  Once it hits a line that has the same number if indenting 
spaces as the initial token, the data-string object is finished.

> 
> data = /// t1
> first line
> last line
> the rest python code
> 
> Will do the same but with one tabstop character.
> 

Tabs / spaces should be handled as normal (up to the data-string object starts, 
after which, it pulls off the first x tabs or spaces, and leaves anything else) 

> Actually that's it!
> Some further ideas:
> 
> data = /// ts
> - "any whitespace" (mimic current Python behaviour)
> 
> data = /// s# or
> data = /// t
> - simply count amount of spaces (tabs) from first
>   line and proceed, otherwise terminate.
> 
> data = /// "???"
> ??? abc foo bar
> ???
> 
> - defines indent character by string: crazy idea but why not.
> 

Nope, don't like this one... It's far enough from Python normal that it seems 
unlikely to not get through, and (personally at least), I struggle to see the 
benefit.

> Language  parameter, e.g.:
> data = /// t1."yaml"
> 
> -this can be reserved for future usage by code analysis tools or dynamic
> syntax highlighting.
> 

I can see where this might be interesting, but again, I just don't see the 
need, if the spec returns a string, you can use that string in any parser you 
want. If you want to customize how it's handled, then you can always create a 
custom object for it.

> That's just a rough specification.
> 
> What should it give as result:
> 

To me, this seems like a simply additional specification for a TQS, with the 
only enhancement being that it's an indented TQS basically, so the return is a 
string.

> 1. No clash with current TQS rules - less worries
>   about reserved characters.
> 
> 2. Built-in indentation parsing parameter makes it more or
>   less natural continuation of Python blocks and is char-precise,
>   which is very important here.
> 
> 3. Independent of the indent of containing block!
> 
> 4. Parameter descriptor can be developed in such manner
>that it allows more customisation and additions

Re: "Data blocks" syntax specification draft

2018-05-22 Thread bartc

On 22/05/2018 16:57, Chris Angelico wrote:

On Wed, May 23, 2018 at 1:43 AM, Ian Kelly  wrote:



In other words, the rule is not really as simple as "commas make
tuples". I stand by what I wrote.


Neither of us is wrong here.


Sorry, but I don't think you're right at all. unless the official 
references for the language specifically say that commas are primarily 
for constructing tuples, and all other uses are exceptions to that rule.


AFAICS, commas are used just like commas everywhere - used as 
separators. The context tells Python what the resulting sequence is.


 "Commas make tuples" is a useful

oversimplification in the same way that "asterisk means
multiplication" is. The asterisk has other meanings in specific
contexts (eg unpacking), but outside of those contexts, it means
multiplication.


I don't think that's quite right either. Asterisk is just an overloaded 
token but you will what it's for as soon as it's encountered.


Comma seems to be used only as a separator.


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Mikhail V
On Tue, May 22, 2018 at 1:25 PM, bartc  wrote:
> On 22/05/2018 03:49, Mikhail V wrote:
>>
>> On Mon, May 21, 2018 at 3:48 PM, bartc  wrote:
>>
>> # t
>> # t
>>11  22  33
>>
>
> Is this example complete? Presumably it means ((11,22,33),).

Yep.

>
>> You get the point?
>> So basically all nice chars are already occupied.
>
> You mean for introducing tuple, list and dict literals?

No, I've meant the node (or whole data block) entry point chars ///.

So in such an language full of various operators, it is
just hard to find anything that does not clash with some
current meaning yet looks adequate and is ASCII.

A quick note: thanks for your comments, I'll  just note that I've
moved to some more promising related proposal (see last posts in this thread),
so the original one will move to later considerations.
so the nuances will be reconsidered anyway.


> Python already uses
> (, [ and { for those, with the advantage of having a closing ), ] and } to
> make it easier to see where each ends.

Ehm, for inline usage - i.e. everywhere on a line - closing tags are necessity.
My whole idea was about indentation based data - so there it is redundant noise.
You may disagree as well, since I've noticed in some previous thread
that you prefer Fortran-like termination tags. So that's quite personal I think.

As for more objective points: currently many brackets are 'overloaded'
and moreover, they are all almost "homoglyphs" - very similar characters.

For an inline solution I'd prefer it e.g.
for a tuple:

{t  11  22  33}

nested:
{t  11  {t  11  22}  22}

Yes, it IS worse than:
{11  {11  22}  22}

But still one might need _various types_ so brackets only - has
limited potential in this sense.


>>> The ///d dictionary example is ambiguous: can you have more than one
>>> key:value per line or not? If so, it would look like this:
>>>
>>>///d "a" "b" "c" "d" "e" "f"
>>
>>
>> ///d   "a" "b""c" "d""e" "f"
>>
>> Now better? :-)
>
>
> Not really.
> Suppose one got accidentally missed out, and there was some
> spurious name at the end,

Not sure about your case, but it is up to you to format it to make it
more readable - whitespace / newline separation gives enough possibilities to
do so. And for me the lack of commas is of great benefit.
 _Some_ punctuation can be added in some cases (see e.g. "node
stacking section" in original document - dicts may adopt this as well
but I see no necessity at least for this case)


> It's not clear whether:
>
>   ///d a b
> c e
> f x
>
> is allowed ?

allowed, but i'd say merely discouraged for industrial usage. better start
newline at least for multiline contents.

> I think this is an interesting first draft of an idea, but it doesn't seem
> rigorous. And people don't like that triple stroke prefix, or those single
> letter codes (why not just use 'tuple', 'list', 'dict')?
>
> For example, here is a proposal I've just made up for a similar idea, but to
> make such constructors obey similar rules to Python blocks:
>
>  tuple:
>  10
>  20
>  30
>
>  list:
>  list:
>  10
>  tuple: 5,6,7
>  30
>  "forty"
>  "fifty"
>

Cool, I've commented on similar option actually in some post before -
it might be ok
with type codes grayed-out and good coloring, but in black and white it is a bit
"wall of text" - not much emphasis on _structure_. But its ok - good option.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Mikhail V
On Wed, May 23, 2018 at 2:25 AM, Dan Strohl  wrote:
>

>>
>> Explanation:
>> [here i'll use same symbol /// for the data entry point, but of course it 
>> can be
>> changed if a better idea comes later. Also for now, just for simplicity - 
>> the rule
>> is that the contents of a block starts always on the new line.
>>
>> So, e.g. this:
>>
>> data = /// s4
>> first line
>> last line
>> the rest python code
>>
>> - will parse the block and knock out leading 4 spaces.
>> i.e. if the first line has 5 leading spaces then 1 space will be left in the 
>> string.
>> Block parsing terminates when the next line does not satisfy the indent
>> sequence (4 spaces in this case).
>
>
> Personally though, I would not hard code it to knock out 4 leading spaces.
> I would have it handle spaces the same was that the existing parser does,

If I understand you correctly, then I think I have this option already
described,
i.e. these:

>> data = /// ts
>>
>> - "any whitespace" (mimic current Python behaviour)
>>
>> data = /// s# or
>> data = /// t
>>
>> - simply count amount of spaces (tabs) from first
>>   line and proceed, otherwise terminate.


Though hard-coded knock-out is also very useful, e.g. for this:

data = /// s4
First line indented more (8 spaces)
second - less (4 spaces)
rest code

So this will preserve formatting.


>> data = /// "???"
>> ??? abc foo bar
>> ???
>>
>> - defines indent character by string: crazy idea but why not.
>>
>
> Nope, don't like this one... It's far enough from Python normal that it seems
> unlikely to not get through, and (personally at least), I struggle to see the 
> benefit.

Heh, that was merely joke - but OTOH one could use it for hard-coded
indent sequences:

data = /// ""
First line indented more (8 spaces)
second - less (4 spaces)
rest code

A bit sloppy look, but it generalizes some uses. But granted - I don't
see much real applications besides than space and tab indented block
anyway - so it's under question.


>> Language  parameter, e.g.:
>> data = /// t1."yaml"
>>
>> -this can be reserved for future usage by code analysis tools or dynamic
>> syntax highlighting.
>>
>
> I can see where this might be interesting, but again, I just don't see the 
> need,

I think you're right  - if need a directive for some analysis tool then one
can make for example a consideration to precede the whole statement
with a directive, say in a comment:

# lang "yaml"
data = /// t
first line
last line
rest




Also I am thinking about this - there might be one useful 'hack".
One could even allow single-line usage, e.g.;
(with a semicolon)

data = /// s2:  first line

- so this would start parsing just after colon :
"pretending it is block.
This may be not so fat-fingered-proof and 'inconsistent',
but in the end of the day, might be a win actually.



M
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Chris Angelico
On Wed, May 23, 2018 at 9:51 AM, bartc  wrote:
> On 22/05/2018 16:57, Chris Angelico wrote:
>>
>> On Wed, May 23, 2018 at 1:43 AM, Ian Kelly  wrote:
>
>
>>> In other words, the rule is not really as simple as "commas make
>>> tuples". I stand by what I wrote.
>>
>>
>> Neither of us is wrong here.
>
>
> Sorry, but I don't think you're right at all. unless the official references
> for the language specifically say that commas are primarily for constructing
> tuples, and all other uses are exceptions to that rule.

"A tuple consists of a number of values separated by commas"
https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences

"Separating items with commas"
https://docs.python.org/3/library/stdtypes.html#tuple

"Note that tuples are not formed by the parentheses, but rather by use
of the comma operator."
https://docs.python.org/3/reference/expressions.html#parenthesized-forms

Enough examples? Commas make tuples, unless context specifies otherwise.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to handle captcha through machanize module or any module

2018-05-22 Thread SACHIN CHAVAN
On Wednesday, December 18, 2013 at 6:26:17 PM UTC+5:30, Jai wrote:
> please do replay how to handle captcha through machanize module

I have the same issue, nothing find a solution yet!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Christian Gollwitzer

Am 23.05.18 um 07:22 schrieb Chris Angelico:

On Wed, May 23, 2018 at 9:51 AM, bartc  wrote:

Sorry, but I don't think you're right at all. unless the official references
for the language specifically say that commas are primarily for constructing
tuples, and all other uses are exceptions to that rule.


"A tuple consists of a number of values separated by commas"
https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences

"Separating items with commas"
https://docs.python.org/3/library/stdtypes.html#tuple

"Note that tuples are not formed by the parentheses, but rather by use
of the comma operator."
https://docs.python.org/3/reference/expressions.html#parenthesized-forms

Enough examples? Commas make tuples, unless context specifies otherwise.


I'd think that the definitive answer is in the grammar, because that is 
what is used to build the Python parser:


https://docs.python.org/3/reference/grammar.html

Actually, I'm a bit surprised that tuple, list etc. does not appear 
there as a non-terminal. It is a bit hard to find, and it seems that 
"atom:" is the starting point for parsing tuples, lists etc.


Christian
--
https://mail.python.org/mailman/listinfo/python-list


Re: Tkinter and root vs. Wayland

2018-05-22 Thread Terry Reedy

On 5/22/2018 5:52 PM, Grant Edwards wrote:

For a couple decades now, I've been distributing a couple smallish
Tkinter applications that need to run as root for a variety of reasons
(raw Ethernet access, starting/stopping daemons, loading and unloading
kernel modules, reading and writing config files that are owned by
root).

As part of RedHat's switch to Wayland, they've decided that GUI X11
apps running as root will no longer be allowed to connect to the
Wayland desktop server/compositor/whatever-it's-called.  When it was
pointed out to RedHat that this will break lots of applications, the
official word from on high is that all GUI apps requiring root
privileges need to be redesigned so that their GUI is running as a
normal user.

How does one do that in a Tkinter app?  Do I need to start as root and
fork a process that drops privledges and starts Tkinter and then the
two processes communicate via sockets or Posix queues or whatnot?


IDLE starts as a GUI process.  It uses subprocess to start a user-code 
execution process.  They communicate via a socket.  This usually works, 
but not always, so I may someday look into using multiprocessing.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Ian Kelly
On Tue, May 22, 2018 at 11:32 PM, Christian Gollwitzer  wrote:
> Am 23.05.18 um 07:22 schrieb Chris Angelico:
>>
>> On Wed, May 23, 2018 at 9:51 AM, bartc  wrote:
>>>
>>> Sorry, but I don't think you're right at all. unless the official
>>> references
>>> for the language specifically say that commas are primarily for
>>> constructing
>>> tuples, and all other uses are exceptions to that rule.
>>
>>
>> "A tuple consists of a number of values separated by commas"
>>
>> https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
>>
>> "Separating items with commas"
>> https://docs.python.org/3/library/stdtypes.html#tuple
>>
>> "Note that tuples are not formed by the parentheses, but rather by use
>> of the comma operator."
>> https://docs.python.org/3/reference/expressions.html#parenthesized-forms
>>
>> Enough examples? Commas make tuples, unless context specifies otherwise.
>
>
> I'd think that the definitive answer is in the grammar, because that is what
> is used to build the Python parser:
>
> https://docs.python.org/3/reference/grammar.html
>
> Actually, I'm a bit surprised that tuple, list etc. does not appear there as
> a non-terminal. It is a bit hard to find, and it seems that "atom:" is the
> starting point for parsing tuples, lists etc.

For enclosed tuples, yes. I believe that tuples without parentheses
can be produced by either 'exprlist' or 'testlist' (which is why some
cases permit iterable unpacking and some don't).
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Spam levels.

2018-05-22 Thread dieter
"Peter J. Holzer"  writes:
> ...
> I didn't read on Gmane. I read on my usenet server. But the broken
> messages were all coming from Gmane.

I am reading with an NNTP client connected to the Gmane NNTP server and
and threading works - with very rare exceptions.
The exeptions are so rare, that they might have been caused by the
posters (sometimes someone opens an issue by answering to an exisiting
discussion; or starts a new thread for answering to a post in another thread).

Maybe something went wrong with the integration of your NTTP server
with the Gmane one?

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Chris Angelico
On Wed, May 23, 2018 at 3:32 PM, Christian Gollwitzer  wrote:
> Am 23.05.18 um 07:22 schrieb Chris Angelico:
>>
>> On Wed, May 23, 2018 at 9:51 AM, bartc  wrote:
>>>
>>> Sorry, but I don't think you're right at all. unless the official
>>> references
>>> for the language specifically say that commas are primarily for
>>> constructing
>>> tuples, and all other uses are exceptions to that rule.
>>
>>
>> "A tuple consists of a number of values separated by commas"
>>
>> https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
>>
>> "Separating items with commas"
>> https://docs.python.org/3/library/stdtypes.html#tuple
>>
>> "Note that tuples are not formed by the parentheses, but rather by use
>> of the comma operator."
>> https://docs.python.org/3/reference/expressions.html#parenthesized-forms
>>
>> Enough examples? Commas make tuples, unless context specifies otherwise.
>
>
> I'd think that the definitive answer is in the grammar, because that is what
> is used to build the Python parser:
>
> https://docs.python.org/3/reference/grammar.html
>
> Actually, I'm a bit surprised that tuple, list etc. does not appear there as
> a non-terminal. It is a bit hard to find, and it seems that "atom:" is the
> starting point for parsing tuples, lists etc.
>

The grammar's a bit hard to read for this sort of thing, as the only
hint of semantic meaning is in the labels at the beginning. For
example, there's a "dictorsetmaker" entry that grammatically could be
a dict comp or a set comp; distinguishing them is the job of other
parts of the code.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Marko Rauhamaa
Christian Gollwitzer :

> I'd think that the definitive answer is in the grammar, because that is
> what is used to build the Python parser:
>
>   https://docs.python.org/3/reference/grammar.html
>
> Actually, I'm a bit surprised that tuple, list etc. does not appear
> there as a non-terminal. It is a bit hard to find, and it seems that
> "atom:" is the starting point for parsing tuples, lists etc.

testlist and testlist_comp are the interesting entities.

The syntax definition does not help you understand the semantics. For
example, omitting yield_expr and testlist_comp in

   atom: ('(' [yield_expr|testlist_comp] ')' |

evaluates to a tuple and nothing in

   testlist: test (',' test)* [',']

suggests what effect the the presence or absence of the final ',' could
have on the evaluation.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-22 Thread Steven D'Aprano
On Wed, 23 May 2018 00:31:03 +0200, Peter J. Holzer wrote:

> On 2018-05-23 07:38:27 +1000, Chris Angelico wrote:
[...]
>> You can find an encoding which is capable of decoding a file. That's
>> not the same thing.
> 
> If the result is correct, it is the same thing.

But how do you know what is correct and what isn't? In the most general 
case, even if you know the language nominally being used, you might not 
be able to recognise good output from bad:

Max Steele strained his mighty thews against his bonds, but
the §-rays had left him as weak as a kitten. The evil Galactic
Emperor, Giµx-Õƒin The Terrible of the planet Œe∂¥, laughed: "I 
have you now, Steele, and by this time tomorrow my armies will
have overrun your pitiful Earth defences!"

If this text is encoding using MacRoman, then decoded in Latin-1, it 
works, and looks barely any more stupid than the original:

Max Steele strained his mighty thews against his bonds, but
the ¤-rays had left him as weak as a kitten. The evil Galactic
Emperor, Giµx-ÍÄin The Terrible of the planet Îe¶´, laughed: "I
have you now, Steele, and by this time tomorrow my armies will
have overrun your pitiful Earth defences!"

but it clearly isn't the original text.

Mojibake is especially difficult to deal with when you are dealing with 
short text snippets like file names or user names which can contain 
arbitrary characters, where there is rarely any way to recognise the 
"correct" string. If you think Giµx-Õƒin The Terrible is a ludicrous 
example of text, you ought to look at user names on web forums.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Ian Kelly
On Wed, May 23, 2018 at 12:01 AM, Ian Kelly  wrote:
> On Tue, May 22, 2018 at 11:32 PM, Christian Gollwitzer  
> wrote:
>> Am 23.05.18 um 07:22 schrieb Chris Angelico:
>>>
>>> On Wed, May 23, 2018 at 9:51 AM, bartc  wrote:

 Sorry, but I don't think you're right at all. unless the official
 references
 for the language specifically say that commas are primarily for
 constructing
 tuples, and all other uses are exceptions to that rule.
>>>
>>>
>>> "A tuple consists of a number of values separated by commas"
>>>
>>> https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
>>>
>>> "Separating items with commas"
>>> https://docs.python.org/3/library/stdtypes.html#tuple
>>>
>>> "Note that tuples are not formed by the parentheses, but rather by use
>>> of the comma operator."
>>> https://docs.python.org/3/reference/expressions.html#parenthesized-forms
>>>
>>> Enough examples? Commas make tuples, unless context specifies otherwise.
>>
>>
>> I'd think that the definitive answer is in the grammar, because that is what
>> is used to build the Python parser:
>>
>> https://docs.python.org/3/reference/grammar.html
>>
>> Actually, I'm a bit surprised that tuple, list etc. does not appear there as
>> a non-terminal. It is a bit hard to find, and it seems that "atom:" is the
>> starting point for parsing tuples, lists etc.
>
> For enclosed tuples, yes. I believe that tuples without parentheses
> can be produced by either 'exprlist' or 'testlist' (which is why some
> cases permit iterable unpacking and some don't).

Er, that should be "either 'testlist_star_expr' or 'testlist'".
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Getting Unicode decode error using lxml.iterparse

2018-05-22 Thread dieter
digi...@gmail.com writes:

> I'm trying to read my iTunes library in Python using iterparse. My current 
> stub is:
> ...
> My input file (reduced to home in on the error) is:
>
>  snip -
>
> 
> 
> 
>   
>   15078
>   
>   NamePart 2. The Death Of Enkidu. 
> Skon Přitele Mého Mne Zdeptal Težče
>   
>   
> 
> 
> ...
> I'm getting an error on one part of the XML:
>
>
>  File "C:\Users\digit\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
>
> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 202: 
> character maps to 
>
>
> I suspect the issue is that it's using cp1252.py, which I don't think is 
> UTF-8 as specified in the XML prolog. Is this an iterparse problem, or am I 
> using it wrongly?

You can tell "lxml" which encoding it should use. Maybe, you did
and it was the wrong one.

If the encoding is not specified, "lxml" will try to determine it
and finally defaults to "utf-8" (which seems to be the correct encoding
for your case).

"lxml" sits on top of the C library "libxml2". It may be possible
that "libxml2" allows an envvar to specify the default encoding
and - maybe - this envvar has an unfortunate value in your case.

As a workaround, you can tell "lxml" explicitly to use "utf-8"
for your parsing.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Steven D'Aprano
On Tue, 22 May 2018 18:51:30 +0100, bartc wrote:

> On 22/05/2018 15:25, Chris Angelico wrote:
[...]
>> The tuple has nothing to do with the parentheses, except for the
>> special case of the empty tuple. It's the comma.
> 
> No? Take these:
> 
>   a = (10,20,30)
>   a = [10,20,30]
>   a = {10,20,30}
> 
> If you print type(a) after each, only one of them is a tuple - the one
> with the round brackets.

You haven't done enough testing. All you have done is found that "round 
brackets give a tuple, other brackets don't". But you need to test what 
happens if you take away the brackets to be sure that it is the round 
brackets which create the tuple:

a = 10, 20, 30  # take away the ()

You still get a tuple. Taking away the [] and {} also give tuples.

What happens if you add extra brackets?

a = ((10, 20, 30))  # tuple
b = ([10, 20, 30])  # list
c = ({10, 20, 30})  # set


The round brackets are just used for grouping. In fact, the bytecode 
generated is identical:

py> import dis
py> dis.dis("(99, x)")
  1   0 LOAD_CONST   0 (99)
  3 LOAD_NAME0 (x)
  6 BUILD_TUPLE  2
  9 RETURN_VALUE
py> dis.dis("99, x")
  1   0 LOAD_CONST   0 (99)
  3 LOAD_NAME0 (x)
  6 BUILD_TUPLE  2
  9 RETURN_VALUE

What if we take away the commas but leave the brackets? If "brackets make 
tuples", then the commas ought to be optional.

a = (10 20 30)  # SyntaxError


What if we simple add a single comma to end end, outside of the brackets?

a = (10, 20, 30),  # tuple inside tuple
b = [10, 20, 30],  # list inside tuple
c = {10, 20, 30},  # set inside tuple


Conclusion: it is the comma, not the round brackets, which defines tuples 
(aside from the empty tuple case).



> The 10,20,30 in those other contexts doesn't create a tuple, nor does it
> here:
> 
>f(10,20,30)

That's okay. There's no rule that says commas are ONLY used for tuples, 
just as there is no rule that says "if" is ONLY be used for if 
statements. (It is also used in the ternary if operator, elif, and any 
valid identifier which happens to include the letters "if" in that order).


> It's just that special case I
> highlighted where an unbracketed sequence of expressions yields a tuple.

You have that backwards. An unbracketed sequence of expressions yielding 
a tuple is not the special case, it is the base case. If you want 
something which is not a tuple (a list, a set) you have to use square or 
curly brackets.

The round brackets are only neccessary for tuples to group items in case 
of ambiguity with other grammar rules. (And the empty tuple.)



> The comma is just generally used to separate expressions, it's not
> specific to tuples.

Nobody said it was specific to tuples. That would be an absurd thing to 
say. What was said is that the comma is what makes tuples, not the 
brackets.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "Data blocks" syntax specification draft

2018-05-22 Thread Steven D'Aprano
On Tue, 22 May 2018 09:43:55 -0600, Ian Kelly wrote:

> In other words, the rule is not really as simple as "commas make
> tuples". I stand by what I wrote.

Being pedantic is great, but if you're going to be pedantic, it pays to 
be *absolutely correctly* pedantic *wink*

Chris is right to say "commas make tuples", and never implied that making 
tuples is *all* that commas do. That would be an absurd thing for him to 
say. Fortunately he didn't :-)

If your comment had been posed as an addition to Chris' comment ("By the 
by, commas also do this that and the other...") then it would have been 
unobjectionable. But by posing it as a correction ("Although, if the rule 
were really as simple ...") you left yourself wide-open to be criticised 
in turn for failing to be pedantic *enough*.

Of course commas can be used elsewhere, just as the use of "if" in if 
statements doesn't prevent us from using that same token in the ternary 
if operator.

In the context of the discussion (namely, the mistaken belief that tuples 
are created by parentheses, which we can often leave out) pointing out 
that commas are *also* used as separators in import statements, lists, 
dicts, function parameter lists etc adds noise but no insight to the 
understanding of tuples.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list