Re: Problem with OrderedDict - progress report

2018-05-31 Thread Frank Millman

"Frank Millman"  wrote in message news:pemchs$r12$1...@blaine.gmane.org...


So working backwards, I have solved the first problem. I am no nearer to

figuring out why it fails intermittently in my live program. The message
from INADA Naoki suggests that it could be inherent in CPython, but I am not
ready to accept that as an answer yet. I will keep plugging away and report
back with any findings.




Ok, I have not found the root cause yet, but I have moved the problem to a 
different place, which is progress.


From the interpreter session below, you will see that adding a key while 
processing the *last* key in an OrderedDict does not give rise to an 
exception. Adding a key while processing any prior key in an OrderedDict 
does raise the exception. I have checked this fairly thoroughly and it 
behaves the same way every time.



from collections import OrderedDict as OD
d = OD()
d[1] = 'one'
d[2] = 'two'
for k in d:

...   if k == 2:
... d[3] = 'three'
...

d = OD()
d[1] = 'one'
d[2] = 'two'
for k in d:

...   if k == 1:
... d[3] = 'three'
...
Traceback (most recent call last):
 File "", line 1, in 
RuntimeError: OrderedDict mutated during iteration




The intermittent nature of my problem stems from the above - sometimes I add 
a key while processing the last key, sometimes a prior one. I don't know why 
this is happening, so I am still investigating, but it has moved into the 
realm of normal debugging, not chasing shadows.


Frank


--
https://mail.python.org/mailman/listinfo/python-list


Re: Pink Floyd: is there anybody in here?

2018-05-31 Thread Peter J. Holzer
On 2018-05-31 00:06:37 +, Steven D'Aprano wrote:
> On Wed, 30 May 2018 21:53:05 +0100, Ben Bacarisse wrote:
> > Rob Gaddi  writes:
> >> On 05/30/2018 09:34 AM, Paul Rubin wrote:
> >>> I think Usenet posts are no longer getting forwarded to the mailing
> >>> list, but now I wonder if this is getting out at all, even to usenet.
> >>>
> >>> Does anyone see it?
> >>
> >> Can't speak for the mailing list, but this came out to Usenet just
> >> fine.
> > 
> > Snap.  The original and the reply.
> 
> You shouldn't generalise about Usenet, as individual news servers can and 
> do drop messages.

True.

> In this case, gmane seems to be dropping Paul Rubin's posts.

As has been recently discussed, gmane isn't directly connected to
Usenet. It gets all its messages from the mailing list. Since Paul's
message didn't make it to the mailing list, it didn't make it to gmane
either.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


EuroPython 2018: First list of accepted sessions available

2018-05-31 Thread Alexander C. S. Hendorf
We have received an amazing collection of 376 proposals.
Thank you all for your contributions!

Given the overwhelming quality of the proposals,
we had some very difficult decisions to make.
Nonetheless we are happy to announce
we have published the first 120+ sessions.
https://ep2018.europython.eu/en/events/sessions/

Here’s what we have on offer so far:
* 12 Trainings (complete)
* 98 Talks (some more will follow)
* 6 help desks (complete)
* 10 posters (complete)

More sessions to come
=

We have informed all speakers with accepted submissions by email.
We are further selecting a second wave of talks, that will be announced soon.

Please see the session list for details and abstracts.
In case you wonder what poster, interactive and help desk sessions are,
please check the call for proposals.
https://ep2018.europython.eu/en/call-for-proposals/

Enjoy,
–
EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/

Alexander Hendorf

as EuroPython vice chair & chair of the program work group

Twitter: @hendorf
LinkedIn: https://www.linkedin.com/in/hendorf

EuroPython:
https://www.europython.eu/
https://twitter.com/europython
https://www.facebook.com/europython

EuroPython Society:
http://www.europython-society.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Why exception from os.path.exists()?

2018-05-31 Thread Marko Rauhamaa


This surprising exception can even be a security issue:

   >>> os.path.exists("\0")
   Traceback (most recent call last):
 File "", line 1, in 
 File "/usr/lib64/python3.6/genericpath.py", line 19, in exists
   os.stat(path)
   ValueError: embedded null byte

Most other analogous reasons *don't* generate an exception, nor is that
possibility mentioned in the specification:

   https://docs.python.org/3/library/os.path.html?#os.path.exists

Is the behavior a bug? Shouldn't it be:

   >>> os.path.exists("\0")
   False


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Chris Angelico
On Thu, May 31, 2018 at 10:03 PM, Marko Rauhamaa  wrote:
>
> This surprising exception can even be a security issue:
>
>>>> os.path.exists("\0")
>Traceback (most recent call last):
>  File "", line 1, in 
>  File "/usr/lib64/python3.6/genericpath.py", line 19, in exists
>os.stat(path)
>ValueError: embedded null byte
>
> Most other analogous reasons *don't* generate an exception, nor is that
> possibility mentioned in the specification:
>
>https://docs.python.org/3/library/os.path.html?#os.path.exists
>
> Is the behavior a bug? Shouldn't it be:
>
>>>> os.path.exists("\0")
>False

A Unix path name cannot contain a null byte, so what you have is a
fundamentally invalid name. ValueError is perfectly acceptable.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Problem with OrderedDict - progress report

2018-05-31 Thread Steven D'Aprano
On Thu, 31 May 2018 10:05:43 +0200, Frank Millman wrote:

> "Frank Millman"  wrote in message news:pemchs$r12$1...@blaine.gmane.org...
>>
>> So working backwards, I have solved the first problem. I am no nearer
>> to
> figuring out why it fails intermittently in my live program. The message
> from INADA Naoki suggests that it could be inherent in CPython, but I am
> not ready to accept that as an answer yet. I will keep plugging away and
> report back with any findings.
>>
>>
> Ok, I have not found the root cause yet, but I have moved the problem to
> a different place, which is progress.
> 
> From the interpreter session below, you will see that adding a key while
> processing the *last* key in an OrderedDict does not give rise to an
> exception.

If you mutate the dict, and then stop iterating over it, there is no 
check that the dict was mutated.

It isn't an error to mutate the dict. It is an error to mutate it while 
it is being iterated over. If you stop the iteration, there's no problem.

py> d = dict(zip(range(5), "abcde"))
py> for x in d:
... d[999] = 'mutation!'
... break
...
py> d  # no error occurred
{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 999: 'mutation!'}


To be more precise, the checks against mutation occur when next() is 
called. If you don't call next(), the checks don't run.

py> it = iter(d)
py> next(it)
0
py> del d[4]
py> next(it)
Traceback (most recent call last):
  File "", line 1, in 
RuntimeError: dictionary changed size during iteration



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Marko Rauhamaa
Chris Angelico :

> On Thu, May 31, 2018 at 10:03 PM, Marko Rauhamaa  wrote:
>>
>> This surprising exception can even be a security issue:
>>
>>>>> os.path.exists("\0")
>>Traceback (most recent call last):
>>  File "", line 1, in 
>>  File "/usr/lib64/python3.6/genericpath.py", line 19, in exists
>>os.stat(path)
>>ValueError: embedded null byte
>
> [...]
>
> A Unix path name cannot contain a null byte, so what you have is a
> fundamentally invalid name. ValueError is perfectly acceptable.

At the very least, that should be emphasized in the documentation. The
pathname may come from an external source. It is routine to check for
"/", "." and ".." but most developers (!?) would not think of checking
for "\0". That means few test suites would catch this issue and few
developers would think of catching ValueError here. The end result is
unpredictable.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Chris Angelico
On Thu, May 31, 2018 at 11:03 PM, Marko Rauhamaa  wrote:
> Chris Angelico :
>
>> On Thu, May 31, 2018 at 10:03 PM, Marko Rauhamaa  wrote:
>>>
>>> This surprising exception can even be a security issue:
>>>
>>>>>> os.path.exists("\0")
>>>Traceback (most recent call last):
>>>  File "", line 1, in 
>>>  File "/usr/lib64/python3.6/genericpath.py", line 19, in exists
>>>os.stat(path)
>>>ValueError: embedded null byte
>>
>> [...]
>>
>> A Unix path name cannot contain a null byte, so what you have is a
>> fundamentally invalid name. ValueError is perfectly acceptable.
>
> At the very least, that should be emphasized in the documentation. The
> pathname may come from an external source. It is routine to check for
> "/", "." and ".." but most developers (!?) would not think of checking
> for "\0". That means few test suites would catch this issue and few
> developers would think of catching ValueError here. The end result is
> unpredictable.

The rules for paths come from the underlying system. You'll get quite
different results on Windows than you do on Unix. What should be
documented? Should it also be documented that you can get strange
errors when your path involves three different operating systems and
five different file systems? Is that Python's responsibility, or
should it be generally accepted that invalid values can cause
ValueError?

Do you have an actual use-case where it is correct for an invalid path
to be treated as not existing?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Marko Rauhamaa
Chris Angelico :
> Do you have an actual use-case where it is correct for an invalid path
> to be treated as not existing?

Note that os.path.exists() returns False for other types of errors
including:

 * File might exist but you have no access rights

 * The pathname is too long for the file system

 * The pathname is a broken symbolic link

 * The pathname is a circular symbolic link

 * The hard disk ball bearings are chipped

I'm not aware of any other kind of a string argument that would trigger
an exception except the presence of a NUL byte.

The reason for the different treatment is that the former errors are
caught by the kernel and converted to False by os.path.exists(). The NUL
byte check is carried out by Python's standard library.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Chris Angelico
On Thu, May 31, 2018 at 11:38 PM, Marko Rauhamaa  wrote:
> Chris Angelico :
>> Do you have an actual use-case where it is correct for an invalid path
>> to be treated as not existing?
>
> Note that os.path.exists() returns False for other types of errors
> including:
>
>  * File might exist but you have no access rights
>
>  * The pathname is too long for the file system
>
>  * The pathname is a broken symbolic link
>
>  * The pathname is a circular symbolic link
>
>  * The hard disk ball bearings are chipped

All of those are conceptually valid filenames, and it's perfectly
reasonable to ask if the file exists. Running the same program inside
a chroot might result in a True.

> I'm not aware of any other kind of a string argument that would trigger
> an exception except the presence of a NUL byte.

With a zero byte in the file name, it is not a valid file name under
any Unix-based OS. Regardless of the file system, "\0" is not valid.

> The reason for the different treatment is that the former errors are
> caught by the kernel and converted to False by os.path.exists(). The NUL
> byte check is carried out by Python's standard library.

That's because the kernel, having declared that zero bytes are
invalid, uses ASCIIZ filenames. It's way simpler that way. So the
Python string cannot validly be turned into input for the kernel. It's
on par with trying to represent 2**53+1.0 - it's not representable and
will behave differently. With floats, you get something close to the
requested value; with strings, they'd be truncated. But either way,
you absolutely cannot represent the file name "spam\0ham" to any Unix
kernel, because the file name is fundamentally invalid.

Can someone on Windows see if there are other path names that raise
ValueError there? Windows has a whole lot more invalid characters, and
invalid names as well.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Gregory Ewing

Chris Angelico wrote:


A Unix path name cannot contain a null byte, so what you have is a
fundamentally invalid name. ValueError is perfectly acceptable.


It would also make sense for it could simply return False, since
a file with such a name can't exist.

This is analogous to the way comparing objects of different types
for equality returns False instead of raising an exception.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Sorting and spaces.

2018-05-31 Thread Tobiah

I had a case today where I needed to sort two string:

['Awards', 'Award Winners']

I consulted a few sources to get a suggestion as to
what would be correct.  My first idea was to throw them
through a Linux command line sort:

Awards
Award Winners

Then I did some Googling, and found that most US systems seem
to prefer that one ignore spaces when alphabetizing.  The sort
program seemed to agree.

I put the items into the database that way, but I had forgotten
that my applications used python to sort them anyway.  The result
was different:

>>> a = ['Awards', 'Award Winners']
>>> sorted(a)
['Award Winners', 'Awards']

So python evaluated the space as a lower ASCII value.

Thoughts?  Are there separate tools for alphabetizing
rather then sorting?


Thanks,


Tobiah
--
https://mail.python.org/mailman/listinfo/python-list


Re: Problem with OrderedDict - progress report

2018-05-31 Thread Frank Millman

"Steven D'Aprano"  wrote in message news:peorib$1f4$2...@blaine.gmane.org...


On Thu, 31 May 2018 10:05:43 +0200, Frank Millman wrote:

> From the interpreter session below, you will see that adding a key while
> processing the *last* key in an OrderedDict does not give rise to an
> exception.

If you mutate the dict, and then stop iterating over it, there is no
check that the dict was mutated.

It isn't an error to mutate the dict. It is an error to mutate it while
it is being iterated over. If you stop the iteration, there's no problem.



Agreed, but my gut feel, and the following example, suggest that when 
processing the last key in a dictionary while iterating over it, you have 
not yet stopped iterating.



d = {}
d[1] = 'one'
d[2] = 'two'
for k in d:

...   if k == 2:
... d[3] = 'three'
...
Traceback (most recent call last):
 File "", line 1, in 
RuntimeError: dictionary changed size during iteration




OrderedDict seems to behave differently in this regard -


from collections import OrderedDict as OD
d = OD()
d[1] = 'one'
d[2] = 'two'
for k in d:

...   if k == 2:
... d[3] = 'three'
...

d

OrderedDict([(1, 'one'), (2, 'two'), (3, 'three')])




Frank



--
https://mail.python.org/mailman/listinfo/python-list


RE: Indented multi-line strings (was: "Data blocks" syntax specification draft)

2018-05-31 Thread Dan Strohl via Python-list
> This is of course not a problem if the *trailing* quote determines the
> indentation:
> 
> a_multi_line_string = i'''
>Py-
>   thon
> '''

I get the point, but it feels like it would be a pain to use, and it "Feels" 
different from the other python indenting, which is something that I would want 
to stay away from changing.

> > In any case, Chris made a good point that I agree with. This doesn't
> > really need to be syntax at all, but could just be implemented as a
> > new string method.
> 
> Depending on the details, not quite. A method wouldn't get the horizontal
> position of the leading quote. It could infer the position of the trailing 
> quote,
> though.
> 

What about if we used Chris's approach, but added a parameter to the method to 
handle the indent? 

For example, 

Test = """
Hello, this is a
 Multiline indented
String
""".outdent(4)


The outdent method could look like:

string.outdent(size=None)
"""
:param size : The number of spaces to remove from the beginning of each 
line in the string.  Non space characters will not be removed.  IF this is 
None, the number of characters in the first line of the string will be used.  
If this is an iterable, the numbers returned from each iteration will be used 
for their respective lines.  If there are more lines than iterations, the last 
iteration will be used for subsequent lines.

This solves the problem in a very pythonic way, while allowing the flexibility 
to handle different needs.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread MRAB

On 2018-05-31 14:38, Marko Rauhamaa wrote:

Chris Angelico :

Do you have an actual use-case where it is correct for an invalid path
to be treated as not existing?


Note that os.path.exists() returns False for other types of errors
including:

  * File might exist but you have no access rights

  * The pathname is too long for the file system

  * The pathname is a broken symbolic link

  * The pathname is a circular symbolic link

  * The hard disk ball bearings are chipped

I'm not aware of any other kind of a string argument that would trigger
an exception except the presence of a NUL byte.

The reason for the different treatment is that the former errors are
caught by the kernel and converted to False by os.path.exists(). The NUL
byte check is carried out by Python's standard library.

On Windows, the path '<' is invalid, but os.path.exists('<') returns 
False, not an error.


The path '' is also invalid, but os.path.exists('') returns False, not 
an error.


I don't see why '\0' should behave any differently.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Paul Moore
On 31 May 2018 at 15:01, Chris Angelico  wrote:
> Can someone on Windows see if there are other path names that raise
> ValueError there? Windows has a whole lot more invalid characters, and
> invalid names as well.

On Windows:

>>> os.path.exists('\0')
ValueError: stat: embedded null character in path

>>> os.path.exists('?')
False

>>> os.path.exists('\u77412')
False

>>> os.path.exists('\t')
False

Honestly, I think the OP's point is correct. os.path.exists should
simply return False if the filename has an embedded \0 - at least on
Unix. I don't know if Windows allows \0 in filenames, but if it does,
then os.path.exists should respect that...

Although I wouldn't consider this as anything even remotely like a
significant issue...

Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


Override built in types... possible? or proposal.

2018-05-31 Thread Dan Strohl via Python-list
Is it possible to override the assignment of built in types to the shorthand 
representations?   And if not, is it a reasonable thought to consider adding?

For example, right now, if I do:

test = "this is a string",

I get back str("this is a string").  What if I want to return this as 
my_string("this is a string")  (OK, I know I have a recursive issue in my 
example, but hopefully you get the point).

Or;

Test = ['item1', 'item2', 'item3'] returns a list, what if I want to add 
functionality to all lists in my module?  (and yes, I know I could simply not 
do [] and always do my_list('item1', 'item2', 'item3']

I am envisioning something in the header like an import statement where I could 
do;

override str=my_string
override list=my_list

This would only be scoped to the current module and would not be imported when 
that module was imported.

Thoughts?

Dan Strohl
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Steven D'Aprano
On Thu, 31 May 2018 22:46:35 +1000, Chris Angelico wrote:
[...]
>> Most other analogous reasons *don't* generate an exception, nor is that
>> possibility mentioned in the specification:
>>
>>https://docs.python.org/3/library/os.path.html?#os.path.exists
>>
>> Is the behavior a bug? Shouldn't it be:
>>
>>>>> os.path.exists("\0")
>>False
> 
> A Unix path name cannot contain a null byte, so what you have is a
> fundamentally invalid name. ValueError is perfectly acceptable.

It should still be documented.

What does it do on Windows if the path is illegal?



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Paul Moore
On 31 May 2018 at 16:11, Steven D'Aprano
 wrote:
> On Thu, 31 May 2018 22:46:35 +1000, Chris Angelico wrote:
> [...]
>>> Most other analogous reasons *don't* generate an exception, nor is that
>>> possibility mentioned in the specification:
>>>
>>>https://docs.python.org/3/library/os.path.html?#os.path.exists
>>>
>>> Is the behavior a bug? Shouldn't it be:
>>>
>>>>>> os.path.exists("\0")
>>>False
>>
>> A Unix path name cannot contain a null byte, so what you have is a
>> fundamentally invalid name. ValueError is perfectly acceptable.
>
> It should still be documented.
>
> What does it do on Windows if the path is illegal?

Returns False (confirmed with paths of '?' and ':', among others).

Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Sorting and spaces.

2018-05-31 Thread MRAB

On 2018-05-31 15:18, Tobiah wrote:

I had a case today where I needed to sort two string:

['Awards', 'Award Winners']

I consulted a few sources to get a suggestion as to
what would be correct.  My first idea was to throw them
through a Linux command line sort:

Awards
Award Winners

Then I did some Googling, and found that most US systems seem
to prefer that one ignore spaces when alphabetizing.  The sort
program seemed to agree.

I put the items into the database that way, but I had forgotten
that my applications used python to sort them anyway.  The result
was different:

>>> a = ['Awards', 'Award Winners']
>>> sorted(a)
['Award Winners', 'Awards']

So python evaluated the space as a lower ASCII value.

Thoughts?  Are there separate tools for alphabetizing
rather then sorting?


You could split the string first:
>>> a = ['Awards', 'Award Winners']
>>> sorted(a, key=str.split)
['Award Winners', 'Awards']

If you want it to be case-insensitive:

>>> sorted(a, key=lambda s: s.lower().split())
['Award Winners', 'Awards']
--
https://mail.python.org/mailman/listinfo/python-list


Re: Indented multi-line strings

2018-05-31 Thread MRAB

On 2018-05-31 15:39, Dan Strohl via Python-list wrote:

This is of course not a problem if the *trailing* quote determines the
indentation:

a_multi_line_string = i'''
   Py-
  thon
'''


I get the point, but it feels like it would be a pain to use, and it "Feels" 
different from the other python indenting, which is something that I would want to stay 
away from changing.


> In any case, Chris made a good point that I agree with. This doesn't
> really need to be syntax at all, but could just be implemented as a
> new string method.

Depending on the details, not quite. A method wouldn't get the horizontal
position of the leading quote. It could infer the position of the trailing 
quote,
though.



What about if we used Chris's approach, but added a parameter to the method to 
handle the indent?

For example,

Test = """
 Hello, this is a
  Multiline indented
 String
 """.outdent(4)


The outdent method could look like:

string.outdent(size=None)
 """
 :param size : The number of spaces to remove from the beginning of each 
line in the string.  Non space characters will not be removed.  IF this is 
None, the number of characters in the first line of the string will be used.  
If this is an iterable, the numbers returned from each iteration will be used 
for their respective lines.  If there are more lines than iterations, the last 
iteration will be used for subsequent lines.

This solves the problem in a very pythonic way, while allowing the flexibility 
to handle different needs.


That string starts with a blank line, after the initial quotes.

I was also thinking that it could take the indentation from the first 
line, but that if you wanted the first line to have a larger indent than 
the remaining lines, you could replace the first space that you want to 
keep with a non-whitespace character and then pass that character to the 
method.


For example:

Test = """\
 _   Hello, this is a
  Multiline indented
 String
 """.outdent(padding='_')

Outdent so that the first line is flush to the margin:

_   Hello, this is a
 Multiline indented
String

The padding argument tells it to replace the initial '_':

Hello, this is a
 Multiline indented
String
--
https://mail.python.org/mailman/listinfo/python-list


Re: Re: Re: The PIL show() method looks for the default viewer. How do I change this to a different viewer (of my choice)?

2018-05-31 Thread Paul St George
That's what I wanted! But, I didn't know the question because I didn't 
know the answer.



On 30/05/2018 23:09, Karsten Hilbert wrote:

On Wed, May 30, 2018 at 11:01:17PM +0200, Peter J. Holzer wrote:


On 2018-05-30 22:08:45 +0200, Paul St George wrote:

Ha! No, my question was clumsy.

If I know the name of the viewer that I want to use (say for example:
‘ImageMagick’), where do I find the argument that should be used in a line
of code such as this:

ImageShow.register(MyViewer("gwenview"), -1)

$> man -k ImageMagick
$> man whatever_you_found_with_the_above

Karsten


--
Paul St George
http://www.paulstgeorge.com
http://www.devices-of-wonder.com

+44(0)7595 37 1302

--
https://mail.python.org/mailman/listinfo/python-list


Re: Sorting and spaces.

2018-05-31 Thread Peter Otten
Tobiah wrote:

> I had a case today where I needed to sort two string:
> 
> ['Awards', 'Award Winners']
> 
> I consulted a few sources to get a suggestion as to
> what would be correct.  My first idea was to throw them
> through a Linux command line sort:
> 
> Awards
> Award Winners
> 
> Then I did some Googling, and found that most US systems seem
> to prefer that one ignore spaces when alphabetizing.  The sort
> program seemed to agree.
> 
> I put the items into the database that way, but I had forgotten
> that my applications used python to sort them anyway.  The result
> was different:
> 
> >>> a = ['Awards', 'Award Winners']
> >>> sorted(a)
> ['Award Winners', 'Awards']
> 
> So python evaluated the space as a lower ASCII value.
> 
> Thoughts?  Are there separate tools for alphabetizing
> rather then sorting?

>>> items = ["Awards", "Award Winners", "awards"]
>>> sorted(items)
['Award Winners', 'Awards', 'awards']
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
'en_US.UTF-8'
>>> sorted(items, key=locale.strxfrm)
['awards', 'Awards', 'Award Winners']
>>> locale.setlocale(locale.LC_ALL, "C")
'C'
>>> sorted(items, key=locale.strxfrm)
['Award Winners', 'Awards', 'awards']


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Override built in types... possible? or proposal.

2018-05-31 Thread Rob Gaddi

On 05/31/2018 07:49 AM, Dan Strohl wrote:

Is it possible to override the assignment of built in types to the shorthand 
representations?   And if not, is it a reasonable thought to consider adding?

For example, right now, if I do:

test = "this is a string",

I get back str("this is a string").  What if I want to return this as 
my_string("this is a string")  (OK, I know I have a recursive issue in my example, but 
hopefully you get the point).

Or;

Test = ['item1', 'item2', 'item3'] returns a list, what if I want to add 
functionality to all lists in my module?  (and yes, I know I could simply not 
do [] and always do my_list('item1', 'item2', 'item3']

I am envisioning something in the header like an import statement where I could 
do;

override str=my_string
override list=my_list

This would only be scoped to the current module and would not be imported when 
that module was imported.

Thoughts?

Dan Strohl



My problem with this idea is that it breaks expectations.  If I know one 
thing as a Python programmer, it's that 'Bob' is a str.  Each time and 
every time.  If you could override the meaning of basic constant 
identifiers to where I have no idea how they behave, that creates an 
easy thing to miss that changes the entire meaning of the things you've 
written.


What's the use case here?  And why is that use case better than, for 
instance, simply defining a function in the module that does the things 
you want done to strings?  Not everything has to be an object method.


--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Terry Reedy

On 5/31/2018 8:03 AM, Marko Rauhamaa wrote:


This surprising exception can even be a security issue:

>>> os.path.exists("\0")
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python3.6/genericpath.py", line 19, in exists
os.stat(path)
ValueError: embedded null byte

Most other analogous reasons *don't* generate an exception, nor is that
possibility mentioned in the specification:

https://docs.python.org/3/library/os.path.html?#os.path.exists

Is the behavior a bug? Shouldn't it be:

>>> os.path.exists("\0")
False


Please open an issue on the tracker if there is not one for this already.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: Indented multi-line strings

2018-05-31 Thread Terry Reedy

On 5/31/2018 10:39 AM, Dan Strohl via Python-list wrote:

This is of course not a problem if the *trailing* quote determines the
indentation:

 a_multi_line_string = i'''
Py-
   thon
 '''


I get the point, but it feels like it would be a pain to use, and it "Feels" 
different from the other python indenting, which is something that I would want to stay 
away from changing.


In any case, Chris made a good point that I agree with. This doesn't
really need to be syntax at all, but could just be implemented as a
new string method.


Depending on the details, not quite. A method wouldn't get the horizontal
position of the leading quote. It could infer the position of the trailing 
quote,
though.



What about if we used Chris's approach, but added a parameter to the method to 
handle the indent?

For example,

Test = """
 Hello, this is a
  Multiline indented
 String
 """.outdent(4)


The outdent method could look like:

string.outdent(size=None)
 """
 :param size : The number of spaces to remove from the beginning of each 
line in the string.  Non space characters will not be removed.  IF this is 
None, the number of characters in the first line of the string will be used.  
If this is an iterable, the numbers returned from each iteration will be used 
for their respective lines.  If there are more lines than iterations, the last 
iteration will be used for subsequent lines.

This solves the problem in a very pythonic way, while allowing the flexibility 
to handle different needs.


string = (
" Hello, this is a concatenated   \n"
"   multiline variably indented   \n"
"string with variable trailing blanks.\n"
"It works now and always has! ")

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Marko Rauhamaa
Terry Reedy :

> On 5/31/2018 8:03 AM, Marko Rauhamaa wrote:
>> Is the behavior a bug? Shouldn't it be:
>>
>> >>> os.path.exists("\0")
>> False
>
> Please open an issue on the tracker if there is not one for this
> already.

issue 33721 created


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Override built in types... possible? or proposal.

2018-05-31 Thread Dan Strohl via Python-list
> >
> > I am envisioning something in the header like an import statement
> > where I could do;
> >
> > override str=my_string
> > override list=my_list
> >
> > This would only be scoped to the current module and would not be
> imported when that module was imported.
> >
> > Thoughts?
> >
> > Dan Strohl
> >
> 
> My problem with this idea is that it breaks expectations.  If I know one 
> thing as
> a Python programmer, it's that 'Bob' is a str.  Each time and every time.  If 
> you
> could override the meaning of basic constant identifiers to where I have no
> idea how they behave, that creates an easy thing to miss that changes the
> entire meaning of the things you've written.
> 

True, though, to determine what almost anything is, you should look at the 
imports anyway, just in case I happened to do something like;

Import my_sys as sys

> What's the use case here?  And why is that use case better than, for instance,
> simply defining a function in the module that does the things you want done
> to strings?  Not everything has to be an object method.

It's not necessarily better, it simply provides more flexibility in how things 
are approached.  In most cases I would probably define a function for something 
as you suggested, or define a new class and just instantiate that  object 
instead when needed, but I can see a time when it would be nice to be able to 
simply say, "I really want to handle all of my dictionaries in this module in a 
certain way", then not have to worry about it.

To me, one of the things I like about Python is that I can override many of the 
way things are handled via sub-classes, magic methods, importing "as" etc... 
this is simply an extension of that existing flexibility.

And yes, it gives developers another tool they can shoot themselves pretty 
easily in the foot with if they aren't careful in how they use it, but so many 
of the good tools do that already.

Honestly, I am not so locked into this that I would scream about it not 
working, but there have been times when it would have been helpful in the past, 
so I figured I would bring it up and see what others thought.

Dan



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Chris Angelico
On Fri, Jun 1, 2018 at 12:51 AM, MRAB  wrote:
> On 2018-05-31 14:38, Marko Rauhamaa wrote:
>>
>> Chris Angelico :
>>>
>>> Do you have an actual use-case where it is correct for an invalid path
>>> to be treated as not existing?
>>
>>
>> Note that os.path.exists() returns False for other types of errors
>> including:
>>
>>   * File might exist but you have no access rights
>>
>>   * The pathname is too long for the file system
>>
>>   * The pathname is a broken symbolic link
>>
>>   * The pathname is a circular symbolic link
>>
>>   * The hard disk ball bearings are chipped
>>
>> I'm not aware of any other kind of a string argument that would trigger
>> an exception except the presence of a NUL byte.
>>
>> The reason for the different treatment is that the former errors are
>> caught by the kernel and converted to False by os.path.exists(). The NUL
>> byte check is carried out by Python's standard library.
>>
> On Windows, the path '<' is invalid, but os.path.exists('<') returns False,
> not an error.
>
> The path '' is also invalid, but os.path.exists('') returns False, not an
> error.
>
> I don't see why '\0' should behave any differently.

Okay, if it's just returning False for all the Windows invalid paths,
then sure, the Unix invalid paths can behave the same way.

Thanks for checking that (you and Paul equally).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Problem with OrderedDict - progress report

2018-05-31 Thread Chris Angelico
On Fri, Jun 1, 2018 at 12:37 AM, Frank Millman  wrote:
> "Steven D'Aprano"  wrote in message news:peorib$1f4$2...@blaine.gmane.org...
>>
>>
>> On Thu, 31 May 2018 10:05:43 +0200, Frank Millman wrote:
>>
>> > From the interpreter session below, you will see that adding a key while
>> > processing the *last* key in an OrderedDict does not give rise to an
>> > exception.
>>
>> If you mutate the dict, and then stop iterating over it, there is no
>> check that the dict was mutated.
>>
>> It isn't an error to mutate the dict. It is an error to mutate it while
>> it is being iterated over. If you stop the iteration, there's no problem.
>>
>
> Agreed, but my gut feel, and the following example, suggest that when
> processing the last key in a dictionary while iterating over it, you have
> not yet stopped iterating.

If it's easier to understand, here's an alternative wording:

It is an error to mutate the dictionary *and then continue to iterate over it*.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-05-31 Thread Grant Edwards
On 2018-05-31, Paul Moore  wrote:
> On 31 May 2018 at 15:01, Chris Angelico  wrote:
>> Can someone on Windows see if there are other path names that raise
>> ValueError there? Windows has a whole lot more invalid characters, and
>> invalid names as well.
>
> On Windows:
>
 os.path.exists('\0')
> ValueError: stat: embedded null character in path
>
 os.path.exists('?')
> False
>
 os.path.exists('\u77412')
> False
>
 os.path.exists('\t')
> False
>
> Honestly, I think the OP's point is correct. os.path.exists should
> simply return False if the filename has an embedded \0 - at least on
> Unix.

Except on the platform in quetion filenames _don't_ contain an
embedded \0.  What was passed was _not_ a path/filename.

You might as well have passed a floating point number or a dict.

> Although I wouldn't consider this as anything even remotely like a
> significant issue...

Agreed, but the thread will continue for months and generate hundreds
of followup.

-- 
Grant Edwards   grant.b.edwardsYow! You were s'posed
  at   to laugh!
  gmail.com

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Indented multi-line strings (was: "Data blocks" syntax specification draft)

2018-05-31 Thread Chris Angelico
On Fri, Jun 1, 2018 at 12:39 AM, Dan Strohl via Python-list
 wrote:
>> This is of course not a problem if the *trailing* quote determines the
>> indentation:
>>
>> a_multi_line_string = i'''
>>Py-
>>   thon
>> '''
>
> I get the point, but it feels like it would be a pain to use, and it "Feels" 
> different from the other python indenting, which is something that I would 
> want to stay away from changing.
>
>> > In any case, Chris made a good point that I agree with. This doesn't
>> > really need to be syntax at all, but could just be implemented as a
>> > new string method.
>>
>> Depending on the details, not quite. A method wouldn't get the horizontal
>> position of the leading quote. It could infer the position of the trailing 
>> quote,
>> though.
>>
>
> What about if we used Chris's approach, but added a parameter to the method 
> to handle the indent?
>
> For example,
>
> Test = """
> Hello, this is a
>  Multiline indented
> String
> """.outdent(4)
>
>
> The outdent method could look like:
>
> string.outdent(size=None)
> """
> :param size : The number of spaces to remove from the beginning of each 
> line in the string.  Non space characters will not be removed.  IF this is 
> None, the number of characters in the first line of the string will be used.  
> If this is an iterable, the numbers returned from each iteration will be used 
> for their respective lines.  If there are more lines than iterations, the 
> last iteration will be used for subsequent lines.
>
> This solves the problem in a very pythonic way, while allowing the 
> flexibility to handle different needs.
>

Sure! Though I'd drop the iterable option - YAGNI. Keep the basic API
simple. Just an integer or None, where None's is defined in terms of
the string itself.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Override built in types... possible? or proposal.

2018-05-31 Thread Terry Reedy

On 5/31/2018 10:49 AM, Dan Strohl via Python-list wrote:

Is it possible to override the assignment of built in types to the shorthand 
representations?


By which I presume you mean literals and overt (non-comprehension) 
displays.  So you wish that Python should be even more dynamic.  (Some 
wish it were less so ;-)


The CPython parser is generated by a parser-generator program from the 
python grammar and a table of non-default functions to call when the 
parser recognizes certain grammatical productions.  For instance, 
*stringliteral* is mapped to str.  I assume that the mapping is 
currently direct, and not routed through the builtins dict.  I don't 
know what other implementations do.


To change this, I believe you would have to introduce indirect mapping 
functions.  One possibility would be C equivalents of functions like


def get_str(): return builtins['str']

Then you could change future parsing by executing
  builtins['str'] = mystr\

However, this would affect the parsing of imported modules, if and when 
they are parsed, and exec and eval calls made within imported functions. 
 This would not be good.


So I believe you would also need to introduce a module copy of a subset 
of builtins, call it '__modclass__', whose keys would be the classes 
called by the parser, and use that in the get_xyz functions.  Then I 
believe you could change parsing within a module by executing

   __modclass__['str'] = mystr


and yes, I know I could simply not do [] and always do my_list('item1', 
'item2', 'item3')


I think we should stick with this.


This would only be scoped to the current module and would not be imported when 
that module was imported.


The harder part, I think, is "and not affect parsing of imported modules 
if they are not already parsed and not affect exec and eval calls in 
imported modules.



--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


How do I list only the methods I define in a class?

2018-05-31 Thread bruceg113355
How do I list only the methods I define in a class?

For example:

class Produce():
def __init__ (self):
print (dir (Produce))

def apples(self):
pass

def peaches(self):
pass

def pumpkin (self):
pass

The print (dir(Produce)) statement displays:
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', 
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', 
'__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', 
'__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', 
'__str__', '__subclasshook__', '__weakref__', 'apples', 'peaches', 'pumpkin']

I am only interested in 'apples', 'peaches', 'pumpkin'

The above is only an example.
In my real code there are methods with and without leading "__". 

Can I assume methods after __weakref__ are the methods I defined?
Is there a Python function to do what I need?

Thanks,
Bruce
-- 
https://mail.python.org/mailman/listinfo/python-list


Python library to break text into words

2018-05-31 Thread beliavsky--- via Python-list
I bought some e-books in a Humble Bundle. The file names are shown below. I 
would like to hyphenate words within the file names, so that the first three 
titles are

a_devils_chaplain.pdf
atomic_accidents.pdf
chaos_making_a_new_science.pdf

Is there a Python library that uses intelligent guesses to break sequences of 
characters into words? The general strategy would be to break strings into the 
longest words possible. The library would need to "know" a sizable subset of 
words in English.

adevilschaplain.pdf
atomicaccidents.pdf
chaos_makinganewscience.pdf
dinosaurswithoutbones.pdf
essaysinscience.pdf
genius_thelifeandscienceofrichardfeynman.pdf
louisagassiz_creatorofamericanscience.pdf
martiansummer.pdf
mind_aunifiedtheoryoflifeandintelligence.pdf
noturningback.pdf
onshakyground.pdf
scienceandphilosophy.pdf
sevenelementsthatchangedtheworld.pdf
strangeangel.pdf
theboywhoplayedwithfusion.pdf
thecanon.pdf
theedgeofphysics.pdf
thegenome.pdf
thegoldilocksenigma.pdf
thesphinxatdawn.pdf
unnaturalselection.pdf
water_thefateofourmostpreciousresource.pdf
x-15diary.pdf
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python library to break text into words

2018-05-31 Thread Chris Angelico
On Fri, Jun 1, 2018 at 6:26 AM, beliavsky--- via Python-list
 wrote:
> I bought some e-books in a Humble Bundle. The file names are shown below. I 
> would like to hyphenate words within the file names, so that the first three 
> titles are
>
> a_devils_chaplain.pdf
> atomic_accidents.pdf
> chaos_making_a_new_science.pdf
>
> Is there a Python library that uses intelligent guesses to break sequences of 
> characters into words? The general strategy would be to break strings into 
> the longest words possible. The library would need to "know" a sizable subset 
> of words in English.
>
> adevilschaplain.pdf
> atomicaccidents.pdf
> chaos_makinganewscience.pdf

Let's start with the easy bit. On many MANY Unix-like systems, you can
find a dictionary of words in the user's language (not necessarily
English, but that's appropriate here - it means your script will work
on a French or German or Turkish or Russian system as well) at
/usr/share/dict/words. All you have to do is:

with open("/usr/share/dict/words") as f:
words = f.read().strip().split("\n")

Tada! That'll give you somewhere between 50K and 650K words, for
English. (I have eight English dictionaries installed, ranging from
american-english-small and british-english-small at 51K all the way up
to their corresponding -insane variants at 650K.) Most likely you'll
have about 100K words, which is a good number to be working with. If
you're on Windows, see if you can just download something from
wordlist.sourceforge.net or similar; it should be in the same format.

So! Now for the next step. You need to split a pile of letters such
that each of the resulting pieces is a word. You're probably going to
find some that just don't work ("x-15diary" seems dubious), but for
the most part, you should get at least _some_ result. You suggested a
general strategy of breaking strings into the longest words possible,
which would be easy enough to code. A basic algorithm of "take as many
letters as you can while still finding a word" is likely to give you
fairly decent results. You'll need a way of backtracking in the event
that the rest of the letters don't work ("theedgeofphysics" will take
a first word of "thee", but then "dgeofphysics" isn't going to work
out well), but otherwise, I think your basic idea is sound.

Should be a fun project!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Sorting and spaces.

2018-05-31 Thread Paul
In the US, at least, spaces should sort before letters.

MRAB brought up an important point. It depends on your purpose, of course,
but having all the capitalized-beginning items appear separately from all
of the lower-cased-beginning items can be very annoying to a user.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Indented multi-line strings (was: "Data blocks" syntax specification draft)

2018-05-31 Thread Peter J. Holzer
[Strange: I didn't get this mail through the list, only directly]

On 2018-05-31 14:39:17 +, Dan Strohl wrote:
> > This is of course not a problem if the *trailing* quote determines the
> > indentation:
> > 
> > a_multi_line_string = i'''
> >Py-
> >   thon
> > '''
> 
> I get the point, but it feels like it would be a pain to use, and it
> "Feels" different from the other python indenting, which is something
> that I would want to stay away from changing.

Yes, it's the wrong way around. The indentation should be determined by
the start quote. That's why I initially wrote that the quotes must
line up vertically.

Unfortunately you can't write 

a_multi_line_string = 
i'''
Py-
   thon
 '''

although you can write 

a_multi_line_string = \
i'''
Py-
   thon
 '''

which is visually not much worse.

> > > In any case, Chris made a good point that I agree with. This doesn't
> > > really need to be syntax at all, but could just be implemented as a
> > > new string method.
> > 
> > Depending on the details, not quite. A method wouldn't get the horizontal
> > position of the leading quote. It could infer the position of the trailing 
> > quote,
> > though.
> > 
> 
> What about if we used Chris's approach, but added a parameter to the
> method to handle the indent? 
> 
> For example, 
> 
> Test = """
> Hello, this is a
>  Multiline indented
> String
> """.outdent(4)

Eek! No, I don't think that's a good idea. It means that the programmer
has to count spaces and has to remember to adjust the parameter if the
indentation changes (e.g. because the block is wrapped in a loop or
factored out to a function).


> The outdent method could look like:
> 
> string.outdent(size=None)
> """
> :param size : The number of spaces to remove from the beginning of
> each line in the string.  Non space characters will not be
> removed.  IF this is None, the number of characters in the first
> line of the string will be used.

The default should be the minimum number of leading spaces on non-empty
lines, I think. This is compatible with PEP 257. And in fact it allows
all lines to start with whitespace if the string ends with a newline
(which is a weird dependency, but probably not much of a restriction in
practice).


> If this is an iterable, the numbers returned from each iteration
> will be used for their respective lines.  If there are more lines
> than iterations, the last iteration will be used for subsequent
> lines.

This looks like overkill to me. What would be the use case?

> This solves the problem in a very pythonic way,

Everybody has their own definition of "pythonic", I guess.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Indented multi-line strings

2018-05-31 Thread Peter J. Holzer
On 2018-05-31 16:44:10 +0100, MRAB wrote:
> I was also thinking that it could take the indentation from the first line,
> but that if you wanted the first line to have a larger indent than the
> remaining lines, you could replace the first space that you want to keep
> with a non-whitespace character and then pass that character to the method.
> 
> For example:
> 
> Test = """\
>  _   Hello, this is a
>   Multiline indented
>  String
>  """.outdent(padding='_')
> 
> Outdent so that the first line is flush to the margin:
> 
> _   Hello, this is a
>  Multiline indented
> String
> 
> The padding argument tells it to replace the initial '_':
> 
> Hello, this is a
>  Multiline indented
> String

I would prefer to remove the padding, like this:

Test = """
|Hello, this is a
| Multiline indented
|String
""".outdent(padding='|')

Or write it like this?

Test = """|Hello, this is a
  | Multiline indented
  |String
  """.outdent(padding='|')

Hmm, the sign of Zorro! :-)

I'm starting to like outdent(), but that may be my TIMTOWTDIism
speaking.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Indented multi-line strings (was: "Data blocks" syntax specification draft)

2018-05-31 Thread Peter J. Holzer
On 2018-05-31 23:05:35 +0200, Peter J. Holzer wrote:
> [Strange: I didn't get this mail through the list, only directly]

Found it. For some reason "Avoid duplicate copies of messages" was
enabled. I normally always disable this when I subscribe to a
mailinglist and I'm surprised that I haven't noticed it before.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python library to break text into words

2018-05-31 Thread Dietmar Schwertberger

On 5/31/2018 10:26 PM, beliavsky--- via Python-list wrote:

Is there a Python library that uses intelligent guesses to break sequences of characters 
into words? The general strategy would be to break strings into the longest words 
possible. The library would need to "know" a sizable subset of words in English.


No need to re-invent the wheel:

import webbrowswer
webbrowser.open( 
"https://www.google.com/search?q=%s"%"atomicaccidents.pdf"+"+amazon";, new=0)



Copy the title from the browser window and paste it into your script's 
window which will read it with input() and rename the file.


Regards,

Dietmar


--
https://mail.python.org/mailman/listinfo/python-list


Re: Sorting and spaces.

2018-05-31 Thread Chris Angelico
On Fri, Jun 1, 2018 at 6:51 AM, Paul  wrote:
> In the US, at least, spaces should sort before letters.
>
> MRAB brought up an important point. It depends on your purpose, of course,
> but having all the capitalized-beginning items appear separately from all
> of the lower-cased-beginning items can be very annoying to a user.

And that's why locale-based sorting exists. You can't set a single
definition of sorting and expect it to work for everyone. In fact,
even within a language, there can be different sorting rules for
different types of data (a list of names might be sorted one way, but
a list of book titles differently). Peter's recommendation covers most
of that, modulo the types-of-data complexity; you should be able to
sort German text according to German rules, and Dutch text according
to Dutch rules.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Indented multi-line strings (was: "Data blocks" syntax specification draft)

2018-05-31 Thread Chris Angelico
On Fri, Jun 1, 2018 at 7:05 AM, Peter J. Holzer  wrote:
> [Strange: I didn't get this mail through the list, only directly]
>
> On 2018-05-31 14:39:17 +, Dan Strohl wrote:
>> The outdent method could look like:
>>
>> string.outdent(size=None)
>> """
>> :param size : The number of spaces to remove from the beginning of
>> each line in the string.  Non space characters will not be
>> removed.  IF this is None, the number of characters in the first
>> line of the string will be used.
>
> The default should be the minimum number of leading spaces on non-empty
> lines, I think. This is compatible with PEP 257. And in fact it allows
> all lines to start with whitespace if the string ends with a newline
> (which is a weird dependency, but probably not much of a restriction in
> practice).

Exactly. The default will be the most commonly used option when
working with string literals; explicitly setting it is there if you
need it, but won't be the normal way you do things.

Either way, if attached to a string literal, with either no parameter
or a literal integer, this would be a valid candidate for constant
folding. (There's no way to monkeypatch or shadow anything.) At that
point, we would have all the benefits of a new string literal type,
with no syntactic changes, just the creation of the method.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python library to break text into words

2018-05-31 Thread Chris Angelico
On Fri, Jun 1, 2018 at 7:09 AM, Dietmar Schwertberger
 wrote:
> On 5/31/2018 10:26 PM, beliavsky--- via Python-list wrote:
>>
>> Is there a Python library that uses intelligent guesses to break sequences
>> of characters into words? The general strategy would be to break strings
>> into the longest words possible. The library would need to "know" a sizable
>> subset of words in English.
>
>
> No need to re-invent the wheel:
>
> import webbrowswer
> webbrowser.open(
> "https://www.google.com/search?q=%s"%"atomicaccidents.pdf"+"+amazon";, new=0)
>
>
> Copy the title from the browser window and paste it into your script's
> window which will read it with input() and rename the file.

10/10 for grin-worthy solutions :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Attachments? Re: Indented multi-line strings (was: "Data blocks" syntax specification draft)

2018-05-31 Thread Paul
I have heard that attachments to messages are not allowed on this list,
which makes sense. However I notice that messages from Peter do have an
attachment, i.e., a signature.asc file.

I'm just curious; why and how do those particular attachments get through?
And should they get through, I guess? E.G., what if I attach a malicious
file labeled as .asc?

[Peter, I am not suggesting anything about you!  ;). ]

Paul C.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python library to break text into words

2018-05-31 Thread beliavsky--- via Python-list
On Thursday, May 31, 2018 at 5:31:48 PM UTC-4, Dietmar Schwertberger wrote:
> On 5/31/2018 10:26 PM, beliavsky--- via Python-list wrote:
> > Is there a Python library that uses intelligent guesses to break sequences 
> > of characters into words? The general strategy would be to break strings 
> > into the longest words possible. The library would need to "know" a sizable 
> > subset of words in English.
> 
> No need to re-invent the wheel:
> 
> import webbrowswer
> webbrowser.open( 
> "https://www.google.com/search?q=%s"%"atomicaccidents.pdf"+"+amazon";, new=0)
> 
> 
> Copy the title from the browser window and paste it into your script's 
> window which will read it with input() and rename the file.
> 
> Regards,
> 
> Dietmar

Thanks to both of you.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Override built in types... possible? or proposal.

2018-05-31 Thread Steven D'Aprano
On Thu, 31 May 2018 09:51:30 -0700, Rob Gaddi wrote:

> On 05/31/2018 07:49 AM, Dan Strohl wrote:
>> Is it possible to override the assignment of built in types to the
>> shorthand representations?   And if not, is it a reasonable thought to
>> consider adding?
[...]

> My problem with this idea is that it breaks expectations.

No worse than shadowing builtin names.



> If I know one
> thing as a Python programmer, it's that 'Bob' is a str.

True. But if there were such a feature as Dan asked for, you would learn 
that "Bob" might be anything, depending on the current state of the 
module. That's not much worse than the idea that int("123") might return 
anything, depending on the current state of the module and builtins.

Still, I agree that this is probably a tad too much dynamism even for a 
language as dynamic as Python.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How do I list only the methods I define in a class?

2018-05-31 Thread bob gailer

On 5/31/2018 3:49 PM, bruceg113...@gmail.com wrote:
> How do I list only the methods I define in a class?
Here's a class with some method, defined in various ways:

>>> class x():
... a=3
... def f():pass
... g = lambda: None
...

>>> l=[v for v in x.__dict__.items()]; print(l)
[('a', 3), ('f', ), ('__module__', 
'__main__'), ('__dict__', ), 
('__doc__', None), ('__weakref__', objects>)]


>>> import inspect
>>> [(key, value) for key, value in l if inspect.isfunction(i[1])]
[('f', ), ('g',  
at 0x01DEDD693620>)]


HTH

--
https://mail.python.org/mailman/listinfo/python-list


... (ellipsis)

2018-05-31 Thread Mike McClain
I'm having understanding the use if the ellipsis.
I keep reading that it is used in slices but every time I use it I get
'Syntax error' in 2.7 if 'Type error' in 3.2.

In python2.7:
l=range(15)
l[...:11]
Syntax error
l[3:...]
Syntax error
l[3:...:11]
Syntax error

In python3.2 it becomes 'Type error'  but still doesn't give me
anything usable.

Is the ellipsis really useable in anything other than documentation or
does it actually have a function in python?
If the latter would someone please provide an example showing how it
is used?

Thanks,
Mike
--
I Don't care how little your country is, you got a right to run it like
you want to. When big nations quit meddling then the world will have peace.
- Will Rogers
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Attachments? Re: Indented multi-line strings (was: "Data blocks" syntax specification draft)

2018-05-31 Thread Abdur-Rahmaan Janhangeer
as this sig file is a common occurance, attaching the topic to the data
blocks thread is not really necessary

Abdur-Rahmaan Janhangeer
https://github.com/Abdur-rahmaanJ

On Fri, 1 Jun 2018, 01:49 Paul,  wrote:

> I have heard that attachments to messages are not allowed on this list,
> which makes sense. However I notice that messages from Peter do have an
> attachment, i.e., a signature.asc file.
>
> I'm just curious; why and how do those particular attachments get through?
> And should they get through, I guess? E.G., what if I attach a malicious
> file labeled as .asc?
>
> [Peter, I am not suggesting anything about you!  ;). ]
>
> Paul C.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


version

2018-05-31 Thread Mike McClain
OK so I installed python 3.2, which is the latest available as a
package in Debian Wheezy, because I've seen so many folks say it's a
waste of time to play with Py2.7.
Immediately my python playground 'my.python.py' failed as soon as
I changes the '#!' line to python3.2.
Most of the errors were because I had used 'print' without parens
which 2.7 liked but 3.2 doesn't.
Is there a way in a script to know which version of python is being
run so I can write:
If (version == 2.7):
do it this way
elsif (version == 3.2):
do it another way

Thanks,
Mike
--
I Don't care how little your country is, you got a right to run it like
you want to. When big nations quit meddling then the world will have peace.
- Will Rogers
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: ... (ellipsis)

2018-05-31 Thread Terry Reedy

On 5/31/2018 10:26 PM, Mike McClain wrote:

 I'm having understanding the use if the ellipsis.
I keep reading that it is used in slices


By numpy for numpy multidimensional arrays, which have their own 
__getitem__, which recognizes and gives meaning to ...


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: version

2018-05-31 Thread Jorge Gimeno
Look at the six module

On Thu, May 31, 2018, 7:57 PM Mike McClain  wrote:

> OK so I installed python 3.2, which is the latest available as a
> package in Debian Wheezy, because I've seen so many folks say it's a
> waste of time to play with Py2.7.
> Immediately my python playground 'my.python.py' failed as soon as
> I changes the '#!' line to python3.2.
> Most of the errors were because I had used 'print' without parens
> which 2.7 liked but 3.2 doesn't.
> Is there a way in a script to know which version of python is being
> run so I can write:
> If (version == 2.7):
> do it this way
> elsif (version == 3.2):
> do it another way
>
> Thanks,
> Mike
> --
> I Don't care how little your country is, you got a right to run it like
> you want to. When big nations quit meddling then the world will have peace.
> - Will Rogers
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python library to break text into words

2018-05-31 Thread Abdur-Rahmaan Janhangeer
1-> search in dict, identify all words example :

meaningsofoffers

.. identified words :

me
an
mean
in
meaning
meanings
so
of
of
offer
offers

2-> next filter duplicates, i.e. of above in a new list as the original
list serves as chronological reference

3-> next chose the words whose lengths make up the length of the string

4-> if several solutions choose non-overlapping and chronologically sound
ones

5-> unused letters are treated as words where non-natural words are
included, that can be problematic if sub words are found in it and point 7
might be the way to go

6-> in the case of non-regular words included, the program returns the best
solutions for the user to choose from

i have branded the above 6 points algorithm as the Arj.mu Algorithm of Word
Extraction in Connected Letters

7-> if machine learning is enacted, the above point (6) serves as training
(on an everyday usage app) or it can directly train on predefined examples

8-> if typos are assumed to be found titles, then the title should be
assumed to have the corrected words and a new search is done on this
assumed title. in which case the results are added to the non corrected
version and then point 6 above is executed

8.1-> for assumptions in 8, Natural Language modules might be used

9-> titles can contain numbers, dates, author names and others and as such
is not covered by the points above


Abdur-Rahmaan Janhangeer
https://github.com/Abdur-rahmaanJ
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Attachments? Re: Indented multi-line strings (was: "Data blocks" syntax specification draft)

2018-05-31 Thread Paul
I gave it a different subject line.

On Fri, Jun 1, 2018 at 2:45 AM, Abdur-Rahmaan Janhangeer <
arj.pyt...@gmail.com> wrote:

> as this sig file is a common occurance, attaching the topic to the data
> blocks thread is not really necessary
>
> Abdur-Rahmaan Janhangeer
> https://github.com/Abdur-rahmaanJ
>
> On Fri, 1 Jun 2018, 01:49 Paul,  wrote:
>
>> I have heard that attachments to messages are not allowed on this list,
>> which makes sense. However I notice that messages from Peter do have an
>> attachment, i.e., a signature.asc file.
>>
>> I'm just curious; why and how do those particular attachments get through?
>> And should they get through, I guess? E.G., what if I attach a malicious
>> file labeled as .asc?
>>
>> [Peter, I am not suggesting anything about you!  ;). ]
>>
>> Paul C.
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python library to break text into words

2018-05-31 Thread Abdur-Rahmaan Janhangeer
Dietmar's answer is the best, piggybacking on search engines' algorithms

and probably instead of a dictionary of english words, we'd need a
dictionary of titles, making search much more efficient

regards,

Abdur-Rahmaan Janhangeer
https://github.com/Abdur-rahmaanJ

No need to re-invent the wheel:
>
> import webbrowswer
> webbrowser.open(
> "https://www.google.com/search?q=%s"%"atomicaccidents.pdf"+"+amazon";,
> new=0)
>
> Copy the title from the browser window and paste it into your script's
> window which will read it with input() and rename the file.
>
> Regards,
>
> Dietmar
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: version

2018-05-31 Thread Ralf Schoenian

Hi Mike,

you can check for the major version with

import sys
sys.version_info.major


On 01.06.2018 04:44, Mike McClain wrote:

 OK so I installed python 3.2, which is the latest available as a
package in Debian Wheezy, because I've seen so many folks say it's a
waste of time to play with Py2.7.
 Immediately my python playground 'my.python.py' failed as soon as
I changes the '#!' line to python3.2.
 Most of the errors were because I had used 'print' without parens
which 2.7 liked but 3.2 doesn't.
Is there a way in a script to know which version of python is being
run so I can write:
 If (version == 2.7):
 do it this way
 elsif (version == 3.2):
 do it another way

Thanks,
Mike
--
I Don't care how little your country is, you got a right to run it like
you want to. When big nations quit meddling then the world will have peace.
 - Will Rogers


--
https://mail.python.org/mailman/listinfo/python-list