date:20150226

splunk_handler and logbook

2015-02-26 Thread jvarghese


Hi,
I am tring to use splunk(splunk_handler), my question is  "Is there any way 
to integrate logbook with splunk_handler".


The examples of splunk_handler uses python's logging module.

Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: GDAL Installation in Enthought Python Distribution

2015-02-26 Thread Mark Lawrence


On 26/02/2015 07:47, Leo Kris Palao wrote:

Hi Python Users,

Would like to request how to install GDAL in my Enthought Python
Distribution (64-bit). I am having some problems making GDAL work. Or
can you point me into a blog that describes how to set up GDAL in
Enthought Python Distribution.

Thanks for any help.
-Leo




Was it really neccessary to start a new thread one day after asking this 
question in a slightly different formt?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Mark Lawrence


On 26/02/2015 02:57, Steven D'Aprano wrote:

Mark Lawrence wrote:


On 25/02/2015 20:45, Mark Lawrence wrote:

http://www.slideshare.net/pydanny/python-worst-practices

Any that should be added to this list?  Any that be removed as not that
bad?



Throwing in my own, how about built-in functions should not use "object"
as the one and only argument, and a keyword argument at that.



Which built-in function is that?



memoryview see http://bugs.python.org/issue20408

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Mark Lawrence


On 26/02/2015 03:05, Dave Angel wrote:

On 02/25/2015 08:44 PM, Mark Lawrence wrote:

On 25/02/2015 20:45, Mark Lawrence wrote:

http://www.slideshare.net/pydanny/python-worst-practices

Any that should be added to this list?  Any that be removed as not that
bad?



Throwing in my own, how about built-in functions should not use "object"
as the one and only argument, and a keyword argument at that.



def marry(object = False)...


if anybody has any cause to object, let him speak now...



Fell off me chair larfing.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread cl

Ben Finney  wrote:
> Chris Angelico  writes:
> 
> > I'd really like to see a lot more presentations done in pure text.
> 
> Maybe so. My request at the moment, though, is not for people to change
> what's on their slides; rather, if they want people to retrieve them,
> the slides should be downloadable easily (i.e. without a web app,
> without a registration to some specific site).
> 
... and having downloaded them what do you view them with if they're
not plain text?

-- 
Chris Green
·
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread cl

Ian Kelly  wrote:
> On Wed, Feb 25, 2015 at 1:45 PM, Mark Lawrence  
> wrote:
> > http://www.slideshare.net/pydanny/python-worst-practices
> >
> > Any that should be added to this list?  Any that be removed as not that bad?
> 
> Using XML for configuration is a good example of a worst practice, but
> using Python instead isn't best practice. There are good arguments
> that a configuration language shouldn't be Turing-complete. See for
> instance this blog post: http://taint.org/2011/02/18/001527a.html
> 
I agree wholeheartedly about XML, it's just not designed for what half
the world seems to be using it for.  Rather like HTML in a way, that
should have been a proper mark-up language.

-- 
Chris Green
·
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Design thought for callbacks

2015-02-26 Thread Cem Karan


On Feb 26, 2015, at 12:36 AM, Gregory Ewing  wrote:

> Cem Karan wrote:
>> I think I see what you're talking about now.  Does WeakMethod
>> (https://docs.python.org/3/library/weakref.html#weakref.WeakMethod) solve
>> this problem?
> 
> Yes, that looks like it would work.


Cool!  

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list

asyncio POLLHUP question

2015-02-26 Thread Chris Laws

I have a system scenario where thousands of applications are running and
via a service discovery mechanism they all get notified that a service they
are all interesting in has come online. They all attempt to connect a TCP
socket to the service. This happen virtually instantly.

The problem that I see is that many of the applications that try to connect
to the server get themselves into a state where they are consuming a lot of
CPU.

I am using Python 3.4.2, asyncio and have set the server backlog set to
4000 in an effort to accomodate the connection request backlog. I am
actually using an event loop from aiozmq (but no ZMQ sockets in this
scenaio) but under the covers this is just using epoll so it should really
be the same as using the DefaultSelector.

Using strace on the apps exhibiting issues I see that a socket is
continuously triggering a POLLERR|POLLHUP event. This is the cause of the
large CPU usage. The socket is the one that was attempting to connect to
the new service that was just brought up.

I am guessing that the POLLHUP is caused by the server having issues
processing the volume of connect requests.

I think I need to drop/close the socket causing the POLLHUP. However, from
looking through the asyncio source code I don't see how I can do that from
within the _selector.select() or _process_events() functions with only the
knowledge of which fd is causing the issue.

How do poll errors propagate up from the select loop?

I can potentially unregister the fd but I don't think this will trigger the
transport/protocol getting closed (as far as I can tell) which prevents my
normal error handling scenarios from attempting to reconnect to the
service. The asyncio select functions seem to ignore events other than
EVENT_READ and EVENT_WRITE.

Any help would be appreciated.

Regards,
Chris
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Steven D'Aprano

Chris Angelico wrote:

> On Thu, Feb 26, 2015 at 10:54 AM, Steven D'Aprano
>  wrote:
>> - Violating the Rule of Demeter: don't talk to the dog's leg, talk to
>>   the dog. Or another way to put it: don't let the paper boy reach
>>   into your pocket for money.
> 
> I'd call that code smell, rather than an automatic worst practice.

Well, I did end my post with:

And very possibly the worst practice of all:

- Failing to understand when it is, and isn't, appropriate to break
  the rules and do what would otherwise be a bad practice.

:-)

> Suppose this:
> 
> class Shelf:
> def __init__(self):
> self.items = [] # Empty shelf
> 
> bookshelf = Shelf()
> bookshelf.items.append(book)
> 
> To enforce Demeter, you'd have to add a bunch of methods to the Shelf
> whose sole purpose is to pass word along to the items list. Sure, it
> makes some design sense to say "Add this book to the shelf" rather
> than "Add this book to the items on the shelf", but all those lines of
> code are potential bugs, and if you have to reimplement huge slabs of
> functionality, that too is code smell. So there are times when it's
> correct to reach into another object.

Yes, well this comes down to the question of encapsulation and
information-hiding. The advantage of exposing the list of items to the
public is that anyone can add or remove items, sort them, reverse them,
etc. The disadvantage is that you are now committed to keeping that list as
part of the public API and you can't easily change the implementation.

In this specific example, I'd probably keep the list as part of the Shelf
API, although I'd be tempted to make self.items a read-only property. That
will allow you to call list mutator methods, but prevent you from doing
something silly like:

bookshelf.items = 23

> But the times to use two dots are much rarer than the times to use one
> dot (the paper boy shouldn't reach into your pocket for money, but
> ThinkGeek has your credit card number on file so you can order more
> conveniently), and I can't think of any example off-hand where you
> would want more than three dots.

The Law of Demeter is not really about counting dots. Ruby encourages
chaining methods. Python doesn't, since built-ins typically don't return
self. But in your own classes, you can have methods return self so you can
chain them like this:

mylist.append(spam).insert(1, eggs).append(cheese).sort().index(ham)

Five dots or not, this is not a violation of Demeter. Likewise for long
package names:

from mylibrary.audiovisual.image.jpeg import Handler

The Law of Demeter is more about information hiding. Clearly you don't hide
*public* attributes of your class, otherwise they aren't public, so it's
perfectly acceptable to say:

myshelf.items.append(spam).insert(1, eggs).append(cheese).sort().index(ham)

if items is public. But if the Shelf designer decides that the user
shouldn't know anything about how the shelf stores its items, then the
*first* dot violates the Law of Demeter.

-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Rustom Mody

On Wednesday, February 25, 2015 at 2:12:09 AM UTC+5:30, Dave Angel wrote:
> On 02/24/2015 02:57 PM, Laura Creighton wrote:
> > Dave Angel
> > are you another Native English speaker living in a world where ASCII
> > is enough?
> 
> I'm a native English speaker, and 7 bits is not nearly enough.  Even if 
> I didn't currently care, I have some history:
> 
> No.  CDC display code is enough. Who needs lowercase?
> 
> No.  Baudot code is enough.
> 
> No, EBCDIC is good enough.  Who cares about other companies.
> 
> No, the "golf-ball" only holds this many characters.  If we need more, 
> we can just get the operator to switch balls in the middle of printing.
> 
> No. 2 digit years is enough.  This world won't last till the millennium 
> anyway.
> 
> No.  2k is all the EPROM you can have.  Your code HAS to fit in it, and 
> only 1.5k RAM.
> 
> No.  640k is more than anyone could need.
> 
> No, you cannot use a punch card made on a model 26 keypunch in the same 
> deck as one made on a model 29.  Too bad, many of the codes are 
> different.  (This one cost me travel back and forth between two 
> different locations with different model keypunches)
> 
> No. 8 bits is as much as we could ever use for characters.  Who could 
> possibly need names or locations outside of this region?  Or from 
> multiple places within it?
> 
> 35 years ago I helped design a serial terminal that "spoke" Chinese, 
> using a two-byte encoding.  But a single worldwide standard didn't come 
> until much later, and I cheered Unicode when it was finally unveiled.
> 
> I've worked with many printers that could only print 70 or 80 unique 
> characters.  The laser printer, and even the matrix printer are 
> relatively recent inventions.

Wrote something up on why we should stop using ASCII:
http://blog.languager.org/2015/02/universal-unicode.html

(Yeah the world is a bit larger than a small bunch of islands off a 
half-continent.
But this is not that discussion!)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Chris Angelico

On Thu, Feb 26, 2015 at 11:26 PM, Steven D'Aprano
 wrote:
> Chris Angelico wrote:
>
>> But the times to use two dots are much rarer than the times to use one
>> dot (the paper boy shouldn't reach into your pocket for money, but
>> ThinkGeek has your credit card number on file so you can order more
>> conveniently), and I can't think of any example off-hand where you
>> would want more than three dots.
>
> The Law of Demeter is not really about counting dots. Ruby encourages
> chaining methods. Python doesn't, since built-ins typically don't return
> self. But in your own classes, you can have methods return self so you can
> chain them like this:
>
> mylist.append(spam).insert(1, eggs).append(cheese).sort().index(ham)
>
> Five dots or not, this is not a violation of Demeter. Likewise for long
> package names:
>
> from mylibrary.audiovisual.image.jpeg import Handler

Yes, there are other places where you have lots of dots... I'm talking
about the "rule of thumb" shorthand for describing the Law of Demeter,
which is that you shouldn't have more than one dot before your method
call. The chaining isn't that, because each one is a separate entity;
but if you say "fred.house.bookshelf.items.append(book)", you're
reaching in far too deep - you should be giving Fred the book to place
on his own shelf. That's the only way where "counting dots" is a valid
shorthand. It's mentioned in the Wikipedia article for the law:

https://en.wikipedia.org/wiki/Law_of_Demeter#In_object-oriented_programming

Can you offer a less ambiguous way to describe Demeter violations? By
the "counting dots" style, Demeter demands one, I would be happy with
two, and three or more strongly suggests a flawed API or overly-tight
coupling - with the exception that module references don't get counted
("import os; os.path.append(p)" is one dot, because there's no point
having an "os.add_path()" method). In some languages, those module
references would be notated differently (os::path.append(p)), so
simply counting dots would be closer to accurate. Is there a better
Python description?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread m


W dniu 25.02.2015 21:45, Mark Lawrence pisze:

http://www.slideshare.net/pydanny/python-worst-practices

Any that should be added to this list?  Any that be removed as not that bad?




I disagree with slide 16. If I wanted to use long variable names, I 
would still code in Java.


regards

m.
--
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Rustom Mody

On Thursday, February 26, 2015 at 6:10:25 PM UTC+5:30, Rustom Mody wrote:
> On Wednesday, February 25, 2015 at 2:12:09 AM UTC+5:30, Dave Angel wrote:
> > On 02/24/2015 02:57 PM, Laura Creighton wrote:
> > > Dave Angel
> > > are you another Native English speaker living in a world where ASCII
> > > is enough?
> > 
> > I'm a native English speaker, and 7 bits is not nearly enough.  Even if 
> > I didn't currently care, I have some history:
> > 
> > No.  CDC display code is enough. Who needs lowercase?
> > 
> > No.  Baudot code is enough.
> > 
> > No, EBCDIC is good enough.  Who cares about other companies.
> > 
> > No, the "golf-ball" only holds this many characters.  If we need more, 
> > we can just get the operator to switch balls in the middle of printing.
> > 
> > No. 2 digit years is enough.  This world won't last till the millennium 
> > anyway.
> > 
> > No.  2k is all the EPROM you can have.  Your code HAS to fit in it, and 
> > only 1.5k RAM.
> > 
> > No.  640k is more than anyone could need.
> > 
> > No, you cannot use a punch card made on a model 26 keypunch in the same 
> > deck as one made on a model 29.  Too bad, many of the codes are 
> > different.  (This one cost me travel back and forth between two 
> > different locations with different model keypunches)
> > 
> > No. 8 bits is as much as we could ever use for characters.  Who could 
> > possibly need names or locations outside of this region?  Or from 
> > multiple places within it?
> > 
> > 35 years ago I helped design a serial terminal that "spoke" Chinese, 
> > using a two-byte encoding.  But a single worldwide standard didn't come 
> > until much later, and I cheered Unicode when it was finally unveiled.
> > 
> > I've worked with many printers that could only print 70 or 80 unique 
> > characters.  The laser printer, and even the matrix printer are 
> > relatively recent inventions.
> 
> Wrote something up on why we should stop using ASCII:
> http://blog.languager.org/2015/02/universal-unicode.html

Dave's list above of instances of 'poverty is a good idea' turning out stupid 
and narrow-minded in hindsight is neat.  Thought I'd ack that explicitly.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Chris Angelico

On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody  wrote:
> Wrote something up on why we should stop using ASCII:
> http://blog.languager.org/2015/02/universal-unicode.html

From that post:

"""
5.1 Gibberish

When going from the original 2-byte unicode (around version 3?) to the
one having supplemental planes, the unicode consortium added blocks
such as

* Egyptian hieroglyphs
* Cuneiform
* Shavian
* Deseret
* Mahjong
* Klingon

To me (a layman) it looks unprofessional – as though they are playing
games – that billions of computing devices, each having billions of
storage words should have their storage wasted on blocks such as
these.
"""

The shift from Unicode as a 16-bit code to having multiple planes came
in with Unicode 2.0, but the various blocks were assigned separately:
* Egyptian hieroglyphs: Unicode 5.2
* Cuneiform: Unicode 5.0
* Shavian: Unicode 4.0
* Deseret: Unicode 3.1
* Mahjong Tiles: Unicode 5.1
* Klingon: Not part of any current standard

However, I don't think historians will appreciate you calling all of
these "gibberish". To adequately describe and discuss old texts
without these Unicode blocks, we'd have to either do everything with
images, or craft some kind of reversible transliteration system and
have dedicated software to render the texts on screen. Instead, what
we have is a well-known and standardized system for transliterating
all of these into numbers (code points), and rendering them becomes a
simple matter of installing an appropriate font.

Also, how does assigning meanings to codepoints "waste storage"? As
soon as Unicode 2.0 hit and 16-bit code units stopped being
sufficient, everyone needed to allocate storage - either 32 bits per
character, or some other system - and the fact that some codepoints
were unassigned had absolutely no impact on that. This is decidedly
NOT unprofessional, and it's not wasteful either.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Chris Angelico

On Fri, Feb 27, 2015 at 12:12 AM, m  wrote:
> W dniu 25.02.2015 21:45, Mark Lawrence pisze:
>>
>> http://www.slideshare.net/pydanny/python-worst-practices
>>
>> Any that should be added to this list?  Any that be removed as not that
>> bad?
>>
>
>
> I disagree with slide 16. If I wanted to use long variable names, I would
> still code in Java.

Clearly you aren't bothered by ambiguities, given that your name is
"m". You're lower-case m, and the James Bond character is upper-case
M... yeah, this isn't going to be a problem, with seven billion people
on the planet!

In case it's not obvious from slide 17, the author is advocating
neither the ridiculously short, nor the ridiculously long. This is a
topic that you could go into great detail on, but a general rule of
thumb is that short names go with short-lived variables, and longer
names go with large-scope variables. [1] So your function names
shouldn't be single letters, but your loop counters can and should be
short:

def discard_all_spam():
for msg in self.messages:
if msg.is_spam(): ms.discard()

And of course, the use of "i" as an integer loop index dates back so
far and is so well known that you don't need anything else:

def get_password():
for i in range(4):
if i: print("%d wrong tries...")
s = input("What's the password? ")
if validate_password(s): return s
print("Too many wrong tries, go away.")

This isn't Java coding.

ChrisA

[1] Yes, Python doesn't have variables per se. But how else am I
supposed to differentiate between the name and the concept of a name
binding?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Grant Edwards

On 2015-02-26, Ben Finney  wrote:
> Chris Angelico  writes:
>
>> IMO the whole system of boolean logic in shell scripts is a massive
>> pile of hacks.
>
> Agreed. It bears all the hallmarks of a system which has been
> extended to become a complete programming language only with extreme
> reluctance on its part.
>
> I continue to be impressed by how capable and powerful Unix shell is
> as a full programming language. Especially it is sorely missed on
> other OSes which lack a capable shell.
>
> But it could never be called “elegant”.

Unless you've spent all day working on PHP code.

-- 
Grant Edwards   grant.b.edwardsYow! All of life is a blur
  at   of Republicans and meat!
  gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Is anyone else unable to log into the bug tracker?

2015-02-26 Thread Malik Rumi

On Friday, January 9, 2015 at 7:49:09 PM UTC-6, Steven D'Aprano wrote:
> I'm having trouble logging into the bug tracker. Is anyone else having the
> same problem, that is, your user name and password worked earlier but
> doesn't work now?
> 
> http://bugs.python.org/
> 
> (Yes, I've checked the capslock key.)
> 
> 
> 
> Before I request a new password, I want to check whether it is me or
> everyone.
> 
> 
> -- 
> Steven



On Friday, January 9, 2015 at 7:49:09 PM UTC-6, Steven D'Aprano wrote:
> I'm having trouble logging into the bug tracker. Is anyone else having the
> same problem, that is, your user name and password worked earlier but
> doesn't work now?
> 
> http://bugs.python.org/
> 
> (Yes, I've checked the capslock key.)
> 
> 
> 
> Before I request a new password, I want to check whether it is me or
> everyone.
> 
> 
> -- 
> Steven



On Friday, January 9, 2015 at 7:49:09 PM UTC-6, Steven D'Aprano wrote:
> I'm having trouble logging into the bug tracker. Is anyone else having the
> same problem, that is, your user name and password worked earlier but
> doesn't work now?
> 
> http://bugs.python.org/
> 
> (Yes, I've checked the capslock key.)
> 
> 
> 
> Before I request a new password, I want to check whether it is me or
> everyone.
> 
> 
> -- 
> Steven

I am having this problem, even after I requested a new password. All I get is 
'invalid login'. How did you resolve? Thx.
-- 
https://mail.python.org/mailman/listinfo/python-list

Windows permission error, 64 bit, psycopg2, python 3.4.2

2015-02-26 Thread Malik Rumi

I am one of those struggling with compile issues with python on 64 bit windows. 
I have not been able to get the solutions mentioned on Stack Overflow to work 
because installing Windows SDK 7.1 fails for me. 

So I stumbled across a precompiled psycopg2, and that reported that it worked, 
but then I got two permission errors. Then I read that this was a bug in python 
(issue 14252) that had been fixed, but I don't think this is the same error. 
That one specifically refers to subprocess.py and I don't have that in my 
traceback.  I have v3.4.2. On top of everything else, despite requesting a new 
password, all I get from the big tracker is 'invalid login'. 

In any event, running "import psycopg2" returns 'import error, no module named 
psycopg2'.


Microsoft Windows [Version 6.3.9600]
(c) 2013 Microsoft Corporation. All rights reserved.

C:\Users\Semantic>pip install git+https://github.com/nwcell/psycopg2-windows.git
@win64-py34#egg=psycopg2
Downloading/unpacking psycopg2 from git+https://github.com/nwcell/psycopg2-windo
ws.git@win64-py34
  Cloning https://github.com/nwcell/psycopg2-windows.git (to win64-py34) to c:\u
sers\semantic\appdata\local\temp\pip_build_semantic\psycopg2
  Running setup.py (path:C:\Users\Semantic\AppData\Local\Temp\pip_build_Semantic
\psycopg2\setup.py) egg_info for package psycopg2
C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution opt
ion: 'summary'
  warnings.warn(msg)

Installing collected packages: psycopg2
  Running setup.py install for psycopg2
C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution opt
ion: 'summary'
  warnings.warn(msg)

Successfully installed psycopg2
Cleaning up...
  Exception:
Traceback (most recent call last):
  File "C:\Python34\lib\shutil.py", line 370, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 5] Access is denied: 'C:\\Users\\Semantic\\AppData\\L
ocal\\Temp\\pip_build_Semantic\\psycopg2\\.git\\objects\\pack\\pack-be4d3da4a06b
4c9ec4c06040dbf6685eeccca068.idx'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python34\lib\site-packages\pip\basecommand.py", line 122, in main
status = self.run(options, args)
  File "C:\Python34\lib\site-packages\pip\commands\install.py", line 302, in run

requirement_set.cleanup_files(bundle=self.bundle)
  File "C:\Python34\lib\site-packages\pip\req.py", line 1333, in cleanup_files
rmtree(dir)
  File "C:\Python34\lib\site-packages\pip\util.py", line 43, in rmtree
onerror=rmtree_errorhandler)
  File "C:\Python34\lib\shutil.py", line 477, in rmtree
return _rmtree_unsafe(path, onerror)
  File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
_rmtree_unsafe(fullname, onerror)
  File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
_rmtree_unsafe(fullname, onerror)
  File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
_rmtree_unsafe(fullname, onerror)
  File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
_rmtree_unsafe(fullname, onerror)
  File "C:\Python34\lib\shutil.py", line 372, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
  File "C:\Python34\lib\site-packages\pip\util.py", line 53, in rmtree_errorhand
ler
(exctype is PermissionError and value.args[3] == 5) #python3.3
IndexError: tuple index out of range
-- 
https://mail.python.org/mailman/listinfo/python-list

EuroPython 2015: Launch preparations are underway

2015-02-26 Thread M.-A. Lemburg

The EuroPython Workgroups are busy preparing the launch of the
website. Just launched in mid-January, all workgroups (WGs) are fully
under steam by now, working hard to make EuroPython 2015 a fabulous
event.

http://ep2015.europython.eu/


Community building the conference
-

The *On-site Team WG* is doing a wonderful job getting us the best
possible deals in Bilbao, the *Web WG* is knee deep into code and
docker containers setting up the website, the *Marketing & Design WG*
working with the designers to create wonderful logos and brochures,
the *Program WG* contacting keynote speakers and creating the call for
proposals, the *Finance WG* building the budget and making sure the
conference stays affordable for everyone, the *Support WG* setting up
the online help desk to answer your questions, the *Communications WG*
preparing to create a constant stream of exciting news updates, the
*Administration WG* is managing the many accounts, contracts and
services needed to run the organization.

The *Financial Aid WG* and *Media WG* are preparing to start their
part of the conference organization later in March.

http://www.europython-society.org/workgroups

The WGs are all staffed with members from the ACPySS on-site team, the
EuroPython Society and volunteers from the EuroPython community to
drive the organization forward and we’re getting a lot done in a very
short time frame.


More help needed


We are very happy with the help we are getting from the community, but
there still is a lot more to be done. If you want to help us build a
great EuroPython conference, please consider joining one of the above
workgroups:

http://www.europython-society.org/workgroups

Stay tuned and be sure to follow the EuroPython Blog for updates on
the conference:

http://blog.europython.eu/

Enjoy,
-
EuroPython Society (EPS)
http://www.europython-society.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Installing PIL without internet access

2015-02-26 Thread Larry Martell

I have a host that has no access to the internet and I need to install
PIL on it. I have an identical host that is on the internet and I have
installed it there (with pip). Is there a way I can copy files from
the connected host to a flash drive and then copy them to the
unconnected host and have PIL working there? Which files would I copy
for that?

This is on CentOS 6.5, python 2.7

Thanks!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: EuroPython 2015: Launch preparations are underway

2015-02-26 Thread Chris Angelico

On Fri, Feb 27, 2015 at 2:16 AM, M.-A. Lemburg  wrote:
> [ a whole lot of relatively sane text ]

Sadly, this was not what I wanted to see, based on the subject line. I
wanted to know about the snake you guys were about to send into space!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Parallelization of Python on GPU?

2015-02-26 Thread Jason Swails

On Wed, 2015-02-25 at 18:35 -0800, John Ladasky wrote:
> I've been working with machine learning for a while.  Many of the
> standard packages (e.g., scikit-learn) have fitting algorithms which
> run in single threads.  These algorithms are not themselves
> parallelized.  Perhaps, due to their unique mathematical requirements,
> they cannot be paralleized.  
> 
> When one is investigating several potential models of one's data with
> various settings for free parameters, it is still sometimes possible
> to speed things up.  On a modern machine, one can use Python's
> multiprocessing.Pool to run separate instances of scikit-learn fits.
> I am currently using ten of the twelve 3.3 GHz CPU cores on my machine
> to do just that.  And I can still browse the web with no observable
> lag.  :^)
> 
> Still, I'm waiting hours for jobs to finish.  Support vector
> regression fitting is hard.
> 
> What I would REALLY like to do is to take advantage of my GPU.  My
> NVidia graphics card has 1152 cores and a 1.0 GHz clock.  I wouldn't
> mind borrowing a few hundred of those GPU cores at a time, and see
> what they can do.  In theory, I calculate that I can speed up the job
> by another five-fold.
> 
> The trick is that each process would need to run some PYTHON code, not
> CUDA or OpenCL.  The child process code isn't particularly fancy.  (I
> should, for example, be able to switch that portion of my code to
> static typing.)
> 
> What is the most effective way to accomplish this task?

GPU computing is a lot more than simply saying "run this on a GPU".  To
realize the performance gains promised by a GPU, you need to tailor your
algorithms to take advantage of their hardware... SIMD reigns supreme
where thread divergence and branching are far more expensive than they
are in CPU computing.  So even if you decide to somehow translate your
Python code into a CUDA kernel, there is a good chance that you will be
woefully disappointed in the resulting speedup (or even moreso if you
actually get a slowdown :)).  For example, a simple reduction is more
expensive on a GPU than it is on a CPU for small arrays.  A dot product,
for example, has a part that's super fast on the GPU (element-by-element
multiplication), and then a part that gets a lot slower (summing up all
elements of the resulting multiplication).  Each core on the GPU is a
lot slower than a CPU (which is why a 1000-CUDA-core GPU doesn't run
anywhere near 1000x faster than a CPU), so you really only get gains
when they can all work efficiently together.

Another example -- matrix multiplies are *fast*.  Diagonalizations are
slow (which is why in my field where diagonalizations are common
requirements, they are often done on the CPU while *building* the matrix
is done on the GPU).
> 
> I came across a reference to a package called "Urutu" which may be
> what I need, however it doesn't look like it is widely supported.

Urutu seems to be built on PyCUDA and PyOpenCL (which are both written
by the same person; Andreas Kloeckner at UIUC in the United States).

Another package I would suggest looking into is numba, from Continuum
Analytics: https://github.com/numba/numba.  Unlike Urutu, their package
is built on LLVM and Python bindings they've written to implement
numpy-aware JIT capabilities.  I believe they also permit compiling down
to a GPU kernel through LLVM.  One downside I've experienced with that
package is that LLVM does not yet have a stable API (as I understand
it), so they often lag behind support for the latest versions of LLVM.
> 
> I would love it if the Python developers themselves added the ability
> to spawn GPU processes to the Multiprocessing module!

I would be stunned if this actually happened.  If you're worried about
performance, you get at least an order of magnitude performance boost by
going to numpy or writing the kernel directly in C or Fortran.  CPython
itself just isn't structured to run on a GPU... maybe pypy will tackle
that at some point in the probably-distant future.

All the best,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher

-- 
https://mail.python.org/mailman/listinfo/python-list

ANN: Wing IDE 5.1.2 released

2015-02-26 Thread Wingware


Hi,

Wingware has released version 5.1.2 of Wing IDE, our cross-platform 
integrated development environment for the Python programming language.


Wing IDE features a professional code editor with vi, emacs, visual 
studio, and other key bindings, auto-completion, call tips, 
context-sensitive auto-editing, goto-definition, find uses, refactoring, 
a powerful debugger, version control, unit testing, search, project 
management, and many other features.


This minor release includes the following improvements:

  Support for recent Google App Engine versions
  Expanded and improved static analysis for PyQt
  Added class and instance attributes to the Find Symbol dialog
  Support recursive invocation of snippets, auto-invocation arg entry, 
and field-based auto-editing operations (e.g. :try applied to a selected 
range)

  Support for python3-pylint
  Code sign all exe, dll, and pyd files on Windows
  Fix a number of child process debugging scenarios
  Fix source assistant formatting of PEP287 fields with long fieldname
  Fix indent level for pasted text after single undo for indent adjustment
  Fix introduce variable refactoring and if (exp): statements
  About 12 other bug fixes; see 
http://wingware.com/pub/wingide/5.1.2/CHANGELOG.txt


What's New in Wing 5.1:

Wing IDE 5.1 adds multi-process and child process debugging, syntax 
highlighting in the shells, persistent time-stamped unit test results, 
auto-conversion of indents on paste, an XCode keyboard personality, 
support for Flask, Django 1.7 & recent Google App Engine versions, 
improved auto-completion for PyQt, recursive snippet invocation, and 
many other minor features and improvements.  For details see 
http://wingware.com/news/2015-02-25


Free trial: http://wingware.com/wingide/trial
Downloads: http://wingware.com/downloads
Feature list: http://wingware.com/wingide/features
Sales: http://wingware.com/store/purchase
Upgrades: https://wingware.com/store/upgrade

Questions?  Don't hesitate to email us at supp...@wingware.com.

Thanks,

--

Stephan Deibel
Wingware | Python IDE

The Intelligent Development Environment for Python Programmers

wingware.com


--
https://mail.python.org/mailman/listinfo/python-list

Re: Parallelization of Python on GPU?

2015-02-26 Thread Jason Swails

On Thu, 2015-02-26 at 14:02 +1100, Steven D'Aprano wrote:
> John Ladasky wrote:
> 
> 
> > What I would REALLY like to do is to take advantage of my GPU.
> 
> I can't help you with that, but I would like to point out that GPUs 
> typically don't support IEE-754 maths, which means that while they are 
> likely significantly faster, they're also likely significantly less 
> accurate. Any any two different brands/models of GPU are likely to give 
> different results. (Possibly not *very* different, but considering the mess 
> that floating point maths was prior to IEEE-754, possibly *very* different.)

This hasn't been true in NVidia GPUs manufactured since ca. 2008.

> Personally, I wouldn't trust GPU floating point for serious work. Maybe for 
> quick and dirty exploration of the data, but I'd then want to repeat any 
> calculations using the main CPU before using the numbers anywhere :-)

There is a *huge* dash toward GPU computing in the scientific computing
sector.  Since I started as a graduate student in computational
chemistry/physics in 2008, I watched as state-of-the-art supercomputers
running tens of thousands to hundreds of thousands of cores were
overtaken in performance by a $500 GPU (today the GTX 780 or 980) you
can put in a desktop.  I went from running all of my calculations on a
CPU cluster in 2009 to running 90% of my calculations on a GPU by the
time I graduated in 2013... and for people without as ready access to
supercomputers as myself the move was even more pronounced.

This work is very serious, and numerical precision is typically of
immense importance.  See, e.g.,
http://www.sciencedirect.com/science/article/pii/S0010465512003098 and
http://pubs.acs.org/doi/abs/10.1021/ct400314y

In our software, we can run simulations on a GPU or a CPU and the
results are *literally* indistinguishable.  The transition to GPUs was
accompanied by a series of studies that investigated precisely your
concerns... we would never have started using GPUs if we didn't trust
GPU numbers as much as we did from the CPU.

And NVidia is embracing this revolution (obviously) -- they are putting
a lot of time, effort, and money into ensuring the success of GPU high
performance computing.  It is here to stay in the immediate future, and
refusing to use the technology will leave those that *could* benefit
from it at a severe disadvantage. (That said, GPUs aren't good at
everything, and CPUs are also here to stay.)

And GPU performance gains are outpacing CPU performance gains -- I've
seen about two orders of magnitude improvement in computational
throughput over the past 6 years through the introduction of GPU
computing and improvements in GPU hardware.

All the best,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher

-- 
https://mail.python.org/mailman/listinfo/python-list

Building C++ modules for python using GNU autotools, automake, whatever

2015-02-26 Thread af300wsm

Hi,

I'm a complete neophyte to the whole use of GNU autotools/automake/auto... .  
(I'm not sure what it should be called anymore.)  Regardless, I'm porting a 
library project, for which I'm a team member, to using this toolset for 
building in Linux.  I'm to the point now of writing the Makefile.am file for 
the actual library.  (There are several other static libraries compiled first 
that are sucked into this shared object file.)  

I found some references here: 
http://www.gnu.org/savannah-checkouts/gnu/automake/manual/html_node/Python.html,
 which seemed to be just what I was after.  However, I've got a big question 
about a file named "module.la" instead of "module.so" which is what we compile 
it to now.

I guess I should have mentioned some background.  Currently, we build this tool 
through some homegrown makefiles.  This has worked, but distribution is 
difficult and our product must now run on an embedded platform (so building it 
cleanly requires the use of autotools).  

Basically, I need this thing to install to /usr/lib/python2.6/site-packages 
when the user invokes "make install".  I thought the variables and primaries 
discussed at the link above were what I needed.  However, what is a "*.la"?  
I'm reading up on libtool now, but will it function the same way as a *.so?  

I need pointers on where to go from here.

Thanks,
Andy
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Is anyone else unable to log into the bug tracker?

2015-02-26 Thread Steven D'Aprano

Malik Rumi wrote:

> On Friday, January 9, 2015 at 7:49:09 PM UTC-6, Steven D'Aprano wrote:
>> I'm having trouble logging into the bug tracker. Is anyone else having
>> the same problem, that is, your user name and password worked earlier but
>> doesn't work now?
>> 
>> http://bugs.python.org/
>> 
>> (Yes, I've checked the capslock key.)
[...]
> I am having this problem, even after I requested a new password. All I get
> is 'invalid login'. How did you resolve? Thx.

I was suffering from a PEBCAK error, and was using the wrong password. Once
I started using the right one, it just worked for me.

I seem to recall that you need to accept cookies for the bugtracker to log
you in. Try that and see if it helps.

Sorry that I can't be of more help.

-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Building C++ modules for python using GNU autotools, automake, whatever

2015-02-26 Thread Jason Swails

On Thu, 2015-02-26 at 07:57 -0800, af300...@gmail.com wrote:
> Hi,
> 
> I'm a complete neophyte to the whole use of GNU
> autotools/automake/auto... .  (I'm not sure what it should be called
> anymore.)  Regardless, I'm porting a library project, for which I'm a
> team member, to using this toolset for building in Linux.  I'm to the
> point now of writing the Makefile.am file for the actual library.
> (There are several other static libraries compiled first that are
> sucked into this shared object file.)  
> 
> I found some references here:
> http://www.gnu.org/savannah-checkouts/gnu/automake/manual/html_node/Python.html,
>  which seemed to be just what I was after.  However, I've got a big question 
> about a file named "module.la" instead of "module.so" which is what we 
> compile it to now.

I certainly hope module.la is not what it gets compiled to.  Open it up
with a text editor :).  It's just basically a description of the library
that libtool makes use of.  In the projects that I build, the .la files
are all associated with a .a archive or a .so (/.dylib for Macs).
Obviously, static archives won't work for Python (and, in particular, I
believe you need to compile all of the objects as position independent
code, so you need to make sure the appropriate PIC flag is given to the
compiler... for g++ that would be -fPIC).
> 
> I guess I should have mentioned some background.  Currently, we build
> this tool through some homegrown makefiles.  This has worked, but
> distribution is difficult and our product must now run on an embedded
> platform (so building it cleanly requires the use of autotools).  
> 
> Basically, I need this thing to install
> to /usr/lib/python2.6/site-packages when the user invokes "make
> install".  I thought the variables and primaries discussed at the link
> above were what I needed.  However, what is a "*.la"?  I'm reading up
> on libtool now, but will it function the same way as a *.so?

To libtool, yes... provided that you *also* have the .so with the same
base name as the .la.  I don't think compilers themselves make any use
of .la files, though.

HTH,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Parallelization of Python on GPU?

2015-02-26 Thread Sturla Molden

If you are doing SVM regression with scikit-learn you are using libSVM.
There is a CUDA accelerated version of this C library here:
http://mklab.iti.gr/project/GPU-LIBSVM

You can presumably reuse the wrapping code from scikit-learn.

Sturla


John Ladasky  wrote:
> I've been working with machine learning for a while.  Many of the
> standard packages (e.g., scikit-learn) have fitting algorithms which run
> in single threads.  These algorithms are not themselves parallelized. 
> Perhaps, due to their unique mathematical requirements, they cannot be 
> paralleized.  
> 
> When one is investigating several potential models of one's data with
> various settings for free parameters, it is still sometimes possible to
> speed things up.  On a modern machine, one can use Python's
> multiprocessing.Pool to run separate instances of scikit-learn fits.  I
> am currently using ten of the twelve 3.3 GHz CPU cores on my machine to
> do just that.  And I can still browse the web with no observable lag.  :^)
> 
> Still, I'm waiting hours for jobs to finish.  Support vector regression 
> fitting is hard.
> 
> What I would REALLY like to do is to take advantage of my GPU.  My NVidia
> graphics card has 1152 cores and a 1.0 GHz clock.  I wouldn't mind
> borrowing a few hundred of those GPU cores at a time, and see what they
> can do.  In theory, I calculate that I can speed up the job by another 
> five-fold.
> 
> The trick is that each process would need to run some PYTHON code, not
> CUDA or OpenCL.  The child process code isn't particularly fancy.  (I
> should, for example, be able to switch that portion of my code to static 
> typing.)
> 
> What is the most effective way to accomplish this task?
> 
> I came across a reference to a package called "Urutu" which may be what I
> need, however it doesn't look like it is widely supported.
> 
> I would love it if the Python developers themselves added the ability to
> spawn GPU processes to the Multiprocessing module!
> 
> Thanks for any advice and comments.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: GDAL Installation in Enthought Python Distribution

2015-02-26 Thread Terry Reedy


On 2/26/2015 2:47 AM, Leo Kris Palao wrote:


Would like to request how to install GDAL in my Enthought Python
Distribution (64-bit).


The best place to ask about the Enthought Python Distribution is a list 
devoted to the E. P. D.



--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Re: Is anyone else unable to log into the bug tracker?

2015-02-26 Thread Skip Montanaro

I have not had problems, but I use the Google login (Open ID, I presume)
option.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Sam Raker

I'm 100% in favor of expanding Unicode until the sun goes dark. Doing so helps 
solve the problems affecting speakers of "underserved" languages--access and 
language preservation. Speakers of Mongolian, Cherokee, Georgian, etc. all 
deserve to be able to interact with technology in their native languages as 
much as we speakers of ASCII-friendly languages do. Unicode support also makes 
writing papers on, dictionaries of, and new texts in such languages much 
easier, which helps the fight against language extinction, which is a sadly 
pressing issue.

Also, like, computers are big. Get an external drive for your high-resolution 
PDF collection of Medieval manuscripts if you feel like you're running out of 
space. A few extra codepoints aren't going to be the straw that breaks the 
camel's back.


On Thursday, February 26, 2015 at 8:24:34 AM UTC-5, Chris Angelico wrote:
> On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody  wrote:
> > Wrote something up on why we should stop using ASCII:
> > http://blog.languager.org/2015/02/universal-unicode.html
> 
> >From that post:
> 
> """
> 5.1 Gibberish
> 
> When going from the original 2-byte unicode (around version 3?) to the
> one having supplemental planes, the unicode consortium added blocks
> such as
> 
> * Egyptian hieroglyphs
> * Cuneiform
> * Shavian
> * Deseret
> * Mahjong
> * Klingon
> 
> To me (a layman) it looks unprofessional - as though they are playing
> games - that billions of computing devices, each having billions of
> storage words should have their storage wasted on blocks such as
> these.
> """
> 
> The shift from Unicode as a 16-bit code to having multiple planes came
> in with Unicode 2.0, but the various blocks were assigned separately:
> * Egyptian hieroglyphs: Unicode 5.2
> * Cuneiform: Unicode 5.0
> * Shavian: Unicode 4.0
> * Deseret: Unicode 3.1
> * Mahjong Tiles: Unicode 5.1
> * Klingon: Not part of any current standard
> 
> However, I don't think historians will appreciate you calling all of
> these "gibberish". To adequately describe and discuss old texts
> without these Unicode blocks, we'd have to either do everything with
> images, or craft some kind of reversible transliteration system and
> have dedicated software to render the texts on screen. Instead, what
> we have is a well-known and standardized system for transliterating
> all of these into numbers (code points), and rendering them becomes a
> simple matter of installing an appropriate font.
> 
> Also, how does assigning meanings to codepoints "waste storage"? As
> soon as Unicode 2.0 hit and 16-bit code units stopped being
> sufficient, everyone needed to allocate storage - either 32 bits per
> character, or some other system - and the fact that some codepoints
> were unassigned had absolutely no impact on that. This is decidedly
> NOT unprofessional, and it's not wasteful either.
> 
> ChrisA

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Parallelization of Python on GPU?

2015-02-26 Thread Sturla Molden

GPU computing is great if you have the following:

1. Your data structures are arrays floating point numbers.
2. You have a data-parallel problem.
3. You are happy with single precision.
4. You have time to code erything in CUDA or OpenCL.
5. You have enough video RAM to store your data.

For Python the easiest solution is to use Numba Pro.

Sturla


Jason Swails  wrote:
> On Thu, 2015-02-26 at 14:02 +1100, Steven D'Aprano wrote:
>> John Ladasky wrote:
>> 
>> 
>>> What I would REALLY like to do is to take advantage of my GPU.
>> 
>> I can't help you with that, but I would like to point out that GPUs 
>> typically don't support IEE-754 maths, which means that while they are 
>> likely significantly faster, they're also likely significantly less 
>> accurate. Any any two different brands/models of GPU are likely to give 
>> different results. (Possibly not *very* different, but considering the mess 
>> that floating point maths was prior to IEEE-754, possibly *very* different.)
> 
> This hasn't been true in NVidia GPUs manufactured since ca. 2008.
> 
>> Personally, I wouldn't trust GPU floating point for serious work. Maybe for 
>> quick and dirty exploration of the data, but I'd then want to repeat any 
>> calculations using the main CPU before using the numbers anywhere :-)
> 
> There is a *huge* dash toward GPU computing in the scientific computing
> sector.  Since I started as a graduate student in computational
> chemistry/physics in 2008, I watched as state-of-the-art supercomputers
> running tens of thousands to hundreds of thousands of cores were
> overtaken in performance by a $500 GPU (today the GTX 780 or 980) you
> can put in a desktop.  I went from running all of my calculations on a
> CPU cluster in 2009 to running 90% of my calculations on a GPU by the
> time I graduated in 2013... and for people without as ready access to
> supercomputers as myself the move was even more pronounced.
> 
> This work is very serious, and numerical precision is typically of
> immense importance.  See, e.g.,
> http://www.sciencedirect.com/science/article/pii/S0010465512003098 and
> http://pubs.acs.org/doi/abs/10.1021/ct400314y
> 
> In our software, we can run simulations on a GPU or a CPU and the
> results are *literally* indistinguishable.  The transition to GPUs was
> accompanied by a series of studies that investigated precisely your
> concerns... we would never have started using GPUs if we didn't trust
> GPU numbers as much as we did from the CPU.
> 
> And NVidia is embracing this revolution (obviously) -- they are putting
> a lot of time, effort, and money into ensuring the success of GPU high
> performance computing.  It is here to stay in the immediate future, and
> refusing to use the technology will leave those that *could* benefit
> from it at a severe disadvantage. (That said, GPUs aren't good at
> everything, and CPUs are also here to stay.)
> 
> And GPU performance gains are outpacing CPU performance gains -- I've
> seen about two orders of magnitude improvement in computational
> throughput over the past 6 years through the introduction of GPU
> computing and improvements in GPU hardware.
> 
> All the best,
> Jason

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Installing PIL without internet access

2015-02-26 Thread MRAB


On 2015-02-26 15:23, Larry Martell wrote:

I have a host that has no access to the internet and I need to install
PIL on it. I have an identical host that is on the internet and I have
installed it there (with pip). Is there a way I can copy files from
the connected host to a flash drive and then copy them to the
unconnected host and have PIL working there? Which files would I copy
for that?

This is on CentOS 6.5, python 2.7


Have a look here:

https://pip.pypa.io/en/latest/reference/pip_install.html#pip-install-options

It says that you can install from a downloaded file, e.g.:

pip install ./downloads/SomePackage-1.0.4.tar.gz

--
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Terry Reedy


On 2/26/2015 8:24 AM, Chris Angelico wrote:

On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody  wrote:

Wrote something up on why we should stop using ASCII:
http://blog.languager.org/2015/02/universal-unicode.html


I think that the main point of the post, that many Unicode chars are 
truly planetary rather than just national/regional, is excellent.



 From that post:

"""
5.1 Gibberish

When going from the original 2-byte unicode (around version 3?) to the
one having supplemental planes, the unicode consortium added blocks
such as

* Egyptian hieroglyphs
* Cuneiform
* Shavian
* Deseret
* Mahjong
* Klingon

To me (a layman) it looks unprofessional – as though they are playing
games – that billions of computing devices, each having billions of
storage words should have their storage wasted on blocks such as
these.
"""

The shift from Unicode as a 16-bit code to having multiple planes came
in with Unicode 2.0, but the various blocks were assigned separately:
* Egyptian hieroglyphs: Unicode 5.2
* Cuneiform: Unicode 5.0
* Shavian: Unicode 4.0
* Deseret: Unicode 3.1
* Mahjong Tiles: Unicode 5.1
* Klingon: Not part of any current standard


You should add emoticons, but not call them or the above 'gibberish'.
I think that this part of your post is more 'unprofessional' than the 
character blocks.  It is very jarring and seems contrary to your main point.



However, I don't think historians will appreciate you calling all of
these "gibberish". To adequately describe and discuss old texts
without these Unicode blocks, we'd have to either do everything with
images, or craft some kind of reversible transliteration system and
have dedicated software to render the texts on screen. Instead, what
we have is a well-known and standardized system for transliterating
all of these into numbers (code points), and rendering them becomes a
simple matter of installing an appropriate font.

Also, how does assigning meanings to codepoints "waste storage"? As
soon as Unicode 2.0 hit and 16-bit code units stopped being
sufficient, everyone needed to allocate storage - either 32 bits per
character, or some other system - and the fact that some codepoints
were unassigned had absolutely no impact on that. This is decidedly
NOT unprofessional, and it's not wasteful either.


I agree.

--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Rustom Mody

On Thursday, February 26, 2015 at 10:16:11 PM UTC+5:30, Sam Raker wrote:
> I'm 100% in favor of expanding Unicode until the sun goes dark. Doing so 
> helps solve the problems affecting speakers of "underserved" 
> languages--access and language preservation. Speakers of Mongolian, Cherokee, 
> Georgian, etc. all deserve to be able to interact with technology in their 
> native languages as much as we speakers of ASCII-friendly languages do. 
> Unicode support also makes writing papers on, dictionaries of, and new texts 
> in such languages much easier, which helps the fight against language 
> extinction, which is a sadly pressing issue.


Agreed -- Correcting the inequities caused by ASCII-bias is a good thing.

In fact the whole point of my post was to say just that by carving out and 
focussing on a 'universal' subset of unicode that is considerably larger than 
ASCII but smaller than unicode, we stand to reduce ASCII-bias.

As also other posts like
http://blog.languager.org/2014/04/unicoded-python.html
http://blog.languager.org/2014/05/unicode-in-haskell-source.html

However my example listed

> > * Egyptian hieroglyphs
> > * Cuneiform
> > * Shavian
> > * Deseret
> > * Mahjong
> > * Klingon

Ok Chris has corrected me re. Klingon-in-unicode. So lets drop that.
Of the others which do you thing is in 'underserved' category?

More generally which of 
http://en.wikipedia.org/wiki/Plane_%28Unicode%29#Supplementary_Multilingual_Plane
are underserved?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Installing PIL without internet access

2015-02-26 Thread Terry Reedy


On 2/26/2015 10:23 AM, Larry Martell wrote:

I have a host that has no access to the internet and I need to install
PIL on it. I have an identical host that is on the internet and I have
installed it there (with pip). Is there a way I can copy files from
the connected host to a flash drive and then copy them to the
unconnected host and have PIL working there? Which files would I copy
for that?

This is on CentOS 6.5, python 2.7


On Windows, I would
look in python27/Lib/site-packages for PIL and pil-dist-info directories 
and copy.

look in python27/script for pil*.py and copy


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Re: Parallelization of Python on GPU?

2015-02-26 Thread Terry Reedy


On 2/26/2015 10:06 AM, Jason Swails wrote:

On Thu, 2015-02-26 at 14:02 +1100, Steven D'Aprano wrote:

John Ladasky wrote:



What I would REALLY like to do is to take advantage of my GPU.


I can't help you with that, but I would like to point out that GPUs
typically don't support IEE-754 maths, which means that while they are
likely significantly faster, they're also likely significantly less
accurate. Any any two different brands/models of GPU are likely to give
different results. (Possibly not *very* different, but considering the mess
that floating point maths was prior to IEEE-754, possibly *very* different.)


This hasn't been true in NVidia GPUs manufactured since ca. 2008.


Personally, I wouldn't trust GPU floating point for serious work. Maybe for
quick and dirty exploration of the data, but I'd then want to repeat any
calculations using the main CPU before using the numbers anywhere :-)


There is a *huge* dash toward GPU computing in the scientific computing
sector.  Since I started as a graduate student in computational
chemistry/physics in 2008, I watched as state-of-the-art supercomputers
running tens of thousands to hundreds of thousands of cores were
overtaken in performance by a $500 GPU (today the GTX 780 or 980) you
can put in a desktop.  I went from running all of my calculations on a
CPU cluster in 2009 to running 90% of my calculations on a GPU by the
time I graduated in 2013... and for people without as ready access to
supercomputers as myself the move was even more pronounced.

This work is very serious, and numerical precision is typically of
immense importance.  See, e.g.,
http://www.sciencedirect.com/science/article/pii/S0010465512003098 and
http://pubs.acs.org/doi/abs/10.1021/ct400314y

In our software, we can run simulations on a GPU or a CPU and the
results are *literally* indistinguishable.  The transition to GPUs was
accompanied by a series of studies that investigated precisely your
concerns... we would never have started using GPUs if we didn't trust
GPU numbers as much as we did from the CPU.

And NVidia is embracing this revolution (obviously) -- they are putting
a lot of time, effort, and money into ensuring the success of GPU high
performance computing.  It is here to stay in the immediate future, and
refusing to use the technology will leave those that *could* benefit
from it at a severe disadvantage. (That said, GPUs aren't good at
everything, and CPUs are also here to stay.)

And GPU performance gains are outpacing CPU performance gains -- I've
seen about two orders of magnitude improvement in computational
throughput over the past 6 years through the introduction of GPU
computing and improvements in GPU hardware.


Thanks for the update.

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Chris Angelico

On Fri, Feb 27, 2015 at 4:02 AM, Terry Reedy  wrote:
> On 2/26/2015 8:24 AM, Chris Angelico wrote:
>>
>> On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody 
>> wrote:
>>>
>>> Wrote something up on why we should stop using ASCII:
>>> http://blog.languager.org/2015/02/universal-unicode.html
>
>
> I think that the main point of the post, that many Unicode chars are truly
> planetary rather than just national/regional, is excellent.

Agreed. Like you, though, I take exception at the "Gibberish" section.

Unicode offers us a number of types of character needed by linguists:

1) Letters[1] common to many languages, such as the unadorned Latin
and Cyrillic letters
2) Letters specific to one or very few languages, such as the Turkish dotless i
3) Diacritical marks, ready to be combined with various letters
4) Precomposed forms of various common "letter with diacritical" combinations
5) Other precomposed forms, eg ligatures and Hangul syllables
6) Symbols, punctuation, and various other marks
7) Spacing of various widths and attributes

Apart from #4 and #5, which could be avoided by using the decomposed
forms everywhere, each of these character types is vital. You can't
typeset a document without being able to adequately represent every
part of it. Then there are additional characters that aren't strictly
necessary, but are extremely convenient, such as the emoticon
sections. You can talk in text and still put in a nice little picture
of a globe, or the monkey-no-evil set, etc.

Most of these characters - in fact, all except #2 and maybe a few of
the diacritical marks - are used in multiple places/languages. Unicode
isn't about taking everyone's separate character sets and numbering
them all so we can reference characters from anywhere; if you wanted
that, you'd be much better off with something that lets you specify a
code page in 16 bits and a character in 8, which is roughly the same
size as Unicode anyway. What we have is, instead, a system that brings
them all together - LATIN SMALL LETTER A is U+0061 no matter whether
it's being used to write English, French, Malaysian, Turkish,
Croatian, Vietnamese, or Icelandic text. Unicode is truly planetary.

ChrisA

[1] I use the word "letter" loosely here; Chinese and Japanese don't
have a concept of letters as such, but their glyphs are still
represented.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Parallelization of Python on GPU?

2015-02-26 Thread John Ladasky

On Thursday, February 26, 2015 at 8:41:26 AM UTC-8, Sturla Molden wrote:
> If you are doing SVM regression with scikit-learn you are using libSVM.
> There is a CUDA accelerated version of this C library here:
> http://mklab.iti.gr/project/GPU-LIBSVM
> 
> You can presumably reuse the wrapping code from scikit-learn.
> 
> Sturla

Hi Sturla,  I recognize your name from the scikit-learn mailing list.  

If you look a few posts above yours in this thread, I am aware of gpu-libsvm.  
I don't know if I'm up to the task of reusing the scikit-learn wrapping code, 
but I am giving that option some serious thought.  It isn't clear to me that 
gpu-libsvm can handle both SVM and SVR, and I have need of both algorithms. 

My training data sets are around 5000 vectors long.  IF that graph on the 
gpu-libsvm web page is any indication of what I can expect from my own data (I 
note that they didn't specify the GPU card they're using), I might realize a 
20x increase in speed.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Parallelization of Python on GPU?

2015-02-26 Thread Jason Swails

On Thu, 2015-02-26 at 16:53 +, Sturla Molden wrote:
> GPU computing is great if you have the following:
> 
> 1. Your data structures are arrays floating point numbers.

It actually works equally great, if not better, for integers.

> 2. You have a data-parallel problem.

This is the biggest one, IMO. ^^^

> 3. You are happy with single precision.

NVidia GPUs have double-precision maths in hardware since compute
capability 1.2 (GTX 280).  That's ca. 2008.  In optimized CPU code, you
still get ~50% benefit going from double to single precision (it's
rarely ever that high, but 20-30% is commonplace in my experience of
optimized code).  It's admittedly a bigger hit on most GPUs, but there
are ways to work around it (e.g., fixed precision), and you can still do
double precision work where it's needed.  One of the articles I linked
previously demonstrates that a hybrid precision model (based on fixed
precision) provides exactly the same numerical stability as double
precision (which is much better than pure single precision) for that
application.

Double precision can often be avoided in many parts of a calculation,
using it only where those bits matter (like accumulators with
potentially small contributions, subtractions of two numbers of similar
magnitude, etc.).

> 4. You have time to code erything in CUDA or OpenCL.

This is the second biggest one, IMO. ^^^

> 5. You have enough video RAM to store your data.

Again, it can be worked around, but the frequent GPU->CPU xfers involved
if you can't fit everything on the GPU can be painstaking to limit its
potentially devastating effects on performance.

> 
> For Python the easiest solution is to use Numba Pro.

Agreed, although I've never actually tried PyCUDA before...

All the best,
Jason

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Rustom Mody

On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
> On 2/26/2015 8:24 AM, Chris Angelico wrote:
> > On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody wrote:
> >> Wrote something up on why we should stop using ASCII:
> >> http://blog.languager.org/2015/02/universal-unicode.html
> 
> I think that the main point of the post, that many Unicode chars are 
> truly planetary rather than just national/regional, is excellent.
> 
> >  From that post:
> >
> > """
> > 5.1 Gibberish
> >
> > When going from the original 2-byte unicode (around version 3?) to the
> > one having supplemental planes, the unicode consortium added blocks
> > such as
> >
> > * Egyptian hieroglyphs
> > * Cuneiform
> > * Shavian
> > * Deseret
> > * Mahjong
> > * Klingon
> >
> > To me (a layman) it looks unprofessional – as though they are playing
> > games – that billions of computing devices, each having billions of
> > storage words should have their storage wasted on blocks such as
> > these.
> > """
> >
> > The shift from Unicode as a 16-bit code to having multiple planes came
> > in with Unicode 2.0, but the various blocks were assigned separately:
> > * Egyptian hieroglyphs: Unicode 5.2
> > * Cuneiform: Unicode 5.0
> > * Shavian: Unicode 4.0
> > * Deseret: Unicode 3.1
> > * Mahjong Tiles: Unicode 5.1
> > * Klingon: Not part of any current standard
> 
> You should add emoticons, but not call them or the above 'gibberish'.

Emoticons (or is it emoji) seems to have some (regional?) takeup?? Dunno…
In any case I'd like to stay clear of political(izable) questions

> I think that this part of your post is more 'unprofessional' than the 
> character blocks.  It is very jarring and seems contrary to your main point.

Ok I need a word for
1. I have no need for this
2. 99.9% of the (living) on this planet also have no need for this

> 
> > However, I don't think historians will appreciate you calling all of
> > these "gibberish". To adequately describe and discuss old texts
> > without these Unicode blocks, we'd have to either do everything with
> > images, or craft some kind of reversible transliteration system and
> > have dedicated software to render the texts on screen. Instead, what
> > we have is a well-known and standardized system for transliterating
> > all of these into numbers (code points), and rendering them becomes a
> > simple matter of installing an appropriate font.
> >
> > Also, how does assigning meanings to codepoints "waste storage"? As
> > soon as Unicode 2.0 hit and 16-bit code units stopped being
> > sufficient, everyone needed to allocate storage - either 32 bits per
> > character, or some other system - and the fact that some codepoints
> > were unassigned had absolutely no impact on that. This is decidedly
> > NOT unprofessional, and it's not wasteful either.
> 
> I agree.

I clearly am more enthusiastic than knowledgeable about unicode.
But I know my basic CS well enough (as I am sure you and Chris also do)

So I dont get how 4 bytes is not more expensive than 2.
Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits
You could use a clever representation like UTF-8 or FSR.
But I dont see how you can get out of this that full-unicode costs more than
exclusive BMP.

eg consider the case of 32 vs 64 bit executables.
The 64 bit executable is generally larger than the 32 bit one
Now consider the case of a machine that has say 2GB RAM and a 64-bit processor.
You could -- I think -- make a reasonable case that all those all-zero 
hi-address-words are 'waste'.

And youve got the general sense best so far:
> I think that the main point of the post, that many Unicode chars are
> truly planetary rather than just national/regional, 

And if the general tone/tenor of what I have written is probably not getting 
across by some words (like 'gibberish'?) so I'll try and reword.

However let me try and clarify that the whole of section 5 is 'iffy' with 5.1 
being only more extreme.  Ive not written these in because the point of that
post is not to criticise unicode but to highlight the universal(isable) parts.

Still if I were to expand on the criticisms here are some examples:

Math-Greek: Consider the math-alpha block
http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block

Now imagine a beginning student not getting the difference between font, glyph,
character.  To me this block represents this same error cast into concrete and
dignified by the (supposed) authority of the unicode consortium.

There are probably dozens of other such stupidities like distinguishing kelvin 
K from latin K as if that is the business of the unicode consortium

My real reservations about unicode come from their work in areas that I happen 
to know something about

Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪ ♫ is 
perhaps ok
However all this stuff http://xahlee.info/comp/unicode_music_symbols.html
makes no sense (to me) given tha

Gaussian process regression

2015-02-26 Thread jaykim . huijae

Hi,

I am trying to use Gaussian process regression for Near Infrared spectra. I 
have reference data(spectra), concentrations of reference data and sample data, 
and I am trying to predict concentrations of sample data. Here is my code.

from sklearn.gaussian_process import GaussianProcess

gp = GaussianProcess()

gp.fit(reference, concentration)

concentration_pred = gp.predict(sample)


The results always gave me the same concentration even though I used different 
sample data. When I used some parts of reference data as sample data, it 
predicted concentration well. But whenever I use different data than reference 
data, it always gave me the same concentration. 
Can I get some help with this problem? What am I doing wrong?
I would appreciate any help.

Thanks,
Jay
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Building C++ modules for python using GNU autotools, automake, whatever

2015-02-26 Thread af300wsm

On Thursday, February 26, 2015 at 9:35:12 AM UTC-7, Jason Swails wrote:
> On Thu, 2015-02-26 at 07:57 -0800, af300wsm wrote:
> > Hi,
> > 
> > I'm a complete neophyte to the whole use of GNU
> > autotools/automake/auto... .  (I'm not sure what it should be called
> > anymore.)  Regardless, I'm porting a library project, for which I'm a
> > team member, to using this toolset for building in Linux.  I'm to the
> > point now of writing the Makefile.am file for the actual library.
> > (There are several other static libraries compiled first that are
> > sucked into this shared object file.)  
> > 
> > I found some references here:
> > http://www.gnu.org/savannah-checkouts/gnu/automake/manual/html_node/Python.html,
> >  which seemed to be just what I was after.  However, I've got a big 
> > question about a file named "module.la" instead of "module.so" which is 
> > what we compile it to now.
> 
> I certainly hope module.la is not what it gets compiled to.  Open it up
> with a text editor :).  It's just basically a description of the library

Fascinating!  This is all new territory for me.  I'm used these tools for a 
number of years, of course, as I've run "./configure && make && make install" 
many times.  Now things are starting to make more sense.

> that libtool makes use of.  In the projects that I build, the .la files
> are all associated with a .a archive or a .so (/.dylib for Macs).
> Obviously, static archives won't work for Python (and, in particular, I
> believe you need to compile all of the objects as position independent
> code, so you need to make sure the appropriate PIC flag is given to the
> compiler... for g++ that would be -fPIC).

We are compiling all of our code with -fPIC.  I looked over the final build 
line and I see that a module.so was placed in .libs.  I looked in that 
directory and actually the module is named "module.so.0.0.0" and there is a 
symbolic link "module.so" which points to that.  This is cool stuff.


Thanks for the clarification on things.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: ANN: Wing IDE 5.1.2 released

2015-02-26 Thread Jim Mooney

Hey, can I run Py 2.7 and 3.4 side by side without a lot of hassle, using Wing? 
I run both since I'm migranting and so far the free IDEs just seem to choke on 
that.
-- 
https://mail.python.org/mailman/listinfo/python-list

requesting you all to please guide me , which tutorials is best to learn redis database

2015-02-26 Thread Jai

hello all, 

i want to learn redis database and its use via python , please  guide me 
which tutorials i should be study, so that i can learn it in good way


I search this on google but i am little confuse, so please help me 

thank you jai 
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Installing PIL without internet access

2015-02-26 Thread Larry Martell

On Thu, Feb 26, 2015 at 11:57 AM, MRAB  wrote:
> On 2015-02-26 15:23, Larry Martell wrote:
>>
>> I have a host that has no access to the internet and I need to install
>> PIL on it. I have an identical host that is on the internet and I have
>> installed it there (with pip). Is there a way I can copy files from
>> the connected host to a flash drive and then copy them to the
>> unconnected host and have PIL working there? Which files would I copy
>> for that?
>>
>> This is on CentOS 6.5, python 2.7
>>
> Have a look here:
>
> https://pip.pypa.io/en/latest/reference/pip_install.html#pip-install-options
>
> It says that you can install from a downloaded file, e.g.:
>
> pip install ./downloads/SomePackage-1.0.4.tar.gz

Thanks for the reply. This is very useful info. But I have another
issue I didn't mention. The system python is 2.6,
but I need the 2.7 version. So anything I install with pip will get
installed to 2.6. To get around that on my connected hosts I've done:

easy_install-2.7 pip

and then I install with pip2.7.

But this unconnected host doesn't have easy_install-2.7, so I'd have
to figure out how to get that first.

I think it will work if I just copy
/usr/lib64/python2.7/site-packages/PIL. That worked on a test system I
tried it on. I'll try on the real system tonight.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Ben Finney

c...@isbd.net writes:

> Ben Finney  wrote:
> > My request at the moment, though, is not for people to change what's
> > on their slides; rather, if they want people to retrieve them, the
> > slides should be downloadable easily (i.e. without a web app,
> > without a registration to some specific site).
>
> ... and having downloaded them what do you view them with if they're
> not plain text?

Again, I was not the one asking for plain text. So I don't really
understand why you ask me that. But, here goes:

Presentations documents, the overwhelming majority, are in a very small
number of formats.

If they're PDF: any PDF viewer https://pdfreaders.org/>.

If they're a format produced by some widespread presentation tool:
LibreOffice Impress https://www.libreoffice.org/discover/impress/>.

Why do you ask?

-- 
 \  “The way to build large Python applications is to componentize |
  `\ and loosely-couple the hell out of everything.” —Aahz |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread alister

On Wed, 25 Feb 2015 23:34:29 +, MRAB wrote:

> On 2015-02-25 22:59, Joel Goldstick wrote:
>  > On Wed, Feb 25, 2015 at 4:28 PM, MRAB 
>  > wrote:
>  > > On 2015-02-25 20:45, Mark Lawrence wrote:
>  > >>
>  > >> http://www.slideshare.net/pydanny/python-worst-practices
>  > >>
>  > >> Any that should be added to this list?  Any that be removed as not
> that
>  > >> bad?
>  > >>
>  > > We don't have numeric ZIP codes in the UK, but the entire world has
>  > > numeric telephone numbers, so that might be a better example of
>  > > numbers that aren't really numbers.
>  >
>  > US zip codes get messed up with ints because many have a leading
>  > zero.
>  > I use strings
> 
> Telephone numbers can also start with zero.

unless you are performing maths on it data that is made up of numbers 
(zip code, tel number, house number etc) is still only text & should be 
stored as a string.

> 
 >  > > Numeric dates can be ambiguous: dd/mm/ or mm/dd/? The ISO
>  > > standard is clearer: -mm-dd.
>  > >
>  > > Handling text: "Unicode sandwich".
>  > >
>  > > UTF-8 is better than legacy encodings.
>  > >





-- 
To save a single life is better than to build a seven story pagoda.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: splunk_handler and logbook

2015-02-26 Thread jvarghese

I got the solution,
 Use RedirectLoggingHandler, to redirect the logs to logbook,

 from logging import getLogger
 mylog = getLogger('My Log')
from splunk_handler import SplunkHandler

splunk = SplunkHandler(
... host='',
... port='',
... username='',
... password='',
... index='',
... verify=,
... source=""
... )
from logbook.compat import RedirectLoggingHandler
mylog.addHandler(RedirectLoggingHandler())
mylog.addHandler(splunk)

..
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: ANN: Wing IDE 5.1.2 released

2015-02-26 Thread William Ray Wing

> On Feb 26, 2015, at 2:04 PM, Jim Mooney  wrote:
> 
> Hey, can I run Py 2.7 and 3.4 side by side without a lot of hassle, using 
> Wing? I run both since I'm migranting and so far the free IDEs just seem to 
> choke on that.
> -- 
> https://mail.python.org/mailman/listinfo/python-list

I assume you just mean that you would like to have different Python projects 
that open in Wing with the correct associated version of Python.
Yes, you can specify a python executable in the Project Properties - 
Environment tab.  Click on the “Custom" button in the Python Executable entry 
and enter the path to the version of Python you want. 

If this isn’t what you are after, let us know.

-Bill

PS: I’ve found that the Wing e-mail support is VERY responsive.  No relation, 
just a happy user.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Design thought for callbacks

2015-02-26 Thread Ian Kelly

On Feb 26, 2015 4:00 AM, "Cem Karan"  wrote:
>
>
> On Feb 26, 2015, at 12:36 AM, Gregory Ewing 
wrote:
>
> > Cem Karan wrote:
> >> I think I see what you're talking about now.  Does WeakMethod
> >> (https://docs.python.org/3/library/weakref.html#weakref.WeakMethod)
solve
> >> this problem?
> >
> > Yes, that looks like it would work.
>
>
> Cool!

Sometimes I wonder whether anybody reads my posts. I suggested a solution
involving WeakMethod four days ago that additionally extends the concept to
non-method callbacks (requiring a small amount of extra effort from the
client in those cases, but I think that is unavoidable. There is no way
that the framework can determine the appropriate lifetime for a
closure-based callback.)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Design thought for callbacks

2015-02-26 Thread Ethan Furman

On 02/26/2015 11:54 AM, Ian Kelly wrote:

> Sometimes I wonder whether anybody reads my posts.

It's entirely possible the OP wasn't ready to understand your solution four 
days ago, but two days later the OP was.

--
~Ethan~



signature.asc
Description: OpenPGP digital signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Rustom Mody

On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
> You should add emoticons, but not call them or the above 'gibberish'.

Done -- and of course not under gibberish.
I dont really know much how emoji are used but I understand they are. 
JFTR I consider it necessary to be respectful to all (living) people.
For that matter even dead people(s) - no need to be disrespectful to the 
egyptians who created the hieroglyphs or the sumerians who wrote cuneiform.

I only find it crosses a line when the 2 millenia dead creations are made to 
take 
the space of the living.

Chris wrote:
> * Klingon: Not part of any current standard 

Thanks Removed.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Grant Edwards

On 2015-02-26, alister  wrote:
> On Wed, 25 Feb 2015 23:34:29 +, MRAB wrote:
>> On 2015-02-25 22:59, Joel Goldstick wrote:
>>> On Wed, Feb 25, 2015 at 4:28 PM, MRAB  wrote:
 On 2015-02-25 20:45, Mark Lawrence wrote:
>
> http://www.slideshare.net/pydanny/python-worst-practices
>
> Any that should be added to this list?  Any that be removed as not
> that bad?

 We don't have numeric ZIP codes in the UK, but the entire world has
 numeric telephone numbers, so that might be a better example of
 numbers that aren't really numbers.
>>>
>>> US zip codes get messed up with ints because many have a leading
>>> zero.
>>> I use strings

I should hope so, because US zip codes can also contain a hyphen.

>> Telephone numbers can also start with zero.
>
> unless you are performing maths on it data that is made up of numbers 
> (zip code, tel number, house number etc) is still only text & should be 
> stored as a string.

And if you _are_ performing maths on postal codes, telephone numbers
and house numbers, something is seriously wrong and it probably
doesn't matter how you represent things.

-- 
Grant Edwards   grant.b.edwardsYow! I am covered with
  at   pure vegetable oil and I am
  gmail.comwriting a best seller!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Simon Ward

On 25 February 2015 21:24:37 GMT+00:00, Chris Angelico  wrote:
>On Thu, Feb 26, 2015 at 7:45 AM, Mark Lawrence
> wrote:
>> http://www.slideshare.net/pydanny/python-worst-practices
>>
>> Any that should be added to this list?  Any that be removed as not
>that bad?
>
>Remove the complaint about id. It's an extremely useful variable name,
>and you hardly ever need the function.

You can add one character and avoid the conflict with "id_" and not require 
anyone else maintaining the code to think about it. As rare as the conflict is, 
I think the ease of avoiding it makes the extra character a practical defensive 
technique. I agree it is not a worst case. 

Simon
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Simon Ward

On 26 February 2015 00:11:24 GMT+00:00, Ben Finney  
wrote:
>> Yes, but my point is: You shouldn't need to rebind those names (or
>> have names "true" and "false" for 0 and 1).
>
>That's not what you asked, though. You asked “When would 0 mean true
>and
>1 mean false?” My answer: in all Unix shell contexts.
>
>> Instead, use "success" and "failure".
>
>You'd better borrow the time machine and tell the creators of Unix. The
>meme is already established for decades now.

0 = success and non-zero = failure is the meme established, rather than 0 = 
true, non-zero = false.

It's not just used by UNIX, and is not necessarily defined by the shell either 
(bash was mentioned elsewhere in the thread). There is probably a system that 
pre-dates UNIX that I uses/used this too, but I don't know.

C stdlib defines EXIT_SUCCESS = 0, yet C99 stdbool.h defines false = 0. That 
shells handle 0 as true and non-zero as false probably stems from this (or 
similar in older languages). The " true" command is defined to have an exit 
status of 0, and "false" an exit status of 1.

The value is better thought of an error level, where 0 is no error and non-zero 
is some error. The AmigaOS shell conventionally takes this further with higher 
values indicating more critical errors, there's even a "failat N" command that 
means exit the script if the error level is higher than N.

None of the above is a good reason to use error *or* success return values in 
Python--use exceptions!--but may be encountered when running other processes.

Simon
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Parallelization of Python on GPU?

2015-02-26 Thread Sturla Molden


On 26/02/15 18:34, John Ladasky wrote:


Hi Sturla,  I recognize your name from the scikit-learn mailing list.

If you look a few posts above yours in this thread, I am aware of gpu-libsvm.  
I don't know if I'm up to the task of reusing the scikit-learn wrapping code, 
but I am giving that option some serious thought.  It isn't clear to me that 
gpu-libsvm can handle both SVM and SVR, and I have need of both algorithms.

My training data sets are around 5000 vectors long.  IF that graph on the 
gpu-libsvm web page is any indication of what I can expect from my own data (I 
note that they didn't specify the GPU card they're using), I might realize a 
20x increase in speed.



A GPU is a "floating point monster", not a CPU. It is not designed to 
run things like CPython. It is also only designed to run threads in 
parallel on its cores, not processes. And as you know, in Python there 
is something called GIL. Further the GPU has hard-wired fine-grained 
load scheduling for data-parallel problems (e.g. matrix multiplication 
for vertex processing in 3D graphics). It is not like a thread on a GPU 
is comparable to a thread on a CPU. It is more like a parallel work 
queue, with the kind of abstraction you find in Apple's GCD.


I don't think it really doable to make something like CPython run with 
thousands of parallel instances on a GPU. A GPU is not designed for 
that. A GPU is great if you can pass millions of floating point vectors 
as items to the work queue, with a tiny amount of computation per item. 
It would be crippled if you passed a thousand CPython interpreters and 
expect them to do a lot of work.


Also, as it is libSVM that does the math in you case, you need to get 
libSVM to run on the GPU, not CPython.


In most cases the best hardware for parallel scientific computing 
(taking economy and flexibility into account) is a Linux cluster which 
supports MPI. You can then use mpi4py or Cython to use MPI from your 
Python code.


Sturla



--
https://mail.python.org/mailman/listinfo/python-list

Re: Windows permission error, 64 bit, psycopg2, python 3.4.2

2015-02-26 Thread Mark Lawrence


On 26/02/2015 15:10, Malik Rumi wrote:

I am one of those struggling with compile issues with python on 64 bit windows. 
I have not been able to get the solutions mentioned on Stack Overflow to work 
because installing Windows SDK 7.1 fails for me.

So I stumbled across a precompiled psycopg2, and that reported that it worked, 
but then I got two permission errors. Then I read that this was a bug in python 
(issue 14252) that had been fixed, but I don't think this is the same error. 
That one specifically refers to subprocess.py and I don't have that in my 
traceback.  I have v3.4.2. On top of everything else, despite requesting a new 
password, all I get from the big tracker is 'invalid login'.

In any event, running "import psycopg2" returns 'import error, no module named 
psycopg2'.


Microsoft Windows [Version 6.3.9600]
(c) 2013 Microsoft Corporation. All rights reserved.

C:\Users\Semantic>pip install git+https://github.com/nwcell/psycopg2-windows.git
@win64-py34#egg=psycopg2
Downloading/unpacking psycopg2 from git+https://github.com/nwcell/psycopg2-windo
ws.git@win64-py34
   Cloning https://github.com/nwcell/psycopg2-windows.git (to win64-py34) to 
c:\u
sers\semantic\appdata\local\temp\pip_build_semantic\psycopg2
   Running setup.py 
(path:C:\Users\Semantic\AppData\Local\Temp\pip_build_Semantic
\psycopg2\setup.py) egg_info for package psycopg2
 C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution 
opt
ion: 'summary'
   warnings.warn(msg)

Installing collected packages: psycopg2
   Running setup.py install for psycopg2
 C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution 
opt
ion: 'summary'
   warnings.warn(msg)

Successfully installed psycopg2
Cleaning up...
   Exception:
Traceback (most recent call last):
   File "C:\Python34\lib\shutil.py", line 370, in _rmtree_unsafe
 os.unlink(fullname)
PermissionError: [WinError 5] Access is denied: 'C:\\Users\\Semantic\\AppData\\L
ocal\\Temp\\pip_build_Semantic\\psycopg2\\.git\\objects\\pack\\pack-be4d3da4a06b
4c9ec4c06040dbf6685eeccca068.idx'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
   File "C:\Python34\lib\site-packages\pip\basecommand.py", line 122, in main
 status = self.run(options, args)
   File "C:\Python34\lib\site-packages\pip\commands\install.py", line 302, in 
run

 requirement_set.cleanup_files(bundle=self.bundle)
   File "C:\Python34\lib\site-packages\pip\req.py", line 1333, in cleanup_files
 rmtree(dir)
   File "C:\Python34\lib\site-packages\pip\util.py", line 43, in rmtree
 onerror=rmtree_errorhandler)
   File "C:\Python34\lib\shutil.py", line 477, in rmtree
 return _rmtree_unsafe(path, onerror)
   File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
 _rmtree_unsafe(fullname, onerror)
   File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
 _rmtree_unsafe(fullname, onerror)
   File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
 _rmtree_unsafe(fullname, onerror)
   File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
 _rmtree_unsafe(fullname, onerror)
   File "C:\Python34\lib\shutil.py", line 372, in _rmtree_unsafe
 onerror(os.unlink, fullname, sys.exc_info())
   File "C:\Python34\lib\site-packages\pip\util.py", line 53, in 
rmtree_errorhand
ler
 (exctype is PermissionError and value.args[3] == 5) #python3.3
IndexError: tuple index out of range



The above clearly shows "Successfully installed psycopg2" and that it's 
a permission error on cleanup that's gone wrong, so what is there to 
report on the bug tracker?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Albert-Jan Roskam



- Original Message -

> From: Simon Ward 
> To: 
> Cc: "python-list@python.org" 
> Sent: Thursday, February 26, 2015 8:36 PM
> Subject: Re: Python Worst Practices
> 
> 
> 
> On 25 February 2015 21:24:37 GMT+00:00, Chris Angelico  
> wrote:
>> On Thu, Feb 26, 2015 at 7:45 AM, Mark Lawrence
>>  wrote:
>>>  http://www.slideshare.net/pydanny/python-worst-practices
>>> 
>>>  Any that should be added to this list?  Any that be removed as not
>> that bad?
>> 
>> Remove the complaint about id. It's an extremely useful variable name,
>> and you hardly ever need the function.
> 
> You can add one character and avoid the conflict with "id_" and not 
> require anyone else maintaining the code to think about it. As rare as the 
> conflict is, I think the ease of avoiding it makes the extra character a 
> practical defensive technique. I agree it is not a worst case. 
> 


I sometimes do:

import sys, functools
if sys.version_info.major > 2:

bytez = functools.partial(bytes, encoding="utf-8")
else:
bytez = bytes  # nog encoding param in python 2.


I bitez you when you shadow 'bytes' (I can't remember when I couldn't use the 
functools.partial object), though it often works. Much easier to use 'bytez' or 
'bytes_'. It is annoying to 'unshadow' your code and confusing for others who 
might read your code.

Albert-Jan
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Parallelization of Python on GPU?

2015-02-26 Thread Sturla Molden


On 26/02/15 18:48, Jason Swails wrote:

On Thu, 2015-02-26 at 16:53 +, Sturla Molden wrote:

GPU computing is great if you have the following:

1. Your data structures are arrays floating point numbers.


It actually works equally great, if not better, for integers.


Right, but not complicated data structures with a lot of references or 
pointers. It requires data are laid out in regular arrays, and then it 
acts on these arrays in a data-parallel manner. It is designed to 
process vertices in parallel for computer graphics, and that is a 
limitation which is always there. It is not a CPU with 1024 cores. It is 
a "floating point monster" which can process 1024 vectors in parallel. 
You write a tiny kernel in a C-like language (CUDA, OpenCL) to process 
one vector, and then it will apply the kernel to all the vectors in an 
array of vectors. It is very comparable to how GLSL and Direct3D vertex 
and fragment shaders work. (The reason for which should be obvious.) The 
GPU is actually great for a lot of things in science, but it is not a 
CPU. The biggest mistake in the GPGPU hype is the idea that the GPU will 
behave like a CPU with many cores.


Sturla









--
https://mail.python.org/mailman/listinfo/python-list

Fix for no module named _sysconfigdata while compiling

2015-02-26 Thread Raymond Cote

Thought I might help someone else address a problem I ran into this afternoon.
While compiling Python 2.7.9 on CentOS 6, I received the error: no module named 
_sysconfigdata

Googling found a number of other people having this problem — but the other 
issues were all after the Python was installed — not while building. In digging 
through their advice, I saw a number of them spoke about having multiple 
versions of Python installed. In my case, I already had a custom Python 2.7.3 
installed on this machine — and I was upgrading over it to Python 2.7.9.

I found that renaming my custom /opt/python2.7 directory and then building the 
new release in the same directory, that the problem went away.

Summary:

Compiling Python 2.7.9 resulted in error: no module named _sysconfigdata while 
compiling.
My configuration: ./configure --prefix=/opt/python2.7 --enable-unicode=ucs4 
--enable-shared  LDFLAGS="-Wl,-rpath /opt/python2.7/lib"

make;make alt install

Remove the existing /opt/python2.7 directory which had Python 2.7.3.
Now all builds and installs properly.
—Ray


signature.asc
Description: Message signed with OpenPGP using GPGMail
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Worst Practices

2015-02-26 Thread Ben Finney

Simon Ward  writes:

> On 26 February 2015 00:11:24 GMT+00:00, Ben Finney 
>  wrote:
> >You'd better borrow the time machine and tell the creators of Unix. The
> >meme is already established for decades now.
>
> 0 = success and non-zero = failure is the meme established, rather
> than 0 = true, non-zero = false.

That is not the case: the commands ‘true’ (returns value 0) and ‘false’
(returns value 1) are long established in Unix. So that *is* part of the
meme I'm describing.

> None of the above is a good reason to use error *or* success return
> values in Python--use exceptions!--but may be encountered when running
> other processes.

Right. But likewise, don't deny that “true == 0” and “false == non-zero”
has a wide acceptance in the programming community too.

-- 
 \“Program testing can be a very effective way to show the |
  `\presence of bugs, but is hopelessly inadequate for showing |
_o__)  their absence.” —Edsger W. Dijkstra |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Windows permission error, 64 bit, psycopg2, python 3.4.2

2015-02-26 Thread Malik Rumi

On Thursday, February 26, 2015 at 2:55:07 PM UTC-6, Mark Lawrence wrote:
> On 26/02/2015 15:10, Malik Rumi wrote:
> > I am one of those struggling with compile issues with python on 64 bit 
> > windows. I have not been able to get the solutions mentioned on Stack 
> > Overflow to work because installing Windows SDK 7.1 fails for me.
> >
> > So I stumbled across a precompiled psycopg2, and that reported that it 
> > worked, but then I got two permission errors. Then I read that this was a 
> > bug in python (issue 14252) that had been fixed, but I don't think this is 
> > the same error. That one specifically refers to subprocess.py and I don't 
> > have that in my traceback.  I have v3.4.2. On top of everything else, 
> > despite requesting a new password, all I get from the big tracker is 
> > 'invalid login'.
> >
> > In any event, running "import psycopg2" returns 'import error, no module 
> > named psycopg2'.
> >
> >
> > Microsoft Windows [Version 6.3.9600]
> > (c) 2013 Microsoft Corporation. All rights reserved.
> >
> > C:\Users\Semantic>pip install 
> > git+https://github.com/nwcell/psycopg2-windows.git
> > @win64-py34#egg=psycopg2
> > Downloading/unpacking psycopg2 from 
> > git+https://github.com/nwcell/psycopg2-windo
> > ws.git@win64-py34
> >Cloning https://github.com/nwcell/psycopg2-windows.git (to win64-py34) 
> > to c:\u
> > sers\semantic\appdata\local\temp\pip_build_semantic\psycopg2
> >Running setup.py 
> > (path:C:\Users\Semantic\AppData\Local\Temp\pip_build_Semantic
> > \psycopg2\setup.py) egg_info for package psycopg2
> >  C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown 
> > distribution opt
> > ion: 'summary'
> >warnings.warn(msg)
> >
> > Installing collected packages: psycopg2
> >Running setup.py install for psycopg2
> >  C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown 
> > distribution opt
> > ion: 'summary'
> >warnings.warn(msg)
> >
> > Successfully installed psycopg2
> > Cleaning up...
> >Exception:
> > Traceback (most recent call last):
> >File "C:\Python34\lib\shutil.py", line 370, in _rmtree_unsafe
> >  os.unlink(fullname)
> > PermissionError: [WinError 5] Access is denied: 
> > 'C:\\Users\\Semantic\\AppData\\L
> > ocal\\Temp\\pip_build_Semantic\\psycopg2\\.git\\objects\\pack\\pack-be4d3da4a06b
> > 4c9ec4c06040dbf6685eeccca068.idx'
> >
> > During handling of the above exception, another exception occurred:
> >
> > Traceback (most recent call last):
> >File "C:\Python34\lib\site-packages\pip\basecommand.py", line 122, in 
> > main
> >  status = self.run(options, args)
> >File "C:\Python34\lib\site-packages\pip\commands\install.py", line 302, 
> > in run
> >
> >  requirement_set.cleanup_files(bundle=self.bundle)
> >File "C:\Python34\lib\site-packages\pip\req.py", line 1333, in 
> > cleanup_files
> >  rmtree(dir)
> >File "C:\Python34\lib\site-packages\pip\util.py", line 43, in rmtree
> >  onerror=rmtree_errorhandler)
> >File "C:\Python34\lib\shutil.py", line 477, in rmtree
> >  return _rmtree_unsafe(path, onerror)
> >File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
> >  _rmtree_unsafe(fullname, onerror)
> >File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
> >  _rmtree_unsafe(fullname, onerror)
> >File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
> >  _rmtree_unsafe(fullname, onerror)
> >File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe
> >  _rmtree_unsafe(fullname, onerror)
> >File "C:\Python34\lib\shutil.py", line 372, in _rmtree_unsafe
> >  onerror(os.unlink, fullname, sys.exc_info())
> >File "C:\Python34\lib\site-packages\pip\util.py", line 53, in 
> > rmtree_errorhand
> > ler
> >  (exctype is PermissionError and value.args[3] == 5) #python3.3
> > IndexError: tuple index out of range
> >
> 
> The above clearly shows "Successfully installed psycopg2" and that it's 
> a permission error on cleanup that's gone wrong, so what is there to 
> report on the bug tracker?
> 
> -- 
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
> 
> Mark Lawrence

1. I did not mean to confuse you by reference to the bug tracker. My log in 
difficulties are not related to this issue.

2. The other reference to the bug tracker was to indicate that I don't think 
this is the same error as mentioned there. 

3. Despite the report of success, I do not have psycopg2. Since that failure 
was followed by the permission errors, I assume they are related. This is why I 
posted, to get help with this problem. I look forward to any assistance you or 
anyone else can render on this issue. Thanks. 

4. Going back to bug #14252, it is structurally very similar. I forget the 
program at issue there, but at first it reported success and that was followed 
by win error 5, and in fact the program had not installed correctly. The 
difference is that #14252 involved

Re: Is anyone else unable to log into the bug tracker?

2015-02-26 Thread Malik Rumi

On Thursday, February 26, 2015 at 10:49:19 AM UTC-6, Skip Montanaro wrote:
> I have not had problems, but I use the Google login (Open ID, I presume) 
> option.
> 
> 
> Skip

Ok, I got it. In short, capitalization (or not) matters. Thanks to all. 
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Chris Angelico

On Fri, Feb 27, 2015 at 4:59 AM, Rustom Mody  wrote:
> On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
>> I think that this part of your post is more 'unprofessional' than the
>> character blocks.  It is very jarring and seems contrary to your main point.
>
> Ok I need a word for
> 1. I have no need for this
> 2. 99.9% of the (living) on this planet also have no need for this

So what, seven million people need it? Sounds pretty useful to me. And
your figure is an exaggeration; a lot more people than that use
emoji/emoticons.

>> > Also, how does assigning meanings to codepoints "waste storage"? As
>> > soon as Unicode 2.0 hit and 16-bit code units stopped being
>> > sufficient, everyone needed to allocate storage - either 32 bits per
>> > character, or some other system - and the fact that some codepoints
>> > were unassigned had absolutely no impact on that. This is decidedly
>> > NOT unprofessional, and it's not wasteful either.
>>
>> I agree.
>
> I clearly am more enthusiastic than knowledgeable about unicode.
> But I know my basic CS well enough (as I am sure you and Chris also do)
>
> So I dont get how 4 bytes is not more expensive than 2.
> Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits
> You could use a clever representation like UTF-8 or FSR.
> But I dont see how you can get out of this that full-unicode costs more than
> exclusive BMP.

Sure, UCS-2 is cheaper than the current Unicode spec. But Unicode 2.0
was when that changed, and the change was because 65536 characters
clearly wouldn't be enough - and that was due to the number of
characters needed for other things than those you're complaining
about. Every spec since then has not changed anything that affects
storage. There are still, today, quite a lot of unallocated blocks of
characters (we're really using only about two planes' worth so far,
maybe three), but even if Unicode specified just two planes of 64K
characters each, you wouldn't be able to save much on transmission
(UTF-8 is already flexible and uses only what you need; if a future
Unicode spec allows 64K planes, UTF-8 transmission will cost exactly
the same for all existing characters), and on an eight-bit-byte
system, the very best you'll be able to do is three bytes - which you
can do today, too; you already know 21 bits will do. So since the BMP
was proven insufficient (back in 1996), no subsequent changes have had
any costs in storage.

> Still if I were to expand on the criticisms here are some examples:
>
> Math-Greek: Consider the math-alpha block
> http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block
>
> Now imagine a beginning student not getting the difference between font, 
> glyph,
> character.  To me this block represents this same error cast into concrete and
> dignified by the (supposed) authority of the unicode consortium.
>
> There are probably dozens of other such stupidities like distinguishing 
> kelvin K from latin K as if that is the business of the unicode consortium

A lot of these kinds of characters come from a need to unambiguously
transliterate text stored in other encodings. I don't personally
profess to understand the reasoning behind the various
indistinguishable characters, but I'm aware that there are a lot of
tricky questions to be decided; and if once the Consortium decides to
allocate a character, that character must remain forever allocated.

> My real reservations about unicode come from their work in areas that I 
> happen to know something about
>
> Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪ ♫ 
> is perhaps ok
> However all this stuff http://xahlee.info/comp/unicode_music_symbols.html
> makes no sense (to me) given that music (ie standard western music written in 
> staff notation) is inherently 2 dimensional --  multi-voiced, multi-staff, 
> chordal

The placement on the page is up to the display library. You can
produce a PDF that places the note symbols at their correct positions,
and requires no images to render sheet music.

> Sanskrit/Devanagari:
> Consists of bogus letters that dont exist in devanagari
> The letter ऄ (0904) is found here http://unicode.org/charts/PDF/U0900.pdf
> But not here http://en.wikipedia.org/wiki/Devanagari#Vowels
> So I call it bogus-devanagari
>
> Contrariwise an important letter in vedic pronunciation the double-udatta is 
> missing
> http://list.indology.info/pipermail/indology_list.indology.info/2000-April/021070.html
>
> All of which adds up to the impression that the unicode consortium 
> occasionally fails to do due diligence

Which proves that they're not perfect. Don't forget, they can always
add more characters later.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Parallelization of Python on GPU?

2015-02-26 Thread Jason Swails

On Thu, Feb 26, 2015 at 4:10 PM, Sturla Molden 
wrote:

> On 26/02/15 18:48, Jason Swails wrote:
>
>> On Thu, 2015-02-26 at 16:53 +, Sturla Molden wrote:
>>
>>> GPU computing is great if you have the following:
>>>
>>> 1. Your data structures are arrays floating point numbers.
>>>
>>
>> It actually works equally great, if not better, for integers.
>>
>
> Right, but not complicated data structures with a lot of references or
> pointers. It requires data are laid out in regular arrays, and then it acts
> on these arrays in a data-parallel manner. It is designed to process
> vertices in parallel for computer graphics, and that is a limitation which
> is always there. It is not a CPU with 1024 cores. It is a "floating point
> monster" which can process 1024 vectors in parallel. You write a tiny
> kernel in a C-like language (CUDA, OpenCL) to process one vector, and then
> it will apply the kernel to all the vectors in an array of vectors. It is
> very comparable to how GLSL and Direct3D vertex and fragment shaders work.
> (The reason for which should be obvious.) The GPU is actually great for a
> lot of things in science, but it is not a CPU. The biggest mistake in the
> GPGPU hype is the idea that the GPU will behave like a CPU with many cores.


Very well summarized.  At least in my field, though, it is well-known that
GPUs are not 'uber-fast CPUs'.  Algorithms have been redesigned, programs
rewritten to take advantage of their architecture.  It has been a *massive*
investment of time and resources, but (unlike the Xeon Phi coprocessor [1])
has reaped most of its promised rewards.

--Jason

[1] I couldn't resist the jab.  At several times the cost of the top of the
line NVidia gaming card, the GPU is about 15-20x faster...
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Steven D'Aprano

Chris Angelico wrote:

> Unicode
> isn't about taking everyone's separate character sets and numbering
> them all so we can reference characters from anywhere; if you wanted
> that, you'd be much better off with something that lets you specify a
> code page in 16 bits and a character in 8, which is roughly the same
> size as Unicode anyway.

Well, except for the approximately 25% of people in the world whose native
language has more than 256 characters.

It sounds like you are referring to some sort of "shift code" system. Some
legacy East Asian encodings use a similar scheme, and depending on how they
are implemented they have great disadvantages. For example, Shift-JIS
suffers from a number of weaknesses including that a single byte corrupted
in transmission can cause large swaths of the following text to be
corrupted. With Unicode, a single corrupted byte can only corrupt a single
code point.

-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Chris Angelico

On Fri, Feb 27, 2015 at 10:09 AM, Steven D'Aprano
 wrote:
> Chris Angelico wrote:
>
>> Unicode
>> isn't about taking everyone's separate character sets and numbering
>> them all so we can reference characters from anywhere; if you wanted
>> that, you'd be much better off with something that lets you specify a
>> code page in 16 bits and a character in 8, which is roughly the same
>> size as Unicode anyway.
>
> Well, except for the approximately 25% of people in the world whose native
> language has more than 256 characters.

You could always allocate multiple code pages to one language. But
since I'm not advocating this system, I'm only guessing at solutions
to its problems.

> It sounds like you are referring to some sort of "shift code" system. Some
> legacy East Asian encodings use a similar scheme, and depending on how they
> are implemented they have great disadvantages. For example, Shift-JIS
> suffers from a number of weaknesses including that a single byte corrupted
> in transmission can cause large swaths of the following text to be
> corrupted. With Unicode, a single corrupted byte can only corrupt a single
> code point.

That's exactly what I was hinting at. There are plenty of systems like
that, and they are badly flawed compared to a simple universal system
for a number of reasons. One is the corruption issue you mention;
another is that a simple memory-based text search becomes utterly
useless (to locate text in a document, you'd need to do a whole lot of
stateful parsing - not to mention the difficulties of doing
"similar-to" searches across languages); concatenation of text also
becomes a stateful operation, and so do all sorts of other simple
manipulations. Unicode may demand a bit more storage in certain
circumstances (where an eight-bit encoding might have handled your
entire document), but it's so much easier for the general case.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Design thought for callbacks

2015-02-26 Thread Fabio Zadrozny

On Wed, Feb 25, 2015 at 9:46 AM, Cem Karan  wrote:

>
> On Feb 24, 2015, at 8:23 AM, Fabio Zadrozny  wrote:
>
> > Hi Cem,
> >
> > I didn't read the whole long thread, but I thought I'd point you to what
> I'm using in PyVmMonitor (http://www.pyvmmonitor.com/) -- which may
> already cover your use-case.
> >
> > Take a look at the callback.py at
> https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/callback.py
> >
> > And its related test (where you can see how to use it):
> https://github.com/fabioz/pyvmmonitor-core/blob/master/_pyvmmonitor_core_tests/test_callback.py
> (note that it falls back to a strong reference on simple functions -- i.e.:
> usually top-level methods or methods created inside a scope -- but
> otherwise uses weak references).
>
> That looks like a better version of what I was thinking about originally.
> However, various people on the list have convinced me to stick with strong
> references everywhere.  I'm working out a possible API right now, once I
> have some code that I can use to illustrate what I'm thinking to everyone,
> I'll post it to the list.
>
> Thank you for showing me your code though, it is clever!
>
> Thanks,
> Cem Karan


Hi Cem,

Well, I decided to elaborate a bit on the use-case I have and how I use it
(on a higher level):
http://pydev.blogspot.com.br/2015/02/design-for-client-side-applications-in.html

So, you can see if it may be worth for you or not (I agree that sometimes
you should keep strong references, but for my use-cases, weak references
usually work better -- with the only exception being closures, which is
handled different anyways but with the gotcha of having to manually
unregister it).

Best Regards,

Fabio
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: ANN: Wing IDE 5.1.2 released

2015-02-26 Thread Chris Angelico

On Fri, Feb 27, 2015 at 6:42 AM, William Ray Wing  wrote:
> PS: I’ve found that the Wing e-mail support is VERY responsive.  No relation, 
> just a happy user.

You should totally get involved with the project. With your name,
everyone would think you started it!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Steven D'Aprano

Rustom Mody wrote:

> Emoticons (or is it emoji) seems to have some (regional?) takeup?? Dunno…
> In any case I'd like to stay clear of political(izable) questions

Emoji is the term used in Japan, gradually spreading to the rest of the
word. Emoticons, I believe, should be restricted to the practice of using
ASCII-only digraphs and trigraphs such as :-) (colon, hyphen, right-parens)
to indicate "smileys".

I believe that emoji will eventually lead to Unicode's victory. People will
want smileys and piles of poo on their mobile phones, and from there it
will gradually spread to everywhere. All they need to do to make victory
inevitable is add cartoon genitals...

>> I think that this part of your post is more 'unprofessional' than the
>> character blocks.  It is very jarring and seems contrary to your main
>> point.
> 
> Ok I need a word for
> 1. I have no need for this
> 2. 99.9% of the (living) on this planet also have no need for this

0.1% of the living is seven million people. I'll tell you what, you tell me
which seven million people should be relegated to second-class status, and
I'll tell them where you live.

:-)

[...]
> I clearly am more enthusiastic than knowledgeable about unicode.
> But I know my basic CS well enough (as I am sure you and Chris also do)
> 
> So I dont get how 4 bytes is not more expensive than 2.

Obviously it is. But it's only twice as expensive, and in computer science
terms that counts as "close enough". It's quite common for data structures
to "waste" space by using "no more than twice as much space as needed",
e.g. Python dicts and lists.

The whole Unicode range U+ to U+10 needs only 21 bits, which fits
into three bytes. Nevertheless, there's no three-byte UTF encoding, because
on modern hardware it is more efficient to "waste" an entire extra byte per
code point and deal with an even multiple of bytes.

> Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits
> You could use a clever representation like UTF-8 or FSR.
> But I dont see how you can get out of this that full-unicode costs more
> than exclusive BMP.

Are you missing a word there? Costs "no more" perhaps?

> eg consider the case of 32 vs 64 bit executables.
> The 64 bit executable is generally larger than the 32 bit one
> Now consider the case of a machine that has say 2GB RAM and a 64-bit
> processor. You could -- I think -- make a reasonable case that all those
> all-zero hi-address-words are 'waste'.

Sure. The whole point of 64-bit processors is to enable the use of more than
2GB of RAM. One might as well say that using 32-bit processors is wasteful
if you only have 64K of memory. Yes it is, but the only things which use
16-bit or 8-bit processors these days are embedded devices.

[...] 
> Math-Greek: Consider the math-alpha block
>
http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block
> 
> Now imagine a beginning student not getting the difference between font,
> glyph,
> character.  To me this block represents this same error cast into concrete 
> and dignified by the (supposed) authority of the unicode consortium.

Not being privy to the internal deliberations of the Consortium, it is
sometimes difficult to tell why two symbols are sometimes declared to be
mere different glyphs for the same character, and other times declared to
be worthy of being separate characters.

E.g. I think we should all agree that the English "A" and the French "A"
shouldn't count as separate characters, although the Greek "Α" and
Russian "А" do.

In the case of the maths symbols, it isn't obvious to me what the deciding
factors were. I know that one of the considerations they use is to consider
whether or not users of the symbols have a tradition of treating the
symbols as mere different glyphs, i.e. stylistic variations. In this case,
I'm pretty sure that mathematicians would *not* consider:

U+2115 DOUBLE-STRUCK CAPITAL N "ℕ"
U+004E LATIN CAPITAL LETTER N "N"

as mere stylistic variations. If you defined a matrix called ℕ, you would
probably be told off for using the wrong symbol, not for using the wrong
formatting.

On the other hand, I'm not so sure about 

U+210E PLANCK CONSTANT "ℎ"

versus a mere lowercase h (possibly in italic).

> There are probably dozens of other such stupidities like distinguishing
> kelvin K from latin K as if that is the business of the unicode consortium

But it *is* the business of the Unicode consortium. They have at least two
important aims:

- to be able to represent every possible human-language character;

- to allow lossless round-trip conversion to all existing legacy encodings
  (for the subset of Unicode handled by that encoding).

The second reason is why Unicode includes code points for degree-Celsius and
degree-Fahrenheit, rather than just using °C and °F like sane people.
Because some idiot^W code-page designer back in the 1980s or 90s decided to
add single character ℃ and ℉. So now Unicode

Re: Newbie question about text encoding

2015-02-26 Thread Dave Angel


On 02/26/2015 08:05 PM, Steven D'Aprano wrote:

Rustom Mody wrote:






eg consider the case of 32 vs 64 bit executables.
The 64 bit executable is generally larger than the 32 bit one
Now consider the case of a machine that has say 2GB RAM and a 64-bit
processor. You could -- I think -- make a reasonable case that all those
all-zero hi-address-words are 'waste'.


Sure. The whole point of 64-bit processors is to enable the use of more than
2GB of RAM. One might as well say that using 32-bit processors is wasteful
if you only have 64K of memory. Yes it is, but the only things which use
16-bit or 8-bit processors these days are embedded devices.


But the 2gig means electrical address lines out of the CPU are wasted, 
not address space.  A 64 bit processor and 64bit OS means you can have 
more than 4gig in a process space, even if over half of it has to be in 
the swap file.  Linear versus physical makes a big difference.


(Although I believe Seymour Cray was quoted as saying that virtual 
memory is a crock, because "you can't fake what you ain't got.")





--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Picking apart a text line

2015-02-26 Thread memilanuk

So... okay.  I've got a bunch of PDFs of tournament reports that I want 
to sift thru for information.  Ended up using 'pdftotext -layout 
file.pdf file.txt' to extract the text from the PDF.  Still have a few 
little glitches to iron out there, but I'm getting decent enough results 
for the moment to move on.


I've got my script to where it opens the file, ignores the header lines 
at the top, then goes through the rest of the file line by line, 
skipping lines if they don't match (don't need the separator lines) and 
adding them to a list if they do (and stripping whitespace off the right 
side along the way).  So far, so good.


#  rstatPDF2csv.py

import sys
import re


def convert(file):
lines = []
data = open(file)

# Skip first n lines of headers
for i in range(9):
data.__next__()

# Read remaining lines one at a time
for line in data:

# If the line begins with a capital letter...
if re.match(r'^[A-Z]', line):

# Strip any trailing whitespace and then add to the list
lines.append(line.rstrip())

return lines

if __name__ == '__main__':
print(convert(sys.argv[1]))



What I'm ending up with is a list full of strings that look something 
like this:


['JOHN DOEC   T   HM   445-20*MW*   199-11*MW* 
194-5 1HM 393-16*MW*   198-9 1HM198-11*MW*396-20*MW* 
789-36*MW* 1234-56 *MW*',


Basically... a certain number of characters allotted for competitor 
name, then four or five 1-2 char columns for things like classification, 
age group, special categories, etc., then a score ('445-20'), then up to 
4 char for award (if any), then another score, another award, etc. etc. etc.


Right now (in the PDF) the scores are batched by one criterion, then 
sorted within those groups.  Makes life easier for the person giving out 
awards at the end of the tournament, not so much for someone trying to 
see how their individual score ranks against the whole field, not just 
their group or sub-group.  I want to be able to pull all the scores out 
and then re-sort based on score - mainly the final aggregate score, but 
potentially also on stage or daily scores.  Eventually I'd like to be 
able to calculate standardized z-scores so as to be able to compare 
scores from one event/location against another.


So back to the lines of text I have stored as strings in a list.  I 
think I want to convert that to a list of lists, i.e. split each line 
up, store that info in another list and ditch the whitespace.  Or would 
I be better off using dicts?  Originally I was thinking of how to 
process each line and split it them up based on what information was 
where - some sort of nested for/if mess.  Now I'm starting to think that 
the lines of text are pretty uniform in structure i.e. the same field is 
always in the same location, and that list slicing might be the way to 
go, if a bit tedious to set up initially...?


Any thoughts or suggestions from people who've gone down this particular 
path would be greatly appreciated.  I think I have a general 
idea/direction, but I'm open to other ideas if the path I'm on is just 
blatantly wrong.




Thanks,

Monte


--
Shiny!  Let's be bad guys.

Reach me @ memilanuk (at) gmail dot com

--
https://mail.python.org/mailman/listinfo/python-list

Re: Are threads bad? - was: Future of Pypy?

2015-02-26 Thread Paul Rubin

Ryan Stuart  writes:
> My point is malloc, something further up (down?) the stack, is making
> modifications to shared state when threads are involved.  Modifying
> shared state makes it infinitely more difficult to reason about the
> correctness of your software.

If you're saying the libc malloc might have bugs that affect
multithreaded apps but not single threaded ones, then sure, but the
Linux kernel might also have such bugs and it's inherently
multithreaded, so there's no escape.  Even if your app is threaded
you're still susceptible to threading bugs in the kernel.  

If malloc works properly then it's thread-safe and you can use it
without worrying about how your app's state interacts with malloc's
internal state.

> We clearly got completely different things from the article. My
> interpretation was that it was making *the exact opposite* point to
> what you stated mainly because non-threading approaches don't share
> state.

It gave the example of asyncio, which is non-threaded but (according to
the article) was susceptible to shared state bugs because you could
accidentally insert yield points in critical sections, by doing things
like logging.

> It states that quite clearly. For example "it is – literally –
> exponentially more difficult to reason about a routine that may be
> executed from an arbitrary number of threads concurrently".

I didn't understand what it was getting at with that n**n claim.  Of
course arbitrary code (even single threaded) is incalculably difficult
to reason about (halting problem, Rice's theorem).  But we're talking
about code following a particular set of conventions, not arbitrary
code.  The conventions are supposed to facilitate reasoning and
verification.  Again there's tons of solid theory in the OS literature
about this stuff.

> by default Haskell looks to use lightweight threads where only 1
> thread can be executing at a time [1]... That doesn't seem to be
> shared state multithreading, which is what the article is referring to.

Haskell uses lightweight, shared state threads with synchronization
primitives that do the usual things (the API is somewhat different than
Posix threads though).  You have to use the +RTS command line option to
run on multiple cores: I don't know why the default is to stay on a
single core.  There might be a performance hit if you use the multicore
runtime with a single-threaded program, or something like that.

There is a book about Haskell concurrency and parallelism that I've been
wanting to read (full text online):

http://chimera.labs.oreilly.com/books/123000929/index.html

> 2) it has a weird story about the brass cockroach, that basically
> signified that they didn't have a robust enough testing system to
> be able to reproduce the bug. 
>
> The point was that it wasn't feasible to have a robust testing suite
> because, you guessed it,

No really, they observed this bug happening repeatedly under what
sounded like fairly light load with real users.  So a stress testing
framework should have been able to reproduce it.  Do you really think
it's impossible to debug this kind of problem?  OS developers do it all
the time.  There is no getting around it.  

> This is probably correct. Is there any STM implementations out that
> that don't significantly compromise performance?

STM is fast as long as there's not much contention for shared data
between threads.  In the "account balance" example that should almost
always be the case.  The slowdown is when multiple threads are fighting
over the same data and transactions keep having to be rolled back and
restarted.

> multiprocessing module looks pretty nice and I should try it 
> It's 1 real advantage is that it side-steps the GIL. So, if you need
> to utilise multiple cores for CPU bound tasks, then it might well be
> the only option.

It's 1 real advantage compared to what?  I thought you were saying it
avoids shared data hazards of threads.  The 4 alternatives in that
article were threads, multiprocessing, old-fashioned async (callback
hell), and asyncio (still contorted and relies on Python 3 coroutines).
If you eliminate threads because of data sharing and asyncio because you
need Python 2 compatibility, you're left with multiprocessing if you
want to avoid the control inversion of callback style.

It's true though, this started out about the GIL in PyPy (was Laura
going to post about that?) so using multicores is indeed maybe relevant.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Steven D'Aprano

Dave Angel wrote:

> (Although I believe Seymour Cray was quoted as saying that virtual
> memory is a crock, because "you can't fake what you ain't got.")

If I recall correctly, disk access is about 1 times slower than RAM, so
virtual memory is *at least* that much slower than real memory.

-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list

Python in The Economist

2015-02-26 Thread Frank Millman

Hi all

>From a recent article in The Economist -

"A recovering economy in America and an explosion of entrepreneurial 
activity are driving up demand for tech talent. [...] Bidding battles are 
breaking out, with salaries and bonuses rising fast for experts in popular 
computer languages such as Python and Ruby on Rails."

The author seems to have obtained his information from "a recent dinner 
party in Silicon Valley", so it may not be very representative. But to be 
mentioned in such a high-profile newspaper with its international readership 
can only be good for Python.

Here is a link to the full article -

http://www.economist.com/news/business/21644150-battle-software-talent-other-industries-can-learn-silicon-valley-how-bag

Frank Millman



-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

2015-02-26 Thread Dave Angel


On 02/27/2015 12:58 AM, Steven D'Aprano wrote:

Dave Angel wrote:


(Although I believe Seymour Cray was quoted as saying that virtual
memory is a crock, because "you can't fake what you ain't got.")


If I recall correctly, disk access is about 1 times slower than RAM, so
virtual memory is *at least* that much slower than real memory.



It's so much more complicated than that, that I hardly know where to 
start.  I'll describe a generic processor/OS/memory/disk architecture; 
there will be huge differences between processor models even from a 
single manufacturer.


First, as soon as you add swapping logic to your 
processor/memory-system, you theoretically slow it down.  And in the 
days of that quote, Cray's memory was maybe 50 times as fast as the 
memory used by us mortals.  So adding swapping logic would have slowed 
it down quite substantially, even when it was not swapping.  But that 
logic is inside the CPU chip these days, and presumably thoroughly 
optimized.


Next, statistically, a program uses a small subset of its total program 
& data space in its working set, and the working set should reside in 
real memory.  But when the program greatly increases that working set, 
and it approaches the amount of physical memory, then swapping becomes 
more frenzied, and we say the program is thrashing.  Simple example, try 
sorting an array that's about the size of available physical memory.


Next, even physical memory is divided into a few levels of caching, some 
on-chip and some off.  And the caching is done in what I call strips, 
where accessing just one byte causes the whole strip to be loaded from 
non-cached memory.  I forget the current size for that, but it's maybe 
64 to 256 bytes or so.


If there are multiple processors (not multicore, but actual separate 
processors), then each one has such internal caches, and any writes on 
one processor may have to trigger flushes of all the other processors 
that happen to have the same strip loaded.


The processor not only prefetches the next few instructions, but decodes 
and tentatively executes them, subject to being discarded if a 
conditional branch doesn't go the way the processor predicted.  So some 
instructions execute in zero time, some of the time.


Every address of instruction fetch, or of data fetch or store, goes 
through a couple of layers of translation.  Segment register plus offset 
gives linear address.  Lookup those in tables to get physical address, 
and if table happens not to be in on-chip cache, swap it in.  If 
physical address isn't valid, a processor exception causes the OS to 
potentially swap something out, and something else in.


Once we're paging from the swapfile, the size of the read is perhaps 4k. 
 And that read is regardless of whether we're only going to use one 
byte or all of it.


The ratio between an access which was in the L1 cache and one which 
required a page to be swapped in from disk?  Much bigger than your 
10,000 figure.  But hopefully it doesn't happen a big percentage of the 
time.


Many, many other variables, like the fact that RAM chips are not 
directly addressable by bytes, but instead count on rows and columns. 
So if you access many bytes in the same row, it can be much quicker than 
random access.  So simple access time specifications don't mean as much 
as it would seem;  the controller has to balance the RAM spec with the 
various cache requirements.

--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Re: Picking apart a text line

2015-02-26 Thread Dave Angel


On 02/26/2015 10:53 PM, memilanuk wrote:

So... okay.  I've got a bunch of PDFs of tournament reports that I want
to sift thru for information.  Ended up using 'pdftotext -layout
file.pdf file.txt' to extract the text from the PDF.  Still have a few
little glitches to iron out there, but I'm getting decent enough results
for the moment to move on.

I've got my script to where it opens the file, ignores the header lines
at the top, then goes through the rest of the file line by line,
skipping lines if they don't match (don't need the separator lines) and
adding them to a list if they do (and stripping whitespace off the right
side along the way).  So far, so good.

#  rstatPDF2csv.py

import sys
import re


def convert(file):
 lines = []
 data = open(file)

 # Skip first n lines of headers
 for i in range(9):
 data.__next__()

 # Read remaining lines one at a time
 for line in data:

 # If the line begins with a capital letter...
 if re.match(r'^[A-Z]', line):

 # Strip any trailing whitespace and then add to the list
 lines.append(line.rstrip())

 return lines

if __name__ == '__main__':
 print(convert(sys.argv[1]))



What I'm ending up with is a list full of strings that look something
like this:

['JOHN DOEC   T   HM   445-20*MW*   199-11*MW* 194-5
1HM 393-16*MW*   198-9 1HM198-11*MW*396-20*MW*
789-36*MW* 1234-56 *MW*',

Basically... a certain number of characters allotted for competitor
name, then four or five 1-2 char columns for things like classification,
age group, special categories, etc., then a score ('445-20'), then up to
4 char for award (if any), then another score, another award, etc. etc.
etc.

Right now (in the PDF) the scores are batched by one criterion, then
sorted within those groups.  Makes life easier for the person giving out
awards at the end of the tournament, not so much for someone trying to
see how their individual score ranks against the whole field, not just
their group or sub-group.  I want to be able to pull all the scores out
and then re-sort based on score - mainly the final aggregate score, but
potentially also on stage or daily scores.  Eventually I'd like to be
able to calculate standardized z-scores so as to be able to compare
scores from one event/location against another.

So back to the lines of text I have stored as strings in a list.  I
think I want to convert that to a list of lists, i.e. split each line
up, store that info in another list and ditch the whitespace.  Or would
I be better off using dicts?  Originally I was thinking of how to
process each line and split it them up based on what information was
where - some sort of nested for/if mess.  Now I'm starting to think that
the lines of text are pretty uniform in structure i.e. the same field is
always in the same location, and that list slicing might be the way to
go, if a bit tedious to set up initially...?

Any thoughts or suggestions from people who've gone down this particular
path would be greatly appreciated.  I think I have a general
idea/direction, but I'm open to other ideas if the path I'm on is just
blatantly wrong.



Maintaining a list of lists is a big pain.  If the data is truly very 
uniform, you might want to do it, but I'd find it much more reasonable 
to have names for the fields of each line.  You can either do that with 
a named-tuple, or with instances of a custom class of your own.


See 
https://docs.python.org/3.4/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields


You read a line, do some sanity checking on it, and construct an object. 
 Go to the next line, do the same, another object.  Those objects are 
stored in a list.


Everything else accesses the fields of the object something like:


for row in  mylist:
print( row.name, row.classification, row.age)
if row.name == "Doe":
 ...




--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

78 matches

Mail list logo