splunk_handler and logbook
Hi, I am tring to use splunk(splunk_handler), my question is "Is there any way to integrate logbook with splunk_handler". The examples of splunk_handler uses python's logging module. Thanks -- https://mail.python.org/mailman/listinfo/python-list
Re: GDAL Installation in Enthought Python Distribution
On 26/02/2015 07:47, Leo Kris Palao wrote: Hi Python Users, Would like to request how to install GDAL in my Enthought Python Distribution (64-bit). I am having some problems making GDAL work. Or can you point me into a blog that describes how to set up GDAL in Enthought Python Distribution. Thanks for any help. -Leo Was it really neccessary to start a new thread one day after asking this question in a slightly different formt? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
On 26/02/2015 02:57, Steven D'Aprano wrote: Mark Lawrence wrote: On 25/02/2015 20:45, Mark Lawrence wrote: http://www.slideshare.net/pydanny/python-worst-practices Any that should be added to this list? Any that be removed as not that bad? Throwing in my own, how about built-in functions should not use "object" as the one and only argument, and a keyword argument at that. Which built-in function is that? memoryview see http://bugs.python.org/issue20408 -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
On 26/02/2015 03:05, Dave Angel wrote: On 02/25/2015 08:44 PM, Mark Lawrence wrote: On 25/02/2015 20:45, Mark Lawrence wrote: http://www.slideshare.net/pydanny/python-worst-practices Any that should be added to this list? Any that be removed as not that bad? Throwing in my own, how about built-in functions should not use "object" as the one and only argument, and a keyword argument at that. def marry(object = False)... if anybody has any cause to object, let him speak now... Fell off me chair larfing. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
Ben Finney wrote: > Chris Angelico writes: > > > I'd really like to see a lot more presentations done in pure text. > > Maybe so. My request at the moment, though, is not for people to change > what's on their slides; rather, if they want people to retrieve them, > the slides should be downloadable easily (i.e. without a web app, > without a registration to some specific site). > ... and having downloaded them what do you view them with if they're not plain text? -- Chris Green · -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
Ian Kelly wrote: > On Wed, Feb 25, 2015 at 1:45 PM, Mark Lawrence > wrote: > > http://www.slideshare.net/pydanny/python-worst-practices > > > > Any that should be added to this list? Any that be removed as not that bad? > > Using XML for configuration is a good example of a worst practice, but > using Python instead isn't best practice. There are good arguments > that a configuration language shouldn't be Turing-complete. See for > instance this blog post: http://taint.org/2011/02/18/001527a.html > I agree wholeheartedly about XML, it's just not designed for what half the world seems to be using it for. Rather like HTML in a way, that should have been a proper mark-up language. -- Chris Green · -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 26, 2015, at 12:36 AM, Gregory Ewing wrote: > Cem Karan wrote: >> I think I see what you're talking about now. Does WeakMethod >> (https://docs.python.org/3/library/weakref.html#weakref.WeakMethod) solve >> this problem? > > Yes, that looks like it would work. Cool! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
asyncio POLLHUP question
I have a system scenario where thousands of applications are running and via a service discovery mechanism they all get notified that a service they are all interesting in has come online. They all attempt to connect a TCP socket to the service. This happen virtually instantly. The problem that I see is that many of the applications that try to connect to the server get themselves into a state where they are consuming a lot of CPU. I am using Python 3.4.2, asyncio and have set the server backlog set to 4000 in an effort to accomodate the connection request backlog. I am actually using an event loop from aiozmq (but no ZMQ sockets in this scenaio) but under the covers this is just using epoll so it should really be the same as using the DefaultSelector. Using strace on the apps exhibiting issues I see that a socket is continuously triggering a POLLERR|POLLHUP event. This is the cause of the large CPU usage. The socket is the one that was attempting to connect to the new service that was just brought up. I am guessing that the POLLHUP is caused by the server having issues processing the volume of connect requests. I think I need to drop/close the socket causing the POLLHUP. However, from looking through the asyncio source code I don't see how I can do that from within the _selector.select() or _process_events() functions with only the knowledge of which fd is causing the issue. How do poll errors propagate up from the select loop? I can potentially unregister the fd but I don't think this will trigger the transport/protocol getting closed (as far as I can tell) which prevents my normal error handling scenarios from attempting to reconnect to the service. The asyncio select functions seem to ignore events other than EVENT_READ and EVENT_WRITE. Any help would be appreciated. Regards, Chris -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
Chris Angelico wrote: > On Thu, Feb 26, 2015 at 10:54 AM, Steven D'Aprano > wrote: >> - Violating the Rule of Demeter: don't talk to the dog's leg, talk to >> the dog. Or another way to put it: don't let the paper boy reach >> into your pocket for money. > > I'd call that code smell, rather than an automatic worst practice. Well, I did end my post with: And very possibly the worst practice of all: - Failing to understand when it is, and isn't, appropriate to break the rules and do what would otherwise be a bad practice. :-) > Suppose this: > > class Shelf: > def __init__(self): > self.items = [] # Empty shelf > > bookshelf = Shelf() > bookshelf.items.append(book) > > To enforce Demeter, you'd have to add a bunch of methods to the Shelf > whose sole purpose is to pass word along to the items list. Sure, it > makes some design sense to say "Add this book to the shelf" rather > than "Add this book to the items on the shelf", but all those lines of > code are potential bugs, and if you have to reimplement huge slabs of > functionality, that too is code smell. So there are times when it's > correct to reach into another object. Yes, well this comes down to the question of encapsulation and information-hiding. The advantage of exposing the list of items to the public is that anyone can add or remove items, sort them, reverse them, etc. The disadvantage is that you are now committed to keeping that list as part of the public API and you can't easily change the implementation. In this specific example, I'd probably keep the list as part of the Shelf API, although I'd be tempted to make self.items a read-only property. That will allow you to call list mutator methods, but prevent you from doing something silly like: bookshelf.items = 23 > But the times to use two dots are much rarer than the times to use one > dot (the paper boy shouldn't reach into your pocket for money, but > ThinkGeek has your credit card number on file so you can order more > conveniently), and I can't think of any example off-hand where you > would want more than three dots. The Law of Demeter is not really about counting dots. Ruby encourages chaining methods. Python doesn't, since built-ins typically don't return self. But in your own classes, you can have methods return self so you can chain them like this: mylist.append(spam).insert(1, eggs).append(cheese).sort().index(ham) Five dots or not, this is not a violation of Demeter. Likewise for long package names: from mylibrary.audiovisual.image.jpeg import Handler The Law of Demeter is more about information hiding. Clearly you don't hide *public* attributes of your class, otherwise they aren't public, so it's perfectly acceptable to say: myshelf.items.append(spam).insert(1, eggs).append(cheese).sort().index(ham) if items is public. But if the Shelf designer decides that the user shouldn't know anything about how the shelf stores its items, then the *first* dot violates the Law of Demeter. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
On Wednesday, February 25, 2015 at 2:12:09 AM UTC+5:30, Dave Angel wrote: > On 02/24/2015 02:57 PM, Laura Creighton wrote: > > Dave Angel > > are you another Native English speaker living in a world where ASCII > > is enough? > > I'm a native English speaker, and 7 bits is not nearly enough. Even if > I didn't currently care, I have some history: > > No. CDC display code is enough. Who needs lowercase? > > No. Baudot code is enough. > > No, EBCDIC is good enough. Who cares about other companies. > > No, the "golf-ball" only holds this many characters. If we need more, > we can just get the operator to switch balls in the middle of printing. > > No. 2 digit years is enough. This world won't last till the millennium > anyway. > > No. 2k is all the EPROM you can have. Your code HAS to fit in it, and > only 1.5k RAM. > > No. 640k is more than anyone could need. > > No, you cannot use a punch card made on a model 26 keypunch in the same > deck as one made on a model 29. Too bad, many of the codes are > different. (This one cost me travel back and forth between two > different locations with different model keypunches) > > No. 8 bits is as much as we could ever use for characters. Who could > possibly need names or locations outside of this region? Or from > multiple places within it? > > 35 years ago I helped design a serial terminal that "spoke" Chinese, > using a two-byte encoding. But a single worldwide standard didn't come > until much later, and I cheered Unicode when it was finally unveiled. > > I've worked with many printers that could only print 70 or 80 unique > characters. The laser printer, and even the matrix printer are > relatively recent inventions. Wrote something up on why we should stop using ASCII: http://blog.languager.org/2015/02/universal-unicode.html (Yeah the world is a bit larger than a small bunch of islands off a half-continent. But this is not that discussion!) -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
On Thu, Feb 26, 2015 at 11:26 PM, Steven D'Aprano wrote: > Chris Angelico wrote: > >> But the times to use two dots are much rarer than the times to use one >> dot (the paper boy shouldn't reach into your pocket for money, but >> ThinkGeek has your credit card number on file so you can order more >> conveniently), and I can't think of any example off-hand where you >> would want more than three dots. > > The Law of Demeter is not really about counting dots. Ruby encourages > chaining methods. Python doesn't, since built-ins typically don't return > self. But in your own classes, you can have methods return self so you can > chain them like this: > > mylist.append(spam).insert(1, eggs).append(cheese).sort().index(ham) > > Five dots or not, this is not a violation of Demeter. Likewise for long > package names: > > from mylibrary.audiovisual.image.jpeg import Handler Yes, there are other places where you have lots of dots... I'm talking about the "rule of thumb" shorthand for describing the Law of Demeter, which is that you shouldn't have more than one dot before your method call. The chaining isn't that, because each one is a separate entity; but if you say "fred.house.bookshelf.items.append(book)", you're reaching in far too deep - you should be giving Fred the book to place on his own shelf. That's the only way where "counting dots" is a valid shorthand. It's mentioned in the Wikipedia article for the law: https://en.wikipedia.org/wiki/Law_of_Demeter#In_object-oriented_programming Can you offer a less ambiguous way to describe Demeter violations? By the "counting dots" style, Demeter demands one, I would be happy with two, and three or more strongly suggests a flawed API or overly-tight coupling - with the exception that module references don't get counted ("import os; os.path.append(p)" is one dot, because there's no point having an "os.add_path()" method). In some languages, those module references would be notated differently (os::path.append(p)), so simply counting dots would be closer to accurate. Is there a better Python description? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
W dniu 25.02.2015 21:45, Mark Lawrence pisze: http://www.slideshare.net/pydanny/python-worst-practices Any that should be added to this list? Any that be removed as not that bad? I disagree with slide 16. If I wanted to use long variable names, I would still code in Java. regards m. -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
On Thursday, February 26, 2015 at 6:10:25 PM UTC+5:30, Rustom Mody wrote: > On Wednesday, February 25, 2015 at 2:12:09 AM UTC+5:30, Dave Angel wrote: > > On 02/24/2015 02:57 PM, Laura Creighton wrote: > > > Dave Angel > > > are you another Native English speaker living in a world where ASCII > > > is enough? > > > > I'm a native English speaker, and 7 bits is not nearly enough. Even if > > I didn't currently care, I have some history: > > > > No. CDC display code is enough. Who needs lowercase? > > > > No. Baudot code is enough. > > > > No, EBCDIC is good enough. Who cares about other companies. > > > > No, the "golf-ball" only holds this many characters. If we need more, > > we can just get the operator to switch balls in the middle of printing. > > > > No. 2 digit years is enough. This world won't last till the millennium > > anyway. > > > > No. 2k is all the EPROM you can have. Your code HAS to fit in it, and > > only 1.5k RAM. > > > > No. 640k is more than anyone could need. > > > > No, you cannot use a punch card made on a model 26 keypunch in the same > > deck as one made on a model 29. Too bad, many of the codes are > > different. (This one cost me travel back and forth between two > > different locations with different model keypunches) > > > > No. 8 bits is as much as we could ever use for characters. Who could > > possibly need names or locations outside of this region? Or from > > multiple places within it? > > > > 35 years ago I helped design a serial terminal that "spoke" Chinese, > > using a two-byte encoding. But a single worldwide standard didn't come > > until much later, and I cheered Unicode when it was finally unveiled. > > > > I've worked with many printers that could only print 70 or 80 unique > > characters. The laser printer, and even the matrix printer are > > relatively recent inventions. > > Wrote something up on why we should stop using ASCII: > http://blog.languager.org/2015/02/universal-unicode.html Dave's list above of instances of 'poverty is a good idea' turning out stupid and narrow-minded in hindsight is neat. Thought I'd ack that explicitly. -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody wrote: > Wrote something up on why we should stop using ASCII: > http://blog.languager.org/2015/02/universal-unicode.html From that post: """ 5.1 Gibberish When going from the original 2-byte unicode (around version 3?) to the one having supplemental planes, the unicode consortium added blocks such as * Egyptian hieroglyphs * Cuneiform * Shavian * Deseret * Mahjong * Klingon To me (a layman) it looks unprofessional – as though they are playing games – that billions of computing devices, each having billions of storage words should have their storage wasted on blocks such as these. """ The shift from Unicode as a 16-bit code to having multiple planes came in with Unicode 2.0, but the various blocks were assigned separately: * Egyptian hieroglyphs: Unicode 5.2 * Cuneiform: Unicode 5.0 * Shavian: Unicode 4.0 * Deseret: Unicode 3.1 * Mahjong Tiles: Unicode 5.1 * Klingon: Not part of any current standard However, I don't think historians will appreciate you calling all of these "gibberish". To adequately describe and discuss old texts without these Unicode blocks, we'd have to either do everything with images, or craft some kind of reversible transliteration system and have dedicated software to render the texts on screen. Instead, what we have is a well-known and standardized system for transliterating all of these into numbers (code points), and rendering them becomes a simple matter of installing an appropriate font. Also, how does assigning meanings to codepoints "waste storage"? As soon as Unicode 2.0 hit and 16-bit code units stopped being sufficient, everyone needed to allocate storage - either 32 bits per character, or some other system - and the fact that some codepoints were unassigned had absolutely no impact on that. This is decidedly NOT unprofessional, and it's not wasteful either. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
On Fri, Feb 27, 2015 at 12:12 AM, m wrote: > W dniu 25.02.2015 21:45, Mark Lawrence pisze: >> >> http://www.slideshare.net/pydanny/python-worst-practices >> >> Any that should be added to this list? Any that be removed as not that >> bad? >> > > > I disagree with slide 16. If I wanted to use long variable names, I would > still code in Java. Clearly you aren't bothered by ambiguities, given that your name is "m". You're lower-case m, and the James Bond character is upper-case M... yeah, this isn't going to be a problem, with seven billion people on the planet! In case it's not obvious from slide 17, the author is advocating neither the ridiculously short, nor the ridiculously long. This is a topic that you could go into great detail on, but a general rule of thumb is that short names go with short-lived variables, and longer names go with large-scope variables. [1] So your function names shouldn't be single letters, but your loop counters can and should be short: def discard_all_spam(): for msg in self.messages: if msg.is_spam(): ms.discard() And of course, the use of "i" as an integer loop index dates back so far and is so well known that you don't need anything else: def get_password(): for i in range(4): if i: print("%d wrong tries...") s = input("What's the password? ") if validate_password(s): return s print("Too many wrong tries, go away.") This isn't Java coding. ChrisA [1] Yes, Python doesn't have variables per se. But how else am I supposed to differentiate between the name and the concept of a name binding? -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
On 2015-02-26, Ben Finney wrote: > Chris Angelico writes: > >> IMO the whole system of boolean logic in shell scripts is a massive >> pile of hacks. > > Agreed. It bears all the hallmarks of a system which has been > extended to become a complete programming language only with extreme > reluctance on its part. > > I continue to be impressed by how capable and powerful Unix shell is > as a full programming language. Especially it is sorely missed on > other OSes which lack a capable shell. > > But it could never be called “elegant”. Unless you've spent all day working on PHP code. -- Grant Edwards grant.b.edwardsYow! All of life is a blur at of Republicans and meat! gmail.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Is anyone else unable to log into the bug tracker?
On Friday, January 9, 2015 at 7:49:09 PM UTC-6, Steven D'Aprano wrote: > I'm having trouble logging into the bug tracker. Is anyone else having the > same problem, that is, your user name and password worked earlier but > doesn't work now? > > http://bugs.python.org/ > > (Yes, I've checked the capslock key.) > > > > Before I request a new password, I want to check whether it is me or > everyone. > > > -- > Steven On Friday, January 9, 2015 at 7:49:09 PM UTC-6, Steven D'Aprano wrote: > I'm having trouble logging into the bug tracker. Is anyone else having the > same problem, that is, your user name and password worked earlier but > doesn't work now? > > http://bugs.python.org/ > > (Yes, I've checked the capslock key.) > > > > Before I request a new password, I want to check whether it is me or > everyone. > > > -- > Steven On Friday, January 9, 2015 at 7:49:09 PM UTC-6, Steven D'Aprano wrote: > I'm having trouble logging into the bug tracker. Is anyone else having the > same problem, that is, your user name and password worked earlier but > doesn't work now? > > http://bugs.python.org/ > > (Yes, I've checked the capslock key.) > > > > Before I request a new password, I want to check whether it is me or > everyone. > > > -- > Steven I am having this problem, even after I requested a new password. All I get is 'invalid login'. How did you resolve? Thx. -- https://mail.python.org/mailman/listinfo/python-list
Windows permission error, 64 bit, psycopg2, python 3.4.2
I am one of those struggling with compile issues with python on 64 bit windows. I have not been able to get the solutions mentioned on Stack Overflow to work because installing Windows SDK 7.1 fails for me. So I stumbled across a precompiled psycopg2, and that reported that it worked, but then I got two permission errors. Then I read that this was a bug in python (issue 14252) that had been fixed, but I don't think this is the same error. That one specifically refers to subprocess.py and I don't have that in my traceback. I have v3.4.2. On top of everything else, despite requesting a new password, all I get from the big tracker is 'invalid login'. In any event, running "import psycopg2" returns 'import error, no module named psycopg2'. Microsoft Windows [Version 6.3.9600] (c) 2013 Microsoft Corporation. All rights reserved. C:\Users\Semantic>pip install git+https://github.com/nwcell/psycopg2-windows.git @win64-py34#egg=psycopg2 Downloading/unpacking psycopg2 from git+https://github.com/nwcell/psycopg2-windo ws.git@win64-py34 Cloning https://github.com/nwcell/psycopg2-windows.git (to win64-py34) to c:\u sers\semantic\appdata\local\temp\pip_build_semantic\psycopg2 Running setup.py (path:C:\Users\Semantic\AppData\Local\Temp\pip_build_Semantic \psycopg2\setup.py) egg_info for package psycopg2 C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution opt ion: 'summary' warnings.warn(msg) Installing collected packages: psycopg2 Running setup.py install for psycopg2 C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution opt ion: 'summary' warnings.warn(msg) Successfully installed psycopg2 Cleaning up... Exception: Traceback (most recent call last): File "C:\Python34\lib\shutil.py", line 370, in _rmtree_unsafe os.unlink(fullname) PermissionError: [WinError 5] Access is denied: 'C:\\Users\\Semantic\\AppData\\L ocal\\Temp\\pip_build_Semantic\\psycopg2\\.git\\objects\\pack\\pack-be4d3da4a06b 4c9ec4c06040dbf6685eeccca068.idx' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Python34\lib\site-packages\pip\basecommand.py", line 122, in main status = self.run(options, args) File "C:\Python34\lib\site-packages\pip\commands\install.py", line 302, in run requirement_set.cleanup_files(bundle=self.bundle) File "C:\Python34\lib\site-packages\pip\req.py", line 1333, in cleanup_files rmtree(dir) File "C:\Python34\lib\site-packages\pip\util.py", line 43, in rmtree onerror=rmtree_errorhandler) File "C:\Python34\lib\shutil.py", line 477, in rmtree return _rmtree_unsafe(path, onerror) File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) File "C:\Python34\lib\shutil.py", line 372, in _rmtree_unsafe onerror(os.unlink, fullname, sys.exc_info()) File "C:\Python34\lib\site-packages\pip\util.py", line 53, in rmtree_errorhand ler (exctype is PermissionError and value.args[3] == 5) #python3.3 IndexError: tuple index out of range -- https://mail.python.org/mailman/listinfo/python-list
EuroPython 2015: Launch preparations are underway
The EuroPython Workgroups are busy preparing the launch of the website. Just launched in mid-January, all workgroups (WGs) are fully under steam by now, working hard to make EuroPython 2015 a fabulous event. http://ep2015.europython.eu/ Community building the conference - The *On-site Team WG* is doing a wonderful job getting us the best possible deals in Bilbao, the *Web WG* is knee deep into code and docker containers setting up the website, the *Marketing & Design WG* working with the designers to create wonderful logos and brochures, the *Program WG* contacting keynote speakers and creating the call for proposals, the *Finance WG* building the budget and making sure the conference stays affordable for everyone, the *Support WG* setting up the online help desk to answer your questions, the *Communications WG* preparing to create a constant stream of exciting news updates, the *Administration WG* is managing the many accounts, contracts and services needed to run the organization. The *Financial Aid WG* and *Media WG* are preparing to start their part of the conference organization later in March. http://www.europython-society.org/workgroups The WGs are all staffed with members from the ACPySS on-site team, the EuroPython Society and volunteers from the EuroPython community to drive the organization forward and we’re getting a lot done in a very short time frame. More help needed We are very happy with the help we are getting from the community, but there still is a lot more to be done. If you want to help us build a great EuroPython conference, please consider joining one of the above workgroups: http://www.europython-society.org/workgroups Stay tuned and be sure to follow the EuroPython Blog for updates on the conference: http://blog.europython.eu/ Enjoy, - EuroPython Society (EPS) http://www.europython-society.org/ -- https://mail.python.org/mailman/listinfo/python-list
Installing PIL without internet access
I have a host that has no access to the internet and I need to install PIL on it. I have an identical host that is on the internet and I have installed it there (with pip). Is there a way I can copy files from the connected host to a flash drive and then copy them to the unconnected host and have PIL working there? Which files would I copy for that? This is on CentOS 6.5, python 2.7 Thanks! -- https://mail.python.org/mailman/listinfo/python-list
Re: EuroPython 2015: Launch preparations are underway
On Fri, Feb 27, 2015 at 2:16 AM, M.-A. Lemburg wrote: > [ a whole lot of relatively sane text ] Sadly, this was not what I wanted to see, based on the subject line. I wanted to know about the snake you guys were about to send into space! ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallelization of Python on GPU?
On Wed, 2015-02-25 at 18:35 -0800, John Ladasky wrote: > I've been working with machine learning for a while. Many of the > standard packages (e.g., scikit-learn) have fitting algorithms which > run in single threads. These algorithms are not themselves > parallelized. Perhaps, due to their unique mathematical requirements, > they cannot be paralleized. > > When one is investigating several potential models of one's data with > various settings for free parameters, it is still sometimes possible > to speed things up. On a modern machine, one can use Python's > multiprocessing.Pool to run separate instances of scikit-learn fits. > I am currently using ten of the twelve 3.3 GHz CPU cores on my machine > to do just that. And I can still browse the web with no observable > lag. :^) > > Still, I'm waiting hours for jobs to finish. Support vector > regression fitting is hard. > > What I would REALLY like to do is to take advantage of my GPU. My > NVidia graphics card has 1152 cores and a 1.0 GHz clock. I wouldn't > mind borrowing a few hundred of those GPU cores at a time, and see > what they can do. In theory, I calculate that I can speed up the job > by another five-fold. > > The trick is that each process would need to run some PYTHON code, not > CUDA or OpenCL. The child process code isn't particularly fancy. (I > should, for example, be able to switch that portion of my code to > static typing.) > > What is the most effective way to accomplish this task? GPU computing is a lot more than simply saying "run this on a GPU". To realize the performance gains promised by a GPU, you need to tailor your algorithms to take advantage of their hardware... SIMD reigns supreme where thread divergence and branching are far more expensive than they are in CPU computing. So even if you decide to somehow translate your Python code into a CUDA kernel, there is a good chance that you will be woefully disappointed in the resulting speedup (or even moreso if you actually get a slowdown :)). For example, a simple reduction is more expensive on a GPU than it is on a CPU for small arrays. A dot product, for example, has a part that's super fast on the GPU (element-by-element multiplication), and then a part that gets a lot slower (summing up all elements of the resulting multiplication). Each core on the GPU is a lot slower than a CPU (which is why a 1000-CUDA-core GPU doesn't run anywhere near 1000x faster than a CPU), so you really only get gains when they can all work efficiently together. Another example -- matrix multiplies are *fast*. Diagonalizations are slow (which is why in my field where diagonalizations are common requirements, they are often done on the CPU while *building* the matrix is done on the GPU). > > I came across a reference to a package called "Urutu" which may be > what I need, however it doesn't look like it is widely supported. Urutu seems to be built on PyCUDA and PyOpenCL (which are both written by the same person; Andreas Kloeckner at UIUC in the United States). Another package I would suggest looking into is numba, from Continuum Analytics: https://github.com/numba/numba. Unlike Urutu, their package is built on LLVM and Python bindings they've written to implement numpy-aware JIT capabilities. I believe they also permit compiling down to a GPU kernel through LLVM. One downside I've experienced with that package is that LLVM does not yet have a stable API (as I understand it), so they often lag behind support for the latest versions of LLVM. > > I would love it if the Python developers themselves added the ability > to spawn GPU processes to the Multiprocessing module! I would be stunned if this actually happened. If you're worried about performance, you get at least an order of magnitude performance boost by going to numpy or writing the kernel directly in C or Fortran. CPython itself just isn't structured to run on a GPU... maybe pypy will tackle that at some point in the probably-distant future. All the best, Jason -- Jason M. Swails BioMaPS, Rutgers University Postdoctoral Researcher -- https://mail.python.org/mailman/listinfo/python-list
ANN: Wing IDE 5.1.2 released
Hi, Wingware has released version 5.1.2 of Wing IDE, our cross-platform integrated development environment for the Python programming language. Wing IDE features a professional code editor with vi, emacs, visual studio, and other key bindings, auto-completion, call tips, context-sensitive auto-editing, goto-definition, find uses, refactoring, a powerful debugger, version control, unit testing, search, project management, and many other features. This minor release includes the following improvements: Support for recent Google App Engine versions Expanded and improved static analysis for PyQt Added class and instance attributes to the Find Symbol dialog Support recursive invocation of snippets, auto-invocation arg entry, and field-based auto-editing operations (e.g. :try applied to a selected range) Support for python3-pylint Code sign all exe, dll, and pyd files on Windows Fix a number of child process debugging scenarios Fix source assistant formatting of PEP287 fields with long fieldname Fix indent level for pasted text after single undo for indent adjustment Fix introduce variable refactoring and if (exp): statements About 12 other bug fixes; see http://wingware.com/pub/wingide/5.1.2/CHANGELOG.txt What's New in Wing 5.1: Wing IDE 5.1 adds multi-process and child process debugging, syntax highlighting in the shells, persistent time-stamped unit test results, auto-conversion of indents on paste, an XCode keyboard personality, support for Flask, Django 1.7 & recent Google App Engine versions, improved auto-completion for PyQt, recursive snippet invocation, and many other minor features and improvements. For details see http://wingware.com/news/2015-02-25 Free trial: http://wingware.com/wingide/trial Downloads: http://wingware.com/downloads Feature list: http://wingware.com/wingide/features Sales: http://wingware.com/store/purchase Upgrades: https://wingware.com/store/upgrade Questions? Don't hesitate to email us at supp...@wingware.com. Thanks, -- Stephan Deibel Wingware | Python IDE The Intelligent Development Environment for Python Programmers wingware.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallelization of Python on GPU?
On Thu, 2015-02-26 at 14:02 +1100, Steven D'Aprano wrote: > John Ladasky wrote: > > > > What I would REALLY like to do is to take advantage of my GPU. > > I can't help you with that, but I would like to point out that GPUs > typically don't support IEE-754 maths, which means that while they are > likely significantly faster, they're also likely significantly less > accurate. Any any two different brands/models of GPU are likely to give > different results. (Possibly not *very* different, but considering the mess > that floating point maths was prior to IEEE-754, possibly *very* different.) This hasn't been true in NVidia GPUs manufactured since ca. 2008. > Personally, I wouldn't trust GPU floating point for serious work. Maybe for > quick and dirty exploration of the data, but I'd then want to repeat any > calculations using the main CPU before using the numbers anywhere :-) There is a *huge* dash toward GPU computing in the scientific computing sector. Since I started as a graduate student in computational chemistry/physics in 2008, I watched as state-of-the-art supercomputers running tens of thousands to hundreds of thousands of cores were overtaken in performance by a $500 GPU (today the GTX 780 or 980) you can put in a desktop. I went from running all of my calculations on a CPU cluster in 2009 to running 90% of my calculations on a GPU by the time I graduated in 2013... and for people without as ready access to supercomputers as myself the move was even more pronounced. This work is very serious, and numerical precision is typically of immense importance. See, e.g., http://www.sciencedirect.com/science/article/pii/S0010465512003098 and http://pubs.acs.org/doi/abs/10.1021/ct400314y In our software, we can run simulations on a GPU or a CPU and the results are *literally* indistinguishable. The transition to GPUs was accompanied by a series of studies that investigated precisely your concerns... we would never have started using GPUs if we didn't trust GPU numbers as much as we did from the CPU. And NVidia is embracing this revolution (obviously) -- they are putting a lot of time, effort, and money into ensuring the success of GPU high performance computing. It is here to stay in the immediate future, and refusing to use the technology will leave those that *could* benefit from it at a severe disadvantage. (That said, GPUs aren't good at everything, and CPUs are also here to stay.) And GPU performance gains are outpacing CPU performance gains -- I've seen about two orders of magnitude improvement in computational throughput over the past 6 years through the introduction of GPU computing and improvements in GPU hardware. All the best, Jason -- Jason M. Swails BioMaPS, Rutgers University Postdoctoral Researcher -- https://mail.python.org/mailman/listinfo/python-list
Building C++ modules for python using GNU autotools, automake, whatever
Hi, I'm a complete neophyte to the whole use of GNU autotools/automake/auto... . (I'm not sure what it should be called anymore.) Regardless, I'm porting a library project, for which I'm a team member, to using this toolset for building in Linux. I'm to the point now of writing the Makefile.am file for the actual library. (There are several other static libraries compiled first that are sucked into this shared object file.) I found some references here: http://www.gnu.org/savannah-checkouts/gnu/automake/manual/html_node/Python.html, which seemed to be just what I was after. However, I've got a big question about a file named "module.la" instead of "module.so" which is what we compile it to now. I guess I should have mentioned some background. Currently, we build this tool through some homegrown makefiles. This has worked, but distribution is difficult and our product must now run on an embedded platform (so building it cleanly requires the use of autotools). Basically, I need this thing to install to /usr/lib/python2.6/site-packages when the user invokes "make install". I thought the variables and primaries discussed at the link above were what I needed. However, what is a "*.la"? I'm reading up on libtool now, but will it function the same way as a *.so? I need pointers on where to go from here. Thanks, Andy -- https://mail.python.org/mailman/listinfo/python-list
Re: Is anyone else unable to log into the bug tracker?
Malik Rumi wrote: > On Friday, January 9, 2015 at 7:49:09 PM UTC-6, Steven D'Aprano wrote: >> I'm having trouble logging into the bug tracker. Is anyone else having >> the same problem, that is, your user name and password worked earlier but >> doesn't work now? >> >> http://bugs.python.org/ >> >> (Yes, I've checked the capslock key.) [...] > I am having this problem, even after I requested a new password. All I get > is 'invalid login'. How did you resolve? Thx. I was suffering from a PEBCAK error, and was using the wrong password. Once I started using the right one, it just worked for me. I seem to recall that you need to accept cookies for the bugtracker to log you in. Try that and see if it helps. Sorry that I can't be of more help. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Building C++ modules for python using GNU autotools, automake, whatever
On Thu, 2015-02-26 at 07:57 -0800, af300...@gmail.com wrote: > Hi, > > I'm a complete neophyte to the whole use of GNU > autotools/automake/auto... . (I'm not sure what it should be called > anymore.) Regardless, I'm porting a library project, for which I'm a > team member, to using this toolset for building in Linux. I'm to the > point now of writing the Makefile.am file for the actual library. > (There are several other static libraries compiled first that are > sucked into this shared object file.) > > I found some references here: > http://www.gnu.org/savannah-checkouts/gnu/automake/manual/html_node/Python.html, > which seemed to be just what I was after. However, I've got a big question > about a file named "module.la" instead of "module.so" which is what we > compile it to now. I certainly hope module.la is not what it gets compiled to. Open it up with a text editor :). It's just basically a description of the library that libtool makes use of. In the projects that I build, the .la files are all associated with a .a archive or a .so (/.dylib for Macs). Obviously, static archives won't work for Python (and, in particular, I believe you need to compile all of the objects as position independent code, so you need to make sure the appropriate PIC flag is given to the compiler... for g++ that would be -fPIC). > > I guess I should have mentioned some background. Currently, we build > this tool through some homegrown makefiles. This has worked, but > distribution is difficult and our product must now run on an embedded > platform (so building it cleanly requires the use of autotools). > > Basically, I need this thing to install > to /usr/lib/python2.6/site-packages when the user invokes "make > install". I thought the variables and primaries discussed at the link > above were what I needed. However, what is a "*.la"? I'm reading up > on libtool now, but will it function the same way as a *.so? To libtool, yes... provided that you *also* have the .so with the same base name as the .la. I don't think compilers themselves make any use of .la files, though. HTH, Jason -- Jason M. Swails BioMaPS, Rutgers University Postdoctoral Researcher -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallelization of Python on GPU?
If you are doing SVM regression with scikit-learn you are using libSVM. There is a CUDA accelerated version of this C library here: http://mklab.iti.gr/project/GPU-LIBSVM You can presumably reuse the wrapping code from scikit-learn. Sturla John Ladasky wrote: > I've been working with machine learning for a while. Many of the > standard packages (e.g., scikit-learn) have fitting algorithms which run > in single threads. These algorithms are not themselves parallelized. > Perhaps, due to their unique mathematical requirements, they cannot be > paralleized. > > When one is investigating several potential models of one's data with > various settings for free parameters, it is still sometimes possible to > speed things up. On a modern machine, one can use Python's > multiprocessing.Pool to run separate instances of scikit-learn fits. I > am currently using ten of the twelve 3.3 GHz CPU cores on my machine to > do just that. And I can still browse the web with no observable lag. :^) > > Still, I'm waiting hours for jobs to finish. Support vector regression > fitting is hard. > > What I would REALLY like to do is to take advantage of my GPU. My NVidia > graphics card has 1152 cores and a 1.0 GHz clock. I wouldn't mind > borrowing a few hundred of those GPU cores at a time, and see what they > can do. In theory, I calculate that I can speed up the job by another > five-fold. > > The trick is that each process would need to run some PYTHON code, not > CUDA or OpenCL. The child process code isn't particularly fancy. (I > should, for example, be able to switch that portion of my code to static > typing.) > > What is the most effective way to accomplish this task? > > I came across a reference to a package called "Urutu" which may be what I > need, however it doesn't look like it is widely supported. > > I would love it if the Python developers themselves added the ability to > spawn GPU processes to the Multiprocessing module! > > Thanks for any advice and comments. -- https://mail.python.org/mailman/listinfo/python-list
Re: GDAL Installation in Enthought Python Distribution
On 2/26/2015 2:47 AM, Leo Kris Palao wrote: Would like to request how to install GDAL in my Enthought Python Distribution (64-bit). The best place to ask about the Enthought Python Distribution is a list devoted to the E. P. D. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Is anyone else unable to log into the bug tracker?
I have not had problems, but I use the Google login (Open ID, I presume) option. Skip -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
I'm 100% in favor of expanding Unicode until the sun goes dark. Doing so helps solve the problems affecting speakers of "underserved" languages--access and language preservation. Speakers of Mongolian, Cherokee, Georgian, etc. all deserve to be able to interact with technology in their native languages as much as we speakers of ASCII-friendly languages do. Unicode support also makes writing papers on, dictionaries of, and new texts in such languages much easier, which helps the fight against language extinction, which is a sadly pressing issue. Also, like, computers are big. Get an external drive for your high-resolution PDF collection of Medieval manuscripts if you feel like you're running out of space. A few extra codepoints aren't going to be the straw that breaks the camel's back. On Thursday, February 26, 2015 at 8:24:34 AM UTC-5, Chris Angelico wrote: > On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody wrote: > > Wrote something up on why we should stop using ASCII: > > http://blog.languager.org/2015/02/universal-unicode.html > > >From that post: > > """ > 5.1 Gibberish > > When going from the original 2-byte unicode (around version 3?) to the > one having supplemental planes, the unicode consortium added blocks > such as > > * Egyptian hieroglyphs > * Cuneiform > * Shavian > * Deseret > * Mahjong > * Klingon > > To me (a layman) it looks unprofessional - as though they are playing > games - that billions of computing devices, each having billions of > storage words should have their storage wasted on blocks such as > these. > """ > > The shift from Unicode as a 16-bit code to having multiple planes came > in with Unicode 2.0, but the various blocks were assigned separately: > * Egyptian hieroglyphs: Unicode 5.2 > * Cuneiform: Unicode 5.0 > * Shavian: Unicode 4.0 > * Deseret: Unicode 3.1 > * Mahjong Tiles: Unicode 5.1 > * Klingon: Not part of any current standard > > However, I don't think historians will appreciate you calling all of > these "gibberish". To adequately describe and discuss old texts > without these Unicode blocks, we'd have to either do everything with > images, or craft some kind of reversible transliteration system and > have dedicated software to render the texts on screen. Instead, what > we have is a well-known and standardized system for transliterating > all of these into numbers (code points), and rendering them becomes a > simple matter of installing an appropriate font. > > Also, how does assigning meanings to codepoints "waste storage"? As > soon as Unicode 2.0 hit and 16-bit code units stopped being > sufficient, everyone needed to allocate storage - either 32 bits per > character, or some other system - and the fact that some codepoints > were unassigned had absolutely no impact on that. This is decidedly > NOT unprofessional, and it's not wasteful either. > > ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallelization of Python on GPU?
GPU computing is great if you have the following: 1. Your data structures are arrays floating point numbers. 2. You have a data-parallel problem. 3. You are happy with single precision. 4. You have time to code erything in CUDA or OpenCL. 5. You have enough video RAM to store your data. For Python the easiest solution is to use Numba Pro. Sturla Jason Swails wrote: > On Thu, 2015-02-26 at 14:02 +1100, Steven D'Aprano wrote: >> John Ladasky wrote: >> >> >>> What I would REALLY like to do is to take advantage of my GPU. >> >> I can't help you with that, but I would like to point out that GPUs >> typically don't support IEE-754 maths, which means that while they are >> likely significantly faster, they're also likely significantly less >> accurate. Any any two different brands/models of GPU are likely to give >> different results. (Possibly not *very* different, but considering the mess >> that floating point maths was prior to IEEE-754, possibly *very* different.) > > This hasn't been true in NVidia GPUs manufactured since ca. 2008. > >> Personally, I wouldn't trust GPU floating point for serious work. Maybe for >> quick and dirty exploration of the data, but I'd then want to repeat any >> calculations using the main CPU before using the numbers anywhere :-) > > There is a *huge* dash toward GPU computing in the scientific computing > sector. Since I started as a graduate student in computational > chemistry/physics in 2008, I watched as state-of-the-art supercomputers > running tens of thousands to hundreds of thousands of cores were > overtaken in performance by a $500 GPU (today the GTX 780 or 980) you > can put in a desktop. I went from running all of my calculations on a > CPU cluster in 2009 to running 90% of my calculations on a GPU by the > time I graduated in 2013... and for people without as ready access to > supercomputers as myself the move was even more pronounced. > > This work is very serious, and numerical precision is typically of > immense importance. See, e.g., > http://www.sciencedirect.com/science/article/pii/S0010465512003098 and > http://pubs.acs.org/doi/abs/10.1021/ct400314y > > In our software, we can run simulations on a GPU or a CPU and the > results are *literally* indistinguishable. The transition to GPUs was > accompanied by a series of studies that investigated precisely your > concerns... we would never have started using GPUs if we didn't trust > GPU numbers as much as we did from the CPU. > > And NVidia is embracing this revolution (obviously) -- they are putting > a lot of time, effort, and money into ensuring the success of GPU high > performance computing. It is here to stay in the immediate future, and > refusing to use the technology will leave those that *could* benefit > from it at a severe disadvantage. (That said, GPUs aren't good at > everything, and CPUs are also here to stay.) > > And GPU performance gains are outpacing CPU performance gains -- I've > seen about two orders of magnitude improvement in computational > throughput over the past 6 years through the introduction of GPU > computing and improvements in GPU hardware. > > All the best, > Jason -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing PIL without internet access
On 2015-02-26 15:23, Larry Martell wrote: I have a host that has no access to the internet and I need to install PIL on it. I have an identical host that is on the internet and I have installed it there (with pip). Is there a way I can copy files from the connected host to a flash drive and then copy them to the unconnected host and have PIL working there? Which files would I copy for that? This is on CentOS 6.5, python 2.7 Have a look here: https://pip.pypa.io/en/latest/reference/pip_install.html#pip-install-options It says that you can install from a downloaded file, e.g.: pip install ./downloads/SomePackage-1.0.4.tar.gz -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
On 2/26/2015 8:24 AM, Chris Angelico wrote: On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody wrote: Wrote something up on why we should stop using ASCII: http://blog.languager.org/2015/02/universal-unicode.html I think that the main point of the post, that many Unicode chars are truly planetary rather than just national/regional, is excellent. From that post: """ 5.1 Gibberish When going from the original 2-byte unicode (around version 3?) to the one having supplemental planes, the unicode consortium added blocks such as * Egyptian hieroglyphs * Cuneiform * Shavian * Deseret * Mahjong * Klingon To me (a layman) it looks unprofessional – as though they are playing games – that billions of computing devices, each having billions of storage words should have their storage wasted on blocks such as these. """ The shift from Unicode as a 16-bit code to having multiple planes came in with Unicode 2.0, but the various blocks were assigned separately: * Egyptian hieroglyphs: Unicode 5.2 * Cuneiform: Unicode 5.0 * Shavian: Unicode 4.0 * Deseret: Unicode 3.1 * Mahjong Tiles: Unicode 5.1 * Klingon: Not part of any current standard You should add emoticons, but not call them or the above 'gibberish'. I think that this part of your post is more 'unprofessional' than the character blocks. It is very jarring and seems contrary to your main point. However, I don't think historians will appreciate you calling all of these "gibberish". To adequately describe and discuss old texts without these Unicode blocks, we'd have to either do everything with images, or craft some kind of reversible transliteration system and have dedicated software to render the texts on screen. Instead, what we have is a well-known and standardized system for transliterating all of these into numbers (code points), and rendering them becomes a simple matter of installing an appropriate font. Also, how does assigning meanings to codepoints "waste storage"? As soon as Unicode 2.0 hit and 16-bit code units stopped being sufficient, everyone needed to allocate storage - either 32 bits per character, or some other system - and the fact that some codepoints were unassigned had absolutely no impact on that. This is decidedly NOT unprofessional, and it's not wasteful either. I agree. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
On Thursday, February 26, 2015 at 10:16:11 PM UTC+5:30, Sam Raker wrote: > I'm 100% in favor of expanding Unicode until the sun goes dark. Doing so > helps solve the problems affecting speakers of "underserved" > languages--access and language preservation. Speakers of Mongolian, Cherokee, > Georgian, etc. all deserve to be able to interact with technology in their > native languages as much as we speakers of ASCII-friendly languages do. > Unicode support also makes writing papers on, dictionaries of, and new texts > in such languages much easier, which helps the fight against language > extinction, which is a sadly pressing issue. Agreed -- Correcting the inequities caused by ASCII-bias is a good thing. In fact the whole point of my post was to say just that by carving out and focussing on a 'universal' subset of unicode that is considerably larger than ASCII but smaller than unicode, we stand to reduce ASCII-bias. As also other posts like http://blog.languager.org/2014/04/unicoded-python.html http://blog.languager.org/2014/05/unicode-in-haskell-source.html However my example listed > > * Egyptian hieroglyphs > > * Cuneiform > > * Shavian > > * Deseret > > * Mahjong > > * Klingon Ok Chris has corrected me re. Klingon-in-unicode. So lets drop that. Of the others which do you thing is in 'underserved' category? More generally which of http://en.wikipedia.org/wiki/Plane_%28Unicode%29#Supplementary_Multilingual_Plane are underserved? -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing PIL without internet access
On 2/26/2015 10:23 AM, Larry Martell wrote: I have a host that has no access to the internet and I need to install PIL on it. I have an identical host that is on the internet and I have installed it there (with pip). Is there a way I can copy files from the connected host to a flash drive and then copy them to the unconnected host and have PIL working there? Which files would I copy for that? This is on CentOS 6.5, python 2.7 On Windows, I would look in python27/Lib/site-packages for PIL and pil-dist-info directories and copy. look in python27/script for pil*.py and copy -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallelization of Python on GPU?
On 2/26/2015 10:06 AM, Jason Swails wrote: On Thu, 2015-02-26 at 14:02 +1100, Steven D'Aprano wrote: John Ladasky wrote: What I would REALLY like to do is to take advantage of my GPU. I can't help you with that, but I would like to point out that GPUs typically don't support IEE-754 maths, which means that while they are likely significantly faster, they're also likely significantly less accurate. Any any two different brands/models of GPU are likely to give different results. (Possibly not *very* different, but considering the mess that floating point maths was prior to IEEE-754, possibly *very* different.) This hasn't been true in NVidia GPUs manufactured since ca. 2008. Personally, I wouldn't trust GPU floating point for serious work. Maybe for quick and dirty exploration of the data, but I'd then want to repeat any calculations using the main CPU before using the numbers anywhere :-) There is a *huge* dash toward GPU computing in the scientific computing sector. Since I started as a graduate student in computational chemistry/physics in 2008, I watched as state-of-the-art supercomputers running tens of thousands to hundreds of thousands of cores were overtaken in performance by a $500 GPU (today the GTX 780 or 980) you can put in a desktop. I went from running all of my calculations on a CPU cluster in 2009 to running 90% of my calculations on a GPU by the time I graduated in 2013... and for people without as ready access to supercomputers as myself the move was even more pronounced. This work is very serious, and numerical precision is typically of immense importance. See, e.g., http://www.sciencedirect.com/science/article/pii/S0010465512003098 and http://pubs.acs.org/doi/abs/10.1021/ct400314y In our software, we can run simulations on a GPU or a CPU and the results are *literally* indistinguishable. The transition to GPUs was accompanied by a series of studies that investigated precisely your concerns... we would never have started using GPUs if we didn't trust GPU numbers as much as we did from the CPU. And NVidia is embracing this revolution (obviously) -- they are putting a lot of time, effort, and money into ensuring the success of GPU high performance computing. It is here to stay in the immediate future, and refusing to use the technology will leave those that *could* benefit from it at a severe disadvantage. (That said, GPUs aren't good at everything, and CPUs are also here to stay.) And GPU performance gains are outpacing CPU performance gains -- I've seen about two orders of magnitude improvement in computational throughput over the past 6 years through the introduction of GPU computing and improvements in GPU hardware. Thanks for the update. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
On Fri, Feb 27, 2015 at 4:02 AM, Terry Reedy wrote: > On 2/26/2015 8:24 AM, Chris Angelico wrote: >> >> On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody >> wrote: >>> >>> Wrote something up on why we should stop using ASCII: >>> http://blog.languager.org/2015/02/universal-unicode.html > > > I think that the main point of the post, that many Unicode chars are truly > planetary rather than just national/regional, is excellent. Agreed. Like you, though, I take exception at the "Gibberish" section. Unicode offers us a number of types of character needed by linguists: 1) Letters[1] common to many languages, such as the unadorned Latin and Cyrillic letters 2) Letters specific to one or very few languages, such as the Turkish dotless i 3) Diacritical marks, ready to be combined with various letters 4) Precomposed forms of various common "letter with diacritical" combinations 5) Other precomposed forms, eg ligatures and Hangul syllables 6) Symbols, punctuation, and various other marks 7) Spacing of various widths and attributes Apart from #4 and #5, which could be avoided by using the decomposed forms everywhere, each of these character types is vital. You can't typeset a document without being able to adequately represent every part of it. Then there are additional characters that aren't strictly necessary, but are extremely convenient, such as the emoticon sections. You can talk in text and still put in a nice little picture of a globe, or the monkey-no-evil set, etc. Most of these characters - in fact, all except #2 and maybe a few of the diacritical marks - are used in multiple places/languages. Unicode isn't about taking everyone's separate character sets and numbering them all so we can reference characters from anywhere; if you wanted that, you'd be much better off with something that lets you specify a code page in 16 bits and a character in 8, which is roughly the same size as Unicode anyway. What we have is, instead, a system that brings them all together - LATIN SMALL LETTER A is U+0061 no matter whether it's being used to write English, French, Malaysian, Turkish, Croatian, Vietnamese, or Icelandic text. Unicode is truly planetary. ChrisA [1] I use the word "letter" loosely here; Chinese and Japanese don't have a concept of letters as such, but their glyphs are still represented. -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallelization of Python on GPU?
On Thursday, February 26, 2015 at 8:41:26 AM UTC-8, Sturla Molden wrote: > If you are doing SVM regression with scikit-learn you are using libSVM. > There is a CUDA accelerated version of this C library here: > http://mklab.iti.gr/project/GPU-LIBSVM > > You can presumably reuse the wrapping code from scikit-learn. > > Sturla Hi Sturla, I recognize your name from the scikit-learn mailing list. If you look a few posts above yours in this thread, I am aware of gpu-libsvm. I don't know if I'm up to the task of reusing the scikit-learn wrapping code, but I am giving that option some serious thought. It isn't clear to me that gpu-libsvm can handle both SVM and SVR, and I have need of both algorithms. My training data sets are around 5000 vectors long. IF that graph on the gpu-libsvm web page is any indication of what I can expect from my own data (I note that they didn't specify the GPU card they're using), I might realize a 20x increase in speed. -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallelization of Python on GPU?
On Thu, 2015-02-26 at 16:53 +, Sturla Molden wrote: > GPU computing is great if you have the following: > > 1. Your data structures are arrays floating point numbers. It actually works equally great, if not better, for integers. > 2. You have a data-parallel problem. This is the biggest one, IMO. ^^^ > 3. You are happy with single precision. NVidia GPUs have double-precision maths in hardware since compute capability 1.2 (GTX 280). That's ca. 2008. In optimized CPU code, you still get ~50% benefit going from double to single precision (it's rarely ever that high, but 20-30% is commonplace in my experience of optimized code). It's admittedly a bigger hit on most GPUs, but there are ways to work around it (e.g., fixed precision), and you can still do double precision work where it's needed. One of the articles I linked previously demonstrates that a hybrid precision model (based on fixed precision) provides exactly the same numerical stability as double precision (which is much better than pure single precision) for that application. Double precision can often be avoided in many parts of a calculation, using it only where those bits matter (like accumulators with potentially small contributions, subtractions of two numbers of similar magnitude, etc.). > 4. You have time to code erything in CUDA or OpenCL. This is the second biggest one, IMO. ^^^ > 5. You have enough video RAM to store your data. Again, it can be worked around, but the frequent GPU->CPU xfers involved if you can't fit everything on the GPU can be painstaking to limit its potentially devastating effects on performance. > > For Python the easiest solution is to use Numba Pro. Agreed, although I've never actually tried PyCUDA before... All the best, Jason -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote: > On 2/26/2015 8:24 AM, Chris Angelico wrote: > > On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody wrote: > >> Wrote something up on why we should stop using ASCII: > >> http://blog.languager.org/2015/02/universal-unicode.html > > I think that the main point of the post, that many Unicode chars are > truly planetary rather than just national/regional, is excellent. > > > From that post: > > > > """ > > 5.1 Gibberish > > > > When going from the original 2-byte unicode (around version 3?) to the > > one having supplemental planes, the unicode consortium added blocks > > such as > > > > * Egyptian hieroglyphs > > * Cuneiform > > * Shavian > > * Deseret > > * Mahjong > > * Klingon > > > > To me (a layman) it looks unprofessional – as though they are playing > > games – that billions of computing devices, each having billions of > > storage words should have their storage wasted on blocks such as > > these. > > """ > > > > The shift from Unicode as a 16-bit code to having multiple planes came > > in with Unicode 2.0, but the various blocks were assigned separately: > > * Egyptian hieroglyphs: Unicode 5.2 > > * Cuneiform: Unicode 5.0 > > * Shavian: Unicode 4.0 > > * Deseret: Unicode 3.1 > > * Mahjong Tiles: Unicode 5.1 > > * Klingon: Not part of any current standard > > You should add emoticons, but not call them or the above 'gibberish'. Emoticons (or is it emoji) seems to have some (regional?) takeup?? Dunno… In any case I'd like to stay clear of political(izable) questions > I think that this part of your post is more 'unprofessional' than the > character blocks. It is very jarring and seems contrary to your main point. Ok I need a word for 1. I have no need for this 2. 99.9% of the (living) on this planet also have no need for this > > > However, I don't think historians will appreciate you calling all of > > these "gibberish". To adequately describe and discuss old texts > > without these Unicode blocks, we'd have to either do everything with > > images, or craft some kind of reversible transliteration system and > > have dedicated software to render the texts on screen. Instead, what > > we have is a well-known and standardized system for transliterating > > all of these into numbers (code points), and rendering them becomes a > > simple matter of installing an appropriate font. > > > > Also, how does assigning meanings to codepoints "waste storage"? As > > soon as Unicode 2.0 hit and 16-bit code units stopped being > > sufficient, everyone needed to allocate storage - either 32 bits per > > character, or some other system - and the fact that some codepoints > > were unassigned had absolutely no impact on that. This is decidedly > > NOT unprofessional, and it's not wasteful either. > > I agree. I clearly am more enthusiastic than knowledgeable about unicode. But I know my basic CS well enough (as I am sure you and Chris also do) So I dont get how 4 bytes is not more expensive than 2. Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits You could use a clever representation like UTF-8 or FSR. But I dont see how you can get out of this that full-unicode costs more than exclusive BMP. eg consider the case of 32 vs 64 bit executables. The 64 bit executable is generally larger than the 32 bit one Now consider the case of a machine that has say 2GB RAM and a 64-bit processor. You could -- I think -- make a reasonable case that all those all-zero hi-address-words are 'waste'. And youve got the general sense best so far: > I think that the main point of the post, that many Unicode chars are > truly planetary rather than just national/regional, And if the general tone/tenor of what I have written is probably not getting across by some words (like 'gibberish'?) so I'll try and reword. However let me try and clarify that the whole of section 5 is 'iffy' with 5.1 being only more extreme. Ive not written these in because the point of that post is not to criticise unicode but to highlight the universal(isable) parts. Still if I were to expand on the criticisms here are some examples: Math-Greek: Consider the math-alpha block http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block Now imagine a beginning student not getting the difference between font, glyph, character. To me this block represents this same error cast into concrete and dignified by the (supposed) authority of the unicode consortium. There are probably dozens of other such stupidities like distinguishing kelvin K from latin K as if that is the business of the unicode consortium My real reservations about unicode come from their work in areas that I happen to know something about Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪ ♫ is perhaps ok However all this stuff http://xahlee.info/comp/unicode_music_symbols.html makes no sense (to me) given tha
Gaussian process regression
Hi, I am trying to use Gaussian process regression for Near Infrared spectra. I have reference data(spectra), concentrations of reference data and sample data, and I am trying to predict concentrations of sample data. Here is my code. from sklearn.gaussian_process import GaussianProcess gp = GaussianProcess() gp.fit(reference, concentration) concentration_pred = gp.predict(sample) The results always gave me the same concentration even though I used different sample data. When I used some parts of reference data as sample data, it predicted concentration well. But whenever I use different data than reference data, it always gave me the same concentration. Can I get some help with this problem? What am I doing wrong? I would appreciate any help. Thanks, Jay -- https://mail.python.org/mailman/listinfo/python-list
Re: Building C++ modules for python using GNU autotools, automake, whatever
On Thursday, February 26, 2015 at 9:35:12 AM UTC-7, Jason Swails wrote: > On Thu, 2015-02-26 at 07:57 -0800, af300wsm wrote: > > Hi, > > > > I'm a complete neophyte to the whole use of GNU > > autotools/automake/auto... . (I'm not sure what it should be called > > anymore.) Regardless, I'm porting a library project, for which I'm a > > team member, to using this toolset for building in Linux. I'm to the > > point now of writing the Makefile.am file for the actual library. > > (There are several other static libraries compiled first that are > > sucked into this shared object file.) > > > > I found some references here: > > http://www.gnu.org/savannah-checkouts/gnu/automake/manual/html_node/Python.html, > > which seemed to be just what I was after. However, I've got a big > > question about a file named "module.la" instead of "module.so" which is > > what we compile it to now. > > I certainly hope module.la is not what it gets compiled to. Open it up > with a text editor :). It's just basically a description of the library Fascinating! This is all new territory for me. I'm used these tools for a number of years, of course, as I've run "./configure && make && make install" many times. Now things are starting to make more sense. > that libtool makes use of. In the projects that I build, the .la files > are all associated with a .a archive or a .so (/.dylib for Macs). > Obviously, static archives won't work for Python (and, in particular, I > believe you need to compile all of the objects as position independent > code, so you need to make sure the appropriate PIC flag is given to the > compiler... for g++ that would be -fPIC). We are compiling all of our code with -fPIC. I looked over the final build line and I see that a module.so was placed in .libs. I looked in that directory and actually the module is named "module.so.0.0.0" and there is a symbolic link "module.so" which points to that. This is cool stuff. Thanks for the clarification on things. -- https://mail.python.org/mailman/listinfo/python-list
Re: ANN: Wing IDE 5.1.2 released
Hey, can I run Py 2.7 and 3.4 side by side without a lot of hassle, using Wing? I run both since I'm migranting and so far the free IDEs just seem to choke on that. -- https://mail.python.org/mailman/listinfo/python-list
requesting you all to please guide me , which tutorials is best to learn redis database
hello all, i want to learn redis database and its use via python , please guide me which tutorials i should be study, so that i can learn it in good way I search this on google but i am little confuse, so please help me thank you jai -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing PIL without internet access
On Thu, Feb 26, 2015 at 11:57 AM, MRAB wrote: > On 2015-02-26 15:23, Larry Martell wrote: >> >> I have a host that has no access to the internet and I need to install >> PIL on it. I have an identical host that is on the internet and I have >> installed it there (with pip). Is there a way I can copy files from >> the connected host to a flash drive and then copy them to the >> unconnected host and have PIL working there? Which files would I copy >> for that? >> >> This is on CentOS 6.5, python 2.7 >> > Have a look here: > > https://pip.pypa.io/en/latest/reference/pip_install.html#pip-install-options > > It says that you can install from a downloaded file, e.g.: > > pip install ./downloads/SomePackage-1.0.4.tar.gz Thanks for the reply. This is very useful info. But I have another issue I didn't mention. The system python is 2.6, but I need the 2.7 version. So anything I install with pip will get installed to 2.6. To get around that on my connected hosts I've done: easy_install-2.7 pip and then I install with pip2.7. But this unconnected host doesn't have easy_install-2.7, so I'd have to figure out how to get that first. I think it will work if I just copy /usr/lib64/python2.7/site-packages/PIL. That worked on a test system I tried it on. I'll try on the real system tonight. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
c...@isbd.net writes: > Ben Finney wrote: > > My request at the moment, though, is not for people to change what's > > on their slides; rather, if they want people to retrieve them, the > > slides should be downloadable easily (i.e. without a web app, > > without a registration to some specific site). > > ... and having downloaded them what do you view them with if they're > not plain text? Again, I was not the one asking for plain text. So I don't really understand why you ask me that. But, here goes: Presentations documents, the overwhelming majority, are in a very small number of formats. If they're PDF: any PDF viewer https://pdfreaders.org/>. If they're a format produced by some widespread presentation tool: LibreOffice Impress https://www.libreoffice.org/discover/impress/>. Why do you ask? -- \ “The way to build large Python applications is to componentize | `\ and loosely-couple the hell out of everything.” —Aahz | _o__) | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
On Wed, 25 Feb 2015 23:34:29 +, MRAB wrote: > On 2015-02-25 22:59, Joel Goldstick wrote: > > On Wed, Feb 25, 2015 at 4:28 PM, MRAB > > wrote: > > > On 2015-02-25 20:45, Mark Lawrence wrote: > > >> > > >> http://www.slideshare.net/pydanny/python-worst-practices > > >> > > >> Any that should be added to this list? Any that be removed as not > that > > >> bad? > > >> > > > We don't have numeric ZIP codes in the UK, but the entire world has > > > numeric telephone numbers, so that might be a better example of > > > numbers that aren't really numbers. > > > > US zip codes get messed up with ints because many have a leading > > zero. > > I use strings > > Telephone numbers can also start with zero. unless you are performing maths on it data that is made up of numbers (zip code, tel number, house number etc) is still only text & should be stored as a string. > > > > Numeric dates can be ambiguous: dd/mm/ or mm/dd/? The ISO > > > standard is clearer: -mm-dd. > > > > > > Handling text: "Unicode sandwich". > > > > > > UTF-8 is better than legacy encodings. > > > -- To save a single life is better than to build a seven story pagoda. -- https://mail.python.org/mailman/listinfo/python-list
Re: splunk_handler and logbook
I got the solution, Use RedirectLoggingHandler, to redirect the logs to logbook, from logging import getLogger mylog = getLogger('My Log') from splunk_handler import SplunkHandler splunk = SplunkHandler( ... host='', ... port='', ... username='', ... password='', ... index='', ... verify=, ... source="" ... ) from logbook.compat import RedirectLoggingHandler mylog.addHandler(RedirectLoggingHandler()) mylog.addHandler(splunk) .. -- https://mail.python.org/mailman/listinfo/python-list
Re: ANN: Wing IDE 5.1.2 released
> On Feb 26, 2015, at 2:04 PM, Jim Mooney wrote: > > Hey, can I run Py 2.7 and 3.4 side by side without a lot of hassle, using > Wing? I run both since I'm migranting and so far the free IDEs just seem to > choke on that. > -- > https://mail.python.org/mailman/listinfo/python-list I assume you just mean that you would like to have different Python projects that open in Wing with the correct associated version of Python. Yes, you can specify a python executable in the Project Properties - Environment tab. Click on the “Custom" button in the Python Executable entry and enter the path to the version of Python you want. If this isn’t what you are after, let us know. -Bill PS: I’ve found that the Wing e-mail support is VERY responsive. No relation, just a happy user. -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 26, 2015 4:00 AM, "Cem Karan" wrote: > > > On Feb 26, 2015, at 12:36 AM, Gregory Ewing wrote: > > > Cem Karan wrote: > >> I think I see what you're talking about now. Does WeakMethod > >> (https://docs.python.org/3/library/weakref.html#weakref.WeakMethod) solve > >> this problem? > > > > Yes, that looks like it would work. > > > Cool! Sometimes I wonder whether anybody reads my posts. I suggested a solution involving WeakMethod four days ago that additionally extends the concept to non-method callbacks (requiring a small amount of extra effort from the client in those cases, but I think that is unavoidable. There is no way that the framework can determine the appropriate lifetime for a closure-based callback.) -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On 02/26/2015 11:54 AM, Ian Kelly wrote: > Sometimes I wonder whether anybody reads my posts. It's entirely possible the OP wasn't ready to understand your solution four days ago, but two days later the OP was. -- ~Ethan~ signature.asc Description: OpenPGP digital signature -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote: > You should add emoticons, but not call them or the above 'gibberish'. Done -- and of course not under gibberish. I dont really know much how emoji are used but I understand they are. JFTR I consider it necessary to be respectful to all (living) people. For that matter even dead people(s) - no need to be disrespectful to the egyptians who created the hieroglyphs or the sumerians who wrote cuneiform. I only find it crosses a line when the 2 millenia dead creations are made to take the space of the living. Chris wrote: > * Klingon: Not part of any current standard Thanks Removed. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
On 2015-02-26, alister wrote: > On Wed, 25 Feb 2015 23:34:29 +, MRAB wrote: >> On 2015-02-25 22:59, Joel Goldstick wrote: >>> On Wed, Feb 25, 2015 at 4:28 PM, MRAB wrote: On 2015-02-25 20:45, Mark Lawrence wrote: > > http://www.slideshare.net/pydanny/python-worst-practices > > Any that should be added to this list? Any that be removed as not > that bad? We don't have numeric ZIP codes in the UK, but the entire world has numeric telephone numbers, so that might be a better example of numbers that aren't really numbers. >>> >>> US zip codes get messed up with ints because many have a leading >>> zero. >>> I use strings I should hope so, because US zip codes can also contain a hyphen. >> Telephone numbers can also start with zero. > > unless you are performing maths on it data that is made up of numbers > (zip code, tel number, house number etc) is still only text & should be > stored as a string. And if you _are_ performing maths on postal codes, telephone numbers and house numbers, something is seriously wrong and it probably doesn't matter how you represent things. -- Grant Edwards grant.b.edwardsYow! I am covered with at pure vegetable oil and I am gmail.comwriting a best seller! -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
On 25 February 2015 21:24:37 GMT+00:00, Chris Angelico wrote: >On Thu, Feb 26, 2015 at 7:45 AM, Mark Lawrence > wrote: >> http://www.slideshare.net/pydanny/python-worst-practices >> >> Any that should be added to this list? Any that be removed as not >that bad? > >Remove the complaint about id. It's an extremely useful variable name, >and you hardly ever need the function. You can add one character and avoid the conflict with "id_" and not require anyone else maintaining the code to think about it. As rare as the conflict is, I think the ease of avoiding it makes the extra character a practical defensive technique. I agree it is not a worst case. Simon -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
On 26 February 2015 00:11:24 GMT+00:00, Ben Finney wrote: >> Yes, but my point is: You shouldn't need to rebind those names (or >> have names "true" and "false" for 0 and 1). > >That's not what you asked, though. You asked “When would 0 mean true >and >1 mean false?” My answer: in all Unix shell contexts. > >> Instead, use "success" and "failure". > >You'd better borrow the time machine and tell the creators of Unix. The >meme is already established for decades now. 0 = success and non-zero = failure is the meme established, rather than 0 = true, non-zero = false. It's not just used by UNIX, and is not necessarily defined by the shell either (bash was mentioned elsewhere in the thread). There is probably a system that pre-dates UNIX that I uses/used this too, but I don't know. C stdlib defines EXIT_SUCCESS = 0, yet C99 stdbool.h defines false = 0. That shells handle 0 as true and non-zero as false probably stems from this (or similar in older languages). The " true" command is defined to have an exit status of 0, and "false" an exit status of 1. The value is better thought of an error level, where 0 is no error and non-zero is some error. The AmigaOS shell conventionally takes this further with higher values indicating more critical errors, there's even a "failat N" command that means exit the script if the error level is higher than N. None of the above is a good reason to use error *or* success return values in Python--use exceptions!--but may be encountered when running other processes. Simon -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallelization of Python on GPU?
On 26/02/15 18:34, John Ladasky wrote: Hi Sturla, I recognize your name from the scikit-learn mailing list. If you look a few posts above yours in this thread, I am aware of gpu-libsvm. I don't know if I'm up to the task of reusing the scikit-learn wrapping code, but I am giving that option some serious thought. It isn't clear to me that gpu-libsvm can handle both SVM and SVR, and I have need of both algorithms. My training data sets are around 5000 vectors long. IF that graph on the gpu-libsvm web page is any indication of what I can expect from my own data (I note that they didn't specify the GPU card they're using), I might realize a 20x increase in speed. A GPU is a "floating point monster", not a CPU. It is not designed to run things like CPython. It is also only designed to run threads in parallel on its cores, not processes. And as you know, in Python there is something called GIL. Further the GPU has hard-wired fine-grained load scheduling for data-parallel problems (e.g. matrix multiplication for vertex processing in 3D graphics). It is not like a thread on a GPU is comparable to a thread on a CPU. It is more like a parallel work queue, with the kind of abstraction you find in Apple's GCD. I don't think it really doable to make something like CPython run with thousands of parallel instances on a GPU. A GPU is not designed for that. A GPU is great if you can pass millions of floating point vectors as items to the work queue, with a tiny amount of computation per item. It would be crippled if you passed a thousand CPython interpreters and expect them to do a lot of work. Also, as it is libSVM that does the math in you case, you need to get libSVM to run on the GPU, not CPython. In most cases the best hardware for parallel scientific computing (taking economy and flexibility into account) is a Linux cluster which supports MPI. You can then use mpi4py or Cython to use MPI from your Python code. Sturla -- https://mail.python.org/mailman/listinfo/python-list
Re: Windows permission error, 64 bit, psycopg2, python 3.4.2
On 26/02/2015 15:10, Malik Rumi wrote: I am one of those struggling with compile issues with python on 64 bit windows. I have not been able to get the solutions mentioned on Stack Overflow to work because installing Windows SDK 7.1 fails for me. So I stumbled across a precompiled psycopg2, and that reported that it worked, but then I got two permission errors. Then I read that this was a bug in python (issue 14252) that had been fixed, but I don't think this is the same error. That one specifically refers to subprocess.py and I don't have that in my traceback. I have v3.4.2. On top of everything else, despite requesting a new password, all I get from the big tracker is 'invalid login'. In any event, running "import psycopg2" returns 'import error, no module named psycopg2'. Microsoft Windows [Version 6.3.9600] (c) 2013 Microsoft Corporation. All rights reserved. C:\Users\Semantic>pip install git+https://github.com/nwcell/psycopg2-windows.git @win64-py34#egg=psycopg2 Downloading/unpacking psycopg2 from git+https://github.com/nwcell/psycopg2-windo ws.git@win64-py34 Cloning https://github.com/nwcell/psycopg2-windows.git (to win64-py34) to c:\u sers\semantic\appdata\local\temp\pip_build_semantic\psycopg2 Running setup.py (path:C:\Users\Semantic\AppData\Local\Temp\pip_build_Semantic \psycopg2\setup.py) egg_info for package psycopg2 C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution opt ion: 'summary' warnings.warn(msg) Installing collected packages: psycopg2 Running setup.py install for psycopg2 C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution opt ion: 'summary' warnings.warn(msg) Successfully installed psycopg2 Cleaning up... Exception: Traceback (most recent call last): File "C:\Python34\lib\shutil.py", line 370, in _rmtree_unsafe os.unlink(fullname) PermissionError: [WinError 5] Access is denied: 'C:\\Users\\Semantic\\AppData\\L ocal\\Temp\\pip_build_Semantic\\psycopg2\\.git\\objects\\pack\\pack-be4d3da4a06b 4c9ec4c06040dbf6685eeccca068.idx' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Python34\lib\site-packages\pip\basecommand.py", line 122, in main status = self.run(options, args) File "C:\Python34\lib\site-packages\pip\commands\install.py", line 302, in run requirement_set.cleanup_files(bundle=self.bundle) File "C:\Python34\lib\site-packages\pip\req.py", line 1333, in cleanup_files rmtree(dir) File "C:\Python34\lib\site-packages\pip\util.py", line 43, in rmtree onerror=rmtree_errorhandler) File "C:\Python34\lib\shutil.py", line 477, in rmtree return _rmtree_unsafe(path, onerror) File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) File "C:\Python34\lib\shutil.py", line 372, in _rmtree_unsafe onerror(os.unlink, fullname, sys.exc_info()) File "C:\Python34\lib\site-packages\pip\util.py", line 53, in rmtree_errorhand ler (exctype is PermissionError and value.args[3] == 5) #python3.3 IndexError: tuple index out of range The above clearly shows "Successfully installed psycopg2" and that it's a permission error on cleanup that's gone wrong, so what is there to report on the bug tracker? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
- Original Message - > From: Simon Ward > To: > Cc: "python-list@python.org" > Sent: Thursday, February 26, 2015 8:36 PM > Subject: Re: Python Worst Practices > > > > On 25 February 2015 21:24:37 GMT+00:00, Chris Angelico > wrote: >> On Thu, Feb 26, 2015 at 7:45 AM, Mark Lawrence >> wrote: >>> http://www.slideshare.net/pydanny/python-worst-practices >>> >>> Any that should be added to this list? Any that be removed as not >> that bad? >> >> Remove the complaint about id. It's an extremely useful variable name, >> and you hardly ever need the function. > > You can add one character and avoid the conflict with "id_" and not > require anyone else maintaining the code to think about it. As rare as the > conflict is, I think the ease of avoiding it makes the extra character a > practical defensive technique. I agree it is not a worst case. > I sometimes do: import sys, functools if sys.version_info.major > 2: bytez = functools.partial(bytes, encoding="utf-8") else: bytez = bytes # nog encoding param in python 2. I bitez you when you shadow 'bytes' (I can't remember when I couldn't use the functools.partial object), though it often works. Much easier to use 'bytez' or 'bytes_'. It is annoying to 'unshadow' your code and confusing for others who might read your code. Albert-Jan -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallelization of Python on GPU?
On 26/02/15 18:48, Jason Swails wrote: On Thu, 2015-02-26 at 16:53 +, Sturla Molden wrote: GPU computing is great if you have the following: 1. Your data structures are arrays floating point numbers. It actually works equally great, if not better, for integers. Right, but not complicated data structures with a lot of references or pointers. It requires data are laid out in regular arrays, and then it acts on these arrays in a data-parallel manner. It is designed to process vertices in parallel for computer graphics, and that is a limitation which is always there. It is not a CPU with 1024 cores. It is a "floating point monster" which can process 1024 vectors in parallel. You write a tiny kernel in a C-like language (CUDA, OpenCL) to process one vector, and then it will apply the kernel to all the vectors in an array of vectors. It is very comparable to how GLSL and Direct3D vertex and fragment shaders work. (The reason for which should be obvious.) The GPU is actually great for a lot of things in science, but it is not a CPU. The biggest mistake in the GPGPU hype is the idea that the GPU will behave like a CPU with many cores. Sturla -- https://mail.python.org/mailman/listinfo/python-list
Fix for no module named _sysconfigdata while compiling
Thought I might help someone else address a problem I ran into this afternoon. While compiling Python 2.7.9 on CentOS 6, I received the error: no module named _sysconfigdata Googling found a number of other people having this problem — but the other issues were all after the Python was installed — not while building. In digging through their advice, I saw a number of them spoke about having multiple versions of Python installed. In my case, I already had a custom Python 2.7.3 installed on this machine — and I was upgrading over it to Python 2.7.9. I found that renaming my custom /opt/python2.7 directory and then building the new release in the same directory, that the problem went away. Summary: Compiling Python 2.7.9 resulted in error: no module named _sysconfigdata while compiling. My configuration: ./configure --prefix=/opt/python2.7 --enable-unicode=ucs4 --enable-shared LDFLAGS="-Wl,-rpath /opt/python2.7/lib" make;make alt install Remove the existing /opt/python2.7 directory which had Python 2.7.3. Now all builds and installs properly. —Ray signature.asc Description: Message signed with OpenPGP using GPGMail -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Worst Practices
Simon Ward writes: > On 26 February 2015 00:11:24 GMT+00:00, Ben Finney > wrote: > >You'd better borrow the time machine and tell the creators of Unix. The > >meme is already established for decades now. > > 0 = success and non-zero = failure is the meme established, rather > than 0 = true, non-zero = false. That is not the case: the commands ‘true’ (returns value 0) and ‘false’ (returns value 1) are long established in Unix. So that *is* part of the meme I'm describing. > None of the above is a good reason to use error *or* success return > values in Python--use exceptions!--but may be encountered when running > other processes. Right. But likewise, don't deny that “true == 0” and “false == non-zero” has a wide acceptance in the programming community too. -- \“Program testing can be a very effective way to show the | `\presence of bugs, but is hopelessly inadequate for showing | _o__) their absence.” —Edsger W. Dijkstra | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list
Re: Windows permission error, 64 bit, psycopg2, python 3.4.2
On Thursday, February 26, 2015 at 2:55:07 PM UTC-6, Mark Lawrence wrote: > On 26/02/2015 15:10, Malik Rumi wrote: > > I am one of those struggling with compile issues with python on 64 bit > > windows. I have not been able to get the solutions mentioned on Stack > > Overflow to work because installing Windows SDK 7.1 fails for me. > > > > So I stumbled across a precompiled psycopg2, and that reported that it > > worked, but then I got two permission errors. Then I read that this was a > > bug in python (issue 14252) that had been fixed, but I don't think this is > > the same error. That one specifically refers to subprocess.py and I don't > > have that in my traceback. I have v3.4.2. On top of everything else, > > despite requesting a new password, all I get from the big tracker is > > 'invalid login'. > > > > In any event, running "import psycopg2" returns 'import error, no module > > named psycopg2'. > > > > > > Microsoft Windows [Version 6.3.9600] > > (c) 2013 Microsoft Corporation. All rights reserved. > > > > C:\Users\Semantic>pip install > > git+https://github.com/nwcell/psycopg2-windows.git > > @win64-py34#egg=psycopg2 > > Downloading/unpacking psycopg2 from > > git+https://github.com/nwcell/psycopg2-windo > > ws.git@win64-py34 > >Cloning https://github.com/nwcell/psycopg2-windows.git (to win64-py34) > > to c:\u > > sers\semantic\appdata\local\temp\pip_build_semantic\psycopg2 > >Running setup.py > > (path:C:\Users\Semantic\AppData\Local\Temp\pip_build_Semantic > > \psycopg2\setup.py) egg_info for package psycopg2 > > C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown > > distribution opt > > ion: 'summary' > >warnings.warn(msg) > > > > Installing collected packages: psycopg2 > >Running setup.py install for psycopg2 > > C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown > > distribution opt > > ion: 'summary' > >warnings.warn(msg) > > > > Successfully installed psycopg2 > > Cleaning up... > >Exception: > > Traceback (most recent call last): > >File "C:\Python34\lib\shutil.py", line 370, in _rmtree_unsafe > > os.unlink(fullname) > > PermissionError: [WinError 5] Access is denied: > > 'C:\\Users\\Semantic\\AppData\\L > > ocal\\Temp\\pip_build_Semantic\\psycopg2\\.git\\objects\\pack\\pack-be4d3da4a06b > > 4c9ec4c06040dbf6685eeccca068.idx' > > > > During handling of the above exception, another exception occurred: > > > > Traceback (most recent call last): > >File "C:\Python34\lib\site-packages\pip\basecommand.py", line 122, in > > main > > status = self.run(options, args) > >File "C:\Python34\lib\site-packages\pip\commands\install.py", line 302, > > in run > > > > requirement_set.cleanup_files(bundle=self.bundle) > >File "C:\Python34\lib\site-packages\pip\req.py", line 1333, in > > cleanup_files > > rmtree(dir) > >File "C:\Python34\lib\site-packages\pip\util.py", line 43, in rmtree > > onerror=rmtree_errorhandler) > >File "C:\Python34\lib\shutil.py", line 477, in rmtree > > return _rmtree_unsafe(path, onerror) > >File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe > > _rmtree_unsafe(fullname, onerror) > >File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe > > _rmtree_unsafe(fullname, onerror) > >File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe > > _rmtree_unsafe(fullname, onerror) > >File "C:\Python34\lib\shutil.py", line 367, in _rmtree_unsafe > > _rmtree_unsafe(fullname, onerror) > >File "C:\Python34\lib\shutil.py", line 372, in _rmtree_unsafe > > onerror(os.unlink, fullname, sys.exc_info()) > >File "C:\Python34\lib\site-packages\pip\util.py", line 53, in > > rmtree_errorhand > > ler > > (exctype is PermissionError and value.args[3] == 5) #python3.3 > > IndexError: tuple index out of range > > > > The above clearly shows "Successfully installed psycopg2" and that it's > a permission error on cleanup that's gone wrong, so what is there to > report on the bug tracker? > > -- > My fellow Pythonistas, ask not what our language can do for you, ask > what you can do for our language. > > Mark Lawrence 1. I did not mean to confuse you by reference to the bug tracker. My log in difficulties are not related to this issue. 2. The other reference to the bug tracker was to indicate that I don't think this is the same error as mentioned there. 3. Despite the report of success, I do not have psycopg2. Since that failure was followed by the permission errors, I assume they are related. This is why I posted, to get help with this problem. I look forward to any assistance you or anyone else can render on this issue. Thanks. 4. Going back to bug #14252, it is structurally very similar. I forget the program at issue there, but at first it reported success and that was followed by win error 5, and in fact the program had not installed correctly. The difference is that #14252 involved
Re: Is anyone else unable to log into the bug tracker?
On Thursday, February 26, 2015 at 10:49:19 AM UTC-6, Skip Montanaro wrote: > I have not had problems, but I use the Google login (Open ID, I presume) > option. > > > Skip Ok, I got it. In short, capitalization (or not) matters. Thanks to all. -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
On Fri, Feb 27, 2015 at 4:59 AM, Rustom Mody wrote: > On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote: >> I think that this part of your post is more 'unprofessional' than the >> character blocks. It is very jarring and seems contrary to your main point. > > Ok I need a word for > 1. I have no need for this > 2. 99.9% of the (living) on this planet also have no need for this So what, seven million people need it? Sounds pretty useful to me. And your figure is an exaggeration; a lot more people than that use emoji/emoticons. >> > Also, how does assigning meanings to codepoints "waste storage"? As >> > soon as Unicode 2.0 hit and 16-bit code units stopped being >> > sufficient, everyone needed to allocate storage - either 32 bits per >> > character, or some other system - and the fact that some codepoints >> > were unassigned had absolutely no impact on that. This is decidedly >> > NOT unprofessional, and it's not wasteful either. >> >> I agree. > > I clearly am more enthusiastic than knowledgeable about unicode. > But I know my basic CS well enough (as I am sure you and Chris also do) > > So I dont get how 4 bytes is not more expensive than 2. > Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits > You could use a clever representation like UTF-8 or FSR. > But I dont see how you can get out of this that full-unicode costs more than > exclusive BMP. Sure, UCS-2 is cheaper than the current Unicode spec. But Unicode 2.0 was when that changed, and the change was because 65536 characters clearly wouldn't be enough - and that was due to the number of characters needed for other things than those you're complaining about. Every spec since then has not changed anything that affects storage. There are still, today, quite a lot of unallocated blocks of characters (we're really using only about two planes' worth so far, maybe three), but even if Unicode specified just two planes of 64K characters each, you wouldn't be able to save much on transmission (UTF-8 is already flexible and uses only what you need; if a future Unicode spec allows 64K planes, UTF-8 transmission will cost exactly the same for all existing characters), and on an eight-bit-byte system, the very best you'll be able to do is three bytes - which you can do today, too; you already know 21 bits will do. So since the BMP was proven insufficient (back in 1996), no subsequent changes have had any costs in storage. > Still if I were to expand on the criticisms here are some examples: > > Math-Greek: Consider the math-alpha block > http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block > > Now imagine a beginning student not getting the difference between font, > glyph, > character. To me this block represents this same error cast into concrete and > dignified by the (supposed) authority of the unicode consortium. > > There are probably dozens of other such stupidities like distinguishing > kelvin K from latin K as if that is the business of the unicode consortium A lot of these kinds of characters come from a need to unambiguously transliterate text stored in other encodings. I don't personally profess to understand the reasoning behind the various indistinguishable characters, but I'm aware that there are a lot of tricky questions to be decided; and if once the Consortium decides to allocate a character, that character must remain forever allocated. > My real reservations about unicode come from their work in areas that I > happen to know something about > > Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪ ♫ > is perhaps ok > However all this stuff http://xahlee.info/comp/unicode_music_symbols.html > makes no sense (to me) given that music (ie standard western music written in > staff notation) is inherently 2 dimensional -- multi-voiced, multi-staff, > chordal The placement on the page is up to the display library. You can produce a PDF that places the note symbols at their correct positions, and requires no images to render sheet music. > Sanskrit/Devanagari: > Consists of bogus letters that dont exist in devanagari > The letter ऄ (0904) is found here http://unicode.org/charts/PDF/U0900.pdf > But not here http://en.wikipedia.org/wiki/Devanagari#Vowels > So I call it bogus-devanagari > > Contrariwise an important letter in vedic pronunciation the double-udatta is > missing > http://list.indology.info/pipermail/indology_list.indology.info/2000-April/021070.html > > All of which adds up to the impression that the unicode consortium > occasionally fails to do due diligence Which proves that they're not perfect. Don't forget, they can always add more characters later. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallelization of Python on GPU?
On Thu, Feb 26, 2015 at 4:10 PM, Sturla Molden wrote: > On 26/02/15 18:48, Jason Swails wrote: > >> On Thu, 2015-02-26 at 16:53 +, Sturla Molden wrote: >> >>> GPU computing is great if you have the following: >>> >>> 1. Your data structures are arrays floating point numbers. >>> >> >> It actually works equally great, if not better, for integers. >> > > Right, but not complicated data structures with a lot of references or > pointers. It requires data are laid out in regular arrays, and then it acts > on these arrays in a data-parallel manner. It is designed to process > vertices in parallel for computer graphics, and that is a limitation which > is always there. It is not a CPU with 1024 cores. It is a "floating point > monster" which can process 1024 vectors in parallel. You write a tiny > kernel in a C-like language (CUDA, OpenCL) to process one vector, and then > it will apply the kernel to all the vectors in an array of vectors. It is > very comparable to how GLSL and Direct3D vertex and fragment shaders work. > (The reason for which should be obvious.) The GPU is actually great for a > lot of things in science, but it is not a CPU. The biggest mistake in the > GPGPU hype is the idea that the GPU will behave like a CPU with many cores. Very well summarized. At least in my field, though, it is well-known that GPUs are not 'uber-fast CPUs'. Algorithms have been redesigned, programs rewritten to take advantage of their architecture. It has been a *massive* investment of time and resources, but (unlike the Xeon Phi coprocessor [1]) has reaped most of its promised rewards. --Jason [1] I couldn't resist the jab. At several times the cost of the top of the line NVidia gaming card, the GPU is about 15-20x faster... -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
Chris Angelico wrote: > Unicode > isn't about taking everyone's separate character sets and numbering > them all so we can reference characters from anywhere; if you wanted > that, you'd be much better off with something that lets you specify a > code page in 16 bits and a character in 8, which is roughly the same > size as Unicode anyway. Well, except for the approximately 25% of people in the world whose native language has more than 256 characters. It sounds like you are referring to some sort of "shift code" system. Some legacy East Asian encodings use a similar scheme, and depending on how they are implemented they have great disadvantages. For example, Shift-JIS suffers from a number of weaknesses including that a single byte corrupted in transmission can cause large swaths of the following text to be corrupted. With Unicode, a single corrupted byte can only corrupt a single code point. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
On Fri, Feb 27, 2015 at 10:09 AM, Steven D'Aprano wrote: > Chris Angelico wrote: > >> Unicode >> isn't about taking everyone's separate character sets and numbering >> them all so we can reference characters from anywhere; if you wanted >> that, you'd be much better off with something that lets you specify a >> code page in 16 bits and a character in 8, which is roughly the same >> size as Unicode anyway. > > Well, except for the approximately 25% of people in the world whose native > language has more than 256 characters. You could always allocate multiple code pages to one language. But since I'm not advocating this system, I'm only guessing at solutions to its problems. > It sounds like you are referring to some sort of "shift code" system. Some > legacy East Asian encodings use a similar scheme, and depending on how they > are implemented they have great disadvantages. For example, Shift-JIS > suffers from a number of weaknesses including that a single byte corrupted > in transmission can cause large swaths of the following text to be > corrupted. With Unicode, a single corrupted byte can only corrupt a single > code point. That's exactly what I was hinting at. There are plenty of systems like that, and they are badly flawed compared to a simple universal system for a number of reasons. One is the corruption issue you mention; another is that a simple memory-based text search becomes utterly useless (to locate text in a document, you'd need to do a whole lot of stateful parsing - not to mention the difficulties of doing "similar-to" searches across languages); concatenation of text also becomes a stateful operation, and so do all sorts of other simple manipulations. Unicode may demand a bit more storage in certain circumstances (where an eight-bit encoding might have handled your entire document), but it's so much easier for the general case. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Wed, Feb 25, 2015 at 9:46 AM, Cem Karan wrote: > > On Feb 24, 2015, at 8:23 AM, Fabio Zadrozny wrote: > > > Hi Cem, > > > > I didn't read the whole long thread, but I thought I'd point you to what > I'm using in PyVmMonitor (http://www.pyvmmonitor.com/) -- which may > already cover your use-case. > > > > Take a look at the callback.py at > https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/callback.py > > > > And its related test (where you can see how to use it): > https://github.com/fabioz/pyvmmonitor-core/blob/master/_pyvmmonitor_core_tests/test_callback.py > (note that it falls back to a strong reference on simple functions -- i.e.: > usually top-level methods or methods created inside a scope -- but > otherwise uses weak references). > > That looks like a better version of what I was thinking about originally. > However, various people on the list have convinced me to stick with strong > references everywhere. I'm working out a possible API right now, once I > have some code that I can use to illustrate what I'm thinking to everyone, > I'll post it to the list. > > Thank you for showing me your code though, it is clever! > > Thanks, > Cem Karan Hi Cem, Well, I decided to elaborate a bit on the use-case I have and how I use it (on a higher level): http://pydev.blogspot.com.br/2015/02/design-for-client-side-applications-in.html So, you can see if it may be worth for you or not (I agree that sometimes you should keep strong references, but for my use-cases, weak references usually work better -- with the only exception being closures, which is handled different anyways but with the gotcha of having to manually unregister it). Best Regards, Fabio -- https://mail.python.org/mailman/listinfo/python-list
Re: ANN: Wing IDE 5.1.2 released
On Fri, Feb 27, 2015 at 6:42 AM, William Ray Wing wrote: > PS: I’ve found that the Wing e-mail support is VERY responsive. No relation, > just a happy user. You should totally get involved with the project. With your name, everyone would think you started it! ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
Rustom Mody wrote: > Emoticons (or is it emoji) seems to have some (regional?) takeup?? Dunno… > In any case I'd like to stay clear of political(izable) questions Emoji is the term used in Japan, gradually spreading to the rest of the word. Emoticons, I believe, should be restricted to the practice of using ASCII-only digraphs and trigraphs such as :-) (colon, hyphen, right-parens) to indicate "smileys". I believe that emoji will eventually lead to Unicode's victory. People will want smileys and piles of poo on their mobile phones, and from there it will gradually spread to everywhere. All they need to do to make victory inevitable is add cartoon genitals... >> I think that this part of your post is more 'unprofessional' than the >> character blocks. It is very jarring and seems contrary to your main >> point. > > Ok I need a word for > 1. I have no need for this > 2. 99.9% of the (living) on this planet also have no need for this 0.1% of the living is seven million people. I'll tell you what, you tell me which seven million people should be relegated to second-class status, and I'll tell them where you live. :-) [...] > I clearly am more enthusiastic than knowledgeable about unicode. > But I know my basic CS well enough (as I am sure you and Chris also do) > > So I dont get how 4 bytes is not more expensive than 2. Obviously it is. But it's only twice as expensive, and in computer science terms that counts as "close enough". It's quite common for data structures to "waste" space by using "no more than twice as much space as needed", e.g. Python dicts and lists. The whole Unicode range U+ to U+10 needs only 21 bits, which fits into three bytes. Nevertheless, there's no three-byte UTF encoding, because on modern hardware it is more efficient to "waste" an entire extra byte per code point and deal with an even multiple of bytes. > Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits > You could use a clever representation like UTF-8 or FSR. > But I dont see how you can get out of this that full-unicode costs more > than exclusive BMP. Are you missing a word there? Costs "no more" perhaps? > eg consider the case of 32 vs 64 bit executables. > The 64 bit executable is generally larger than the 32 bit one > Now consider the case of a machine that has say 2GB RAM and a 64-bit > processor. You could -- I think -- make a reasonable case that all those > all-zero hi-address-words are 'waste'. Sure. The whole point of 64-bit processors is to enable the use of more than 2GB of RAM. One might as well say that using 32-bit processors is wasteful if you only have 64K of memory. Yes it is, but the only things which use 16-bit or 8-bit processors these days are embedded devices. [...] > Math-Greek: Consider the math-alpha block > http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block > > Now imagine a beginning student not getting the difference between font, > glyph, > character. To me this block represents this same error cast into concrete > and dignified by the (supposed) authority of the unicode consortium. Not being privy to the internal deliberations of the Consortium, it is sometimes difficult to tell why two symbols are sometimes declared to be mere different glyphs for the same character, and other times declared to be worthy of being separate characters. E.g. I think we should all agree that the English "A" and the French "A" shouldn't count as separate characters, although the Greek "Α" and Russian "А" do. In the case of the maths symbols, it isn't obvious to me what the deciding factors were. I know that one of the considerations they use is to consider whether or not users of the symbols have a tradition of treating the symbols as mere different glyphs, i.e. stylistic variations. In this case, I'm pretty sure that mathematicians would *not* consider: U+2115 DOUBLE-STRUCK CAPITAL N "ℕ" U+004E LATIN CAPITAL LETTER N "N" as mere stylistic variations. If you defined a matrix called ℕ, you would probably be told off for using the wrong symbol, not for using the wrong formatting. On the other hand, I'm not so sure about U+210E PLANCK CONSTANT "ℎ" versus a mere lowercase h (possibly in italic). > There are probably dozens of other such stupidities like distinguishing > kelvin K from latin K as if that is the business of the unicode consortium But it *is* the business of the Unicode consortium. They have at least two important aims: - to be able to represent every possible human-language character; - to allow lossless round-trip conversion to all existing legacy encodings (for the subset of Unicode handled by that encoding). The second reason is why Unicode includes code points for degree-Celsius and degree-Fahrenheit, rather than just using °C and °F like sane people. Because some idiot^W code-page designer back in the 1980s or 90s decided to add single character ℃ and ℉. So now Unicode
Re: Newbie question about text encoding
On 02/26/2015 08:05 PM, Steven D'Aprano wrote: Rustom Mody wrote: eg consider the case of 32 vs 64 bit executables. The 64 bit executable is generally larger than the 32 bit one Now consider the case of a machine that has say 2GB RAM and a 64-bit processor. You could -- I think -- make a reasonable case that all those all-zero hi-address-words are 'waste'. Sure. The whole point of 64-bit processors is to enable the use of more than 2GB of RAM. One might as well say that using 32-bit processors is wasteful if you only have 64K of memory. Yes it is, but the only things which use 16-bit or 8-bit processors these days are embedded devices. But the 2gig means electrical address lines out of the CPU are wasted, not address space. A 64 bit processor and 64bit OS means you can have more than 4gig in a process space, even if over half of it has to be in the swap file. Linear versus physical makes a big difference. (Although I believe Seymour Cray was quoted as saying that virtual memory is a crock, because "you can't fake what you ain't got.") -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Picking apart a text line
So... okay. I've got a bunch of PDFs of tournament reports that I want to sift thru for information. Ended up using 'pdftotext -layout file.pdf file.txt' to extract the text from the PDF. Still have a few little glitches to iron out there, but I'm getting decent enough results for the moment to move on. I've got my script to where it opens the file, ignores the header lines at the top, then goes through the rest of the file line by line, skipping lines if they don't match (don't need the separator lines) and adding them to a list if they do (and stripping whitespace off the right side along the way). So far, so good. # rstatPDF2csv.py import sys import re def convert(file): lines = [] data = open(file) # Skip first n lines of headers for i in range(9): data.__next__() # Read remaining lines one at a time for line in data: # If the line begins with a capital letter... if re.match(r'^[A-Z]', line): # Strip any trailing whitespace and then add to the list lines.append(line.rstrip()) return lines if __name__ == '__main__': print(convert(sys.argv[1])) What I'm ending up with is a list full of strings that look something like this: ['JOHN DOEC T HM 445-20*MW* 199-11*MW* 194-5 1HM 393-16*MW* 198-9 1HM198-11*MW*396-20*MW* 789-36*MW* 1234-56 *MW*', Basically... a certain number of characters allotted for competitor name, then four or five 1-2 char columns for things like classification, age group, special categories, etc., then a score ('445-20'), then up to 4 char for award (if any), then another score, another award, etc. etc. etc. Right now (in the PDF) the scores are batched by one criterion, then sorted within those groups. Makes life easier for the person giving out awards at the end of the tournament, not so much for someone trying to see how their individual score ranks against the whole field, not just their group or sub-group. I want to be able to pull all the scores out and then re-sort based on score - mainly the final aggregate score, but potentially also on stage or daily scores. Eventually I'd like to be able to calculate standardized z-scores so as to be able to compare scores from one event/location against another. So back to the lines of text I have stored as strings in a list. I think I want to convert that to a list of lists, i.e. split each line up, store that info in another list and ditch the whitespace. Or would I be better off using dicts? Originally I was thinking of how to process each line and split it them up based on what information was where - some sort of nested for/if mess. Now I'm starting to think that the lines of text are pretty uniform in structure i.e. the same field is always in the same location, and that list slicing might be the way to go, if a bit tedious to set up initially...? Any thoughts or suggestions from people who've gone down this particular path would be greatly appreciated. I think I have a general idea/direction, but I'm open to other ideas if the path I'm on is just blatantly wrong. Thanks, Monte -- Shiny! Let's be bad guys. Reach me @ memilanuk (at) gmail dot com -- https://mail.python.org/mailman/listinfo/python-list
Re: Are threads bad? - was: Future of Pypy?
Ryan Stuart writes: > My point is malloc, something further up (down?) the stack, is making > modifications to shared state when threads are involved. Modifying > shared state makes it infinitely more difficult to reason about the > correctness of your software. If you're saying the libc malloc might have bugs that affect multithreaded apps but not single threaded ones, then sure, but the Linux kernel might also have such bugs and it's inherently multithreaded, so there's no escape. Even if your app is threaded you're still susceptible to threading bugs in the kernel. If malloc works properly then it's thread-safe and you can use it without worrying about how your app's state interacts with malloc's internal state. > We clearly got completely different things from the article. My > interpretation was that it was making *the exact opposite* point to > what you stated mainly because non-threading approaches don't share > state. It gave the example of asyncio, which is non-threaded but (according to the article) was susceptible to shared state bugs because you could accidentally insert yield points in critical sections, by doing things like logging. > It states that quite clearly. For example "it is – literally – > exponentially more difficult to reason about a routine that may be > executed from an arbitrary number of threads concurrently". I didn't understand what it was getting at with that n**n claim. Of course arbitrary code (even single threaded) is incalculably difficult to reason about (halting problem, Rice's theorem). But we're talking about code following a particular set of conventions, not arbitrary code. The conventions are supposed to facilitate reasoning and verification. Again there's tons of solid theory in the OS literature about this stuff. > by default Haskell looks to use lightweight threads where only 1 > thread can be executing at a time [1]... That doesn't seem to be > shared state multithreading, which is what the article is referring to. Haskell uses lightweight, shared state threads with synchronization primitives that do the usual things (the API is somewhat different than Posix threads though). You have to use the +RTS command line option to run on multiple cores: I don't know why the default is to stay on a single core. There might be a performance hit if you use the multicore runtime with a single-threaded program, or something like that. There is a book about Haskell concurrency and parallelism that I've been wanting to read (full text online): http://chimera.labs.oreilly.com/books/123000929/index.html > 2) it has a weird story about the brass cockroach, that basically > signified that they didn't have a robust enough testing system to > be able to reproduce the bug. > > The point was that it wasn't feasible to have a robust testing suite > because, you guessed it, No really, they observed this bug happening repeatedly under what sounded like fairly light load with real users. So a stress testing framework should have been able to reproduce it. Do you really think it's impossible to debug this kind of problem? OS developers do it all the time. There is no getting around it. > This is probably correct. Is there any STM implementations out that > that don't significantly compromise performance? STM is fast as long as there's not much contention for shared data between threads. In the "account balance" example that should almost always be the case. The slowdown is when multiple threads are fighting over the same data and transactions keep having to be rolled back and restarted. > multiprocessing module looks pretty nice and I should try it > It's 1 real advantage is that it side-steps the GIL. So, if you need > to utilise multiple cores for CPU bound tasks, then it might well be > the only option. It's 1 real advantage compared to what? I thought you were saying it avoids shared data hazards of threads. The 4 alternatives in that article were threads, multiprocessing, old-fashioned async (callback hell), and asyncio (still contorted and relies on Python 3 coroutines). If you eliminate threads because of data sharing and asyncio because you need Python 2 compatibility, you're left with multiprocessing if you want to avoid the control inversion of callback style. It's true though, this started out about the GIL in PyPy (was Laura going to post about that?) so using multicores is indeed maybe relevant. -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
Dave Angel wrote: > (Although I believe Seymour Cray was quoted as saying that virtual > memory is a crock, because "you can't fake what you ain't got.") If I recall correctly, disk access is about 1 times slower than RAM, so virtual memory is *at least* that much slower than real memory. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Python in The Economist
Hi all >From a recent article in The Economist - "A recovering economy in America and an explosion of entrepreneurial activity are driving up demand for tech talent. [...] Bidding battles are breaking out, with salaries and bonuses rising fast for experts in popular computer languages such as Python and Ruby on Rails." The author seems to have obtained his information from "a recent dinner party in Silicon Valley", so it may not be very representative. But to be mentioned in such a high-profile newspaper with its international readership can only be good for Python. Here is a link to the full article - http://www.economist.com/news/business/21644150-battle-software-talent-other-industries-can-learn-silicon-valley-how-bag Frank Millman -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie question about text encoding
On 02/27/2015 12:58 AM, Steven D'Aprano wrote: Dave Angel wrote: (Although I believe Seymour Cray was quoted as saying that virtual memory is a crock, because "you can't fake what you ain't got.") If I recall correctly, disk access is about 1 times slower than RAM, so virtual memory is *at least* that much slower than real memory. It's so much more complicated than that, that I hardly know where to start. I'll describe a generic processor/OS/memory/disk architecture; there will be huge differences between processor models even from a single manufacturer. First, as soon as you add swapping logic to your processor/memory-system, you theoretically slow it down. And in the days of that quote, Cray's memory was maybe 50 times as fast as the memory used by us mortals. So adding swapping logic would have slowed it down quite substantially, even when it was not swapping. But that logic is inside the CPU chip these days, and presumably thoroughly optimized. Next, statistically, a program uses a small subset of its total program & data space in its working set, and the working set should reside in real memory. But when the program greatly increases that working set, and it approaches the amount of physical memory, then swapping becomes more frenzied, and we say the program is thrashing. Simple example, try sorting an array that's about the size of available physical memory. Next, even physical memory is divided into a few levels of caching, some on-chip and some off. And the caching is done in what I call strips, where accessing just one byte causes the whole strip to be loaded from non-cached memory. I forget the current size for that, but it's maybe 64 to 256 bytes or so. If there are multiple processors (not multicore, but actual separate processors), then each one has such internal caches, and any writes on one processor may have to trigger flushes of all the other processors that happen to have the same strip loaded. The processor not only prefetches the next few instructions, but decodes and tentatively executes them, subject to being discarded if a conditional branch doesn't go the way the processor predicted. So some instructions execute in zero time, some of the time. Every address of instruction fetch, or of data fetch or store, goes through a couple of layers of translation. Segment register plus offset gives linear address. Lookup those in tables to get physical address, and if table happens not to be in on-chip cache, swap it in. If physical address isn't valid, a processor exception causes the OS to potentially swap something out, and something else in. Once we're paging from the swapfile, the size of the read is perhaps 4k. And that read is regardless of whether we're only going to use one byte or all of it. The ratio between an access which was in the L1 cache and one which required a page to be swapped in from disk? Much bigger than your 10,000 figure. But hopefully it doesn't happen a big percentage of the time. Many, many other variables, like the fact that RAM chips are not directly addressable by bytes, but instead count on rows and columns. So if you access many bytes in the same row, it can be much quicker than random access. So simple access time specifications don't mean as much as it would seem; the controller has to balance the RAM spec with the various cache requirements. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Picking apart a text line
On 02/26/2015 10:53 PM, memilanuk wrote: So... okay. I've got a bunch of PDFs of tournament reports that I want to sift thru for information. Ended up using 'pdftotext -layout file.pdf file.txt' to extract the text from the PDF. Still have a few little glitches to iron out there, but I'm getting decent enough results for the moment to move on. I've got my script to where it opens the file, ignores the header lines at the top, then goes through the rest of the file line by line, skipping lines if they don't match (don't need the separator lines) and adding them to a list if they do (and stripping whitespace off the right side along the way). So far, so good. # rstatPDF2csv.py import sys import re def convert(file): lines = [] data = open(file) # Skip first n lines of headers for i in range(9): data.__next__() # Read remaining lines one at a time for line in data: # If the line begins with a capital letter... if re.match(r'^[A-Z]', line): # Strip any trailing whitespace and then add to the list lines.append(line.rstrip()) return lines if __name__ == '__main__': print(convert(sys.argv[1])) What I'm ending up with is a list full of strings that look something like this: ['JOHN DOEC T HM 445-20*MW* 199-11*MW* 194-5 1HM 393-16*MW* 198-9 1HM198-11*MW*396-20*MW* 789-36*MW* 1234-56 *MW*', Basically... a certain number of characters allotted for competitor name, then four or five 1-2 char columns for things like classification, age group, special categories, etc., then a score ('445-20'), then up to 4 char for award (if any), then another score, another award, etc. etc. etc. Right now (in the PDF) the scores are batched by one criterion, then sorted within those groups. Makes life easier for the person giving out awards at the end of the tournament, not so much for someone trying to see how their individual score ranks against the whole field, not just their group or sub-group. I want to be able to pull all the scores out and then re-sort based on score - mainly the final aggregate score, but potentially also on stage or daily scores. Eventually I'd like to be able to calculate standardized z-scores so as to be able to compare scores from one event/location against another. So back to the lines of text I have stored as strings in a list. I think I want to convert that to a list of lists, i.e. split each line up, store that info in another list and ditch the whitespace. Or would I be better off using dicts? Originally I was thinking of how to process each line and split it them up based on what information was where - some sort of nested for/if mess. Now I'm starting to think that the lines of text are pretty uniform in structure i.e. the same field is always in the same location, and that list slicing might be the way to go, if a bit tedious to set up initially...? Any thoughts or suggestions from people who've gone down this particular path would be greatly appreciated. I think I have a general idea/direction, but I'm open to other ideas if the path I'm on is just blatantly wrong. Maintaining a list of lists is a big pain. If the data is truly very uniform, you might want to do it, but I'd find it much more reasonable to have names for the fields of each line. You can either do that with a named-tuple, or with instances of a custom class of your own. See https://docs.python.org/3.4/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields You read a line, do some sanity checking on it, and construct an object. Go to the next line, do the same, another object. Those objects are stored in a list. Everything else accesses the fields of the object something like: for row in mylist: print( row.name, row.classification, row.age) if row.name == "Doe": ... -- DaveA -- https://mail.python.org/mailman/listinfo/python-list