"A Fundamental Turn Toward Concurrency in Software"
Hello! Just gone though an article via Slashdot titled "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software" [http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that the continous CPU performance gain we've seen is finally over. And that future gain would primary be in the area of software concurrency taking advantage hyperthreading and multicore architectures. Perhaps something the Python interpreter team can ponder. -- http://mail.python.org/mailman/listinfo/python-list
Re: "A Fundamental Turn Toward Concurrency in Software"
Of course there are many performance bottleneck, CPU, memory, I/O, network all the way up to the software design and implementation. As a software guy myself I would say by far better software design would lead to the greatest performance gain. But that doesn't mean hardware engineer can sit back and declare this as "software's problem". Even if we are not writing CPU intensive application we will certain welcome "free performace gain" coming from a faster CPU or a more optimized compiler. I think this is significant because it might signify a paradigm shift. This might well be a hype, but let's just assume this is future direction of CPU design. Then we might as well start experimenting now. I would just throw some random ideas: parallel execution at statement level, look up symbol and attributes predicitively, parallelize hash function, dictionary lookup, sorting, list comprehension, etc, background just-in-time compilation, etc, etc. One of the author's idea is many of today's main stream technology (like OO) did not come about suddenly but has cumulated years of research before becoming widely used. A lot of these ideas may not work or does not seems to matter much today. But in 10 years we might be really glad that we have tried. aurora <[EMAIL PROTECTED]> writes: Just gone though an article via Slashdot titled "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software" [http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that the continous CPU performance gain we've seen is finally over. And that future gain would primary be in the area of software concurrency taking advantage hyperthreading and multicore architectures. Well, another gain could be had in making the software less wasteful of cpu cycles. I'm a pretty experienced programmer by most people's standards but I see a lot of systems where I can't for the life of me figure out how they manage to be so slow. It might be caused by environmental pollutants emanating from Redmond. -- http://mail.python.org/mailman/listinfo/python-list
list unpack trick?
I find that I use some list unpacking construct very often: name, value = s.split('=',1) So that 'a=1' unpack as name='a' and value='1' and 'a=b=c' unpack as name='a' and value='b=c'. The only issue is when s does not contain the character '=', let's say it is 'xyz', the result list has a len of 1 and the unpacking would fail. Is there some really handy trick to pack the result list into len of 2 so that it unpack as name='xyz' and value=''? So more generally, is there an easy way to pad a list into length of n with filler items appended at the end? -- http://mail.python.org/mailman/listinfo/python-list
Re: list unpack trick?
Thanks. I'm just trying to see if there is some concise syntax available without getting into obscurity. As for my purpose Siegmund's suggestion works quite well. The few forms you have suggested works. But as they refer to list multiple times, it need a separate assignment statement like list = s.split('=',1) I am think more in the line of string.ljust(). So if we have a list.ljust(length, filler), we can do something like name, value = s.split('=',1).ljust(2,'') I can always break it down into multiple lines. The good thing about list unpacking is its a really compact and obvious syntax. On Sat, 22 Jan 2005 08:34:27 +0100, Fredrik Lundh <[EMAIL PROTECTED]> wrote: ... So more generally, is there an easy way to pad a list into length of n with filler items appended at the end? some variants (with varying semantics): list = (list + n*[item])[:n] or list += (n - len(list)) * [item] or (readable): if len(list) < n: list.extend((n - len(list)) * [item]) etc. -- http://mail.python.org/mailman/listinfo/python-list
Re: list unpack trick?
On Sat, 22 Jan 2005 10:03:27 -0800, aurora <[EMAIL PROTECTED]> wrote: I am think more in the line of string.ljust(). So if we have a list.ljust(length, filler), we can do something like name, value = s.split('=',1).ljust(2,'') I can always break it down into multiple lines. The good thing about list unpacking is its a really compact and obvious syntax. Just to clarify the ljust() is a feature wish, probably should be named something like pad(). Also there is another thread a few hours before this asking about essentially the same thing. "default value in a list" http://groups-beta.google.com/group/comp.lang.python/browse_frm/thread/f3affefdb4272270 -- http://mail.python.org/mailman/listinfo/python-list
Re: limited python virtual machine (WAS: Another scripting language implemented into Python itself?)
It is really necessary to build a VM from the ground up that includes OS ability? What about JavaScript? On Wed, Jan 26, 2005 at 05:18:59PM +0100, Alexander Schremmer wrote: On Tue, 25 Jan 2005 22:08:01 +0100, I wrote: sys.safecall(func, maxcycles=1000) > could enter the safe mode and call the func. This might be even enhanced like this: >>> import sys >>> sys.safecall(func, maxcycles=1000, allowed_domains=['file-IO', 'net-IO', 'devices', 'gui'], allowed_modules=['_sre']) Any comments about this from someone who already hacked CPython? Yes, this comes up every couple months and there is only one answer: This is the job of the OS. Java largely succeeds at doing sandboxy things because it was written that way from the ground up (to behave both like a program interpreter and an OS). Python the language was not, and the CPython interpreter definitely was not. Search groups.google.com for previous discussions of this on c.l.py -Jack -- http://mail.python.org/mailman/listinfo/python-list
Re: Transparent (redirecting) proxy with BaseHTTPServer
If you actually want the IP, resolve the host header would give you that. In the redirect case you should get a host header like Host: www.python.org From that you can reconstruct the original URL as http://www.python.org/ftp/python/contrib/. With that you can open it using urllib and proxy the data to the client. The second form of HTTP request without the host part is for compatability of pre-HTTP/1.1 standard. All modern web browser should send the Host header. Hi list, My ultimate goal is to have a small HTTP proxy which is able to show a message specific to clients name/ip/status then handle the original request normally either by redirecting the client, or acting as a proxy. I started with a modified[1] version of TinyHTTPProxy postet by Suzuki Hisao somewhere in 2003 to this list and tried to extend it to my needs. It works quite well if I configure my client to use it, but using iptables REDIRECT feature to point the clients transparently to the proxy caused some issues. Precisely, the "self.path" member variable of baseHTTPRequestHandler is missing the and the host (i.e www.python.org) part of the request line for REDIRECTed connections: without iptables REDIRECT: self.path -> GET http://www.python.org/ftp/python/contrib/ HTTP/1.1 with REDIRECT: self.path -> GET /ftp/python/contrib/ HTTP/1.1 I asked about this on the squid mailing list and was told this is normal and I have to reconstuct the request line from the real destination IP, the URL-path and the Host header (if any). If the Host header is sent it's an (unsafe) nobrainer, but I cannot for the life of me figure out where to get the "real destination IP". Any ideas? thanks Paul [1] HTTP Debugging Proxy Modified by Xavier Defrang (http://defrang.com/) -- http://mail.python.org/mailman/listinfo/python-list
Re: Transparent (redirecting) proxy with BaseHTTPServer
It should be very safe to count on the host header. Maybe some really really old browser would not support that. But they probably won't work in today's WWW anyway. Majority of today's web site is likely to be virtually hosted. One Apache maybe hosting for 50 web addresses. If a client strip the host name and not sending the host header either the web server wouldn't what address it is really looking for. If you caught some request that doesn't have host header it is a good idea to redirect them to a browser upgrade page. Thanks, aurora ;), aurora wrote: If you actually want the IP, resolve the host header would give you that. I' m only interested in the hostname. The second form of HTTP request without the host part is for compatability of pre-HTTP/1.1 standard. All modern web browser should send the Host header. How safe is the assumtion that the Host header will be there? Is it part of the HTTP/1.1 spec? And does it mean all "pre 1.1" clients will fail? Hmm, maybe I should look on the wire whats really happening... thanks again Paul -- http://mail.python.org/mailman/listinfo/python-list
Go visit Xah Lee's home page
Let's stop discussing about the perl-python non-sense. It is so boring. For a break, just visit Mr Xah Lee's personal page (http://xahlee.org/PageTwo_dir/Personal_dir/xah.html). You'll find lot of funny information and quotes from this queer personality. Thankfully no perl-python stuff there. Don't miss Mr. Xah Lee's recent pictures at http://xahlee.org/PageTwo_dir/Personal_dir/mi_pixra.html My favor is the last picture. Long haired Xah Lee sitting contemplatively in the living room. The caption says "my beautiful hair, fails to resolve the problems of humanity. And, it is falling apart by age." -- http://mail.python.org/mailman/listinfo/python-list
Re: Next step after pychecker
A frequent error I encounter try: ...do something... except IOError: log('encounter an error %s line %d' % filename) Here in the string interpolation I should supply (filename,lineno). Usually I have a lot of unittesting to catch syntax error in the main code. But it is very difficult to run into exception handler, some of those are added defensely. Unfortunately those untested exception sometimes fails precisely when we need it for diagnosis information. pychecker sometime give false alarm. The argument of a string interpolation may be a valid tuple. It would be great it we can somehow unit test the exception handler (without building an extensive library of mock objects). -- http://mail.python.org/mailman/listinfo/python-list
Re: Printing Filenames with non-Ascii-Characters
On Tue, 01 Feb 2005 20:28:11 +0100, Marian Aldenhövel <[EMAIL PROTECTED]> wrote: Hi, I am very new to Python and have run into the following problem. If I do something like dir = os.listdir(somepath) for d in dir: print d The program fails for filenames that contain non-ascii characters. 'ascii' codec can't encode characters in position 33-34: I have noticed that this seems to be a very common problem. I have read a lot of postings regarding it but not really found a solution. Is there a simple one? English windows command prompt uses cp437 charset. To print it, use print d.encode('cp437') The issue is a terminal only understand certain character set. If you have unicode string, like d in your case, you have to encode it before it can be printed. (We really need native unicode terminal!!!) If you don't encode, Python will do it for you. The default encoding is ASCII. Any string that contains non-ASCII character will give you trouble. In my opinion Python is too conversative to use the 'strict' encoding which gives users unaware of unicode a lot of woes. So how did you get a unicoded d to start with? If 'somepath' is unicode, os.listdir returns a list of unicode. So why is somepath unicode? Either you have entered a unicode literal or it comes from some other sources. One possible source is XML parser, which returns string in unicode. Windows NT support unicode filename. I'm not sure about Linux. The result maybe slightly differ. What I specifically do not understand is why Python wants to interpret the string as ASCII at all. Where is this setting hidden? I am running Python 2.3.4 on Windows XP and I want to run the program on Debian sarge later. Ciao, MM -- http://mail.python.org/mailman/listinfo/python-list
hotspot profiler experience and accuracy?
I have a parser I need to optimize. It has some disk IO and a lot of looping over characters. I used the hotspot profiler to gain insight on optimization options. The methods show up on on the top of this list seems fairly trivial and does not look like CPU hogger. Nevertheless I optimized it and have 25% performance gain according to hotspot's number. But the numbers look skeptical. Hotspot claim 71.166 CPU seconds but the actual elapsed time is only 54s. When measuring elapsed time instead of CPU time the performance gain is only 13% with the profiler running and down to 10% when not using the profiler. Is there something I misunderstood in reading the numbers? -- http://mail.python.org/mailman/listinfo/python-list
Re: hotspot profiler experience and accuracy?
Thanks for pointing me to your analysis. Now I know it wasn't me doing something wrong. hotspot did lead me to knock down a major performance bottleneck one time. I found that zipfile.ZipFile() basically read the entire zip file in instantiation time, even though you may only need one file from it subsequencely. In anycase the number of function call seems to make sense and it should give some insight to the runtime behaviour. The CPU time is just so misleading. aurora wrote: But the numbers look skeptical. Hotspot claim 71.166 CPU seconds but the actual elapsed time is only 54s. When measuring elapsed time instead of CPU time the performance gain is only 13% with the profiler running and down to 10% when not using the profiler. Is there something I misunderstood in reading the numbers? Well, I'm confused too. Look at my post from a few months ago: http://tinyurl.com/6awzj (note that my code contained a few errors and that you need to use the fixed code that I posted a few replies later). Perhaps somebody can explain a bit more about this this time? :-) At the moment, frankly, hotspot seems rather useless. --Irmen -- http://mail.python.org/mailman/listinfo/python-list
Re: Printing Filenames with non-Ascii-Characters
> print d.encode('cp437') So I would have to specify the encoding on every call to print? I am sure to forget and I don't like the program dying, in my case garbled output would be much more acceptable. Marian I'm with you. You never known you have put enough encode in all the right places and there is no static type checking to help you. So that short answer is to set a different default in sitecustomize.py. I'm trying to writeup something about unicode in Python, once I understand what's going on inside... -- http://mail.python.org/mailman/listinfo/python-list
Re: OT: why are LAMP sites slow?
Slow compares to what? For a large commerical site with bigger budget, better infrastructure, better implementation, it is not surprising that they come out ahead compares to hobbyist sites. Putting implementation aside, is LAMP inherently performing worst than commerical alternatives like IIS, ColdFusion, Sun ONE or DB2? Sounds like that's your perposition. I don't know if there is any number to support this perposition. Note that many largest site have open source components in them. Google, Amazon, Yahoo all run on unix variants. Ebay is the notable exception, which uses IIS. Can you really say ebay is performing better that amazon (or vice versa)? I think the chief factor that a site performing poorly is in the implementation. It is really easy to throw big money into expensive software and hardware and come out with a performance dog. Google's infrastructure relies on a large distributed network of commodity hardware, not a few expensive boxes. LAMP based infrastructure, if used right, can support the most demanding applications. LAMP = Linux/Apache/MySQL/P{ython,erl,HP}. Refers to the general class of database-backed web sites built using those components. This being c.l.py, if you want, you can limit your interest to the case the P stands for Python. I notice that lots of the medium-largish sites (from hobbyist BBS's to sites like Slashdot, Wikipedia, etc.) built using this approach are painfully slow even using seriously powerful server hardware. Yet compared to a really large site like Ebay or Hotmail (to say nothing of Google), the traffic levels on those sites is just chickenfeed. I wonder what the webheads here see as the bottlenecks. Is it the application code? Disk bandwidth at the database side, that could be cured with more ram caches or solid state disks? SQL just inherently slow? I've only worked on one serious site of this type and it was "SAJO" (Solaris Apache Java Oracle) rather than LAMP, but the concepts are the same. I just feel like something bogus has to be going on. I think even sites like Slashdot handle fewer TPS than a 1960's airline reservation that ran on hardware with a fraction of the power of one of today's laptops. How would you go about building such a site? Is LAMP really the right approach? -- http://mail.python.org/mailman/listinfo/python-list
Re: OT: why are LAMP sites slow?
aurora <[EMAIL PROTECTED]> writes: Slow compares to what? For a large commerical site with bigger budget, better infrastructure, better implementation, it is not surprising that they come out ahead compares to hobbyist sites. Hmm, as mentioned, I'm not sure what the commercial sites do that's different. I take the view that the free software world is capable of anything that the commercial world is capable of, so I'm not awed just because a site is commercial. And sites like Slashdot have pretty big budgets by hobbyist standards. Putting implementation aside, is LAMP inherently performing worst than commerical alternatives like IIS, ColdFusion, Sun ONE or DB2? Sounds like that's your perposition. I wouldn't say that. I don't think Apache is a bottleneck compared with other web servers. Similarly I don't see an inherent reason for Python (or whatever) to be seriously slower than Java servlets. I have heard that MySQL doesn't handle concurrent updates nearly as well as DB2 or Oracle, or for that matter PostgreSQL, so I wonder if busier LAMP sites might benefit from switching to PostgreSQL (LAMP => LAPP?). I'm lost. So what do you compares against when you said LAMP is slow? What is the reference point? Is it just a general observation that slashdot is slower than we like it to be? If you are talking about slashdot, there are many ideas to make it faster. For example they can send all 600 comments to the client and let the user do querying using DHTML on the client side. This leave the server serving mostly static files and will certainly boost the performance tremendously. If you mean MySQL or SQL database in general is slow, there are truth in it. The best thing about SQL database is concurrent access, transactional semantics and versatile querying. Turns out a lot of application can really live without that. If you can rearchitect the application using flat files instead of database it can often be a big bloom. A lot of these is just implementation. Find the right tool and the right design for the job. I still don't see a case that LAMP based solution is inherently slow. -- http://mail.python.org/mailman/listinfo/python-list
Re: executing VBScript from Python and vice versa
Go to the bookstore and get a copy of Python Programming on Win32 by Mark Hammond, Andy Robinson today. http://www.oreilly.com/catalog/pythonwin32/ It has everything you need. Is there a way to make programs written in these two languages communicate with each other? I am pretty sure that VBScript can access a Python script because Python is COM compliant. On the other hand, Python might be able to call a VBScript through WSH. Can somebody provide a simple example? I have exactly 4 days of experience in Python (and fortunately, much more in VB6) Thanks. -- http://mail.python.org/mailman/listinfo/python-list
performance of recursive generator
I love generator and I use it a lot. Lately I've been writing some recursive generator to traverse tree structures. After taking closer look I have some concern on its performance. Let's take the inorder traversal from http://www.python.org/peps/pep-0255.html as an example. def inorder(t): if t: for x in inorder(t.left): yield x yield t.label for x in inorder(t.right): yield x Consider a 4 level deep tree that has only a right child: 1 \ 2 \ 3 \ 4 Using the recursive generator, the flow would go like this: maingen1gen2gen3gen4 inorder(1..4) yield 1 inorder(2..4) yield 2 yield 2 inorder(3..4) yield 3 yield3 yield 3 inorder(4) yield 4 yield 4 yield 4 yield 4 Note that there are 4 calls to inorder() and 10 yield. Indeed the complexity of traversing this kind of tree would be O(n^2)! Compare that with a similar recursive function using callback instead of generator. def inorder(t, foo): if t: inorder(t.left, foo): foo(t.label) inorder(t.right, foo): The flow would go like this: mainstack1 stack2 stack3 stack4 inorder(1..4) foo(1) inorder(2..4) foo(2) inorder(3..4) foo(3) inorder(4) foo(4) There will be 4 calls to inorder() and 4 call to foo(), give a reasonable O(n) performance. Is it an inherent issue in the use of recursive generator? Is there any compiler optimization possible? -- http://mail.python.org/mailman/listinfo/python-list
Re: performance of recursive generator
On Thu, 11 Aug 2005 01:18:11 -0700, Matt Hammond <[EMAIL PROTECTED]> wrote: > >> Is it an inherent issue in the use of recursive generator? Is there any >> compiler optimization possible? > > Hi, I could be misunderstanding it myself, but I think the short answer > to your question is that its an inherent limitation. ... > Perhaps if there existed some kind of syntax to hint this to python it > could optimise it away, eg: > >yield *inorder(t.left) > > ... but AFAIK there isn't :-( so I guess you'll have to avoid recursive > generators for this app! That would be unfortunately. I think generator is most elegant in traversing recursive structure. It is non-trivial to use most other methods. But the O(n^2) price tag is a big caveat to keep in mind. Of course I agree we should not optimize prematurely. I'm not about to rewrite my recursive generators just yet. But O(n^2) complexity is something important to bear in mind. It doesn't necessary cause problems in practice. But it might. -- http://mail.python.org/mailman/listinfo/python-list
Re: performance of recursive generator
> You seem to be assuming that a yield statement and a function call are > equivalent. I'm not sure that's a valid assumption. I don't know. I was hoping the compiler can optimize away the chain of yields. > Anyway, here's some data to consider: > > test.py > def gen(n): > if n: > for i in gen(n/2): > yield i > yield n > for i in gen(n/2): > yield i > > def gen_wrapper(n): > return list(gen(n)) > > def nongen(n, func): > if n: > nongen(n/2, func) > func(n) > nongen(n/2, func) > > def nongen_wrapper(n): > result = [] > nongen(n, result.append) > return result > - This test somehow water down the n^2 issue. The problem is in the depth of recursion, in this case it is only log(n). It is probably more interesting to test: def gen(n): if n: yield n for i in gen(n-1): yield i -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Binary and Windows
Thanks for making me aware of the difflib module. I don't know there is such cool module exists. You can make it available to other Windows program as a COM object. The win32 api should be all you need. It might be slightly challenging because some parameters are list of strings which might need a little work to translate into COM parameters. Hi. I'd like to compile (?) the DiffLib Python code into a binary form that can be called by other Windows apps - like, I'd like to compile it into a DLL. Is this possible? Many thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: DHTML control from Python?
IE should be able to do that. Install the win32 modules. Then you should simply embed Python using
unicode encoding usablilty problem
I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up in unexpected places and only when a non-ASCII or unicode character first found its way into the system. Below is an example. The program may runs fine at the beginning. But as soon as an unicode character u'b' is introduced, the program boom out unexpectedly. sys.getdefaultencoding() 'ascii' a='\xe5' # can print, you think you're ok ... print a å b=u'b' a==b Traceback (most recent call last): File "", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128) One may suggest the correct way to do it is to use decode, such as a.decode('latin-1') == b This brings up another issue. Most references and books focus exclusive on entering unicode literal and using the encode/decode methods. The fallacy is that string is such a basic data type use throughout the program, you really don't want to make a individual decision everytime when you use string (and take a penalty for any negligence). The Java has a much more usable model with unicode used internally and encoding/decoding decision only need twice when dealing with input and output. I am sure these errors are a nuisance to those who are half conscious to unicode. Even for those who choose to use unicode, it is almost impossible to ensure their program work correctly. -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie CGI problem
Not sure about the repeated hi. But you are supposed to use \r\n\r\n, not just \n\n according to the HTTP specification. #!/usr/bin/python import cgi print "Content-type: text/html\n\n" print "hi" Gives me the following in my browser: ''' hi Content-type: text/html hi ''' Why are there two 'hi's? Thanks, Rory -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie CGI problem
On Fri, 18 Feb 2005 18:36:10 +0100, Peter Otten <[EMAIL PROTECTED]> wrote: Rory Campbell-Lange wrote: #!/usr/bin/python import cgi print "Content-type: text/html\n\n" print "hi" Gives me the following in my browser: ''' hi Content-type: text/html hi ''' Why are there two 'hi's? You have chosen a bad name for your script: cgi.py. It is now self-importing. Rename it to something that doesn't clash with the standard library, and all should be OK. Peter You are genius. -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode encoding usablilty problem
On Fri, 18 Feb 2005 19:24:10 +0100, Fredrik Lundh <[EMAIL PROTECTED]> wrote: that's how you should do things in Python too, of course. a unicode string uses unicode internally. decode on the way in, encode on the way out, and things just work. the fact that you can mess things up by mixing unicode strings with binary strings doesn't mean that you have to mix unicode strings with binary strings in your program. I don't want to mix them. But how could I find them? How do I know this statement can be potential problem if a==b: where a and b can be instantiated individually far away from this line of code that put them together? In Java they are distinct data type and the compiler would catch all incorrect usage. In Python, the interpreter seems to 'help' us to promote binary string to unicode. Things works fine, unit tests pass, all until the first non-ASCII characters come in and then the program breaks. Is there a scheme for Python developer to use so that they are safe from incorrect mixing? -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode and socket
You could not. Unicode is an abstract data type. It must be encoded into octets in order to send via socket. And the other end must decode the octets to retrieve the unicode string. Needless to say the encoding scheme must be consistent and understood by both ends. On 18 Feb 2005 11:03:46 -0800, <[EMAIL PROTECTED]> wrote: hello all, I am new in Python. And I have got a problem about unicode. I have got a unicode string, when I was going to send it out throuth a socket by send(), I got an exception. How can I send the unicode string to the remote end of the socket as it is without any conversion of encode, so the remote end of the socket will receive unicode string? Thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode encoding usablilty problem
On Fri, 18 Feb 2005 20:18:28 +0100, Walter Dörwald <[EMAIL PROTECTED]> wrote: aurora wrote: > [...] In Java they are distinct data type and the compiler would catch all incorrect usage. In Python, the interpreter seems to 'help' us to promote binary string to unicode. Things works fine, unit tests pass, all until the first non-ASCII characters come in and then the program breaks. Is there a scheme for Python developer to use so that they are safe from incorrect mixing? Put the following: import sys sys.setdefaultencoding("undefined") in a file named sitecustomize.py somewhere in your Python path and Python will complain whenever there's an implicit conversion between str and unicode. HTH, Walter Dörwald That helps! Running unit test caught quite a few potential problems (as well as a lot of safe of ASCII string promotion). -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode encoding usablilty problem
On Fri, 18 Feb 2005 21:16:01 +0100, Martin v. Löwis <[EMAIL PROTECTED]> wrote: I'd like to point out the historical reason: Python predates Unicode, so the byte string type has many convenience operations that you would only expect of a character string. We have come up with a transition strategy, allowing existing libraries to widen their support from byte strings to character strings. This isn't a simple task, so many libraries still expect and return byte strings, when they should process character strings. Instead of breaking the libraries right away, we have defined a transitional mechanism, which allows to add Unicode support to libraries as the need arises. This transition is still in progress. I understand. So I wasn't yelling "why can't Python be more like Java". On the other hand I also want to point out making individual decision for each string wasn't practical and is very error prone. The fact that unicode and 8 bit string look alike and work alike in common situation but only run into problem with non-ASCII is very confusing for most people. Eventually, the primary string type should be the Unicode string. If you are curious how far we are still off that goal, just try running your program with the -U option. Lots of errors. Amount them are gzip (binary?!) and strftime?? I actually quite appriciate Python's power in processing binary data as 8-bit strings. But perhaps we should transition to use unicode as text string as treat binary string as exception. Right now we have '' - 8bit string; u'' unicode string How about b'' - 8bit string; '' unicode string and no automatic conversion. Perhaps this can be activated by something like the encoding declarations, so that transition can happen module by module. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode and socket
On 18 Feb 2005 19:10:36 -0800, <[EMAIL PROTECTED]> wrote: It's really funny, I cannot send a unicode stream throuth socket with python while all the other languages as perl,c and java can do it. then, how about converting the unicode string to a binary stream? It is possible to send a binary through socket with python? I was answering your specific question: "How can I send the unicode string to the remote end of the socket as it is without any conversion of encode" The answer is you could not. Not that you cannot sent unicode but you have to encode it. The same applies to perl, c or Java. The only difference is the detail of how strings get encoded. There are a few posts suggest various means. Or you can check out codecs.getwriter() which closer resembles Java's way. -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode encoding usablilty problem
On Sat, 19 Feb 2005 18:44:27 +0100, Fredrik Lundh <[EMAIL PROTECTED]> wrote: "aurora" <[EMAIL PROTECTED]> wrote: I don't want to mix them. But how could I find them? How do I know this statement can be potential problem if a==b: where a and b can be instantiated individually far away from this line of code that put them together? if you don't know what a and b comes from, how can you be sure that your program works at all? how can you be sure they're both strings? ("a op b" can fail in many ways, depending on what "a", "b", and "op" are) a and b are both string. The issue is 8-bit string or unicode string. Things works fine, unit tests pass, all until the first non-ASCII characters come in and then the program breaks. if you have unit tests, why don't they include Unicode tests? How do I structure the test cases to guarantee coverage? It is not practical to test every combinations of unicode/8-bit strings. Adding non-ascii characters to test data probably make problem pop up earlier. But it is arduous and it is hard to spot if you left out any. -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode encoding usablilty problem
On Sun, 20 Feb 2005 15:01:09 +0100, Martin v. Löwis <[EMAIL PROTECTED]> wrote: Nick Coghlan wrote: Having "", u"", and r"" be immutable, while b"" was mutable would seem rather inconsistent. Yes. However, this inconsistency might be desirable. It would, of course, mean that the literal cannot be a singleton. Instead, it has to be a display (?), similar to list or dict displays: each execution of the byte string literal creates a new object. An alternative would be to have "bytestr" be the immutable type corresponding to the current str (with b"" literals producing bytestr's), while reserving the "bytes" name for a mutable byte sequence. Indeed. This maze of options has caused the process to get stuck. People also argue that with such an approach, we could as well tell users to use array.array for the mutable type. But then, people complain that it doesn't have all the library support that strings have. The main point being, the replacement for 'str' needs to be immutable or the upgrade process is going to be a serious PITA. Somebody really needs to take this in his hands, completing the PEP, writing a patch, checking applications to find out what breaks. Regards, Martin What is the processing of getting a PEP work out? Does the work and discussion carry out in the python-dev mailing list? I would be glad to help out especially on this particular issue. -- http://mail.python.org/mailman/listinfo/python-list
Re: running a shell command from a python program
In Python 2.4, use the new subprocess module for this. It subsume the popen* methods. Hi, I'm a newbie, so please be gentle :-) How would I run a shell command in Python? Here is what I want to do: I want to run a shell command that outputs some stuff, save it into a list and do stuff with the contents of that list. I started with a BASH script actually, until I realized I really needed better data structures :-) Is popen the answer? Also, where online would I get access to good sample code that I could peruse? I'm running 2.2.3 on Linux, and going strictly by online doc so far. Thanks! S C -- http://mail.python.org/mailman/listinfo/python-list
Re: Python and "Ajax technology collaboration"
It was discussed in the last Bay Area Python Interest Group meeting. Thursday, February 10, 2005 Agenda: Developing Responsive GUI Applications Using HTML and HTTP Speakers: Donovan Preston http://www.baypiggies.net/ The author has a component LivePage for this. You may find it from http://nevow.com/. Similar idea from the Javascript stuff but very Python centric. Interesting GUI developments, it seems. Anyone developed a "Ajax" application using Python? Very curious thx (Ajax stands for: XHTML and CSS; dynamic display and interaction using the Document Object Model; data interchange and manipulation using XML and XSLT; asynchronous data retrieval using XMLHttpRequest; and JavaScript binding everything together ie Google has used these technologies to build Gmail, Google Maps etc. more info: http://www.adaptivepath.com/publications/essays/archives/000385.php) -- http://mail.python.org/mailman/listinfo/python-list
Re: Unit testing - one test class/method, or test class/class
I do something more or less like your option b. I don't think there is any orthodox structure to follow. You should use a style that fit your taste. What I really want to bring up is your might want to look at refactoring your module in the first place. 348 test cases for one module sounds like a large number. That reflects you have a fairly complex module to be tested to start with. Often the biggest benefit of doing automated unit testing is it forces the developers to modularize and decouple their code in order to make it testable. This action alone improve that code quality a lot. If breaking up the module make sense in your case, the test structure will follows. Hi, I just found py.test[1] and converted a large unit test module to py.test format (which is actually almost-no-format-at-all, but I won't get there now). Having 348 test cases in the module and huge test classes, I started to think about splitting classes. Basically you have at least three obvious choises, if you are going for consistency in your test modules: Choise a: Create a single test class for the whole module to be tested, whether it contains multiple classes or not. ...I dont think this method deserves closer inspection. It's probably rather poor method to begin with. With py.test where no subclassing is required (like in Python unittest, where you have to subclass unittest.TestCase) you'd probably be better off with just writing a test method for each class and each class method in the module. Choise b: Create a test class for each class in the module, plus one class for any non-class methods defined in the module. + Feels clean, because each test class is mapped to one class in the module + It is rather easy to find all tests for given class + Relatively easy to create class skeleton automatically from test module and the other way round - Test classes get huge easily - Missing test methods are not very easy to find[2] - A test method may depend on other tests in the same class Choise c: Create a test class for each non-class method and class method in the tested module. + Test classes are small, easy to find all tests for given method + Helps in test isolation - having separate test class for single method makes tested class less dependent of any other methods/classes + Relatively easy to create test module from existing class (but then you are not doing TDD!) but not vice versa - Large number of classes results in more overhead; more typing, probably requires subclassing because of common test class setup methods etc. What do you think, any important points I'm missing? Footnotes: [1] In reality, this is a secret plot to advertise py.test, see http://codespeak.net/py/current/doc/test.html [2] However, this problem disappears if you start with writing your tests first: with TDD, you don't have untested methods, because you start by writing the tests first, and end up with a module that passes the tests -- # Edvard Majakari Software Engineer # PGP PUBLIC KEY available Soli Deo Gloria! One day, when he was naughty, Mr Bunnsy looked over the hedge into Farmer Fred's field and it was full of fresh green lettuces. Mr Bunnsy, however, was not full of lettuces. This did not seem fair. --Mr Bunnsy has an adventure -- http://mail.python.org/mailman/listinfo/python-list
decode unicode string using 'unicode_escape' codecs
I have some unicode string with some characters encode using python notation like '\n' for LF. I need to convert that to the actual LF character. There is a 'unicode_escape' codec that seems to suit my purpose. >>> encoded = u'A\\nA' >>> decoded = encoded.decode('unicode_escape') >>> print len(decoded) 3 Note that both encoded and decoded are unicode string. I'm trying to use the builtin codec because I assume it has better performance that for me to write pure Python decoding. But I'm not converting between byte string and unicode string. However it runs into problem in some cases. encoded = u'€\\n€' decoded = encoded.decode('unicode_escape') Traceback (most recent call last): File "g:\bin\py_repos\mindretrieve\trunk\minds\x.py", line 9, in ? decoded = encoded.decode('unicode_escape') UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128) Reading the docuemnt more carefully, I found out what has happened. decode('unicode_escape') takes byte string as operand and convert it into unicode string. Since encoded is already unicode, it is first implicitly converted to byte string using 'ascii' encoding. In this case it fails because of the '€' character. So I resigned to the fact that 'unicode_escape' doesn't do what I want. But I think more deeply. I come up with this Python source code. It runs OK and outputs 3. - # -*- coding: utf-8 -*- print len(u'€\n€') # 3 - Think about what happened in the second line. First the parser decodes the bytes into an unicode string with UTF-8 encoding. Then it applies syntax run to decode the unicode characters '\n' to LF. The second is what I want. There must be something available to the Python interpreter that is not available to the user. So it there something I have overlook? Anyway I just want to leverage the builtin codecs for performance. I figure this would be faster than encoded.replace('\\n', '\n') ...and so on... If there are other suggestion it would be greatly appriciated :) wy -- http://mail.python.org/mailman/listinfo/python-list
Re: decode unicode string using 'unicode_escape' codecs
Cool, it works! I have also done some due diligence that the utf-8 encoding would not introduce any Python escape accidentially. I have written a recipe in the Python cookbook: Efficient character escapes decoding http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/466293 wy > Does this do what you want? > > >>> u'€\\n€' > u'\x80\\n\x80' > >>> len(u'€\\n€') > 4 > >>> u'€\\n€'.encode('utf-8').decode('string_escape').decode('utf-8') > u'\x80\n\x80' > >>> > len(u'€\\n€'.encode('utf-8').decode('string_escape').decode('utf-8')) > 3 > > Basically, I convert the unicode string to bytes, escape the bytes using > the 'string_escape' codec, and then convert the bytes back into a > unicode string. > > HTH, > > STeVe -- http://mail.python.org/mailman/listinfo/python-list
ANN: HTMLTestRunner - generates HTML test report for unittest
Greeting, HTMLTestRunner is an extension to the Python standard library's unittest module. It generates easy to use HTML test reports. See a sample report at http://tungwaiyip.info/software/sample_test_report.html. Check more information and download from http://tungwaiyip.info/software/#htmltestrunner Wai Yip Tung -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLTestRunner - generates HTML test report for unittest
On Fri, 27 Jan 2006 06:35:46 -0800, Paul McGuire <[EMAIL PROTECTED]> wrote: > Nice! I just adapted my pyparsing unit tests to use this tool - took me > about 3 minutes, and now it's much easier to run and review my unit test > results. I especially like the pass/fail color coding, and the > "drill-down" > to the test output. > > -- Paul Thank you! I'm glad that it is helpful to you :) -- http://mail.python.org/mailman/listinfo/python-list
Problem redirecting stdin on Windows
On Windows (XP) with win32 extension installed, a Python script can be launched from the command line directly since the .py extension is associated with python. However it fails if the stdin is piped or redirected. Assume there is an echo.py that read from stdin and echo the input. Launching from command line directly, this echos input from keyboard: echo.py But it causes an error if the stdin is redirected echo.py http://mail.python.org/mailman/listinfo/python-list
win32clipboard.GetClipboardData() return string with null characters
I was using win32clipboard.GetClipboardData() to retrieve the Windows clipboard using code similar to the message below: http://groups-beta.google.com/group/comp.lang.python/msg/3722ba3afb209314?hl=en Some how I notice the data returned includes \0 and some characters that shouldn't be there after the null character. It is easy enough to truncate them. But why does it get there in the first place? Is the data length somehow calculated wrong? I'm using Windows XP SP2 with Python 2.4 and pywin32-203. aurora -- http://mail.python.org/mailman/listinfo/python-list
Design mini-lanugage for data input
This is an entry I just added to ASPN. It is a somewhat novel technique I have employed quite successfully in my code. I repost it here for more explosure and discussions. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/475158 wy Title: Design mini-lanugage for data input Description: Many programs need a set of initial data. For ease of use and flexibility, design a mini-language for your input data. Use Python's superb text handling capability to parse and build the data structure from the input text. Source: Text Source # this is an example to demonstrate the programming technique DATA = """ # data souce: http://www.mongabay.com/igapo/world_statistics_by_pop.htm # Country / Captial / Area [sq. km] / 2002 Population Estimate China / Beijing / 9,596,960 / 1,284,303,705 India / New Delhi / 3,287,590 / 1,045,845,226 United States / Washington DC / 9,629,091 / 280,562,489 Indonesia / Jakarta / 1,919,440 / 231,328,092 Russia / Moscow / 17,075,200 / 144,978,573 """ def initData(): """ parse and return a country list of (name, captial, area, population) """ countries = [] for line in DATA.splitlines(): # filter out blank lines/comment lines line = line.strip() if not line or line.startswith('#'): continue # 4 fields separated by '/' parts = map(string.strip, line.split('/')) country, captial, area, population = parts # remove commas in numbers area = int(area.replace(',','')) population = int(population.replace(',','')) countries.append((country, captial, area, population)) return countries def findLargestCountry(countries): # your algorithm here def main(): countries = initData() print findLargestCountry(countries) Discussion: Problem --- Many programs need a set of initial data. The simplest way is to construct Python data structure directly as shown below. This is often not ideal. Algorithm and data structure tend to change. Python program statements is likely differ literally from its data source, which might be text pulled from web pages or other place. This means a great deal of effort is often needed to format and maintain the input as Python statements. This is a sample program that initialize some geographical data. # map of country -> (captial, area, population) COUNTRIES = {} COUNTRIES['China'] = ('Beijing', 9596960, 1284303705) COUNTRIES['India'] = ('New Delhi', 3287590, 1045845226) COUNTRIES['United States'] = ('Washington DC', 9629091, 280562489) COUNTRIES['Indonesia'] = ('Jakarta', 1919440, 231328092) COUNTRIES['Russia'] = ('Moscow', 17075200, 144978573) Mini-language - A more flexible approach is to define a mini-lanugage to describe the data. This can be as simple as formatting data into a multiple-line string. 1. Define the data format in text. It should mirror the data source and designed for ease for human editing. 2. Define the data structure. 3. Write glue code to parse the input data and initialize the data structure. In the example above we use one line for each record. Each record has four fields, Country, captial, area and population, separated by slashes. One of the immediate benefit is that we no longer need to type so many quotes for every string literal. This concise data format is much easiler to read and edit than Python statements. The parser simply break down the input text using splitlines() and then loop through them line by line. It is useful to account for some extra white space so that it is more flexible for human editor. In this case the numbers (area, population) from the data source contains commas. Rather than manually edit them out, they are copied as is into the text as is. Then they are parsed into integer using area = int(area.replace(',','')) Slash is chosen as the separator (rather than the more common comma) because it does not otherwise appear in the data. A record is parsed into field using line.split('/') Don't forget to remove extra white space using string.strip() Finally it built a data structure of list of country record as tuple of (country, captial, area, population). It is just as easy to turn them into objects or any other data structure as desired. The mini-language technique can be refined to represent more complex, more structured input. It makes transformation and maintenance of input data much easier. -- http://mail.python.org/mailman/listinfo/python-list
Re: datetime iso8601 string input
I agree. I just keep rewriting the parse method again and again. wy def parse_iso8601_date(s): """ Parse date in iso8601 format e.g. 2003-09-15T10:34:54 and returns a datetime object. """ y=m=d=hh=mm=ss=0 if len(s) not in [10,19,20]: raise ValueError('Invalid timestamp length - "%s"' % s) if s[4] != '-' or s[7] != '-': raise ValueError('Invalid separators - "%s"' % s) if len(s) > 10 and (s[13] != ':' or s[16] != ':'): raise ValueError('Invalid separators - "%s"' % s) try: y = int(s[0:4]) m = int(s[5:7]) d = int(s[8:10]) if len(s) >= 19: hh = int(s[11:13]) mm = int(s[14:16]) ss = int(s[17:19]) except Exception, e: raise ValueError('Invalid timestamp - "%s": %s' % (s, str(e))) return datetime(y,m,d,hh,mm,ss) > I was a little surprised to recently discover > that datetime has no method to input a string > value. PEP 321 appears does not convey much > information, but a timbot post from a couple > years ago clarifies things: > > http://tinyurl.com/epjqc > >> You can stop looking: datetime doesn't >> support any kind of conversion from string. >> The number of bottomless pits in any datetime >> module is unbounded, and Guido declared this >> particular pit out-of-bounds at the start so >> that there was a fighting chance to get >> *anything* done for 2.3. > > I can understand why datetime can't handle > arbitrary string inputs, but why not just > simple iso8601 format -- i.e. the default > output format for datetime? > > Given a datetime-generated string: > > >>> now = str(datetime.datetime.now()) > >>> print now > '2006-02-23 11:03:36.762172' > > Why can't we have a function to accept it > as string input and return a datetime object? > > datetime.parse_iso8601(now) > > Jeff Bauer > Rubicon, Inc. > -- http://mail.python.org/mailman/listinfo/python-list
Re: Design mini-lanugage for data input
Yes. But they have different motivations. The mini-language concept is to design an input format that is convenient for human editor and that is close to the semi-structured data source. I think the benefit from ease of editing and flexibility would justify writing a little parsing code. JSON is mainly designed for data exchange between programs. You can hand edit JSON data (as well as XML or Python statement) but it is not the most convenient. Just consider you don't have to enter two quotes for every string object is almost liberating. These quotes are only artifacts for structured data format. The idea to design a format convenient for human and let code to parse and built the data structure. wy > Hmm, > Do you know about JSON and YAML? > http://en.wikipedia.org/wiki/JSON > http://en.wikipedia.org/wiki/YAML > > They have the advantage of being maintained by a group of people and > being available for a number of languages. (as well as NOT being XML > :-) > > - Cheers, Paddy. > -- > http://paddy3118.blogspot.com/ > -- http://mail.python.org/mailman/listinfo/python-list
Re: Design mini-lanugage for data input
P.S. Also it is a 'mini-language' because it is an ad-hoc design that is good enough and can be easily implemented for a given problem. This is oppose to a general purpose solution like XML that is one translation from the original data format and carries too much baggages. > Just consider you don't have to enter two quotes for every string object > is almost liberating. These quotes are only artifacts for structured > data format. The idea to design a format convenient for human and let > code to parse and built the data structure. > > wy -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode question : turn "José" into u"José"
First of all, if you run this on the console, find out your console's encoding. In my case it is English Windows XP. It uses 'cp437'. C:\>chcp Active code page: 437 Then >>> s = "José" >>> u = u"Jos\u00e9" # same thing in unicode escape >>> s.decode('cp437') == u # use encoding that match your console True >>> wy > This is probably stupid and/or misguided but supposing I'm passed a > byte-string value that I want to be unicode, this is what I do. I'm sure > I'm missing something very important. > > Short version : > s = "José" #Start with non-unicode string unicoded = eval("u'%s'" % "José") > > Long version : > s = "José" #Start with non-unicode string s #Lets look at it > 'Jos\xe9' escaped = s.encode('string_escape') escaped > 'Jos\\xe9' unicoded = eval("u'%s'" % escaped) unicoded > u'Jos\xe9' > test = u"José" #What they should have passed me test == unicoded #Am I really getting the same thing? > True #Yay! > > > > -- http://mail.python.org/mailman/listinfo/python-list