from:"aurora"

"A Fundamental Turn Toward Concurrency in Software"

2005-01-07 Thread aurora

Hello!
Just gone though an article via Slashdot titled "The Free Lunch Is Over: A  
Fundamental Turn Toward Concurrency in Software"  
[http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that the  
continous CPU performance gain we've seen is finally over. And that future  
gain would primary be in the area of software concurrency taking advantage  
hyperthreading and multicore architectures.

Perhaps something the Python interpreter team can ponder.
--
http://mail.python.org/mailman/listinfo/python-list

Re: "A Fundamental Turn Toward Concurrency in Software"

2005-01-08 Thread aurora

Of course there are many performance bottleneck, CPU, memory, I/O, network  
all the way up to the software design and implementation. As a software  
guy myself I would say by far better software design would lead to the  
greatest performance gain. But that doesn't mean hardware engineer can sit  
back and declare this as "software's problem". Even if we are not writing  
CPU intensive application we will certain welcome "free performace gain"  
coming from a faster CPU or a more optimized compiler.

I think this is significant because it might signify a paradigm shift.  
This might well be a hype, but let's just assume this is future direction  
of CPU design. Then we might as well start experimenting now. I would just  
throw some random ideas: parallel execution at statement level, look up  
symbol and attributes predicitively, parallelize hash function, dictionary  
lookup, sorting, list comprehension, etc, background just-in-time  
compilation, etc, etc.

One of the author's idea is many of today's main stream technology (like  
OO) did not come about suddenly but has cumulated years of research before  
becoming widely used. A lot of these ideas may not work or does not seems  
to matter much today. But in 10 years we might be really glad that we have  
tried.


aurora <[EMAIL PROTECTED]> writes:
Just gone though an article via Slashdot titled "The Free Lunch Is
Over: A  Fundamental Turn Toward Concurrency in Software"
[http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that
the  continous CPU performance gain we've seen is finally over. And
that future  gain would primary be in the area of software concurrency
taking advantage  hyperthreading and multicore architectures.
Well, another gain could be had in making the software less wasteful
of cpu cycles.
I'm a pretty experienced programmer by most people's standards but I
see a lot of systems where I can't for the life of me figure out how
they manage to be so slow.  It might be caused by environmental
pollutants emanating from Redmond.
--
http://mail.python.org/mailman/listinfo/python-list

list unpack trick?

2005-01-21 Thread aurora

I find that I use some list unpacking construct very often:
name, value = s.split('=',1)
So that 'a=1' unpack as name='a' and value='1' and 'a=b=c' unpack as  
name='a' and value='b=c'.

The only issue is when s does not contain the character '=', let's say it  
is 'xyz', the result list has a len of 1 and the unpacking would fail. Is  
there some really handy trick to pack the result list into len of 2 so  
that it unpack as name='xyz' and value=''?

So more generally, is there an easy way to pad a list into length of n  
with filler items appended at the end?
--
http://mail.python.org/mailman/listinfo/python-list

Re: list unpack trick?

2005-01-22 Thread aurora

Thanks. I'm just trying to see if there is some concise syntax available  
without getting into obscurity. As for my purpose Siegmund's suggestion  
works quite well.

The few forms you have suggested works. But as they refer to list multiple  
times, it need a separate assignment statement like

  list = s.split('=',1)
I am think more in the line of string.ljust(). So if we have a  
list.ljust(length, filler), we can do something like

  name, value = s.split('=',1).ljust(2,'')
I can always break it down into multiple lines. The good thing about list  
unpacking is its a really compact and obvious syntax.


On Sat, 22 Jan 2005 08:34:27 +0100, Fredrik Lundh <[EMAIL PROTECTED]>  
wrote:
...
So more generally, is there an easy way to pad a list into length of n   
with filler items appended
at the end?
some variants (with varying semantics):
list = (list + n*[item])[:n]
or
list += (n - len(list)) * [item]
or (readable):
if len(list) < n:
list.extend((n - len(list)) * [item])
etc.

--
http://mail.python.org/mailman/listinfo/python-list

Re: list unpack trick?

2005-01-23 Thread aurora

On Sat, 22 Jan 2005 10:03:27 -0800, aurora <[EMAIL PROTECTED]> wrote:
I am think more in the line of string.ljust(). So if we have a  
list.ljust(length, filler), we can do something like

   name, value = s.split('=',1).ljust(2,'')
I can always break it down into multiple lines. The good thing about  
list unpacking is its a really compact and obvious syntax.
Just to clarify the ljust() is a feature wish, probably should be named  
something like pad().

Also there is another thread a few hours before this asking about  
essentially the same thing.

"default value in a list"
http://groups-beta.google.com/group/comp.lang.python/browse_frm/thread/f3affefdb4272270
--
http://mail.python.org/mailman/listinfo/python-list

Re: limited python virtual machine (WAS: Another scripting language implemented into Python itself?)

2005-01-26 Thread aurora

It is really necessary to build a VM from the ground up that includes OS  
ability? What about JavaScript?

On Wed, Jan 26, 2005 at 05:18:59PM +0100, Alexander Schremmer wrote:
On Tue, 25 Jan 2005 22:08:01 +0100, I wrote:
 sys.safecall(func, maxcycles=1000)
> could enter the safe mode and call the func.
This might be even enhanced like this:
>>> import sys
>>> sys.safecall(func, maxcycles=1000,
 allowed_domains=['file-IO', 'net-IO', 'devices',  
'gui'],
 allowed_modules=['_sre'])

Any comments about this from someone who already hacked CPython?
Yes, this comes up every couple months and there is only one answer:
This is the job of the OS.
Java largely succeeds at doing sandboxy things because it was written  
that
way from the ground up (to behave both like a program interpreter and an  
OS).
Python the language was not, and the CPython interpreter definitely was  
not.

Search groups.google.com for previous discussions of this on c.l.py
-Jack
--
http://mail.python.org/mailman/listinfo/python-list

Re: Transparent (redirecting) proxy with BaseHTTPServer

2005-01-27 Thread aurora

If you actually want the IP, resolve the host header would give you that.
In the redirect case you should get a host header like
Host: www.python.org
From that you can reconstruct the original URL as  
http://www.python.org/ftp/python/contrib/. With that you can open it using  
urllib and proxy the data to the client.

The second form of HTTP request without the host part is for compatability  
of pre-HTTP/1.1 standard. All modern web browser should send the Host  
header.


Hi list,
My ultimate goal is to have a small HTTP proxy which is able to show a  
message specific to clients name/ip/status then handle the original  
request normally either by redirecting the client, or acting as a proxy.

I started with a modified[1] version of TinyHTTPProxy postet by Suzuki  
Hisao somewhere in 2003 to this list and tried to extend it to my needs.  
It works quite well if I configure my client to use it, but using  
iptables REDIRECT feature to point the clients transparently to the  
proxy caused some issues.

Precisely, the "self.path" member variable of baseHTTPRequestHandler is  
missing the  and the host (i.e www.python.org) part of the  
request line for REDIRECTed connections:

without iptables REDIRECT:
self.path -> GET http://www.python.org/ftp/python/contrib/ HTTP/1.1
with REDIRECT:
self.path -> GET /ftp/python/contrib/ HTTP/1.1
I asked about this on the squid mailing list and was told this is normal  
and I have to reconstuct the request line from the real destination IP,  
the URL-path and the Host header (if any). If the Host header is sent  
it's an (unsafe) nobrainer, but I cannot for the life of me figure out  
where to get the "real destination IP". Any ideas?

thanks
  Paul
[1] HTTP Debugging Proxy
  Modified by Xavier Defrang (http://defrang.com/)
--
http://mail.python.org/mailman/listinfo/python-list

Re: Transparent (redirecting) proxy with BaseHTTPServer

2005-01-28 Thread aurora

It should be very safe to count on the host header. Maybe some really  
really old browser would not support that. But they probably won't work in  
today's WWW anyway. Majority of today's web site is likely to be virtually  
hosted. One Apache maybe hosting for 50 web addresses. If a client strip  
the host name and not sending the host header either the web server  
wouldn't what address it is really looking for. If you caught some request  
that doesn't have host header it is a good idea to redirect them to a  
browser upgrade page.

Thanks, aurora ;),
aurora wrote:
If you actually want the IP, resolve the host header would give you  
that.
I' m only interested in the hostname.
 The second form of HTTP request without the host part is for  
compatability  of pre-HTTP/1.1 standard. All modern web browser should  
send the Host  header.
How safe is the assumtion that the Host header will be there? Is it part  
of the HTTP/1.1 spec? And does it mean all "pre 1.1" clients will fail?  
Hmm, maybe I should look on the wire whats really happening...

thanks again
  Paul
--
http://mail.python.org/mailman/listinfo/python-list

Go visit Xah Lee's home page

2005-01-31 Thread aurora

Let's stop discussing about the perl-python non-sense. It is so boring.
For a break, just visit Mr Xah Lee's personal page  
(http://xahlee.org/PageTwo_dir/Personal_dir/xah.html). You'll find lot of  
funny information and quotes from this queer personality. Thankfully no  
perl-python stuff there.

Don't miss Mr. Xah Lee's recent pictures at
  http://xahlee.org/PageTwo_dir/Personal_dir/mi_pixra.html
My favor is the last picture. Long haired Xah Lee sitting contemplatively  
in the living room. The caption says "my beautiful hair, fails to resolve  
the problems of humanity. And, it is falling apart by age."
--
http://mail.python.org/mailman/listinfo/python-list

Re: Next step after pychecker

2005-02-01 Thread aurora

A frequent error I encounter
  try:
...do something...
  except IOError:
log('encounter an error %s line %d' % filename)
Here in the string interpolation I should supply (filename,lineno).  
Usually I have a lot of unittesting to catch syntax error in the main  
code. But it is very difficult to run into exception handler, some of  
those are added defensely. Unfortunately those untested exception  
sometimes fails precisely when we need it for diagnosis information.

pychecker sometime give false alarm. The argument of a string  
interpolation  may be a valid tuple. It would be great it we can somehow  
unit test the exception handler (without building an extensive library of  
mock objects).
--
http://mail.python.org/mailman/listinfo/python-list

Re: Printing Filenames with non-Ascii-Characters

2005-02-01 Thread aurora

On Tue, 01 Feb 2005 20:28:11 +0100, Marian Aldenhövel  
<[EMAIL PROTECTED]> wrote:

Hi,
I am very new to Python and have run into the following problem. If I do
something like
   dir = os.listdir(somepath)
   for d in dir:
  print d

The program fails for filenames that contain non-ascii characters.
   'ascii' codec can't encode characters in position 33-34:
I have noticed that this seems to be a very common problem. I have read  
a lot
of postings regarding it but not really found a solution. Is there a  
simple
one?
English windows command prompt uses cp437 charset. To print it, use
  print d.encode('cp437')
The issue is a terminal only understand certain character set. If you have  
unicode string, like d in your case, you have to encode it before it can  
be printed. (We really need native unicode terminal!!!) If you don't  
encode, Python will do it for you. The default encoding is ASCII. Any  
string that contains non-ASCII character will give you trouble. In my  
opinion Python is too conversative to use the 'strict' encoding which  
gives users unaware of unicode a lot of woes.

So how did you get a unicoded d to start with? If 'somepath' is unicode,  
os.listdir returns a list of unicode. So why is somepath unicode? Either  
you have entered a unicode literal or it comes from some other sources.  
One possible source is XML parser, which returns string in unicode.

Windows NT support unicode filename. I'm not sure about Linux. The result  
maybe slightly differ.




What I specifically do not understand is why Python wants to interpret  
the
string as ASCII at all. Where is this setting hidden?

I am running Python 2.3.4 on Windows XP and I want to run the program on
Debian sarge later.
Ciao, MM
--
http://mail.python.org/mailman/listinfo/python-list

hotspot profiler experience and accuracy?

2005-02-01 Thread aurora

I have a parser I need to optimize. It has some disk IO and a lot of  
looping over characters.

I used the hotspot profiler to gain insight on optimization options. The  
methods show up on on the top of this list seems fairly trivial and does  
not look like CPU hogger. Nevertheless I optimized it and have 25%  
performance gain according to hotspot's number.

But the numbers look skeptical. Hotspot claim 71.166 CPU seconds but the  
actual elapsed time is only 54s. When measuring elapsed time instead of  
CPU time the performance gain is only 13% with the profiler running and  
down to 10% when not using the profiler.

Is there something I misunderstood in reading the numbers?
--
http://mail.python.org/mailman/listinfo/python-list

Re: hotspot profiler experience and accuracy?

2005-02-02 Thread aurora

Thanks for pointing me to your analysis. Now I know it wasn't me doing  
something wrong.

hotspot did lead me to knock down a major performance bottleneck one time.  
I found that zipfile.ZipFile() basically read the entire zip file in  
instantiation time, even though you may only need one file from it  
subsequencely.

In anycase the number of function call seems to make sense and it should  
give some insight to the runtime behaviour. The CPU time is just so  
misleading.


aurora wrote:
But the numbers look skeptical. Hotspot claim 71.166 CPU seconds but  
the  actual elapsed time is only 54s. When measuring elapsed time  
instead of  CPU time the performance gain is only 13% with the profiler  
running and  down to 10% when not using the profiler.
 Is there something I misunderstood in reading the numbers?
Well, I'm confused too. Look at my post from a few months ago:
   http://tinyurl.com/6awzj
(note that my code contained a few errors and that you need
to use the fixed code that I posted a few replies later).
Perhaps somebody can explain a bit more about this this time? :-)
At the moment, frankly, hotspot seems rather useless.
--Irmen
--
http://mail.python.org/mailman/listinfo/python-list

Re: Printing Filenames with non-Ascii-Characters

2005-02-02 Thread aurora

 > print d.encode('cp437')
So I would have to specify the encoding on every call to print? I am  
sure to
forget and I don't like the program dying, in my case garbled output  
would be
much more acceptable.
Marian I'm with you. You never known you have put enough encode in all the  
right places and there is no static type checking to help you. So that  
short answer is to set a different default in sitecustomize.py. I'm trying  
to writeup something about unicode in Python, once I understand what's  
going on inside...
--
http://mail.python.org/mailman/listinfo/python-list

Re: OT: why are LAMP sites slow?

2005-02-03 Thread aurora

Slow compares to what? For a large commerical site with bigger budget,  
better infrastructure, better implementation, it is not surprising that  
they come out ahead compares to hobbyist sites.

Putting implementation aside, is LAMP inherently performing worst than  
commerical alternatives like IIS, ColdFusion, Sun ONE or DB2? Sounds like  
that's your perposition.

I don't know if there is any number to support this perposition. Note that  
many largest site have open source components in them. Google, Amazon,  
Yahoo all run on unix variants. Ebay is the notable exception, which uses  
IIS. Can you really say ebay is performing better that amazon (or vice  
versa)?

I think the chief factor that a site performing poorly is in the  
implementation. It is really easy to throw big money into expensive  
software and hardware and come out with a performance dog. Google's  
infrastructure relies on a large distributed network of commodity  
hardware, not a few expensive boxes. LAMP based infrastructure, if used  
right, can support the most demanding applications.


LAMP = Linux/Apache/MySQL/P{ython,erl,HP}.  Refers to the general
class of database-backed web sites built using those components.  This
being c.l.py, if you want, you can limit your interest to the case the
P stands for Python.
I notice that lots of the medium-largish sites (from hobbyist BBS's to
sites like Slashdot, Wikipedia, etc.)  built using this approach are
painfully slow even using seriously powerful server hardware.  Yet
compared to a really large site like Ebay or Hotmail (to say nothing
of Google), the traffic levels on those sites is just chickenfeed.
I wonder what the webheads here see as the bottlenecks.  Is it the
application code?  Disk bandwidth at the database side, that could be
cured with more ram caches or solid state disks?  SQL just inherently
slow?
I've only worked on one serious site of this type and it was "SAJO"
(Solaris Apache Java Oracle) rather than LAMP, but the concepts are
the same.  I just feel like something bogus has to be going on.  I
think even sites like Slashdot handle fewer TPS than a 1960's airline
reservation that ran on hardware with a fraction of the power of one
of today's laptops.
How would you go about building such a site?  Is LAMP really the right
approach?
--
http://mail.python.org/mailman/listinfo/python-list

Re: OT: why are LAMP sites slow?

2005-02-03 Thread aurora

aurora <[EMAIL PROTECTED]> writes:
Slow compares to what? For a large commerical site with bigger budget,
better infrastructure, better implementation, it is not surprising
that  they come out ahead compares to hobbyist sites.
Hmm, as mentioned, I'm not sure what the commercial sites do that's
different.  I take the view that the free software world is capable of
anything that the commercial world is capable of, so I'm not awed just
because a site is commercial.  And sites like Slashdot have pretty big
budgets by hobbyist standards.
Putting implementation aside, is LAMP inherently performing worst than
commerical alternatives like IIS, ColdFusion, Sun ONE or DB2? Sounds
like  that's your perposition.
I wouldn't say that.  I don't think Apache is a bottleneck compared
with other web servers.  Similarly I don't see an inherent reason for
Python (or whatever) to be seriously slower than Java servlets.  I
have heard that MySQL doesn't handle concurrent updates nearly as well
as DB2 or Oracle, or for that matter PostgreSQL, so I wonder if busier
LAMP sites might benefit from switching to PostgreSQL (LAMP => LAPP?).
I'm lost. So what do you compares against when you said LAMP is slow? What  
is the reference point? Is it just a general observation that slashdot is  
slower than we like it to be?

If you are talking about slashdot, there are many ideas to make it faster.  
For example they can send all 600 comments to the client and let the user  
do querying using DHTML on the client side. This leave the server serving  
mostly static files and will certainly boost the performance tremendously.

If you mean MySQL or SQL database in general is slow, there are truth in  
it. The best thing about SQL database is concurrent access, transactional  
semantics and versatile querying. Turns out a lot of application can  
really live without that. If you can rearchitect the application using  
flat files instead of database it can often be a big bloom.

A lot of these is just implementation. Find the right tool and the right  
design for the job. I still don't see a case that LAMP based solution is  
inherently slow.
--
http://mail.python.org/mailman/listinfo/python-list

Re: executing VBScript from Python and vice versa

2005-02-04 Thread aurora

Go to the bookstore and get a copy of Python Programming on Win32
by Mark Hammond, Andy Robinson today.
  http://www.oreilly.com/catalog/pythonwin32/
It has everything you need.
Is there a way to make programs written in these two languages  
communicate
with each other? I am pretty sure that VBScript can access a Python  
script
because Python is COM compliant. On the other hand, Python might be able  
to
call a VBScript through WSH. Can somebody provide a simple example? I  
have
exactly 4 days of experience in Python (and fortunately, much more in  
VB6)

Thanks.
--
http://mail.python.org/mailman/listinfo/python-list

performance of recursive generator

2005-08-10 Thread aurora

I love generator and I use it a lot. Lately I've been writing some  
recursive generator to traverse tree structures. After taking closer look  
I have some concern on its performance.

Let's take the inorder traversal from  
http://www.python.org/peps/pep-0255.html as an example.

def inorder(t):
 if t:
 for x in inorder(t.left):
 yield x
 yield t.label
 for x in inorder(t.right):
 yield x

Consider a 4 level deep tree that has only a right child:

1
  \
   2
\
 3
  \
   4


Using the recursive generator, the flow would go like this:

maingen1gen2gen3gen4


inorder(1..4)

 yield 1
 inorder(2..4)
 yield 2
 yield 2
 inorder(3..4)
 yield 3
 yield3
 yield 3
 inorder(4)
 yield 4
 yield 4
 yield 4
 yield 4


Note that there are 4 calls to inorder() and 10 yield. Indeed the  
complexity of traversing this kind of tree would be O(n^2)!


Compare that with a similar recursive function using callback instead of  
generator.

def inorder(t, foo):
 if t:
 inorder(t.left, foo):
 foo(t.label)
 inorder(t.right, foo):


The flow would go like this:

mainstack1  stack2  stack3  stack4


inorder(1..4)
 foo(1)
 inorder(2..4)
 foo(2)
 inorder(3..4)
 foo(3)
 inorder(4)
 foo(4)


There will be 4 calls to inorder() and 4 call to foo(), give a reasonable  
O(n) performance.

Is it an inherent issue in the use of recursive generator? Is there any  
compiler optimization possible?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: performance of recursive generator

2005-08-11 Thread aurora

On Thu, 11 Aug 2005 01:18:11 -0700, Matt Hammond  
<[EMAIL PROTECTED]> wrote:

>
>> Is it an inherent issue in the use of recursive generator? Is there any  
>> compiler optimization possible?
>
> Hi, I could be misunderstanding it myself, but I think the short answer  
> to your question is that its an inherent limitation.

...

> Perhaps if there existed some kind of syntax to hint this to python it  
> could optimise it away, eg:
>
>yield *inorder(t.left)
>
> ... but AFAIK there isn't :-( so I guess you'll have to avoid recursive  
> generators for this app!


That would be unfortunately. I think generator is most elegant in  
traversing recursive structure. It is non-trivial to use most other  
methods. But the O(n^2) price tag is a big caveat to keep in mind.

Of course I agree we should not optimize prematurely. I'm not about to  
rewrite my recursive generators just yet. But O(n^2) complexity is  
something important to bear in mind. It doesn't necessary cause problems  
in practice. But it might.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: performance of recursive generator

2005-08-11 Thread aurora

> You seem to be assuming that a yield statement and a function call are  
> equivalent.  I'm not sure that's a valid assumption.

I don't know. I was hoping the compiler can optimize away the chain of  
yields.

> Anyway, here's some data to consider:
>
>  test.py 
> def gen(n):
>  if n:
>  for i in gen(n/2):
>  yield i
>  yield n
>  for i in gen(n/2):
>   yield i
>
> def gen_wrapper(n):
>  return list(gen(n))
>
> def nongen(n, func):
>   if n:
>   nongen(n/2, func)
>   func(n)
>   nongen(n/2, func)
>
> def nongen_wrapper(n):
>   result = []
>   nongen(n, result.append)
>   return result
> -

This test somehow water down the n^2 issue. The problem is in the depth of  
recursion, in this case it is only log(n). It is probably more interesting  
to test:

def gen(n):
  if n:
  yield n
  for i in gen(n-1):
  yield i
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python Binary and Windows

2005-02-07 Thread aurora

Thanks for making me aware of the difflib module. I don't know there is  
such cool module exists.

You can make it available to other Windows program as a COM object. The  
win32 api should be all you need. It might be slightly challenging because  
some parameters are list of strings which might need a little work to  
translate into COM parameters.

Hi.
I'd like to compile (?) the DiffLib Python code into a binary form that  
can
be called by other Windows apps - like, I'd like to compile it into a  
DLL.

Is this possible?
Many thanks!

--
http://mail.python.org/mailman/listinfo/python-list

Re: DHTML control from Python?

2005-02-14 Thread aurora

IE should be able to do that. Install the win32 modules. Then you should  
simply embed Python using

unicode encoding usablilty problem

2005-02-18 Thread aurora

I have long find the Python default encoding of strict ASCII frustrating.  
For one thing I prefer to get garbage character than an exception. But the  
biggest issue is Unicode exception often pop up in unexpected places and  
only when a non-ASCII or unicode character first found its way into the  
system.

Below is an example. The program may runs fine at the beginning. But as  
soon as an unicode character u'b' is introduced, the program boom out  
unexpectedly.

sys.getdefaultencoding()
'ascii'
a='\xe5'
# can print, you think you're ok
... print a
å
b=u'b'
a==b
Traceback (most recent call last):
  File "", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0:  
ordinal not in range(128)


One may suggest the correct way to do it is to use decode, such as
  a.decode('latin-1') == b
This brings up another issue. Most references and books focus exclusive on  
entering unicode literal and using the encode/decode methods. The fallacy  
is that string is such a basic data type use throughout the program, you  
really don't want to make a individual decision everytime when you use  
string (and take a penalty for any negligence). The Java has a much more  
usable model with unicode used internally and encoding/decoding decision  
only need twice when dealing with input and output.

I am sure these errors are a nuisance to those who are half conscious to  
unicode. Even for those who choose to use unicode, it is almost impossible  
to ensure their program work correctly.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie CGI problem

2005-02-18 Thread aurora

Not sure about the repeated hi. But you are supposed to use \r\n\r\n, not  
just \n\n according to the HTTP specification.

#!/usr/bin/python
import cgi
print "Content-type: text/html\n\n"
print "hi"
Gives me the following in my browser:
'''
hi
Content-type: text/html
hi
'''
Why are there two 'hi's?
Thanks,
Rory
--
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie CGI problem

2005-02-18 Thread aurora

On Fri, 18 Feb 2005 18:36:10 +0100, Peter Otten <[EMAIL PROTECTED]> wrote:
Rory Campbell-Lange wrote:
#!/usr/bin/python
import cgi
print "Content-type: text/html\n\n"
print "hi"
Gives me the following in my browser:
'''
hi
Content-type: text/html
hi
'''
Why are there two 'hi's?
You have chosen a bad name for your script: cgi.py.
It is now self-importing. Rename it to something that doesn't clash with  
the
standard library, and all should be OK.

Peter
You are genius.
--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode encoding usablilty problem

2005-02-18 Thread aurora

On Fri, 18 Feb 2005 19:24:10 +0100, Fredrik Lundh <[EMAIL PROTECTED]>  
wrote:

that's how you should do things in Python too, of course.  a unicode  
string
uses unicode internally. decode on the way in, encode on the way out, and
things just work.

the fact that you can mess things up by mixing unicode strings with  
binary
strings doesn't mean that you have to mix unicode strings with binary  
strings
in your program.
I don't want to mix them. But how could I find them? How do I know this  
statement can be potential problem

  if a==b:
where a and b can be instantiated individually far away from this line of  
code that put them together?

In Java they are distinct data type and the compiler would catch all  
incorrect usage. In Python, the interpreter seems to 'help' us to promote  
binary string to unicode. Things works fine, unit tests pass, all until  
the first non-ASCII characters come in and then the program breaks.

Is there a scheme for Python developer to use so that they are safe from  
incorrect mixing?
--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode and socket

2005-02-18 Thread aurora

You could not. Unicode is an abstract data type. It must be encoded into  
octets in order to send via socket. And the other end must decode the  
octets to retrieve the unicode string. Needless to say the encoding scheme  
must be consistent and understood by both ends.

On 18 Feb 2005 11:03:46 -0800, <[EMAIL PROTECTED]> wrote:
hello all,
 I am new in Python. And I have got a problem about unicode.
I have got a unicode string, when I was going to send it out throuth a
socket by send(), I got an exception. How can I send the unicode string
to the remote end of the socket as it is without any conversion of
encode, so the remote end of the socket will receive unicode string?
Thanks
--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode encoding usablilty problem

2005-02-18 Thread aurora

On Fri, 18 Feb 2005 20:18:28 +0100, Walter Dörwald <[EMAIL PROTECTED]>  
wrote:

aurora wrote:
 > [...]
In Java they are distinct data type and the compiler would catch all   
incorrect usage. In Python, the interpreter seems to 'help' us to  
promote  binary string to unicode. Things works fine, unit tests pass,  
all until  the first non-ASCII characters come in and then the program  
breaks.
 Is there a scheme for Python developer to use so that they are safe  
from  incorrect mixing?
Put the following:
import sys
sys.setdefaultencoding("undefined")
in a file named sitecustomize.py somewhere in your Python path and
Python will complain whenever there's an implicit conversion between
str and unicode.
HTH,
Walter Dörwald
That helps! Running unit test caught quite a few potential problems (as  
well as a lot of safe of ASCII string promotion).
--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode encoding usablilty problem

2005-02-18 Thread aurora

On Fri, 18 Feb 2005 21:16:01 +0100, Martin v. Löwis <[EMAIL PROTECTED]>  
wrote:

I'd like to point out the
historical reason: Python predates Unicode, so the byte string type
has many convenience operations that you would only expect of
a character string.
We have come up with a transition strategy, allowing existing
libraries to widen their support from byte strings to character
strings. This isn't a simple task, so many libraries still expect
and return byte strings, when they should process character strings.
Instead of breaking the libraries right away, we have defined
a transitional mechanism, which allows to add Unicode support
to libraries as the need arises. This transition is still in
progress.
I understand. So I wasn't yelling "why can't Python be more like Java". On  
the other hand I also want to point out making individual decision for  
each string wasn't practical and is very error prone. The fact that  
unicode and 8 bit string look alike and work alike in common situation but  
only run into problem with non-ASCII is very confusing for most people.


Eventually, the primary string type should be the Unicode
string. If you are curious how far we are still off that goal,
just try running your program with the -U option.
Lots of errors. Amount them are gzip (binary?!) and strftime??
I actually quite appriciate Python's power in processing binary data as  
8-bit strings. But perhaps we should transition to use unicode as text  
string as treat binary string as exception. Right now we have

  '' - 8bit string; u'' unicode string
How about
  b'' - 8bit string; '' unicode string
and no automatic conversion. Perhaps this can be activated by something  
like the encoding declarations, so that transition can happen module by  
module.


Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode and socket

2005-02-19 Thread aurora

On 18 Feb 2005 19:10:36 -0800, <[EMAIL PROTECTED]> wrote:
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.
then, how about converting the unicode string to a binary stream? It is
possible to send a binary through socket with python?
I was answering your specific question:
"How can I send the unicode string to the remote end of the socket as it  
is without any conversion of encode"

The answer is you could not. Not that you cannot sent unicode but you have  
to encode it. The same applies to perl, c or Java. The only difference is  
the detail of how strings get encoded.

There are a few posts suggest various means. Or you can check out  
codecs.getwriter() which closer resembles Java's way.
--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode encoding usablilty problem

2005-02-20 Thread aurora

On Sat, 19 Feb 2005 18:44:27 +0100, Fredrik Lundh <[EMAIL PROTECTED]>  
wrote:

"aurora" <[EMAIL PROTECTED]> wrote:
I don't want to mix them. But how could I find them? How do I know  
this  statement can be
potential problem

  if a==b:
where a and b can be instantiated individually far away from this line  
of  code that put them
together?
if you don't know what a and b comes from, how can you be sure that
your program works at all?  how can you be sure they're both strings?
("a op b" can fail in many ways, depending on what "a", "b", and "op"  
are)

a and b are both string. The issue is 8-bit string or unicode string.

Things works fine, unit tests pass, all until the first non-ASCII  
characters
come in and then the program breaks.
if you have unit tests, why don't they include Unicode tests?

How do I structure the test cases to guarantee coverage? It is not  
practical to test every combinations of unicode/8-bit strings. Adding  
non-ascii characters to test data probably make problem pop up earlier.  
But it is arduous and it is hard to spot if you left out any.

--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode encoding usablilty problem

2005-02-20 Thread aurora

On Sun, 20 Feb 2005 15:01:09 +0100, Martin v. Löwis <[EMAIL PROTECTED]>  
wrote:

Nick Coghlan wrote:
Having "", u"", and r"" be immutable, while b"" was mutable would seem  
rather inconsistent.
Yes. However, this inconsistency might be desirable. It would, of  
course, mean that the literal cannot be a singleton. Instead, it has
to be a display (?), similar to list or dict displays: each execution
of the byte string literal creates a new object.

An alternative would be to have "bytestr" be the immutable type  
corresponding to the current str (with b"" literals producing  
bytestr's), while reserving the "bytes" name for a mutable byte  
sequence.
Indeed. This maze of options has caused the process to get stuck.
People also argue that with such an approach, we could as well
tell users to use array.array for the mutable type. But then,
people complain that it doesn't have all the library support that
strings have.
The main point being, the replacement for 'str' needs to be immutable  
or the upgrade process is going to be a serious PITA.
Somebody really needs to take this in his hands, completing the PEP,
writing a patch, checking applications to find out what breaks.
Regards,
Martin
What is the processing of getting a PEP work out? Does the work and  
discussion carry out in the python-dev mailing list? I would be glad to  
help out especially on this particular issue.
--
http://mail.python.org/mailman/listinfo/python-list

Re: running a shell command from a python program

2005-02-23 Thread aurora

In Python 2.4, use the new subprocess module for this. It subsume the  
popen* methods.

Hi,
   I'm a newbie, so please be gentle :-)
How would I run a shell command in Python?
Here is what I want to do:
I want to run a shell command that outputs some stuff, save it into a
list and do stuff with the contents of that list.
I started with a BASH script actually, until I realized I really needed
better data structures :-)
Is popen the answer? Also, where online would I get access to good
sample code that I could peruse?
I'm running 2.2.3 on Linux, and going strictly by online doc so far.
Thanks!
S C
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python and "Ajax technology collaboration"

2005-02-23 Thread aurora

It was discussed in the last Bay Area Python Interest Group meeting.
Thursday, February 10, 2005
Agenda: Developing Responsive GUI Applications Using HTML and HTTP
Speakers: Donovan Preston
http://www.baypiggies.net/
The author has a component LivePage for this. You may find it from  
http://nevow.com/. Similar idea from the Javascript stuff but very Python  
centric.


Interesting GUI developments, it seems. Anyone developed a "Ajax"
application using Python? Very curious
thx
(Ajax stands for:
XHTML and CSS;
dynamic display and interaction using the Document Object Model;
data interchange and manipulation using XML and XSLT;
asynchronous data retrieval using XMLHttpRequest;
and JavaScript binding everything together
ie Google has used these technologies to build Gmail, Google Maps etc.
more info:
http://www.adaptivepath.com/publications/essays/archives/000385.php)
--
http://mail.python.org/mailman/listinfo/python-list

Re: Unit testing - one test class/method, or test class/class

2005-02-25 Thread aurora

I do something more or less like your option b. I don't think there is any  
orthodox structure to follow. You should use a style that fit your taste.

What I really want to bring up is your might want to look at refactoring  
your module in the first place. 348 test cases for one module sounds like  
a large number. That reflects you have a fairly complex module to be  
tested to start with. Often the biggest benefit of doing automated unit  
testing is it forces the developers to modularize and decouple their code  
in order to make it testable. This action alone improve that code quality  
a lot. If breaking up the module make sense in your case, the test  
structure will follows.

Hi,
I just found py.test[1] and converted a large unit test module to py.test
format (which is actually almost-no-format-at-all, but I won't get there
now). Having 348 test cases in the module and huge test classes, I  
started
to think about splitting classes. Basically you have at least three  
obvious
choises, if you are going for consistency in your test modules:

Choise a:
Create a single test class for the whole module to be tested, whether it
contains multiple classes or not.
...I dont think this method deserves closer inspection. It's probably  
rather
poor method to begin with. With py.test where no subclassing is required
(like in Python unittest, where you have to subclass unittest.TestCase)
you'd probably be better off with just writing a test method for each  
class
and each class method in the module.

Choise b:
Create a test class for each class in the module, plus one class for any
non-class methods defined in the module.
+ Feels clean, because each test class is mapped to one class in the  
module
+ It is rather easy to find all tests for given class
+ Relatively easy to create class skeleton automatically from test module
  and the other way round

- Test classes get huge easily
- Missing test methods are not very easy to find[2]
- A test method may depend on other tests in the same class
Choise c:
Create a test class for each non-class method and class method in the  
tested
module.

+ Test classes are small, easy to find all tests for given method
+ Helps in test isolation - having separate test class for single method
  makes tested class less dependent of any other methods/classes
+ Relatively easy to create test module from existing class (but then you
  are not doing TDD!) but not vice versa
- Large number of classes results in more overhead; more typing, probably
  requires subclassing because of common test class setup methods etc.
What do you think, any important points I'm missing?
Footnotes:
[1]  In reality, this is a secret plot to advertise py.test, see
 http://codespeak.net/py/current/doc/test.html
[2] However, this problem disappears if you start with writing your tests
first: with TDD, you don't have untested methods, because you start  
by
writing the tests first, and end up with a module that passes the  
tests

--
# Edvard Majakari		Software Engineer
# PGP PUBLIC KEY available	Soli Deo Gloria!
One day, when he was naughty, Mr Bunnsy looked over the hedge into Farmer
Fred's field and it was full of fresh green lettuces. Mr Bunnsy,  
however, was
not full of lettuces. This did not seem fair.  --Mr Bunnsy has an  
adventure
--
http://mail.python.org/mailman/listinfo/python-list

decode unicode string using 'unicode_escape' codecs

2006-01-12 Thread aurora

I have some unicode string with some characters encode using python  
notation like '\n' for LF. I need to convert that to the actual LF  
character. There is a 'unicode_escape' codec that seems to suit my purpose.

>>> encoded = u'A\\nA'
>>> decoded = encoded.decode('unicode_escape')
>>> print len(decoded)
3

Note that both encoded and decoded are unicode string. I'm trying to use  
the builtin codec because I assume it has better performance that for me  
to write pure Python decoding. But I'm not converting between byte string  
and unicode string.

However it runs into problem in some cases.

encoded = u'€\\n€'
decoded = encoded.decode('unicode_escape')

Traceback (most recent call last):
   File "g:\bin\py_repos\mindretrieve\trunk\minds\x.py", line 9, in ?
 decoded = encoded.decode('unicode_escape')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in  
position 0: ordinal not in range(128)

Reading the docuemnt more carefully, I found out what has happened.  
decode('unicode_escape') takes byte string as operand and convert it into  
unicode string. Since encoded is already unicode, it is first implicitly  
converted to byte string using 'ascii' encoding. In this case it fails  
because of the '€' character.

So I resigned to the fact that 'unicode_escape' doesn't do what I want.  
But I think more deeply. I come up with this Python source code. It runs  
OK and outputs 3.

-
# -*- coding: utf-8 -*-
print len(u'€\n€')  # 3
-

Think about what happened in the second line. First the parser decodes the  
bytes into an unicode string with UTF-8 encoding. Then it applies syntax  
run to decode the unicode characters '\n' to LF. The second is what I  
want. There must be something available to the Python interpreter that is  
not available to the user. So it there something I have overlook?

Anyway I just want to leverage the builtin codecs for performance. I  
figure this would be faster than

   encoded.replace('\\n', '\n')
   ...and so on...

If there are other suggestion it would be greatly appriciated :)

wy

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: decode unicode string using 'unicode_escape' codecs

2006-01-13 Thread aurora

Cool, it works! I have also done some due diligence that the utf-8  
encoding would not introduce any Python escape accidentially. I have  
written a recipe in the Python cookbook:

Efficient character escapes decoding
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/466293

wy

> Does this do what you want?
>
>  >>> u'€\\n€'
> u'\x80\\n\x80'
>  >>> len(u'€\\n€')
> 4
>  >>> u'€\\n€'.encode('utf-8').decode('string_escape').decode('utf-8')
> u'\x80\n\x80'
>  >>>  
> len(u'€\\n€'.encode('utf-8').decode('string_escape').decode('utf-8'))
> 3
>
> Basically, I convert the unicode string to bytes, escape the bytes using  
> the 'string_escape' codec, and then convert the bytes back into a  
> unicode string.
>
> HTH,
>
> STeVe

-- 
http://mail.python.org/mailman/listinfo/python-list

ANN: HTMLTestRunner - generates HTML test report for unittest

2006-01-26 Thread aurora

Greeting,

HTMLTestRunner is an extension to the Python standard library's unittest  
module. It generates easy to use HTML test reports. See a sample report at  
http://tungwaiyip.info/software/sample_test_report.html.

Check more information and download from
http://tungwaiyip.info/software/#htmltestrunner

Wai Yip Tung
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTMLTestRunner - generates HTML test report for unittest

2006-01-27 Thread aurora

On Fri, 27 Jan 2006 06:35:46 -0800, Paul McGuire  
<[EMAIL PROTECTED]> wrote:
> Nice!  I just adapted my pyparsing unit tests to use this tool - took me
> about 3 minutes, and now it's much easier to run and review my unit test
> results.  I especially like the pass/fail color coding, and the  
> "drill-down"
> to the test output.
>
> -- Paul

Thank you! I'm glad that it is helpful to you :)
-- 
http://mail.python.org/mailman/listinfo/python-list

Problem redirecting stdin on Windows

2005-05-25 Thread aurora

On Windows (XP) with win32 extension installed, a Python script can be  
launched from the command line directly since the .py extension is  
associated with python. However it fails if the stdin is piped or  
redirected.

Assume there is an echo.py that read from stdin and echo the input.



Launching from command line directly, this echos input from keyboard:

   echo.py



But it causes an error if the stdin is redirected

   echo.py http://mail.python.org/mailman/listinfo/python-list

win32clipboard.GetClipboardData() return string with null characters

2005-05-25 Thread aurora

I was using win32clipboard.GetClipboardData() to retrieve the Windows  
clipboard using code similar to the message below:

http://groups-beta.google.com/group/comp.lang.python/msg/3722ba3afb209314?hl=en

Some how I notice the data returned includes \0 and some characters that  
shouldn't be there after the null character. It is easy enough to truncate  
them. But why does it get there in the first place? Is the data length  
somehow calculated wrong?

I'm using Windows XP SP2 with Python 2.4 and pywin32-203.

aurora
-- 
http://mail.python.org/mailman/listinfo/python-list

Design mini-lanugage for data input

2006-03-20 Thread aurora

This is an entry I just added to ASPN. It is a somewhat novel technique I  
have employed quite successfully in my code. I repost it here for more  
explosure and discussions.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/475158

wy



Title: Design mini-lanugage for data input


Description:

Many programs need a set of initial data. For ease of use and flexibility,  
design a mini-language for your input data. Use Python's superb text  
handling capability to parse and build the data structure from the input  
text.

Source: Text Source
# this is an example to demonstrate the programming technique

DATA = """
# data souce: http://www.mongabay.com/igapo/world_statistics_by_pop.htm
# Country / Captial / Area [sq. km] / 2002 Population Estimate
China / Beijing / 9,596,960 / 1,284,303,705
India / New Delhi / 3,287,590 / 1,045,845,226
United States / Washington DC / 9,629,091 / 280,562,489
Indonesia / Jakarta / 1,919,440 / 231,328,092
Russia / Moscow / 17,075,200 / 144,978,573
"""

def initData():
 """ parse and return a country list of (name, captial, area,  
population) """

 countries = []
 for line in DATA.splitlines():

 # filter out blank lines/comment lines
 line = line.strip()
 if not line or line.startswith('#'):
 continue

 # 4 fields separated by '/'
 parts = map(string.strip, line.split('/'))
 country, captial, area, population = parts

 # remove commas in numbers
 area = int(area.replace(',',''))
 population = int(population.replace(',',''))

 countries.append((country, captial, area, population))

 return countries


def findLargestCountry(countries):
 # your algorithm here


def main():
 countries = initData()
 print findLargestCountry(countries)


Discussion:

Problem
---

Many programs need a set of initial data. The simplest way is to construct  
Python data structure directly as shown below. This is often not ideal.  
Algorithm and data structure tend to change. Python program statements is  
likely differ literally from its data source, which might be text pulled  
 from web pages or other place. This means a great deal of effort is often  
needed to format and maintain the input as Python statements.

This is a sample program that initialize some geographical data.

# map of country -> (captial, area, population)
COUNTRIES = {}
COUNTRIES['China'] = ('Beijing', 9596960, 1284303705)
COUNTRIES['India'] = ('New Delhi', 3287590, 1045845226)
COUNTRIES['United States'] = ('Washington DC', 9629091, 280562489)
COUNTRIES['Indonesia'] = ('Jakarta', 1919440, 231328092)
COUNTRIES['Russia'] = ('Moscow', 17075200, 144978573)


Mini-language
-

A more flexible approach is to define a mini-lanugage to describe the  
data. This can be as simple as formatting data into a multiple-line string.

1. Define the data format in text. It should mirror the data source and  
designed for ease for human editing.

2. Define the data structure.

3. Write glue code to parse the input data and initialize the data  
structure.

In the example above we use one line for each record. Each record has four  
fields, Country, captial, area and population, separated by slashes. One  
of the immediate benefit is that we no longer need to type so many quotes  
for every string literal. This concise data format is much easiler to read  
and edit than Python statements.

The parser simply break down the input text using splitlines() and then  
loop through them line by line. It is useful to account for some extra  
white space so that it is more flexible for human editor. In this case the  
numbers (area, population) from the data source contains commas. Rather  
than manually edit them out, they are copied as is into the text as is.  
Then they are parsed into integer using

area = int(area.replace(',',''))

Slash is chosen as the separator (rather than the more common comma)  
because it does not otherwise appear in the data. A record is parsed into  
field using

line.split('/')

Don't forget to remove extra white space using string.strip()

Finally it built a data structure of list of country record as tuple of  
(country, captial, area, population). It is just as easy to turn them into  
objects or any other data structure as desired.

The mini-language technique can be refined to represent more complex, more  
structured input. It makes transformation and maintenance of input data  
much easier.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: datetime iso8601 string input

2006-03-20 Thread aurora

I agree. I just keep rewriting the parse method again and again.

wy

def parse_iso8601_date(s):
 """ Parse date in iso8601 format e.g. 2003-09-15T10:34:54 and
 returns a datetime object.
 """
 y=m=d=hh=mm=ss=0
 if len(s) not in [10,19,20]:
 raise ValueError('Invalid timestamp length - "%s"' % s)
 if s[4] != '-' or s[7] != '-':
 raise ValueError('Invalid separators - "%s"' % s)
 if len(s) > 10 and (s[13] != ':' or s[16] != ':'):
 raise ValueError('Invalid separators - "%s"' % s)
 try:
 y = int(s[0:4])
 m = int(s[5:7])
 d = int(s[8:10])
 if len(s) >= 19:
 hh = int(s[11:13])
 mm = int(s[14:16])
 ss = int(s[17:19])
 except Exception, e:
 raise ValueError('Invalid timestamp - "%s": %s' % (s, str(e)))
 return datetime(y,m,d,hh,mm,ss)


> I was a little surprised to recently discover
> that datetime has no method to input a string
> value.  PEP 321 appears does not convey much
> information, but a timbot post from a couple
> years ago clarifies things:
>
> http://tinyurl.com/epjqc
>
>> You can stop looking:  datetime doesn't
>> support any kind of conversion from string.
>> The number of bottomless pits in any datetime
>> module is unbounded, and Guido declared this
>> particular pit out-of-bounds at the start so
>> that there was a fighting chance to get
>> *anything* done for 2.3.
>
> I can understand why datetime can't handle
> arbitrary string inputs, but why not just
> simple iso8601 format -- i.e. the default
> output format for datetime?
>
> Given a datetime-generated string:
>
>   >>> now = str(datetime.datetime.now())
>   >>> print now
>   '2006-02-23 11:03:36.762172'
>
> Why can't we have a function to accept it
> as string input and return a datetime object?
>
>   datetime.parse_iso8601(now)
>
> Jeff Bauer
> Rubicon, Inc.
>

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Design mini-lanugage for data input

2006-03-21 Thread aurora

Yes. But they have different motivations.

The mini-language concept is to design an input format that is convenient  
for human editor and that is close to the semi-structured data source. I  
think the benefit from ease of editing and flexibility would justify  
writing a little parsing code.

JSON is mainly designed for data exchange between programs. You can hand  
edit JSON data (as well as XML or Python statement) but it is not the most  
convenient.

Just consider you don't have to enter two quotes for every string object  
is almost liberating. These quotes are only artifacts for structured data  
format. The idea to design a format convenient for human and let code to  
parse and built the data structure.

wy



> Hmm,
> Do you know about JSON and YAML?
>   http://en.wikipedia.org/wiki/JSON
>   http://en.wikipedia.org/wiki/YAML
>
> They have the advantage of being maintained by a group of people and
> being available for a number of languages. (as well as NOT being XML
> :-)
>
> - Cheers, Paddy.
> --
> http://paddy3118.blogspot.com/
>

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Design mini-lanugage for data input

2006-03-21 Thread aurora

P.S. Also it is a 'mini-language' because it is an ad-hoc design that is  
good enough and can be easily implemented for a given problem. This is  
oppose to a general purpose solution like XML that is one translation from  
the original data format and carries too much baggages.

> Just consider you don't have to enter two quotes for every string object  
> is almost liberating. These quotes are only artifacts for structured  
> data format. The idea to design a format convenient for human and let  
> code to parse and built the data structure.
>
> wy
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode question : turn "José" into u"José"

2006-04-05 Thread aurora

First of all, if you run this on the console, find out your console's  
encoding. In my case it is English Windows XP. It uses 'cp437'.

C:\>chcp
Active code page: 437

Then

>>> s = "José"
>>> u = u"Jos\u00e9" # same thing in unicode escape
>>> s.decode('cp437') == u   # use encoding that match your console
True
>>>

wy




> This is probably stupid and/or misguided but supposing I'm passed a  
> byte-string value that I want to be unicode, this is what I do. I'm sure  
> I'm missing something very important.
>
> Short version :
>
 s = "José" #Start with non-unicode string
 unicoded = eval("u'%s'" % "José")
>
> Long version :
>
 s = "José" #Start with non-unicode string
 s  #Lets look at it
> 'Jos\xe9'
 escaped = s.encode('string_escape')
 escaped
> 'Jos\\xe9'
 unicoded = eval("u'%s'" % escaped)
 unicoded
> u'Jos\xe9'
>
 test = u"José"   #What they should have passed me
 test == unicoded #Am I really getting the same thing?
> True #Yay!
>
>
>
>

-- 
http://mail.python.org/mailman/listinfo/python-list

46 matches

Mail list logo