On Sun, Nov 8, 2009 at 9:38 PM, Alan Harris-Reid <a...@baselinedata.co.uk> wrote: > In the Python.org 3.1 documentation (section 20.4.6), there is a simple > “Hello World” WSGI application which includes the following method... > > def hello_world_app(environ, start_response): > status = b'200 OK' # HTTP Status > headers = [(b'Content-type', b'text/plain; charset=utf-8')] # HTTP Headers > start_response(status, headers) > > # The returned object is going to be printed > return [b"Hello World"] > > Question - Can anyone tell me why the 'b' prefix is present before each > string? The method seems to work equally well with and without the prefix. > From what I can gather from the documentation the b prefix represents a > bytes literal, but can anyone explain (in simple english) what this means? > > Many thanks, > Alan
The rather long version: read http://www.joelonsoftware.com/articles/Unicode.html A somewhat shorter summary, along with how Python deals with this: Once upon a time, someone decided to allocate 1 byte for each character. Since everything the Americans who made the computers needed fit into 7 bits, this was alright. And they called this the American Standard Code for Information Interchange (ASCII). When computers came along, device manufacturers realized that they had 128 characters that didn't mean anything, so they all made their own characters to show for the upper 128. And when they started selling computers internationally, they used the upper 128 to store the characters they needed for the local language. This had several problems. 1) Files made by on one computer in one country wouldn't display right in a computer made by a different manufacturer or for a different country 2) The 256 characters were enough for most Western languages, but Chinese and Japanese need a whole lot more. To solve this problem, Unicode was created. Rather than thinking of each character as a distinct set of bits, it just assigns a number to each one (a code point). The bottom 128 characters are the original ASCII set, and everything else you could think of was added on top of that - other alphabets, mathematical symbols, music notes, cuneiform, dominos, mah jong tiles, and more. Unicode is harder to implement than a simple byte array, but it means strings are universal- every program will interpret them exactly the same. Unicode strings in python are the default ('') in Python 3.x and created in 2.x by putting a u in front of the string declaration (u'') Unicode, however, is a concept, and concepts can't be mapped to bits that can be sent through the network or stored on the hard drive. So instead we deal with strings internally as Unicode and then give them an encoding when we send them back out. Some encodings, such as UTF-8, can have multiple bytes per character and, as such, can deal with the full range of Unicode characters. Other times, programs still expect the old 8-bit encodings like ISO-8859-1 or the Windows Ansi code pages. In Python, to declare that the string is a literal set of bytes and the program should not try and interpret it, you use b'' in Python 3.x, or just declare it normally in Python 2.x (''). ------------------------------------------------------ What happens in your program: When you print a Unicode string, Python has to decide what encoding to use. If you're printing to a terminal, Python looks for the terminal's encoding and uses that. In the event that it doesn't know what encoding to use, Python defaults to ASCII because that's compatible with almost everything. Since the string you're sending to the web page only contains ASCII characters, the automatic conversion works fine if you don't specify the b''. Since the resulting page uses UTF-8 (which you declare in the header), which is compatible with ASCII, the output looks fine. If you try sending a string that has non-ASCII characters, the program might throw a UnicodeEncodeError because it doesn't know what bytes to use for those characters. It may be able to guess, but since I haven't used WSGI directly before, I can't say for sure. -- http://mail.python.org/mailman/listinfo/python-list