2010/8/3 Νίκος <nikos.the.gr...@gmail.com>: >>On 3 Αύγ, 21:00, Dave Angel <da...@ieee.org> wrote: > >> A string is an object containing characters. A string literal is one of >> the ways you create such an object. When you create it that way, you >> need to make sure the compiler knows the correct encoding, by using the >> encoding: line at beginning of file. > > > mymessage = "καλημέρα" <==== string > mymessage = u"καλημέρα" <==== string literal?
Not quite. A literal is the actual string in the file, those letters between the quotes: "καλημέρα" <=== String literal (a literal value of the string/str type) u"καλημέρα" <=== Unicode literal (a literal value of the Unicode type. The bytes on the page will be converted to unicode using the file's encoding) mymessage <==== String (not literal, because it's a value) > > So, a string literal is one of the encodings i use to create a string > object? > > Can the encodign of a python script file be in iso-8859-7 which means > the file contents is saved to the hdd as greek-iso but the part of > this variabel value mymessage = u"καλημέρα" is saved as utf-8 ot the > opposite? > The compiler does not see u"καλημέρα" on the page. All it sees is the bytes ['0x75', '0x22', '0xea', '0xe1', '0xeb', '0xe7', '0xec', '0xdd', '0xf1', '0xe1', '0x22'] Now the compiler knows that the sequence 0x75 0x22 (Stuff) 0x22 means to create a Unicode literal. So it takes those bytes ('0xea', '0xe1', '0xeb', '0xe7', '0xec', '0xdd', '0xf1', '0xe1') and decodes them using the pages encoding, in your case ISO-8859-7. At this point, they don't have an encoding. They aren't bytes as far as you are concerned, they are code points. Internally, they're stored as either UTF-16 or UTF-32 depending on how Python was compiled, but that doesn't matter. You can treat them as if they are characters. > have the file saved as utf-8 but one variuable value as greek > encoding? > Sure you can. A unicode literal will always have the encoding of the file. But a string is just a sequence of bytes (forget about the characters that show up on the page for now). If you do "\xce\xba\xce\xb1\xce\xbb\xce\xb7\xce\xbc\xce\xad\xcf\x81\xce\xb1".encode('UTF-8') Then Python will take that sequence of bytes and interpret them as UTF-8. That will give you the same Unicode string you started out with: u"καλημέρα" > Encodings still give me headaches. I try to understand them as > different ways to store data in a media. > > Tell me something. What encoding should i pick for my scripts knowing > that only contain english + greek chars?? > iso-8859-7 or utf-8 and why? > > Can i save the sting lets say "Νίκος" in different encodings and still > print out correctly in browser? > > ascii = the standard english character set only, right? > Yes. >> The web server wraps a few characters before and after your html stream, >> but it shouldn't touch the stream itself. > > So the pythoon compiler using the cgi module is the one that is > producing the html output that immediately after send to the web > server, right? > > >> > For example if i say mymessage = "καλημέρα" and the i say mymessage = >> > u"καλημέρα" then the 1st one is a greek encoding variable while the >> > 2nd its a utf-8 one? No. They both are in whatever encoding your file is using. But the first one will be interpreted as a sequence of bytes. the second one will be interpreted as a sequence of characters. For a single-byte encoding like ISO-8859-7, it doesn't make a difference. But if you were to encode it in UTF-8, the first one would have a length of 16 (because the Greek characters are all 2 bytes) and the 2nd one would have a length of 8. >> >> No, the first is an 8 bit copy of whatever bytes your editor happened to >> save. > > But since mymessage = "καλημέρα" is a string containing greek > characaters why the editor doesn't save it as such? > Because you don't save characters, you save bytes. \xce\xba\xce\xb1\xce\xbb\xce\xb7\xce\xbc\xce\xad\xcf\x81\xce\xb1 is your String in UTF-8 \xea\xe1\xeb\xe7\xec\xdd\xf1\xe1 is that exact same string in ISO-8859-7 They are two different ways of representing the same characters > It reminds me of varibles an valeus where if you say > > a = 5 , a var becomes instantly an integer variable > while > a = 'hello' , become instantly a string variable > > >> mymessage = u"καλημέρα" >> >> creates an object that is *not* encoded. > > Because it isn't saved by the editor yet? In what satet is this object > in before it gets encoded? > And it egts encoded the minute i tell the editor to save the file? > >> Encoding is taking the unicode >> stream and representing it as a stream of bytes, which may or may have >> more bytes than the original has characters. > > > So this line mymessage = u"καλημέρα" what it does is tell the browser > thats when its time to save the whole file to save this string as > utf-8? > > If yes, then if were to save the above string as greek encoding how > was i suppose to right it? > > Also if u ise the 'coding line' in the beggining of the file is there > a need for using the u literal? > >> I personally haven't done any cookie code. If I were debugging this, I'd >> factor out the multiple parts of that if statement, and find out which >> one isn't true. From here I can't guess. > > I did what you say and foudn out that both of the if condition parts > were always false thast why the if code blck never got executed. > > And it is alwsy wrong because the cookie never gets set. > > So can you please tell me why this line > > cookie['visitor'] = ( 'nikos', time() + 60*60*24*365 ) #this cookie > will expire in an year > > never created a cookie? > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list