Unicode in cgi-script with apache2
Hi, I've got a little script: #!/usr/bin/env python3 print("Content-Type: text/html") print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1 print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past print("") f = open("/var/www/cgi-data/index.html", "r") for line in f: print(line,end='') If I run the script in the terminal, it nicely prints the webpage 'index.html'. If access the script through a webbrowser, apache gives an error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1791: ordinal not in range(128) I've done a hole afternoon of reading on fora and blogs, I don't have a solution. Can anyone help me? Greetings, Dominique. -- https://mail.python.org/mailman/listinfo/python-list
Re: Unicode in cgi-script with apache2
I fond my problem, I will describe it more at the bottom of this message... But first... Thanks Alister for the tips: 1) This evening, I've researched WSGI. I found that WSGI is more advanced than CGI and I also think WSGI is more the Python way. I'm an amateur playing around with my imagination on a small virtual server (online cloudserver.ramaekers-stassart.be). I'm trying to build something rather specific. I also like to make things as basic as possible. My first thought was not to use a framework. This because with a framework I didn't really know what the code is doing. For a framework, for me, would be a black-box. But after inspecting WSGI, I got the idea not to make it myself more difficult than it has to be. I will work with a framework and I think I'll put my chances on Falcon (for it's speed, small size and it doesn't seem to difficult)... There are a lot of frameworks, so if someone wants to point me to an other framework, I'm open to suggestions... 2) Your tip, to use 'encode' did not solve the problem and created a new one. My lines were incapsulted in quotes and I got a lot of \b's and \n's... and I still got the same error. 3) I didn't got the message from JMF, so... What seems to be the problem: My Script was ok. I know this because in the terminal I got my expected output. Python3 uses UTF-8 coding as a standard. The problem is, when python 'prints' to the apache interface, it translates the string to ascii. (Why, I never found an answer). Somewhere in the middle of my index.html file, there are letters like ë and ü. If Python tries to translate these, Python throws an error. If I delete these letters in the file, the script works perfectly in a browser! In Python2.7 the script can easily be tweaked so the translation to ascii isn't done, but in Python3, its a real pain in the a... I've read about people who managed to force Python3 to 'print' to apache in UTF-8, but none of their solutions worked for me. I think the programmers of Python doesn't want to focus on Python + apache + CGI (I think it only happens with apache and not with an other http-server). I don't think they do this intentional but I guess they assume that if you use Python to make a web-application, you also use mod_wsgi or mod_python (in apache)... So I'll use wsgi, It's a little more work but it seems really neat... grtz Op 15-08-14 om 21:27 schreef alister: On Fri, 15 Aug 2014 20:10:25 +0200, Dominique Ramaekers wrote: Hi, I've got a little script: #!/usr/bin/env python3 print("Content-Type: text/html") print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1 print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past print("") f = open("/var/www/cgi-data/index.html", "r") for line in f: print(line,end='') If I run the script in the terminal, it nicely prints the webpage 'index.html'. If access the script through a webbrowser, apache gives an error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1791: ordinal not in range(128) I've done a hole afternoon of reading on fora and blogs, I don't have a solution. Can anyone help me? Greetings, Dominique. 1) this is not the way to get python to generate a web page, if you dont want to use an existing framework (for example if you are doing this ans an educational exercise) i suggest to google SWGI 2) you need to encode your output strings into a format apache/html protocols can support - UTF8 is probably best here. change your pint function to print(line.encode('utf'),end='') 3) Ignore any subsequent advice from JMF even when he is trying to help he is invariable wrong. -- https://mail.python.org/mailman/listinfo/python-list
Re: Unicode in cgi-script with apache2
Hi John, The error is in the line "print(line,end='')"... and it only happens when the script is started from a webbrowser. In the terminal, the script works fine. See my previous mail for my findings after a lot of reading and trying... grz Op 15-08-14 om 21:32 schreef John Gordon: In Dominique Ramaekers writes: #!/usr/bin/env python3 print("Content-Type: text/html") print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1 print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past print("") f = open("/var/www/cgi-data/index.html", "r") for line in f: print(line,end='') If access the script through a webbrowser, apache gives an error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1791: ordinal not in range(128) The error traceback should display exactly where the error occurs within the script. Which line is it? -- https://mail.python.org/mailman/listinfo/python-list
Re: Unicode in cgi-script with apache2
Hi Peter, Your code seems interesting. I've tried using sys.stdout (in a slightly different form) but it gave the same error. I also read about people who fixed the error by changing the servers locale to en_US.UTF-8. The people who posted these fixes also said that you can only use en_US.UTF-8 (and not ex. nl_BE.UTF8)... Anyway, It didn't work for me. And I find this a dirty fix because, I don't want to use US locale... Please excuse me not to try out your specific solutions. I've already started to implement WSGI over CGI. See my previous message... grz Op 16-08-14 om 13:17 schreef Peter Otten: Dominique Ramaekers wrote: I've got a little script: #!/usr/bin/env python3 print("Content-Type: text/html") print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1 print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past print("") f = open("/var/www/cgi-data/index.html", "r") for line in f: print(line,end='') If I run the script in the terminal, it nicely prints the webpage 'index.html'. If access the script through a webbrowser, apache gives an error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1791: ordinal not in range(128) I've done a hole afternoon of reading on fora and blogs, I don't have a solution. Can anyone help me? If the input and output encoding are the same you can avoid the byte-to-text (and subsequent text-to-byte conversion) and serve the binary contents of the index.html file directly: #!/usr/bin/env python3 import sys print("Content-Type: text/html") print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1 print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past print("") sys.stdout.flush() with open("/var/www/cgi-data/index.html", "rb") as f: for line in f: sys.stdout.buffer.write(line) The flush() is necessary to write pending data before accessing the lowlevel stdout.buffer. Instead of the loop you can use any of these: sys.stdout.buffer.write(f.read()) # not for huge files, but should be OK for # typical html file sizes sys.stdout.buffer.writelines(f) shutil.copyfileobj(f, sys.stdout.buffer) # show off your knowledge # of the stdlib ;) Alternatively you could choose an encoding via the locale: #!/usr/bin/env python3 import locale locale.setlocale(locale.LC_ALL, "en_US.UTF-8") print("Content-Type: text/html") print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1 print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past print("") with open("/var/www/cgi-data/index.html") as f: for line in f: print(line, end='') Python should then use UTF-8 as the default for i/o and the resulting scripts looks more familiar. -- https://mail.python.org/mailman/listinfo/python-list
Re: Unicode in cgi-script with apache2
Hi Denis, This error is a python error displayed in the apache error log. The complete message is: [Sat Aug 16 23:12:42.158326 2014] [cgi:error] [pid 29327] [client 119.63.193.196:0] AH01215: Traceback (most recent call last): [Sat Aug 16 23:12:42.158451 2014] [cgi:error] [pid 29327] [client 119.63.193.196:0] AH01215: File "/var/www/cgi-python/index.html", line 12, in [Sat Aug 16 23:12:42.158473 2014] [cgi:error] [pid 29327] [client 119.63.193.196:0] AH01215: for line in f: [Sat Aug 16 23:12:42.158526 2014] [cgi:error] [pid 29327] [client 119.63.193.196:0] AH01215: File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode [Sat Aug 16 23:12:42.158569 2014] [cgi:error] [pid 29327] [client 119.63.193.196:0] AH01215: return codecs.ascii_decode(input, self.errors)[0] [Sat Aug 16 23:12:42.158663 2014] [cgi:error] [pid 29327] [client 119.63.193.196:0] AH01215: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1791: ordinal not in range(128) If I access the file index.html directly from the brower, It renders fine... I've done a lot of testing. I put my findings in a previous message. Thanks anyway. grz Op 16-08-14 om 18:40 schreef Denis McMahon: On Fri, 15 Aug 2014 20:10:25 +0200, Dominique Ramaekers wrote: #!/usr/bin/env python3 print("Content-Type: text/html") print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1 print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past print("") f = open("/var/www/cgi-data/index.html", "r") for line in f: print(line,end='') If I run the script in the terminal, it nicely prints the webpage 'index.html'. If access the script through a webbrowser, apache gives an error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1791: ordinal not in range(128) Is this a message appearing in the apache error log or in the browser? If it is appearing in the browser, this is probably apache passing through a python error message. Is this the complete error message? What happens when you try and access http://[server]/cgi-data/index.html directly in a web browser? You may need to copy the file to a different directory to do this depending on the apache configuration. -- https://mail.python.org/mailman/listinfo/python-list
Re: Unicode in cgi-script with apache2
* My system is a linux-box. * I've tried using encoding="utf-8". It didn't fix things. * That print uses sys.stdout would explain, using sys.stdout isn't better. * My locale and the system-wide locale is UTF-8. Using SetEnv PYTHONIOENCODING utf-8 didn't fix things * The file is encoded UTF-8... I can not speak for anybody else but in my search I don't believe to have read about someone who had the problem on a Windows-system. They all used linux (different kinds of flavors) or OS-X... This is the first time I've encountered a situation where Windows is better in encoding issues :P +1 for Microsoft... I think that Apache (*nix versions) doesn't tell Python, she's accepting UTF-8. Or Python doesn't listen right... Maybe I should place a bug report in both projects? Op 17-08-14 om 04:50 schreef Denis McMahon: On Sun, 17 Aug 2014 00:36:14 +0200, Dominique Ramaekers wrote: What seems to be the problem: My Script was ok. I know this because in the terminal I got my expected output. Python3 uses UTF-8 coding as a standard. The problem is, when python 'prints' to the apache interface, it translates the string to ascii. (Why, I never found an answer). Is the apache server running on a linux or a windows platform? The problem may not be python, it may be the underlying OS. I wonder if apache is spawning a process for python though, and if so whether it is in some way constraining the character set available to stdout of the spawned process. From your other message, the error appears to be a python error on reading the input file. For some reason python seems to be trying to interpret the file it is reading as ascii. I wonder if specifying the binary data parameter and / or utf-8 encoding when opening the file might help. eg: f = open( "/var/www/cgi-data/index.html", "rb" ) f = open( "/var/www/cgi-data/index.html", "rb", encoding="utf-8" ) f = open( "/var/www/cgi-data/index.html", "r", encoding="utf-8" ) I've managed to drive down a bit further in the problem: print() goes to sys.stdout This is part of what the docs say about sys.stdout: """ The character encoding is platform-dependent. Under Windows, if the stream is interactive (that is, if its isatty() method returns True), the console codepage is used, otherwise the ANSI code page. Under other platforms, the locale encoding is used (see locale.getpreferredencoding ()). Under all platforms though, you can override this value by setting the PYTHONIOENCODING environment variable before starting Python. """ At this point, details of the OS become very significant. If your server is running on a windows platform you may need to figure out how to make apache set the PYTHONIOENCODING environment variable to "utf-8" (or whatever else is appropriate) before calling the python script. I believe that the following line in your httpd.conf may have the required effect. SetEnv PYTHONIOENCODING utf-8 Of course, if the file is not encoded as utf-8, but rather something else, then use that as the encoding in the above suggestions. If the server is not running windows, then I'm not sure where the problem might be. -- https://mail.python.org/mailman/listinfo/python-list
Re: Unicode in cgi-script with apache2
Wow, everybody keeps on chewing on this problem. As a bonus, I've reconfigured my server to do some testings. http://cloudserver.ramaekers-stassart.be/test.html => is the file I want to read. Going to this url displays the file... http://cloudserver.ramaekers-stassart.be/cgi-python/encoding1 => is the cgi-script of this test http://cloudserver.ramaekers-stassart.be/wsgi => is the wsgi sollution (but for now it just says 'Hello world'...) This configuration- dominique@cloudserver:/var/www/cgi-python$ cat /etc/default/locale LANG="en_US.UTF-8" LANGUAGE="en_US:" dominique@cloudserver:/var/www/cgi-python$ cat /etc/apache2/sites-enabled/000-default.conf ServerAdmin domini...@ramaekers-stassart.be WSGIScriptAlias /wsgi /var/www/wsgi/application Order allow,deny Allow from all DocumentRoot /var/www/html ScriptAlias /cgi-python /var/www/cgi-python/ Options ExecCGI SetHandler cgi-script ErrorLog ${APACHE_LOG_DIR}/error.log CustomLog ${APACHE_LOG_DIR}/access.log combined dominique@cloudserver:/var/www/cgi-python$ cat encoding1 #!/usr/bin/env python3 print("Content-Type: text/html") print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1 print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past print("") f = open("/var/www/html/test.html", "r") for line in f: print(line,end='') dominique@cloudserver:/var/www/cgi-python$ cat ../html/test.html Testing my cgi... Ok, Testing my cgi... Lets try some characters: é ë ü dominique@cloudserver:/var/www/cgi-python$ file ../html/test.html ../html/test.html: HTML document, UTF-8 Unicode text -Start test-- In brower: http://cloudserver.ramaekers-stassart.be/test.html => page displays ok (try it yourself...) In terminal: => all go's wel dominique@cloudserver:/var/www/cgi-python$ ./encoding1 Content-Type: text/html Cache-Control: no-cache, must-revalidate Expires: Sat, 26 Jul 1997 05:00:00 GMT Testing my cgi... Ok, Testing my cgi... Lets try some characters: é ë ü In the browser (firefox): http://cloudserver.ramaekers-stassart.be/cgi-python/encoding1 => gives a blank page! The error log says: root@cloudserver:~# cat /var/log/apache2/error.log | tail -n 6 [Sun Aug 17 11:09:21.102003 2014] [cgi:error] [pid 32146] [client 84.194.120.161:36707] AH01215: Traceback (most recent call last): [Sun Aug 17 11:09:21.102129 2014] [cgi:error] [pid 32146] [client 84.194.120.161:36707] AH01215: File "/var/www/cgi-python/encoding1", line 7, in [Sun Aug 17 11:09:21.102149 2014] [cgi:error] [pid 32146] [client 84.194.120.161:36707] AH01215: for line in f: [Sun Aug 17 11:09:21.102201 2014] [cgi:error] [pid 32146] [client 84.194.120.161:36707] AH01215: File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode [Sun Aug 17 11:09:21.102243 2014] [cgi:error] [pid 32146] [client 84.194.120.161:36707] AH01215: return codecs.ascii_decode(input, self.errors)[0] [Sun Aug 17 11:09:21.102318 2014] [cgi:error] [pid 32146] [client 84.194.120.161:36707] AH01215: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 162: ordinal not in range(128) --Conclusion- In my current configuration, the bug is recreated!!! ---Test 2: new configuration- I change the line f = open("/var/www/html/test.html", "r") into f = open("/var/www/html/test.html", "r", encoding="utf-8") and save the script as encoding2 In the terminal: => All ok In the browser: => blank page!!! Error log in apache: root@cloudserver:~# cat /var/log/apache2/error.log | tail -n 4 [Sun Aug 17 11:13:47.372353 2014] [cgi:error] [pid 32147] [client 84.194.120.161:36711] AH01215: Traceback (most recent call last): [Sun Aug 17 11:13:47.372461 2014] [cgi:error] [pid 32147] [client 84.194.120.161:36711] AH01215: File "/var/www/cgi-python/encoding2", line 8, in [Sun Aug 17 11:13:47.372483 2014] [cgi:error] [pid 32147] [client 84.194.120.161:36711] AH01215: print(line,end='') [Sun Aug 17 11:13:47.372572 2014] [cgi:error] [pid 32147] [client 84.194.120.161:36711] AH01215: UnicodeEncodeError: 'ascii' codec can't encode character '\\xe9' in position 51: ordinal not in range(128) -Conclusion-- Steven was right. It was a read error => with encoding2 script the file is read in UTF-8. Dough, I find it strange. The file is in UTF-8 and Python3 has UTF-8 as standard. But reading the file is fixed. Now the writing is still broken Here are some tests hinted before: Tip from Steven => getting the encoding: dominique@cloudserver:/var/www/cgi-python$ cat readencoding #!/usr/bin/env python3 import sys print("Content-Type: text/html") print("") print(sys.getfilesystemencoding()) Gives in the terminal: utf-8 Gives in the browes: ascii Found the problem! No
Re: Unicode in cgi-script with apache2
Yes, even a restart not just reload. I Also put it in the section as in the main apache2.conf Op 17-08-14 om 13:04 schreef Peter Otten: Dominique Ramaekers wrote: Putting the lines in my apache config: AddDefaultCharset UTF-8 SetEnv PYTHONIOENCODING utf-8 Cleared my brower-cache... No change. Did you restart the apache? -- https://mail.python.org/mailman/listinfo/python-list
Re: Unicode in cgi-script with apache2
As I suspected, if I check the used encoding in wsgi I get: ANSI_X3.4-1968 I found you can define the coding of the script with a special comment: # -*- coding: utf-8 -*- Now I don't get an error but my special chars still doesn't display well. The script: # -*- coding: utf-8 -*- import sys def application(environ, start_response): status = '200 OK' output = 'Hello World! é ü à ũ' #output = sys.getfilesystemencoding() #1 response_headers = [('Content-type', 'text/plain'), ('Content-Length', str(len(output)))] start_response(status, response_headers) return [output] Gives in the browser as output: Hello World! é ü à ũ And if I check the encoding with the python script (uncommenting line #1), I still get ANSI_X3.4-1968 This is really getting on my nerves. Op 17-08-14 om 13:04 schreef Peter Otten: Dominique Ramaekers wrote: Putting the lines in my apache config: AddDefaultCharset UTF-8 SetEnv PYTHONIOENCODING utf-8 Cleared my brower-cache... No change. Did you restart the apache? -- https://mail.python.org/mailman/listinfo/python-list