Unicode in cgi-script with apache2

2014-08-15 Thread Dominique Ramaekers

Hi,

I've got a little script:

#!/usr/bin/env python3
print("Content-Type: text/html")
print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1
print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
print("")
f = open("/var/www/cgi-data/index.html", "r")
for line in f:
print(line,end='')

If I run the script in the terminal, it nicely prints the webpage 
'index.html'.


If access the script through a webbrowser, apache gives an error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 
1791: ordinal not in range(128)


I've done a hole afternoon of reading on fora and blogs, I don't have a 
solution.


Can anyone help me?

Greetings,

Dominique.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode in cgi-script with apache2

2014-08-16 Thread Dominique Ramaekers

I fond my problem, I will describe it more at the bottom of this message...

But first...

Thanks Alister for the tips:
1) This evening, I've researched WSGI. I found that WSGI is more 
advanced than CGI and I also think WSGI is more the Python way. I'm an 
amateur playing around with my imagination on a small virtual server 
(online cloudserver.ramaekers-stassart.be). I'm trying to build 
something rather specific. I also like to make things as basic as 
possible. My first thought was not to use a framework. This because with 
a framework I didn't really know what the code is doing. For a 
framework, for me, would be a black-box. But after inspecting WSGI, I 
got the idea not to make it myself more difficult than it has to be. I 
will work with a framework and I think I'll put my chances on Falcon 
(for it's speed, small size and it doesn't seem to difficult)... There 
are a lot of frameworks, so if someone wants to point me to an other 
framework, I'm open to suggestions...


2) Your tip, to use 'encode' did not solve the problem and created a new 
one. My lines were incapsulted in quotes and I got a lot of \b's and 
\n's... and I still got the same error.


3) I didn't got the message from JMF, so...

What seems to be the problem:
My Script was ok. I know this because in the terminal I got my expected 
output. Python3 uses UTF-8 coding as a standard. The problem is, when 
python 'prints' to the apache interface, it translates the string to 
ascii. (Why, I never found an answer). Somewhere in the middle of my 
index.html file, there are letters like ë and ü. If Python tries to 
translate these, Python throws an error. If I delete these letters in 
the file, the script works perfectly in a browser! In Python2.7 the 
script can easily be tweaked so the translation to ascii isn't done, but 
in Python3, its a real pain in the a... I've read about people who 
managed to force Python3 to 'print' to apache in UTF-8, but none of 
their solutions worked for me.
I think the programmers of Python doesn't want to focus on Python + 
apache + CGI (I think it only happens with apache and not with an other 
http-server). I don't think they do this intentional but I guess they 
assume that if you use Python to make a web-application, you also use 
mod_wsgi or mod_python (in apache)...

So I'll use wsgi, It's a little more work but it seems really neat...

grtz


Op 15-08-14 om 21:27 schreef alister:

On Fri, 15 Aug 2014 20:10:25 +0200, Dominique Ramaekers wrote:


Hi,

I've got a little script:

#!/usr/bin/env python3 print("Content-Type: text/html")
print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1
print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
print("")
f = open("/var/www/cgi-data/index.html", "r")
for line in f:
  print(line,end='')

If I run the script in the terminal, it nicely prints the webpage
'index.html'.

If access the script through a webbrowser, apache gives an error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
1791: ordinal not in range(128)

I've done a hole afternoon of reading on fora and blogs, I don't have a
solution.

Can anyone help me?

Greetings,

Dominique.

1) this is not the way to get python to generate a web page, if you dont
want to use an existing framework (for example if you are doing this ans
an educational exercise) i suggest to google SWGI

2) you need to encode your output strings  into a format apache/html
protocols can support - UTF8 is probably best here.
change your pint function to
print(line.encode('utf'),end='')


3) Ignore any subsequent advice from JMF even when he is trying to help
he is invariable wrong.
  



--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode in cgi-script with apache2

2014-08-16 Thread Dominique Ramaekers

Hi John,

The error is in the line "print(line,end='')"... and it only happens 
when the script is started from a webbrowser. In the terminal, the 
script works fine.

See my previous mail for my findings after a lot of reading and trying...

grz



Op 15-08-14 om 21:32 schreef John Gordon:

In  Dominique Ramaekers 
 writes:


#!/usr/bin/env python3
print("Content-Type: text/html")
print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1
print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
print("")
f = open("/var/www/cgi-data/index.html", "r")
for line in f:
  print(line,end='')
If access the script through a webbrowser, apache gives an error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
1791: ordinal not in range(128)

The error traceback should display exactly where the error occurs within
the script.  Which line is it?



--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode in cgi-script with apache2

2014-08-16 Thread Dominique Ramaekers

Hi Peter,

Your code seems interesting.

I've tried using sys.stdout (in a slightly different form) but it gave 
the same error.


I also read about people who fixed the error by changing the servers 
locale to en_US.UTF-8. The people who posted these fixes also said that 
you can only use en_US.UTF-8 (and not ex. nl_BE.UTF8)... Anyway, It 
didn't work for me. And I find this a dirty fix because, I don't want to 
use US locale...


Please excuse me not to try out your specific solutions. I've already 
started to implement WSGI over CGI. See my previous message...


grz

Op 16-08-14 om 13:17 schreef Peter Otten:

Dominique Ramaekers wrote:


I've got a little script:

#!/usr/bin/env python3
print("Content-Type: text/html")
print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1
print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
print("")
f = open("/var/www/cgi-data/index.html", "r")
for line in f:
  print(line,end='')

If I run the script in the terminal, it nicely prints the webpage
'index.html'.

If access the script through a webbrowser, apache gives an error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
1791: ordinal not in range(128)

I've done a hole afternoon of reading on fora and blogs, I don't have a
solution.

Can anyone help me?

If the input and output encoding are the same you can avoid the byte-to-text
(and subsequent text-to-byte conversion) and serve the binary contents of
the index.html file directly:

#!/usr/bin/env python3
import sys

print("Content-Type: text/html")
print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1
print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
print("")
sys.stdout.flush()
with open("/var/www/cgi-data/index.html", "rb") as f:
 for line in f:
 sys.stdout.buffer.write(line)

The flush() is necessary to write pending data before accessing the lowlevel
stdout.buffer. Instead of the loop you can use any of these:

sys.stdout.buffer.write(f.read()) # not for huge files, but should be OK for
   # typical html file sizes
sys.stdout.buffer.writelines(f)
shutil.copyfileobj(f, sys.stdout.buffer) # show off your knowledge
  # of the stdlib ;)


Alternatively you could choose an encoding via the locale:

#!/usr/bin/env python3
import locale
locale.setlocale(locale.LC_ALL, "en_US.UTF-8")

print("Content-Type: text/html")
print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1
print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
print("")
with open("/var/www/cgi-data/index.html") as f:
 for line in f:
 print(line, end='')

Python should then use UTF-8 as the default for i/o and the resulting
scripts looks more familiar.



--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode in cgi-script with apache2

2014-08-16 Thread Dominique Ramaekers

Hi Denis,

This error is a python error displayed in the apache error log. The 
complete message is:
[Sat Aug 16 23:12:42.158326 2014] [cgi:error] [pid 29327] [client 
119.63.193.196:0] AH01215: Traceback (most recent call last):
[Sat Aug 16 23:12:42.158451 2014] [cgi:error] [pid 29327] [client 
119.63.193.196:0] AH01215:   File "/var/www/cgi-python/index.html", 
line 12, in 
[Sat Aug 16 23:12:42.158473 2014] [cgi:error] [pid 29327] [client 
119.63.193.196:0] AH01215: for line in f:
[Sat Aug 16 23:12:42.158526 2014] [cgi:error] [pid 29327] [client 
119.63.193.196:0] AH01215:   File 
"/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
[Sat Aug 16 23:12:42.158569 2014] [cgi:error] [pid 29327] [client 
119.63.193.196:0] AH01215: return codecs.ascii_decode(input, 
self.errors)[0]
[Sat Aug 16 23:12:42.158663 2014] [cgi:error] [pid 29327] [client 
119.63.193.196:0] AH01215: UnicodeDecodeError: 'ascii' codec can't 
decode byte 0xc3 in position 1791: ordinal not in range(128)


If I access the file index.html directly from the brower, It renders fine...

I've done a lot of testing. I put my findings in a previous message.

Thanks anyway.

grz

Op 16-08-14 om 18:40 schreef Denis McMahon:

On Fri, 15 Aug 2014 20:10:25 +0200, Dominique Ramaekers wrote:


#!/usr/bin/env python3
print("Content-Type: text/html")
print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1
print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
print("")
f = open("/var/www/cgi-data/index.html", "r")
for line in f:
  print(line,end='')

If I run the script in the terminal, it nicely prints the webpage
'index.html'.

If access the script through a webbrowser, apache gives an error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
1791: ordinal not in range(128)

Is this a message appearing in the apache error log or in the browser? If
it is appearing in the browser, this is probably apache passing through a
python error message.

Is this the complete error message?

What happens when you try and access http://[server]/cgi-data/index.html
directly in a web browser? You may need to copy the file to a different
directory to do this depending on the apache configuration.



--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode in cgi-script with apache2

2014-08-16 Thread Dominique Ramaekers

* My system is a linux-box.

* I've tried using encoding="utf-8". It didn't fix things.

* That print uses sys.stdout would explain, using sys.stdout isn't better.

* My locale and the system-wide locale is UTF-8. Using SetEnv 
PYTHONIOENCODING utf-8 didn't fix things


* The file is encoded UTF-8...

I can not speak for anybody else but in my search I don't believe to 
have read about someone who had the problem on a Windows-system. They 
all used linux (different kinds of flavors) or OS-X... This is the first 
time I've encountered a situation where Windows is better in encoding 
issues :P +1 for Microsoft...


I think that Apache (*nix versions) doesn't tell Python, she's accepting 
UTF-8. Or Python doesn't listen right... Maybe I should place a bug 
report in both projects?



Op 17-08-14 om 04:50 schreef Denis McMahon:

On Sun, 17 Aug 2014 00:36:14 +0200, Dominique Ramaekers wrote:


What seems to be the problem:
My Script was ok. I know this because in the terminal I got my expected
output. Python3 uses UTF-8 coding as a standard. The problem is, when
python 'prints' to the apache interface, it translates the string to
ascii. (Why, I never found an answer).

Is the apache server running on a linux or a windows platform?

The problem may not be python, it may be the underlying OS. I wonder if
apache is spawning a process for python though, and if so whether it is
in some way constraining the character set available to stdout of the
spawned process.

 From your other message, the error appears to be a python error on
reading the input file. For some reason python seems to be trying to
interpret the file it is reading as ascii.

I wonder if specifying the binary data parameter and / or utf-8 encoding
when opening the file might help.

eg:

f = open( "/var/www/cgi-data/index.html", "rb" )
f = open( "/var/www/cgi-data/index.html", "rb", encoding="utf-8" )
f = open( "/var/www/cgi-data/index.html", "r", encoding="utf-8" )

I've managed to drive down a bit further in the problem:

print() goes to sys.stdout

This is part of what the docs say about sys.stdout:

"""
The character encoding is platform-dependent. Under Windows, if the
stream is interactive (that is, if its isatty() method returns True), the
console codepage is used, otherwise the ANSI code page. Under other
platforms, the locale encoding is used (see locale.getpreferredencoding
()).

Under all platforms though, you can override this value by setting the
PYTHONIOENCODING environment variable before starting Python.
"""

At this point, details of the OS become very significant. If your server
is running on a windows platform you may need to figure out how to make
apache set the PYTHONIOENCODING environment variable to "utf-8" (or
whatever else is appropriate) before calling the python script.

I believe that the following line in your httpd.conf may have the
required effect.

SetEnv PYTHONIOENCODING utf-8

Of course, if the file is not encoded as utf-8, but rather something
else, then use that as the encoding in the above suggestions. If the
server is not running windows, then I'm not sure where the problem might
be.



--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode in cgi-script with apache2

2014-08-17 Thread Dominique Ramaekers
Wow, everybody keeps on chewing on this problem. As a bonus, I've 
reconfigured my server to do some testings.
http://cloudserver.ramaekers-stassart.be/test.html => is the file I want 
to read. Going to this url displays the file...
http://cloudserver.ramaekers-stassart.be/cgi-python/encoding1 => is the 
cgi-script of this test
http://cloudserver.ramaekers-stassart.be/wsgi => is the wsgi sollution 
(but for now it just says 'Hello world'...)


This configuration-

dominique@cloudserver:/var/www/cgi-python$ cat /etc/default/locale
LANG="en_US.UTF-8"
LANGUAGE="en_US:"

dominique@cloudserver:/var/www/cgi-python$ cat 
/etc/apache2/sites-enabled/000-default.conf



ServerAdmin domini...@ramaekers-stassart.be
WSGIScriptAlias /wsgi /var/www/wsgi/application


Order allow,deny
Allow from all


DocumentRoot /var/www/html

ScriptAlias /cgi-python /var/www/cgi-python/

Options ExecCGI
SetHandler cgi-script


ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined



dominique@cloudserver:/var/www/cgi-python$ cat encoding1
#!/usr/bin/env python3
print("Content-Type: text/html")
print("Cache-Control: no-cache, must-revalidate")# HTTP/1.1
print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
print("")
f = open("/var/www/html/test.html", "r")
for line in f:
print(line,end='')

dominique@cloudserver:/var/www/cgi-python$ cat ../html/test.html




Testing my cgi...


Ok, Testing my cgi... Lets try some characters: é ë ü



dominique@cloudserver:/var/www/cgi-python$ file ../html/test.html
../html/test.html: HTML document, UTF-8 Unicode text

-Start test--
In brower: http://cloudserver.ramaekers-stassart.be/test.html => page 
displays ok (try it yourself...)


In terminal: => all go's wel
dominique@cloudserver:/var/www/cgi-python$ ./encoding1
Content-Type: text/html
Cache-Control: no-cache, must-revalidate
Expires: Sat, 26 Jul 1997 05:00:00 GMT





Testing my cgi...


Ok, Testing my cgi... Lets try some characters: é ë ü



In the browser (firefox):
http://cloudserver.ramaekers-stassart.be/cgi-python/encoding1 => gives a 
blank page!


The error log says:
root@cloudserver:~# cat /var/log/apache2/error.log | tail -n 6
[Sun Aug 17 11:09:21.102003 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215: Traceback (most recent call last):
[Sun Aug 17 11:09:21.102129 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215:   File "/var/www/cgi-python/encoding1", 
line 7, in 
[Sun Aug 17 11:09:21.102149 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215: for line in f:
[Sun Aug 17 11:09:21.102201 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215:   File 
"/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
[Sun Aug 17 11:09:21.102243 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215: return codecs.ascii_decode(input, 
self.errors)[0]
[Sun Aug 17 11:09:21.102318 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215: UnicodeDecodeError: 'ascii' codec can't 
decode byte 0xc3 in position 162: ordinal not in range(128)


--Conclusion-
In my current configuration, the bug is recreated!!!

---Test 2: new configuration-
I change the line f = open("/var/www/html/test.html", "r") into f = 
open("/var/www/html/test.html", "r", encoding="utf-8") and save the 
script as encoding2


In the terminal: => All ok

In the browser: => blank page!!!

Error log in apache:
root@cloudserver:~# cat /var/log/apache2/error.log | tail -n 4
[Sun Aug 17 11:13:47.372353 2014] [cgi:error] [pid 32147] [client 
84.194.120.161:36711] AH01215: Traceback (most recent call last):
[Sun Aug 17 11:13:47.372461 2014] [cgi:error] [pid 32147] [client 
84.194.120.161:36711] AH01215:   File "/var/www/cgi-python/encoding2", 
line 8, in 
[Sun Aug 17 11:13:47.372483 2014] [cgi:error] [pid 32147] [client 
84.194.120.161:36711] AH01215: print(line,end='')
[Sun Aug 17 11:13:47.372572 2014] [cgi:error] [pid 32147] [client 
84.194.120.161:36711] AH01215: UnicodeEncodeError: 'ascii' codec can't 
encode character '\\xe9' in position 51: ordinal not in range(128)


-Conclusion--
Steven was right. It was a read error => with encoding2 script the file 
is read in UTF-8. Dough, I find it strange. The file is in UTF-8 and 
Python3 has UTF-8 as standard. But reading the file is fixed.


Now the writing is still broken

Here are some tests hinted before:

Tip from Steven => getting the encoding:
dominique@cloudserver:/var/www/cgi-python$ cat readencoding
#!/usr/bin/env python3
import sys
print("Content-Type: text/html")
print("")
print(sys.getfilesystemencoding())

Gives in the terminal: utf-8
Gives in the browes: ascii

Found the problem!

No

Re: Unicode in cgi-script with apache2

2014-08-17 Thread Dominique Ramaekers
Yes, even a restart not just reload. I Also put it in the section 
 as in the main apache2.conf


Op 17-08-14 om 13:04 schreef Peter Otten:

Dominique Ramaekers wrote:


Putting the lines in my apache config:
AddDefaultCharset UTF-8
SetEnv PYTHONIOENCODING utf-8

Cleared my brower-cache... No change.

Did you restart the apache?




--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode in cgi-script with apache2

2014-08-17 Thread Dominique Ramaekers

As I suspected, if I check the used encoding in wsgi I get:
ANSI_X3.4-1968

I found you can define the coding of the script with a special comment: 
# -*- coding: utf-8 -*-


Now I don't get an error but my special chars still doesn't display well.
The script:
# -*- coding: utf-8 -*-
import sys
def application(environ, start_response):
status = '200 OK'
output = 'Hello World! é ü à ũ'
#output = sys.getfilesystemencoding() #1

response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)

return [output]

Gives in the browser as output:

Hello World! é ü à ũ

And if I check the encoding with the python script (uncommenting line 
#1), I still get ANSI_X3.4-1968


This is really getting on my nerves.


Op 17-08-14 om 13:04 schreef Peter Otten:

Dominique Ramaekers wrote:


Putting the lines in my apache config:
AddDefaultCharset UTF-8
SetEnv PYTHONIOENCODING utf-8

Cleared my brower-cache... No change.

Did you restart the apache?




--
https://mail.python.org/mailman/listinfo/python-list