Package: w3m Version: 0.5.1-3 Severity: normal
Hi, Say I have a test.html.utf-8 page on some web server: <body> test </body> The web server properly announces that it is an utf-8 encoded page: Content-Type: text/html; charset=utf-8 But w3m simplify this into US-ASCII, because the page indeed doesn't contain anything than can't be coded in plain ascii: Information about current page Title test.html Current URL http://footwar/test.html Document Type text/html Last Modified Sat, 22 Jan 2005 20:54:00 GMT Document Charset [Latin (US-ASCII) ] [Change] Number of lines 1 Transferred bytes 20 Header information HTTP/1.1 200 OK Date: Sat, 22 Jan 2005 20:54:09 GMT Server: Apache/1.3.33 (Debian GNU/Linux) PHP/4.3.10-2 Content-Location: test.html.utf-8 Vary: negotiate TCN: choice Last-Modified: Sat, 22 Jan 2005 20:54:00 GMT ETag: "dba40-14-41f2bd68;41f2bd6c" Accept-Ranges: bytes Content-Length: 20 Connection: close Content-Type: text/html; charset=utf-8 The problem comes if I put a form in my page. Since the page is announced as utf-8-encoded, w3m should default to using utf-8 to code the values. But since w3m simplifies charset into US-ASCII, it will default to that to code the values (and won't know how to code accents & co). W3m should *not* simplify charset. Regards, Samuel -- System Information: Debian Release: 3.1 APT prefers unstable APT policy: (50, 'unstable'), (1, 'experimental') Architecture: i386 (i686) Kernel: Linux 2.6.10 Locale: [EMAIL PROTECTED], [EMAIL PROTECTED] (charmap=ISO-8859-15) Versions of packages w3m depends on: ii libc6 2.3.2.ds1-20 GNU C Library: Shared libraries an ii libgc1 1:6.3-1 Conservative garbage collector for ii libgpmg1 1.19.6-19 General Purpose Mouse - shared lib ii libncurses5 5.4-4 Shared libraries for terminal hand ii libssl0.9.7 0.9.7e-3 SSL shared libraries ii zlib1g 1:1.2.2-4 compression library - runtime -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]