On 28 April 2012 05:13, Sunjoong Lee <sunjo...@gmail.com> wrote: > > Background; > #:decode-body? keyword of http-get seems not to work properly; I should > set #:decode-body? to false value and decode the contents body string > manually. If a web page's charset be utf-8, there be no problem. If not, a > problem occurs. decode-response-body of (web client) call decode-string with > web page's charset. But real charset of bytevector is iso-8859-1, not web > page's charset. If so, you should not let http-get use decode-response-body.
Hello It seems you later made some headway on this, but just a note to clarify: Bytevectors are raw data, they do not have an encoding. Web ports are set to ISO-8859-1 as this is an 8-bit encoding that can be read as raw data. The output of http-get with '#:decode-body #f' *should* be a bytevector of exactly the bytes sent by the server. This is mentioned in the comments for read-request: > (use-modules (web request)) > ,d read-request Read an HTTP request from @var{port}, optionally attaching the given metadata, @var{meta}. As a side effect, sets the encoding on @var{port} to ISO-8859-1 (latin-1), so that reading one character reads one byte. See the discussion of character sets in "HTTP Requests" in the manual, for more information. Can you provide us with a couple of sites where http-get or decode-string does not work properly? Or was something else at play here? This would help to investigate what the issue is. (I am lazy today to find some, I think you must know of a few :-) > > After getting response-body with bytevector form, you should decode it with > "iso-8859-1" like decode-string's manner. Then you'll get web page's > contents body string; it's charset is what you see in response header. > Note that ISO-8859-1 does not cover much of Unicode so decoding the bytevector as that will lose much data. Regards