Re: HTTP Request/Response questions
"R. P. Dillon" writes: > I'm currently working on a project to gather RSS data using Guile. I've been I've done that. I highly recommend sxpath for this job. > working with both the stable 2.0.3 version and the latest git repository. I'm > fairly new to Guile, though, so I might be approaching this the wrong way. > > As a test, I wanted to make an HTTP request. This is a series of commands I > executed in the REPL to accomplish this (using Geiser in Emacs 24): > > (use-modules (web request) (web response) (web uri) (rnrs bytevectors)) > > (define port (socket PF_INET SOCK_STREAM 0)) > (define address (addrinfo:addr (car (getaddrinfo "www.google.com" "http" > (connect port address) > (define request (build-request (build-uri 'http #:host "www.google.com"))) > (write-request request port) > (define response (read-response port)) > > (read-response ...) consistently fails with Google: > > web/http.scm:754:6: In procedure parse-asctime-date: > web/http.scm:754:6: Throw to key `bad-header' with args `(date "-1")'. I can confirm this with (call-with-input-string "Date: -1\r\n\r\n" parse-headers) > > The expiration is set to -1 in the headers, and this seems to cause a problem > for the web libraries in Guile. This is not IIRC a valid Date header, but is this common value? If so, it may be worth making an exception for it. > This same request seems to work well for my own domain (killring.org). > > I attempted a very similar series of commands to get RSS data for Google News: > > (define port (socket PF_INET SOCK_STREAM 0)) > (define address (addrinfo:addr (car (getaddrinfo "news.google.com" "http" > (connect port address) > (define request (build-request (build-uri 'http #:host "news.google.com" > #:path "/news?pz=1&cf=all&ned=us&hl=en&output=rss"))) > (write-request request port) > (define response (read-response port)) > (define body-vec (read-response-body response)) > > In this case, the (read-response-body ...) returns #f, although when I pulled > the data manually, there was XML data present in the body of the response. I have also experienced this problem. read-response-body returns #f if there is no content-length header, which usually means chunked encoding. I have a patch to deal with this, but I have not received any feedback on my proposed functions, so I haven't posted it yet. Basically, I wanted to add 4 functions, including a read-chunked-response-body, and to have the (web client) handle chunked-encoding transparently. > > Similarly, when getting RSS information from Slashdot: > > (define port (socket PF_INET SOCK_STREAM 0)) > (define address (addrinfo:addr (car (getaddrinfo "rss.slashdot.org" "http" > (connect port address) > (define request (build-request (build-uri 'http #:host "rss.slashdot.org" > #:path "/Slashdot/slashdot"))) > (write-request request port) > (define response (read-response port)) > > I get the following error when reading the response: > > web/http.scm:814:12: In procedure parse-entity-tag: > web/http.scm:814:12: Throw to key `bad-header' with args `(qstring > "F+oOJMkOlp2n1IUbAJmq+7qCGuk")'. > > which I haven't fully tracked down yet. I came across this issue already, and in my case it was because some servers (gws, I think) don't quote their Etags. Feedburner was a common culprit. All in all, not common, but a nuisance. Using 'declare-header!' from the (web http) library, you can cause Etags not to be parsed by doing (declare-header! "Etag" values string? display) Although, I'd think it much nicer if guile were to expose declare-opaque-header! directly for these sorts of circumstances. > > I have a feeling I'm using the API incorrectly, though I've pored over the > documentation the best I can to figure out how to make these requests and > parse the responses. Short of writing my own implementation, is there > anything I should be doing to make this work? No no, you're using it right :) Although the (web client) module will be more convenient usually. For example, scheme@(guile−user)> ,use (web client) scheme@(guile−user)> http-get $11 = # scheme@(guile−user)> (http-get (string->uri "http://www.google.com";)) $12 = #< version: (1 . 1) code: 302 reason−phrase: "Found" headers: ((location . #< scheme: http userinfo: #f host: "www.google.co.uk" port: #f path: "/" query: #f fragment: #f>) (cache−control private) (content−type text/html (charset . "UTF−8")) (set−cookie . "PREF=ID=3c2c9fc50c288823:FF=0:TM=1320578334:LM=1320578334:S=Gtrhd05V1tRopJyZ; expires=Tue, 05−Nov−2013 11:18:54 GMT; path=/; domain=.google.com") (date . #) (server . "gws") (content−length . 221) (x−xss−protection . "1; mode=block") (x−frame−options . "SAMEORIGIN") (connection close)) port: #> $13 = " 302 Moved 302 Moved The document has moved http://www.google.co.uk/\";>here.\r \r " scheme@(guile−user)> > > Thanks, > Rick > -- Ian Price "Programming is like pinball. The reward for doing it well is the opportunity to do it again" - from "The Wizardy Compiled"
Re: HTTP Request/Response questions
Thanks for your response, Ian. I don't know how I missed the (web client) module, but it's right there in my info page. I've been experimenting with it, but am having similar problem to those outlined below. I'm going to start reading some of the code, but my initial impression is that there's lots of loose interpretation (or at least execution) of the specs in the servers I'm testing on (Google, CNN) that are causing errors, e.g. (http-get (string->uri "http://www.cnn.com";)) yields: web/client.scm:109:4: In procedure http-get: web/client.scm:109:4: Throw to key `bad-response' with args `("EOF while reading response body: ~a bytes of ~a" (18576 106274))'. In web/client.scm: 109:4 0 (http-get #< scheme: http userinfo: #f host: "www.cnn.com" port: #f path: "" query: #f fragment: #f> #:port # …) In your google.com web client example, the request seemed to return the body of the document, but I'm still encountering the -1 expiration problem. (Guile 2.0.3, though I think I'll go back to the git repo if I can work around a recent compilation error that showed up). It might be useful for me to see if I can make the parsing functions more permissive, since they are (correctly) throwing errors for some common servers. Unfortunately, I don't know that much about the innards of HTTP, but I'm sure I can look at where the errors are generated and short circuit some of the logic and see what happens. =) Thanks for your help with this. Rick
load-from-path and compile-file question
Hi all, In LilyPond initialization code, at the moment we build a list of scheme files to load from our %load-path and interpret as we go. I'm currently hacking some code so that if we do the load-from-path successfully we then call (compile-file (%search-load-path "blah.scm") /blah.go). However one of these file has code using (current-module) to do some validation on some interpretive code basically looking for a symbol-name which is defined in the current module and is a procedure. The procedure works fine and validates as expected when the .scm file is loaded via load-from-path, but fails validation of exactly the same symbol when the file is being compiled using compile-file. I investigated by putting trace commands in the file being compiled to (format #f "~s" (current-module)) and this showed different results between when the file was being interpreted after being loaded and when it was compiling. (load-from-path) showed the expected module '(lily), (compile-file) showed an "anonymous" module with an internally generated name. Tested with Guile V2.0.3. Is this a bug, or are we doing something seriously screwy in our code? Cheers, Ian Hulin
Re: HTTP Request/Response questions
"R. P. Dillon" writes: > (http-get (string->uri "http://www.cnn.com";)) > > yields: > > web/client.scm:109:4: In procedure http-get: > web/client.scm:109:4: Throw to key `bad-response' with args `("EOF while > reading response body: ~a bytes of ~a" (18576 106274))'. > > In web/client.scm: > 109:4 0 (http-get #< scheme: http userinfo: #f host: "www.cnn.com" > port: #f path: "" query: #f fragment: #f> #:port # …) I see, http-get by default sends a "Connection: close" header, which is probably responsible for this behaviour. Using the keep-alive keyword argument should rectify this. (http-get (string->uri "http://www.cnn.com";) #:keep-alive? #t) > In your google.com web client example, the request seemed to return the body > of the document, but I'm still encountering the -1 expiration problem. (Guile > 2.0.3, though I think I'll go back to the git repo if I can work around a > recent compilation error that showed up). If you aren't needing the date header, then I'd suggest doing the same for the date header as I did for the etag header. It's a band-aid, but I'm not really sure why you'd be getting a -1 date. > Thanks for your help with this. No problem. I've also attached a patch for _reading_ chunk-encoded data. It will also modify http-get to handle that for you. Other Guilers, If you use the web modules, _please_ comment on my suggestions for chunked encoding support. See http://article.gmane.org/gmane.lisp.guile.devel/12814 for details. -- Ian Price "Programming is like pinball. The reward for doing it well is the opportunity to do it again" - from "The Wizardy Compiled" >From f58482fcae11690b23924334f7b89ba136a7fddc Mon Sep 17 00:00:00 2001 From: Ian Price Date: Sun, 6 Nov 2011 20:42:25 + Subject: [PATCH] Add support for transfer-encoded responses --- module/web/client.scm |4 ++- module/web/response.scm| 46 test-suite/tests/web-response.test | 25 +++ 3 files changed, 74 insertions(+), 1 deletions(-) diff --git a/module/web/client.scm b/module/web/client.scm index 6a04497..78d5201 100644 --- a/module/web/client.scm +++ b/module/web/client.scm @@ -107,7 +107,9 @@ (if (not keep-alive?) (shutdown port 1)) (let* ((res (read-response port)) - (body (read-response-body res))) + (body (if (member '(chunked) (response-transfer-encoding res)) + (read-chunked-response-body res) + (read-response-body res (if (not keep-alive?) (close-port port)) (values res diff --git a/module/web/response.scm b/module/web/response.scm index 6283772..e24ac0b 100644 --- a/module/web/response.scm +++ b/module/web/response.scm @@ -20,6 +20,8 @@ ;;; Code: (define-module (web response) + #:use-module (srfi srfi-1) + #:use-module (rnrs control) #:use-module (rnrs bytevectors) #:use-module (ice-9 binary-ports) #:use-module (ice-9 rdelim) @@ -39,6 +41,7 @@ read-response-body write-response-body +read-chunked-response-body ;; General headers ;; response-cache-control @@ -230,6 +233,49 @@ on @var{port}, perhaps using some transfer encoding." response @var{r}." (put-bytevector (response-port r) bv)) + +(define (read-chunk-header port) + (let* ((str (read-line port)) + (extension-start (string-index str (lambda (c) (or (char=? c #\;) + (char=? c #\return) + (size (string->number (if extension-start ; unnecessary? + (substring str 0 extension-start) + str) + 16))) +size)) + +(define (read-chunk port) + (let ((size (read-chunk-header port))) +(read-chunk-body port size))) + +(define (read-chunk-body port size) + (let ((bv (get-bytevector-n port size))) +(get-u8 port) ; CR +(get-u8 port) ; LF +bv)) + +(define (read-chunked-response-body r) + (let ((port (response-port r))) +(let loop ((chunks '())) + (let ((chunk (read-chunk port))) +(if (zero? (bytevector-length chunk)) +(bytevector-concatenate (reverse! chunks)) +(loop (cons chunk chunks))) + +(define (bytevector-concatenate bvs) + (let* ((total-length (fold (lambda (bv total) + (+ (bytevector-length bv) total)) + 0 + bvs)) + (result (make-bytevector total-length))) +(let loop ((start 0) (bvs bvs)) + (unless (null? bvs) +(let ((len (bytevector-length (car bvs + (bytevector-copy! (car bvs) 0 result start len) + (loop (+ start len) (cdr bvs) +result)) + + (define-syntax define-response-accessor (lambda (x) (syntax-case x () diff --git a/test-suite/tes
command line argument locale for a guile script
When guile 2.0 is used to write scripts, one have to manually do a setlocale at the beginning of the script to enable non-asciiI character support (why not by default?). My question is that the command line arguments seems to be parsed before any code in the script is executed (including the setlocale). Thus non-ascii arguments are not read correctly. Do I miss something or can anybody tell me how to read arguments correctly? My locale is en_US.UTF-8. Guile 1.8 works just fine. $ cat test.scm #!/usr/bin/guile !# (setlocale LC_ALL "") (write (command-line)) $ ./test.scm 跪了 ("./test.scm" "??")