"R. P. Dillon" <rpdil...@gmail.com> writes: > I'm currently working on a project to gather RSS data using Guile. I've been I've done that. I highly recommend sxpath for this job.
> working with both the stable 2.0.3 version and the latest git repository. I'm > fairly new to Guile, though, so I might be approaching this the wrong way. > > As a test, I wanted to make an HTTP request. This is a series of commands I > executed in the REPL to accomplish this (using Geiser in Emacs 24): > > (use-modules (web request) (web response) (web uri) (rnrs bytevectors)) > > (define port (socket PF_INET SOCK_STREAM 0)) > (define address (addrinfo:addr (car (getaddrinfo "www.google.com" "http")))) > (connect port address) > (define request (build-request (build-uri 'http #:host "www.google.com"))) > (write-request request port) > (define response (read-response port)) > > (read-response ...) consistently fails with Google: > > web/http.scm:754:6: In procedure parse-asctime-date: > web/http.scm:754:6: Throw to key `bad-header' with args `(date "-1")'. I can confirm this with (call-with-input-string "Date: -1\r\n\r\n" parse-headers) > > The expiration is set to -1 in the headers, and this seems to cause a problem > for the web libraries in Guile. This is not IIRC a valid Date header, but is this common value? If so, it may be worth making an exception for it. > This same request seems to work well for my own domain (killring.org). > > I attempted a very similar series of commands to get RSS data for Google News: > > (define port (socket PF_INET SOCK_STREAM 0)) > (define address (addrinfo:addr (car (getaddrinfo "news.google.com" "http")))) > (connect port address) > (define request (build-request (build-uri 'http #:host "news.google.com" > #:path "/news?pz=1&cf=all&ned=us&hl=en&output=rss"))) > (write-request request port) > (define response (read-response port)) > (define body-vec (read-response-body response)) > > In this case, the (read-response-body ...) returns #f, although when I pulled > the data manually, there was XML data present in the body of the response. I have also experienced this problem. read-response-body returns #f if there is no content-length header, which usually means chunked encoding. I have a patch to deal with this, but I have not received any feedback on my proposed functions, so I haven't posted it yet. Basically, I wanted to add 4 functions, including a read-chunked-response-body, and to have the (web client) handle chunked-encoding transparently. > > Similarly, when getting RSS information from Slashdot: > > (define port (socket PF_INET SOCK_STREAM 0)) > (define address (addrinfo:addr (car (getaddrinfo "rss.slashdot.org" "http")))) > (connect port address) > (define request (build-request (build-uri 'http #:host "rss.slashdot.org" > #:path "/Slashdot/slashdot"))) > (write-request request port) > (define response (read-response port)) > > I get the following error when reading the response: > > web/http.scm:814:12: In procedure parse-entity-tag: > web/http.scm:814:12: Throw to key `bad-header' with args `(qstring > "F+oOJMkOlp2n1IUbAJmq+7qCGuk")'. > > which I haven't fully tracked down yet. I came across this issue already, and in my case it was because some servers (gws, I think) don't quote their Etags. Feedburner was a common culprit. All in all, not common, but a nuisance. Using 'declare-header!' from the (web http) library, you can cause Etags not to be parsed by doing (declare-header! "Etag" values string? display) Although, I'd think it much nicer if guile were to expose declare-opaque-header! directly for these sorts of circumstances. > > I have a feeling I'm using the API incorrectly, though I've pored over the > documentation the best I can to figure out how to make these requests and > parse the responses. Short of writing my own implementation, is there > anything I should be doing to make this work? No no, you're using it right :) Although the (web client) module will be more convenient usually. For example, scheme@(guile−user)> ,use (web client) scheme@(guile−user)> http-get $11 = #<procedure http−get (uri #:key port version keep−alive? extra−headers decode−body?)> scheme@(guile−user)> (http-get (string->uri "http://www.google.com")) $12 = #<<response> version: (1 . 1) code: 302 reason−phrase: "Found" headers: ((location . #<<uri> scheme: http userinfo: #f host: "www.google.co.uk" port: #f path: "/" query: #f fragment: #f>) (cache−control private) (content−type text/html (charset . "UTF−8")) (set−cookie . "PREF=ID=3c2c9fc50c288823:FF=0:TM=1320578334:LM=1320578334:S=Gtrhd05V1tRopJyZ; expires=Tue, 05−Nov−2013 11:18:54 GMT; path=/; domain=.google.com") (date . #<date nanosecond: 0 second: 54 minute: 18 hour: 11 day: 6 month: 11 year: 2011 zone−offset: 0>) (server . "gws") (content−length . 221) (x−xss−protection . "1; mode=block") (x−frame−options . "SAMEORIGIN") (connection close)) port: #<closed: file 0>> $13 = "<HTML><HEAD><meta http−equiv=\"content−type\" content=\"text/html;charset=utf−8\"> <TITLE>302 Moved</TITLE></HEAD><BODY> <H1>302 Moved</H1> The document has moved <A HREF=\"http://www.google.co.uk/\">here</A>.\r </BODY></HTML>\r " scheme@(guile−user)> > > Thanks, > Rick > -- Ian Price "Programming is like pinball. The reward for doing it well is the opportunity to do it again" - from "The Wizardy Compiled"