Re: HTTP Request/Response questions

2011-11-06 Thread Ian Price
"R. P. Dillon"  writes:

> I'm currently working on a project to gather RSS data using Guile.  I've been
I've done that. I highly recommend sxpath for this job.

> working with both the stable 2.0.3 version and the latest git repository.  I'm
> fairly new to Guile, though, so I might be approaching this the wrong way.
>
> As a test, I wanted to make an HTTP request.  This is a series of commands I
> executed in the REPL to accomplish this (using Geiser in Emacs 24):
>
> (use-modules (web request) (web response) (web uri) (rnrs bytevectors))
>
> (define port (socket PF_INET SOCK_STREAM 0))
> (define address (addrinfo:addr (car (getaddrinfo "www.google.com" "http"
> (connect port address)
> (define request (build-request (build-uri 'http #:host "www.google.com")))
> (write-request request port)
> (define response (read-response port))
>
> (read-response ...) consistently fails with Google:
>
> web/http.scm:754:6: In procedure parse-asctime-date:
> web/http.scm:754:6: Throw to key `bad-header' with args `(date "-1")'.
I can confirm this with (call-with-input-string "Date: -1\r\n\r\n" 
parse-headers)

>
> The expiration is set to -1 in the headers, and this seems to cause a problem
> for the web libraries in Guile.
This is not IIRC a valid Date header, but is this common value? If so, it
may be worth making an exception for it.

> This same request seems to work well for my own domain (killring.org).
>
> I attempted a very similar series of commands to get RSS data for Google News:
>
> (define port (socket PF_INET SOCK_STREAM 0))
> (define address (addrinfo:addr (car (getaddrinfo "news.google.com" "http"
> (connect port address)
> (define request (build-request (build-uri 'http #:host "news.google.com"
> #:path "/news?pz=1&cf=all&ned=us&hl=en&output=rss")))
> (write-request request port)
> (define response (read-response port))
> (define body-vec (read-response-body response))
>
> In this case, the (read-response-body ...) returns #f, although when I pulled
> the data manually, there was XML data present in the body of the response.
I have also experienced this problem. read-response-body returns #f if
there is no content-length header, which usually means chunked
encoding.

I have a patch to deal with this, but I have not received any
feedback on my proposed functions, so I haven't posted it
yet. Basically, I wanted to add 4 functions, including a
read-chunked-response-body, and to have the (web client) handle
chunked-encoding transparently.

>
> Similarly, when getting RSS information from Slashdot:
>
> (define port (socket PF_INET SOCK_STREAM 0))
> (define address (addrinfo:addr (car (getaddrinfo "rss.slashdot.org" "http"
> (connect port address)
> (define request (build-request (build-uri 'http #:host "rss.slashdot.org"
> #:path "/Slashdot/slashdot")))
> (write-request request port)
> (define response (read-response port))
>
> I get the following error when reading the response:
>
> web/http.scm:814:12: In procedure parse-entity-tag:
> web/http.scm:814:12: Throw to key `bad-header' with args `(qstring
> "F+oOJMkOlp2n1IUbAJmq+7qCGuk")'.
>
> which I haven't fully tracked down yet.
I came across this issue already, and in my case it was because some servers
(gws, I think) don't quote their Etags. Feedburner was a common
culprit. All in all, not common, but a nuisance. Using 'declare-header!'
from the (web http) library, you can cause Etags not to be parsed by doing

(declare-header! "Etag" values string? display)

Although, I'd think it much nicer if guile were to expose
declare-opaque-header! directly for these sorts of circumstances.

>
> I have a feeling I'm using the API incorrectly, though I've pored over the
> documentation the best I can to figure out how to make these requests and
> parse the responses.  Short of writing my own implementation, is there
> anything I should be doing to make this work?
No no, you're using it right :) Although the (web client) module will be
more convenient usually. For example,

scheme@(guile−user)> ,use (web client)
scheme@(guile−user)> http-get
$11 = #
scheme@(guile−user)> (http-get (string->uri "http://www.google.com";))
$12 = #< version: (1 . 1) code: 302 reason−phrase: "Found" headers: 
((location . #< scheme: http userinfo: #f host: "www.google.co.uk" port: 
#f path: "/" query: #f fragment: #f>) (cache−control private) (content−type 
text/html (charset . "UTF−8")) (set−cookie . 
"PREF=ID=3c2c9fc50c288823:FF=0:TM=1320578334:LM=1320578334:S=Gtrhd05V1tRopJyZ; 
expires=Tue, 05−Nov−2013 11:18:54 GMT; path=/; domain=.google.com") (date . 
#) (server . "gws") (content−length . 221) (x−xss−protection . 
"1; mode=block") (x−frame−options . "SAMEORIGIN") (connection close)) port: 
#>
$13 = "
302 Moved
302 Moved
The document has moved
http://www.google.co.uk/\";>here.\r
\r
"
scheme@(guile−user)> 

>
> Thanks,
> Rick
>

-- 
Ian Price

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"



Re: HTTP Request/Response questions

2011-11-06 Thread R. P. Dillon
Thanks for your response, Ian.  I don't know how I missed the (web client)
module, but it's right there in my info page.

I've been experimenting with it, but am having similar problem to those
outlined below.  I'm going to start reading some of the code, but my
initial impression is that there's lots of loose interpretation (or at
least execution) of the specs in the servers I'm testing on (Google, CNN)
that are causing errors, e.g.

(http-get (string->uri "http://www.cnn.com";))

yields:

web/client.scm:109:4: In procedure http-get:
web/client.scm:109:4: Throw to key `bad-response' with args `("EOF while
reading response body: ~a bytes of ~a" (18576 106274))'.

In web/client.scm:
109:4  0 (http-get #< scheme: http userinfo: #f host: "www.cnn.com"
port: #f path: "" query: #f fragment: #f> #:port # …)

In your google.com web client example, the request seemed to return the
body of the document, but I'm still encountering the -1 expiration problem.
(Guile 2.0.3, though I think I'll go back to the git repo if I can work
around a recent compilation error that showed up).

It might be useful for me to see if I can make the parsing functions more
permissive, since they are (correctly) throwing errors for some common
servers.  Unfortunately, I don't know that much about the innards of HTTP,
but I'm sure I can look at where the errors are generated and short circuit
some of the logic and see what happens.  =)

Thanks for your help with this.

Rick


load-from-path and compile-file question

2011-11-06 Thread Ian Hulin
Hi all,

In LilyPond initialization code, at the moment we build a list of
scheme files to load from our %load-path and interpret as we go.

I'm currently hacking some code so that if we do the load-from-path
successfully we then call
(compile-file (%search-load-path "blah.scm") /blah.go).

However one of these file has code using (current-module) to do some
validation on some interpretive code basically looking for a
symbol-name which is defined in the current module and is a procedure.

The procedure works fine and validates as expected when the .scm file
is loaded via load-from-path, but fails validation of exactly the same
symbol when the file is being compiled using compile-file.

I investigated by putting trace commands in the file being compiled to
(format #f "~s" (current-module)) and this showed different results
between when the file was being interpreted after being loaded and
when it was compiling.

(load-from-path) showed the expected module '(lily), (compile-file)
showed an "anonymous" module with an internally generated name.

Tested with Guile V2.0.3.

Is this a bug, or are we doing something seriously screwy in our code?

Cheers,
Ian Hulin




Re: HTTP Request/Response questions

2011-11-06 Thread Ian Price
"R. P. Dillon"  writes:

> (http-get (string->uri "http://www.cnn.com";))
>
> yields:
>
> web/client.scm:109:4: In procedure http-get:
> web/client.scm:109:4: Throw to key `bad-response' with args `("EOF while
> reading response body: ~a bytes of ~a" (18576 106274))'.
>
> In web/client.scm:
>     109:4  0 (http-get #< scheme: http userinfo: #f host: "www.cnn.com"
> port: #f path: "" query: #f fragment: #f> #:port # …)
I see, http-get by default sends a "Connection: close" header, which is
probably responsible for this behaviour. Using the keep-alive keyword
argument should rectify this.

  (http-get (string->uri "http://www.cnn.com";) #:keep-alive? #t)

> In your google.com web client example, the request seemed to return the body
> of the document, but I'm still encountering the -1 expiration problem. (Guile
> 2.0.3, though I think I'll go back to the git repo if I can work around a
> recent compilation error that showed up).
If you aren't needing the date header, then I'd suggest doing the same
for the date header as I did for the etag header. It's a band-aid, but
I'm not really sure why you'd be getting a -1 date.

> Thanks for your help with this.
No problem.

I've also attached a patch for _reading_ chunk-encoded data. It will
also modify http-get to handle that for you.


Other Guilers,

If you use the web modules, _please_ comment on my suggestions for
chunked encoding support. See
http://article.gmane.org/gmane.lisp.guile.devel/12814 for details.

-- 
Ian Price

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"

>From f58482fcae11690b23924334f7b89ba136a7fddc Mon Sep 17 00:00:00 2001
From: Ian Price 
Date: Sun, 6 Nov 2011 20:42:25 +
Subject: [PATCH] Add support for transfer-encoded responses

---
 module/web/client.scm  |4 ++-
 module/web/response.scm|   46 
 test-suite/tests/web-response.test |   25 +++
 3 files changed, 74 insertions(+), 1 deletions(-)

diff --git a/module/web/client.scm b/module/web/client.scm
index 6a04497..78d5201 100644
--- a/module/web/client.scm
+++ b/module/web/client.scm
@@ -107,7 +107,9 @@
 (if (not keep-alive?)
 (shutdown port 1))
 (let* ((res (read-response port))
-   (body (read-response-body res)))
+   (body (if (member '(chunked) (response-transfer-encoding res))
+ (read-chunked-response-body res)
+ (read-response-body res
   (if (not keep-alive?)
   (close-port port))
   (values res
diff --git a/module/web/response.scm b/module/web/response.scm
index 6283772..e24ac0b 100644
--- a/module/web/response.scm
+++ b/module/web/response.scm
@@ -20,6 +20,8 @@
 ;;; Code:
 
 (define-module (web response)
+  #:use-module (srfi srfi-1)
+  #:use-module (rnrs control)
   #:use-module (rnrs bytevectors)
   #:use-module (ice-9 binary-ports)
   #:use-module (ice-9 rdelim)
@@ -39,6 +41,7 @@
 read-response-body
 write-response-body
 
+read-chunked-response-body
 ;; General headers
 ;;
 response-cache-control
@@ -230,6 +233,49 @@ on @var{port}, perhaps using some transfer encoding."
 response @var{r}."
   (put-bytevector (response-port r) bv))
 
+
+(define (read-chunk-header port)
+  (let* ((str (read-line port))
+ (extension-start (string-index str (lambda (c) (or (char=? c #\;)
+   (char=? c #\return)
+ (size (string->number (if extension-start ; unnecessary?
+   (substring str 0 extension-start)
+   str)
+   16)))
+size))
+
+(define (read-chunk port)
+  (let ((size (read-chunk-header port)))
+(read-chunk-body port size)))
+
+(define (read-chunk-body port size)
+  (let ((bv (get-bytevector-n port size)))
+(get-u8 port)   ; CR
+(get-u8 port)   ; LF
+bv))
+
+(define (read-chunked-response-body r)
+  (let ((port (response-port r)))
+(let loop ((chunks '()))
+  (let ((chunk (read-chunk port)))
+(if (zero? (bytevector-length chunk))
+(bytevector-concatenate (reverse! chunks))
+(loop (cons chunk chunks)))
+
+(define (bytevector-concatenate bvs)
+  (let* ((total-length (fold (lambda (bv total)
+   (+ (bytevector-length bv) total))
+ 0
+ bvs))
+ (result (make-bytevector total-length)))
+(let loop ((start 0) (bvs bvs))
+  (unless (null? bvs)
+(let ((len (bytevector-length (car bvs
+  (bytevector-copy! (car bvs) 0 result start len)
+  (loop (+ start len) (cdr bvs)
+result))
+
+
 (define-syntax define-response-accessor
   (lambda (x)
 (syntax-case x ()
diff --git a/test-suite/tes

command line argument locale for a guile script

2011-11-06 Thread cong gu
When guile 2.0 is used to write scripts, one have to manually do a
setlocale at the beginning of the script to enable non-asciiI
character support (why not by default?).

My question is that the command line arguments seems to be parsed
before any code in the script is executed (including the setlocale).
Thus non-ascii arguments are not read correctly.  Do I miss something
or can anybody tell me how to read arguments correctly?

My locale is en_US.UTF-8.  Guile 1.8 works just fine.

$ cat test.scm
#!/usr/bin/guile
!#
(setlocale LC_ALL "")
(write (command-line))

$ ./test.scm 跪了
("./test.scm" "??")