In case no one offers a better library, enclosed is a small one that I
recently created for a web-scraping task.

Start a simulation of a browser with `make-connection`, use `goto!` to
follow a link to a relative URL (following redirects), and use `back!`
to go back. The `goto!` function returns two values: the headers as a
string and the page content as bytes.

Beware: My application accessed a single site, so this library doesn't
attempt to do the right thing with cookies across sites.

At Wed, 08 Jan 2014 03:48:44 -0800, Duncan Bayne wrote:
> Hi All,
> 
> I'm trying to re-write some Common Lisp web-scraping code in Racket.
> 
> In Common Lisp, I'm POSTing a login request, and storing the cookie-jar
> for subsequent GETs:
> 
> (defun login (username password)
>   "Logs in to www.example.com.  Returns a cookie-jar containing
>   authentication details."
>   (let ((cookie-jar (make-instance 'drakma:cookie-jar)))
>     (drakma:http-request "http://www.example.com/login";
>              :method :post
>              :parameters `(("username" . ,username) ("password" .
>              ,password))
>              :cookie-jar cookie-jar)
>     cookie-jar))
> 
> ; snip
> 
> (defun get-page (page-num cookie-jar)
>   "Downloads a potentially invalid HTML page containing data to scrape. 
>   Returns a string containing the HTML."
>   (let ((url (concatenate 'string "http://www.example.com/data/";
>   (write-to-string page-num))))
>     (let ((body (drakma:http-request url :cookie-jar cookie-jar)))
>       (if (search "No data found." body)
>     nil
>   body))))
> 
> However, I can't find an equivalent in Racket. The latest HTTP
> library[1] makes no mention of cookies at all, and AFAICT the cookie
> library[2] seems more about correctly serializing and deserializing
> them.
> 
> Can anyone suggest a way of re-writing the above CL in Racket without
> having to implement a bunch of header-parsing stuff?
> 
> TIA for any help ...
> 
> [1]
> https://github.com/plt/racket/blob/master/racket/collects/net/http-client.rkt
> [2] http://docs.racket-lang.org/net/cookie.html
> 
> -- 
> Duncan Bayne
> ph: +61 420817082 | web: http://duncan-bayne.github.com/ | skype:
> duncan_bayne
> 
> I usually check my mail every 24 - 48 hours.  If there's something
> urgent going on, please send me an SMS or call me.
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users

Attachment: connection.rkt
Description: Binary data

____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Reply via email to