Andy,

Il giorno 10 gennaio 2012 22:46, Andy Wingo <wi...@pobox.com> ha scritto:

> Hi Catonano,
>
> On Fri 30 Dec 2011 23:58, Catonano <caton...@gmail.com> writes:
>
> > I´m a beginner, I never wrote a single line of LISP or Scheme in my life
> > and I´m here for asking for directions and suggestions.
>
> Welcome! :-)
>

thank you so much for your reply. I had been eagerly waiting for a signal
from the list and I had missed it ! I´m sorry.

The gmail learning mechanism hasn´t still learned enough about my interest
in this issue, so it didn´t promptly reported about your reply. I had to
dig inside the folders structure I had layed out in order to discover it.
As for me I haven´t learned enough about the gmail learning mechaninsm
woes. I guess we´re both learning, now.

Well, I was attempting a joke ;-)



> > my boldness is such that I´d ask you to write for me an example
> > skeleton code.
>
>
> Hey, it's fair, I think; that is a new part of Guile, and there is not a
> lot of example code.
>
>
Thanks, Andy, I´m grateful for this. Actually I managed to set up geiser,
load a file and get me delivered to a prompt in which that file is loaded.
Cool ;-) But there are still some thing I didn´t know that your post made
clear.


> Generally, we figure out how to solve problems at the REPL, so fire up
> your Guile:
>
>  $ guile
>  ...
>  scheme@(guile-user)>
>
> (Here I'm assuming you have guile 2.0.3.)
>


> Use the web modules.  Let's assume we're grabbing http://www.gnu.org/,
> for simplicity:
>
>  > (use-modules (web client) (web uri))
>  > (http-get (string->uri "http://www.gnu.org/software/guile/";))
>  [here the text of the web page gets printed out]
>

Ok, I had managed to arrive so far (thanks to the help received in the
guile cannel in irc)

>
> Actually there are two return values: the response object, corresponding
> to the headers, and the body.  If you scroll your terminal up, you'll
> see that they get labels like $1 and $2.
>

I didn´t know they were 2 values, thanks

>
> Now you need to parse the HTML.  The best way to do this is with the
> pragmatic HTML parser, htmlprag.  It's part of guile-lib.  So download
> and install guile-lib (it's at http://www.non-gnu.org/guile-lib/), and
> then, assuming the html is in $2:
>

I had seen those $i things but I hadn´t understood that stuff was "inside"
them and that I could use them, so I was using a lot of (define this that).
And this is probably why I missed the two values returned by http-get.
Thanks !



>   > (use-modules (htmlprag))
>  > (define the-web-page (html->sxml $2))
>


And I didn´t know about htmlprag, thanks


>
> That parses the web page to s-expressions.  You can print the result
> nicely:
>
>  > ,pretty-print the-web-page
>

thanks, I didn´t know this, either


>
> Now you need to get something out of the web page.  The hackiest way to
> do it is just to match against the entire page.  Maybe someone else can
> come up with an example, but I'm short on time, so I'll proceed to The
> Right Thing -- the problem is that whitespace is significant, and maybe
> all you want is the contents of "the <title> in the <head> in the
> <html>."
>
> So in XML you'd use XPATH.  In SXML you'd use SXPATH.  It's hard to use
> right now; we really need to steal
> http://www.neilvandyke.org/webscraperhelper/ from Neil van Dyke.  But
> you can see from his docs that the thing would be
>
>  > (use-modules (sxml xpath))
>  > (define matcher (sxpath '(// html head title)))
>  > (matcher the-web-page)
>  $3 = ((title "GNU Guile (About Guile)"))
>
>
I was going to attempt something along this line

(sxml-match (xml->sxml page) [(div (@ (id "real_player") (rel ,url))) (str

but I´m going to explore your lines too. I still wasn´t there, I had
stumbled in something I thought it was a bug, but I also had something else
to do (this is a pet project) so this had to wait.

But I´ll surely let you know

Thanks again for your help
Bye
Cato

Reply via email to