I wrote this a while ago,
here it is :-)

https://github.com/gtrak/betamore-clj/blob/master/src/betamore_clj/core.clj


On Thu, Jul 4, 2013 at 1:23 PM, Amir Fouladi <ah.foul...@gmail.com> wrote:

> Hi everybody
>
> I'm a clojure newbie, so I'm sorry if I'm asking a dumb question.
> I'm trying to write a program to crawl a webpage, find all links and
> recursively do this until all the links in the website is crawled (of
> course I'm omitting external hosts to avoid infinite crawling). So
> basically I'm dealing with a tree data structure. The problem is that my
> knowledge of clojure data structures are no where near enough to be able to
> implement this. I read a little bit about zippers and lots of other stuff
> but it only made me more confused.
>
> This is what I've got so far:
>
>
>
> (ns cralwer.core
>   (:gen-class)
>   (:require [net.cgrand.enlive-html :as h])
>   (:import (java.net URL MalformedURLException))
>   (:import java.io.FileNotFoundException)
>   )
>
> (defn get-absolute-url-same-host
>   "Convert the URL to absolute form if it's already not. Returns nill if
> the url is not from the same host"
>   [url parent]
>   (try (let [u (URL. url)]
>          (if (= (.getHost u) (.getHost parent))
>            (.toString u)))
>     (catch MalformedURLException e (.toString (URL. parent url)))
>     ))
>
>
>
> (defn get-links
>   "Return all the links in a URI"
>   [url links]
>     ;I do this check to avoid back edges/already seen urls and stop when
> there are no links in the current page
>     (if-not (or (nil? url) (some #{url} links))
>       (try (let [j-url (java.net.URL. url)
>           page (h/html-resource j-url)]
>          (map #(get-absolute-url-same-host (:href (:attrs %)) j-url)
> (h/select page [(h/attr? :href)])))
>
>       (catch FileNotFoundException e (println "invalid URL: " url)))))
>
>
>
> (defn get-all-links
>   "Return a collection of all links"
>   [url]
>   (let [links '() children (get-links url links)]
>     (concat links (mapcat get-all-links children))))
>
>
> For small inputs, I get an empty list and for large inputs I just get
> stack overflow exception.
>
> Thanks a lot for your help in advance
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to