Hi everybody I'm a clojure newbie, so I'm sorry if I'm asking a dumb question. I'm trying to write a program to crawl a webpage, find all links and recursively do this until all the links in the website is crawled (of course I'm omitting external hosts to avoid infinite crawling). So basically I'm dealing with a tree data structure. The problem is that my knowledge of clojure data structures are no where near enough to be able to implement this. I read a little bit about zippers and lots of other stuff but it only made me more confused.
This is what I've got so far: (ns cralwer.core (:gen-class) (:require [net.cgrand.enlive-html :as h]) (:import (java.net URL MalformedURLException)) (:import java.io.FileNotFoundException) ) (defn get-absolute-url-same-host "Convert the URL to absolute form if it's already not. Returns nill if the url is not from the same host" [url parent] (try (let [u (URL. url)] (if (= (.getHost u) (.getHost parent)) (.toString u))) (catch MalformedURLException e (.toString (URL. parent url))) )) (defn get-links "Return all the links in a URI" [url links] ;I do this check to avoid back edges/already seen urls and stop when there are no links in the current page (if-not (or (nil? url) (some #{url} links)) (try (let [j-url (java.net.URL. url) page (h/html-resource j-url)] (map #(get-absolute-url-same-host (:href (:attrs %)) j-url) (h/select page [(h/attr? :href)]))) (catch FileNotFoundException e (println "invalid URL: " url))))) (defn get-all-links "Return a collection of all links" [url] (let [links '() children (get-links url links)] (concat links (mapcat get-all-links children)))) For small inputs, I get an empty list and for large inputs I just get stack overflow exception. Thanks a lot for your help in advance -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.