hi,Mark ,thanks for your suggestion. I modified the main function to : ;================================ (defn parse-file "" [file n] (with-open [rdr (io/reader file)] (println "001 begin with open " (type rdr)) (let [;lines (line-seq rdr) *res (parse-recur (line-seq rdr))*;lines) sorted (into (sorted-map-by (fn [key1 key2] (compare [(get res key2) key2] [(get res key1) key1]))) res)] (println "Statistic result : " res) (println "Sorted result : " sorted) ;(println "..." (type rdr)) ;(find-write-recur lines sorted n) *(find-write-recur (line-seq rdr) sorted n)* ))) ;================================ But it's wired , i got this error:
com.util=> (parse-file "./log600w.log" 3) 001 begin with open java.io.BufferedReader com.util=> *OutOfMemoryError GC overhead limit exceeded java.util.regex.Pattern.matcher (Pattern.java:1088)* 2013/11/20 Mark Engelberg <mark.engelb...@gmail.com> > Looks like you're "holding on to the head" by giving a name (lines) to the > result of line-seq. Don't do that. Try: > (parse-recur (line-seq rdr)) > > > On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote: > >> Hi,all >> I want to parse big log files using Clojure. >> And the structure of each line record is >> "UserID,Lantitude,Lontitude,Timestamp". >> My implemented steps are: >> ----> Read log file & Get top-n user list >> ----> Find each top-n user's records and store in separate log file >> (UserID.log) . >> >> The implement source code : >> ;====================================================== >> (defn parse-file >> "" >> [file n] >> (with-open [rdr (io/reader file)] >> (println "001 begin with open ") >> (let [lines (line-seq rdr) >> res (parse-recur lines) >> sorted >> (into (sorted-map-by (fn [key1 key2] >> (compare [(get res key2) key2] >> [(get res key1) key1]))) >> res)] >> (println "Statistic result : " res) >> (println "Top-N User List : " sorted) >> (find-write-recur lines sorted n) >> ))) >> >> (defn parse-recur >> "" >> [lines] >> (loop [ls lines >> res {}] >> (if ls >> (recur (next ls) >> (update-res res (first ls))) >> res))) >> >> (defn update-res >> "" >> [res line] >> (let [params (string/split line #",") >> id (if (> (count params) 1) (params 0) "0")] >> (if (res id) >> (update-in res [id] inc) >> (assoc res id 1)))) >> >> (defn find-write-recur >> "Get each users' records and store into separate log file" >> [lines sorted n] >> (loop [x n >> sd sorted >> id (first (keys sd))] >> (if (and (> x 0) sd) >> (do (create-write-file id >> (find-recur lines id)) >> (recur (dec x) >> (rest sd) >> (nth (keys sd) 1)))))) >> >> (defn find-recur >> "" >> [lines id] >> (loop [ls lines >> res []] >> (if ls >> (recur (next ls) >> (update-vec res id (first ls))) >> res))) >> >> (defn update-vec >> "" >> [res id line] >> (let [params (string/split line #",") >> id_ (if (> (count params) 1) (params 0) "0")] >> (if (= id id_ ) >> (conj res line) >> res))) >> >> (defn create-write-file >> "Create a new file and write information into the file." >> ([file info-lines] >> (with-open [wr (io/writer (str MAIN-PATH file))] >> (doseq [line info-lines] (.write wr (str line "\n"))) >> )) >> ([file info-lines append?] >> (with-open [wr (io/writer (str MAIN-PATH file) :append append?)] >> (doseq [line info-lines] (.write wr (str line "\n")))) >> )) >> ;====================================================== >> >> I tested this clj in REPL with command (parse-file "./DATA/log.log" 3), >> and get the results: >> >> Records Size Time Result >> 1,000 42KB <1s OK >> 10,000 420KB <1s OK >> 100,000 4.3MB 3s OK >> 1,000,000 43MB 15s OK >> 6,000,000 258MB >20M "OutOfMemoryError Java heap space >> java.lang.String.substring (String.java:1913)" >> >> ====================================================== >> Here is the question: >> 1. how can i fix the error when i try to parse big log file , like > 200MB >> 2. how can i optimize the function to run faster ? >> 3. there are logs more than 1G size , how can the function deal with it. >> >> I am still new to Clojure, any suggestion or solution will be appreciate~ >> Thanks >> >> BR >> >> ------------------------------------ >> >> 刘家齐 (Jacky Liu) >> >> >> >> 手机:15201091195 邮箱:liujiaq...@gmail.com >> >> Skype:jacky_liu_1987 QQ:406229156 >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- ------------------------------------ 刘家齐 (Jacky Liu) 手机:15201091195 邮箱:liujiaq...@gmail.com Skype:jacky_liu_1987 QQ:406229156 -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.