sorry, i mean "weird"...
2013/11/20 Jiaqi Liu <liujiaq...@gmail.com> > hi,Mark ,thanks for your suggestion. > I modified the main function to : > ;================================ > (defn parse-file > "" > [file n] > (with-open [rdr (io/reader file)] > (println "001 begin with open " (type rdr)) > (let [;lines (line-seq rdr) > *res (parse-recur (line-seq rdr))*;lines) > sorted > (into (sorted-map-by (fn [key1 key2] > (compare [(get res key2) key2] > [(get res key1) key1]))) > res)] > (println "Statistic result : " res) > (println "Sorted result : " sorted) > ;(println "..." (type rdr)) > ;(find-write-recur lines sorted n) > *(find-write-recur (line-seq rdr) sorted n)* > ))) > ;================================ > But it's wired , i got this error: > > com.util=> (parse-file "./log600w.log" 3) > > 001 begin with open java.io.BufferedReader > > > com.util=> *OutOfMemoryError GC overhead limit exceeded > java.util.regex.Pattern.matcher (Pattern.java:1088)* > > > > > > 2013/11/20 Mark Engelberg <mark.engelb...@gmail.com> > >> Looks like you're "holding on to the head" by giving a name (lines) to >> the result of line-seq. Don't do that. Try: >> (parse-recur (line-seq rdr)) >> >> >> On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote: >> >>> Hi,all >>> I want to parse big log files using Clojure. >>> And the structure of each line record is >>> "UserID,Lantitude,Lontitude,Timestamp". >>> My implemented steps are: >>> ----> Read log file & Get top-n user list >>> ----> Find each top-n user's records and store in separate log file >>> (UserID.log) . >>> >>> The implement source code : >>> ;====================================================== >>> (defn parse-file >>> "" >>> [file n] >>> (with-open [rdr (io/reader file)] >>> (println "001 begin with open ") >>> (let [lines (line-seq rdr) >>> res (parse-recur lines) >>> sorted >>> (into (sorted-map-by (fn [key1 key2] >>> (compare [(get res key2) key2] >>> [(get res key1) key1]))) >>> res)] >>> (println "Statistic result : " res) >>> (println "Top-N User List : " sorted) >>> (find-write-recur lines sorted n) >>> ))) >>> >>> (defn parse-recur >>> "" >>> [lines] >>> (loop [ls lines >>> res {}] >>> (if ls >>> (recur (next ls) >>> (update-res res (first ls))) >>> res))) >>> >>> (defn update-res >>> "" >>> [res line] >>> (let [params (string/split line #",") >>> id (if (> (count params) 1) (params 0) "0")] >>> (if (res id) >>> (update-in res [id] inc) >>> (assoc res id 1)))) >>> >>> (defn find-write-recur >>> "Get each users' records and store into separate log file" >>> [lines sorted n] >>> (loop [x n >>> sd sorted >>> id (first (keys sd))] >>> (if (and (> x 0) sd) >>> (do (create-write-file id >>> (find-recur lines id)) >>> (recur (dec x) >>> (rest sd) >>> (nth (keys sd) 1)))))) >>> >>> (defn find-recur >>> "" >>> [lines id] >>> (loop [ls lines >>> res []] >>> (if ls >>> (recur (next ls) >>> (update-vec res id (first ls))) >>> res))) >>> >>> (defn update-vec >>> "" >>> [res id line] >>> (let [params (string/split line #",") >>> id_ (if (> (count params) 1) (params 0) "0")] >>> (if (= id id_ ) >>> (conj res line) >>> res))) >>> >>> (defn create-write-file >>> "Create a new file and write information into the file." >>> ([file info-lines] >>> (with-open [wr (io/writer (str MAIN-PATH file))] >>> (doseq [line info-lines] (.write wr (str line "\n"))) >>> )) >>> ([file info-lines append?] >>> (with-open [wr (io/writer (str MAIN-PATH file) :append append?)] >>> (doseq [line info-lines] (.write wr (str line "\n")))) >>> )) >>> ;====================================================== >>> >>> I tested this clj in REPL with command (parse-file "./DATA/log.log" 3), >>> and get the results: >>> >>> Records Size Time Result >>> 1,000 42KB <1s OK >>> 10,000 420KB <1s OK >>> 100,000 4.3MB 3s OK >>> 1,000,000 43MB 15s OK >>> 6,000,000 258MB >20M "OutOfMemoryError Java heap space >>> java.lang.String.substring (String.java:1913)" >>> >>> ====================================================== >>> Here is the question: >>> 1. how can i fix the error when i try to parse big log file , like > >>> 200MB >>> 2. how can i optimize the function to run faster ? >>> 3. there are logs more than 1G size , how can the function deal with it. >>> >>> I am still new to Clojure, any suggestion or solution will be appreciate~ >>> Thanks >>> >>> BR >>> >>> ------------------------------------ >>> >>> 刘家齐 (Jacky Liu) >>> >>> >>> >>> 手机:15201091195 邮箱:liujiaq...@gmail.com >>> >>> Skype:jacky_liu_1987 QQ:406229156 >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clojure@googlegroups.com >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+unsubscr...@googlegroups.com >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to clojure+unsubscr...@googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > > > -- > > ------------------------------------ > > 刘家齐 (Jacky Liu) > > > > 手机:15201091195 邮箱:liujiaq...@gmail.com > > Skype:jacky_liu_1987 QQ:406229156 > -- ------------------------------------ 刘家齐 (Jacky Liu) 手机:15201091195 邮箱:liujiaq...@gmail.com Skype:jacky_liu_1987 QQ:406229156 -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.