Hi,all I want to parse big log files using Clojure. And the structure of each line record is "UserID,Lantitude,Lontitude,Timestamp". My implemented steps are: ----> Read log file & Get top-n user list ----> Find each top-n user's records and store in separate log file (UserID.log) .
The implement source code : ;====================================================== (defn parse-file "" [file n] (with-open [rdr (io/reader file)] (println "001 begin with open ") (let [lines (line-seq rdr) res (parse-recur lines) sorted (into (sorted-map-by (fn [key1 key2] (compare [(get res key2) key2] [(get res key1) key1]))) res)] (println "Statistic result : " res) (println "Top-N User List : " sorted) (find-write-recur lines sorted n) ))) (defn parse-recur "" [lines] (loop [ls lines res {}] (if ls (recur (next ls) (update-res res (first ls))) res))) (defn update-res "" [res line] (let [params (string/split line #",") id (if (> (count params) 1) (params 0) "0")] (if (res id) (update-in res [id] inc) (assoc res id 1)))) (defn find-write-recur "Get each users' records and store into separate log file" [lines sorted n] (loop [x n sd sorted id (first (keys sd))] (if (and (> x 0) sd) (do (create-write-file id (find-recur lines id)) (recur (dec x) (rest sd) (nth (keys sd) 1)))))) (defn find-recur "" [lines id] (loop [ls lines res []] (if ls (recur (next ls) (update-vec res id (first ls))) res))) (defn update-vec "" [res id line] (let [params (string/split line #",") id_ (if (> (count params) 1) (params 0) "0")] (if (= id id_ ) (conj res line) res))) (defn create-write-file "Create a new file and write information into the file." ([file info-lines] (with-open [wr (io/writer (str MAIN-PATH file))] (doseq [line info-lines] (.write wr (str line "\n"))) )) ([file info-lines append?] (with-open [wr (io/writer (str MAIN-PATH file) :append append?)] (doseq [line info-lines] (.write wr (str line "\n")))) )) ;====================================================== I tested this clj in REPL with command (parse-file "./DATA/log.log" 3), and get the results: Records Size Time Result 1,000 42KB <1s OK 10,000 420KB <1s OK 100,000 4.3MB 3s OK 1,000,000 43MB 15s OK 6,000,000 258MB >20M "OutOfMemoryError Java heap space java.lang.String.substring (String.java:1913)" ====================================================== Here is the question: 1. how can i fix the error when i try to parse big log file , like > 200MB 2. how can i optimize the function to run faster ? 3. there are logs more than 1G size , how can the function deal with it. I am still new to Clojure, any suggestion or solution will be appreciate~ Thanks BR ------------------------------------ 刘家齐 (Jacky Liu) 手机:15201091195 邮箱:liujiaq...@gmail.com Skype:jacky_liu_1987 QQ:406229156 -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.