Looks like you're "holding on to the head" by giving a name (lines) to the result of line-seq. Don't do that. Try: (parse-recur (line-seq rdr))
On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote: > Hi,all > I want to parse big log files using Clojure. > And the structure of each line record is > "UserID,Lantitude,Lontitude,Timestamp". > My implemented steps are: > ----> Read log file & Get top-n user list > ----> Find each top-n user's records and store in separate log file > (UserID.log) . > > The implement source code : > ;====================================================== > (defn parse-file > "" > [file n] > (with-open [rdr (io/reader file)] > (println "001 begin with open ") > (let [lines (line-seq rdr) > res (parse-recur lines) > sorted > (into (sorted-map-by (fn [key1 key2] > (compare [(get res key2) key2] > [(get res key1) key1]))) > res)] > (println "Statistic result : " res) > (println "Top-N User List : " sorted) > (find-write-recur lines sorted n) > ))) > > (defn parse-recur > "" > [lines] > (loop [ls lines > res {}] > (if ls > (recur (next ls) > (update-res res (first ls))) > res))) > > (defn update-res > "" > [res line] > (let [params (string/split line #",") > id (if (> (count params) 1) (params 0) "0")] > (if (res id) > (update-in res [id] inc) > (assoc res id 1)))) > > (defn find-write-recur > "Get each users' records and store into separate log file" > [lines sorted n] > (loop [x n > sd sorted > id (first (keys sd))] > (if (and (> x 0) sd) > (do (create-write-file id > (find-recur lines id)) > (recur (dec x) > (rest sd) > (nth (keys sd) 1)))))) > > (defn find-recur > "" > [lines id] > (loop [ls lines > res []] > (if ls > (recur (next ls) > (update-vec res id (first ls))) > res))) > > (defn update-vec > "" > [res id line] > (let [params (string/split line #",") > id_ (if (> (count params) 1) (params 0) "0")] > (if (= id id_ ) > (conj res line) > res))) > > (defn create-write-file > "Create a new file and write information into the file." > ([file info-lines] > (with-open [wr (io/writer (str MAIN-PATH file))] > (doseq [line info-lines] (.write wr (str line "\n"))) > )) > ([file info-lines append?] > (with-open [wr (io/writer (str MAIN-PATH file) :append append?)] > (doseq [line info-lines] (.write wr (str line "\n")))) > )) > ;====================================================== > > I tested this clj in REPL with command (parse-file "./DATA/log.log" 3), > and get the results: > > Records Size Time Result > 1,000 42KB <1s OK > 10,000 420KB <1s OK > 100,000 4.3MB 3s OK > 1,000,000 43MB 15s OK > 6,000,000 258MB >20M "OutOfMemoryError Java heap space > java.lang.String.substring (String.java:1913)" > > ====================================================== > Here is the question: > 1. how can i fix the error when i try to parse big log file , like > 200MB > 2. how can i optimize the function to run faster ? > 3. there are logs more than 1G size , how can the function deal with it. > > I am still new to Clojure, any suggestion or solution will be appreciate~ > Thanks > > BR > > ------------------------------------ > > 刘家齐 (Jacky Liu) > > > > 手机:15201091195 邮箱:liujiaq...@gmail.com > > Skype:jacky_liu_1987 QQ:406229156 > > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.