Yeah, I see now that you're still holding on to the head because a name is given to the line sequence in the functions that you call. One option would be making parse-recur and related functions that take lines as an input into a macro.
You could also try: (defn parse-recur "" [ls res] (if ls (recur (next ls) (update-res res (first ls))) res)) and calling (parse-recur (line-seq rdr) {}) This way, the recur goes back to the main function entry point and ls is overwritten, so nothing is holding on to the head. Make similar changes to the other functions. On Tue, Nov 19, 2013 at 8:07 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote: > sorry, i mean "weird"... > > > 2013/11/20 Jiaqi Liu <liujiaq...@gmail.com> > >> hi,Mark ,thanks for your suggestion. >> I modified the main function to : >> ;================================ >> (defn parse-file >> "" >> [file n] >> (with-open [rdr (io/reader file)] >> (println "001 begin with open " (type rdr)) >> (let [;lines (line-seq rdr) >> *res (parse-recur (line-seq rdr))*;lines) >> sorted >> (into (sorted-map-by (fn [key1 key2] >> (compare [(get res key2) key2] >> [(get res key1) key1]))) >> res)] >> (println "Statistic result : " res) >> (println "Sorted result : " sorted) >> ;(println "..." (type rdr)) >> ;(find-write-recur lines sorted n) >> *(find-write-recur (line-seq rdr) sorted n)* >> ))) >> ;================================ >> But it's wired , i got this error: >> >> com.util=> (parse-file "./log600w.log" 3) >> >> 001 begin with open java.io.BufferedReader >> >> >> com.util=> *OutOfMemoryError GC overhead limit exceeded >> java.util.regex.Pattern.matcher (Pattern.java:1088)* >> >> >> >> >> >> 2013/11/20 Mark Engelberg <mark.engelb...@gmail.com> >> >>> Looks like you're "holding on to the head" by giving a name (lines) to >>> the result of line-seq. Don't do that. Try: >>> (parse-recur (line-seq rdr)) >>> >>> >>> On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote: >>> >>>> Hi,all >>>> I want to parse big log files using Clojure. >>>> And the structure of each line record is >>>> "UserID,Lantitude,Lontitude,Timestamp". >>>> My implemented steps are: >>>> ----> Read log file & Get top-n user list >>>> ----> Find each top-n user's records and store in separate log file >>>> (UserID.log) . >>>> >>>> The implement source code : >>>> ;====================================================== >>>> (defn parse-file >>>> "" >>>> [file n] >>>> (with-open [rdr (io/reader file)] >>>> (println "001 begin with open ") >>>> (let [lines (line-seq rdr) >>>> res (parse-recur lines) >>>> sorted >>>> (into (sorted-map-by (fn [key1 key2] >>>> (compare [(get res key2) key2] >>>> [(get res key1) key1]))) >>>> res)] >>>> (println "Statistic result : " res) >>>> (println "Top-N User List : " sorted) >>>> (find-write-recur lines sorted n) >>>> ))) >>>> >>>> (defn parse-recur >>>> "" >>>> [lines] >>>> (loop [ls lines >>>> res {}] >>>> (if ls >>>> (recur (next ls) >>>> (update-res res (first ls))) >>>> res))) >>>> >>>> (defn update-res >>>> "" >>>> [res line] >>>> (let [params (string/split line #",") >>>> id (if (> (count params) 1) (params 0) "0")] >>>> (if (res id) >>>> (update-in res [id] inc) >>>> (assoc res id 1)))) >>>> >>>> (defn find-write-recur >>>> "Get each users' records and store into separate log file" >>>> [lines sorted n] >>>> (loop [x n >>>> sd sorted >>>> id (first (keys sd))] >>>> (if (and (> x 0) sd) >>>> (do (create-write-file id >>>> (find-recur lines id)) >>>> (recur (dec x) >>>> (rest sd) >>>> (nth (keys sd) 1)))))) >>>> >>>> (defn find-recur >>>> "" >>>> [lines id] >>>> (loop [ls lines >>>> res []] >>>> (if ls >>>> (recur (next ls) >>>> (update-vec res id (first ls))) >>>> res))) >>>> >>>> (defn update-vec >>>> "" >>>> [res id line] >>>> (let [params (string/split line #",") >>>> id_ (if (> (count params) 1) (params 0) "0")] >>>> (if (= id id_ ) >>>> (conj res line) >>>> res))) >>>> >>>> (defn create-write-file >>>> "Create a new file and write information into the file." >>>> ([file info-lines] >>>> (with-open [wr (io/writer (str MAIN-PATH file))] >>>> (doseq [line info-lines] (.write wr (str line "\n"))) >>>> )) >>>> ([file info-lines append?] >>>> (with-open [wr (io/writer (str MAIN-PATH file) :append append?)] >>>> (doseq [line info-lines] (.write wr (str line "\n")))) >>>> )) >>>> ;====================================================== >>>> >>>> I tested this clj in REPL with command (parse-file "./DATA/log.log" 3), >>>> and get the results: >>>> >>>> Records Size Time Result >>>> 1,000 42KB <1s OK >>>> 10,000 420KB <1s OK >>>> 100,000 4.3MB 3s OK >>>> 1,000,000 43MB 15s OK >>>> 6,000,000 258MB >20M "OutOfMemoryError Java heap space >>>> java.lang.String.substring (String.java:1913)" >>>> >>>> ====================================================== >>>> Here is the question: >>>> 1. how can i fix the error when i try to parse big log file , like > >>>> 200MB >>>> 2. how can i optimize the function to run faster ? >>>> 3. there are logs more than 1G size , how can the function deal with it. >>>> >>>> I am still new to Clojure, any suggestion or solution will be >>>> appreciate~ >>>> Thanks >>>> >>>> BR >>>> >>>> ------------------------------------ >>>> >>>> 刘家齐 (Jacky Liu) >>>> >>>> >>>> >>>> 手机:15201091195 邮箱:liujiaq...@gmail.com >>>> >>>> Skype:jacky_liu_1987 QQ:406229156 >>>> >>>> -- >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To post to this group, send email to clojure@googlegroups.com >>>> Note that posts from new members are moderated - please be patient with >>>> your first post. >>>> To unsubscribe from this group, send email to >>>> clojure+unsubscr...@googlegroups.com >>>> For more options, visit this group at >>>> http://groups.google.com/group/clojure?hl=en >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to clojure+unsubscr...@googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clojure@googlegroups.com >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+unsubscr...@googlegroups.com >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to clojure+unsubscr...@googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> >> >> -- >> >> ------------------------------------ >> >> 刘家齐 (Jacky Liu) >> >> >> >> 手机:15201091195 邮箱:liujiaq...@gmail.com >> >> Skype:jacky_liu_1987 QQ:406229156 >> > > > > -- > > ------------------------------------ > > 刘家齐 (Jacky Liu) > > > > 手机:15201091195 邮箱:liujiaq...@gmail.com > > Skype:jacky_liu_1987 QQ:406229156 > > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.