Yeah, I see now that you're still holding on to the head because a name is
given to the line sequence in the functions that you call.
One option would be making parse-recur and related functions that take
lines as an input into a macro.

You could also try:

(defn parse-recur
  ""
  [ls res]
    (if ls
      (recur (next ls)
               (update-res res (first ls)))
      res))

and calling (parse-recur (line-seq rdr) {})

This way, the recur goes back to the main function entry point and ls is
overwritten, so nothing is holding on to the head.  Make similar changes to
the other functions.



On Tue, Nov 19, 2013 at 8:07 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote:

> sorry, i mean "weird"...
>
>
> 2013/11/20 Jiaqi Liu <liujiaq...@gmail.com>
>
>> hi,Mark ,thanks for your suggestion.
>> I modified the main function  to :
>> ;================================
>> (defn parse-file
>>   ""
>>   [file n]
>>   (with-open [rdr (io/reader file)]
>>     (println "001 begin with open " (type rdr))
>>     (let [;lines (line-seq rdr)
>>           *res (parse-recur (line-seq rdr))*;lines)
>>            sorted
>>           (into (sorted-map-by (fn [key1 key2]
>>                                  (compare [(get res key2) key2]
>>                                           [(get res key1) key1])))
>>                 res)]
>>       (println "Statistic result : " res)
>>       (println "Sorted result : " sorted)
>>       ;(println "..." (type rdr))
>>       ;(find-write-recur lines sorted n)
>>       *(find-write-recur (line-seq rdr) sorted n)*
>>       )))
>> ;================================
>> But it's wired , i got this error:
>>
>> com.util=> (parse-file "./log600w.log" 3)
>>
>> 001 begin with open  java.io.BufferedReader
>>
>>
>> com.util=> *OutOfMemoryError GC overhead limit exceeded
>>  java.util.regex.Pattern.matcher (Pattern.java:1088)*
>>
>>
>>
>>
>>
>> 2013/11/20 Mark Engelberg <mark.engelb...@gmail.com>
>>
>>> Looks like you're "holding on to the head" by giving a name (lines) to
>>> the result of line-seq.  Don't do that.  Try:
>>> (parse-recur (line-seq rdr))
>>>
>>>
>>> On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote:
>>>
>>>> Hi,all
>>>> I want to parse big log files using Clojure.
>>>> And the structure of each line record is
>>>> "UserID,Lantitude,Lontitude,Timestamp".
>>>> My implemented steps are:
>>>> ----> Read log file & Get top-n user list
>>>> ----> Find each top-n user's records and store in separate log file
>>>> (UserID.log) .
>>>>
>>>> The implement source code :
>>>> ;======================================================
>>>> (defn parse-file
>>>>   ""
>>>>   [file n]
>>>>   (with-open [rdr (io/reader file)]
>>>>     (println "001 begin with open ")
>>>>     (let [lines (line-seq rdr)
>>>>           res (parse-recur lines)
>>>>           sorted
>>>>           (into (sorted-map-by (fn [key1 key2]
>>>>                                  (compare [(get res key2) key2]
>>>>                                           [(get res key1) key1])))
>>>>                 res)]
>>>>       (println "Statistic result : " res)
>>>>       (println "Top-N User List : " sorted)
>>>>       (find-write-recur lines sorted n)
>>>>       )))
>>>>
>>>> (defn parse-recur
>>>>   ""
>>>>   [lines]
>>>>   (loop [ls  lines
>>>>          res {}]
>>>>     (if ls
>>>>       (recur (next ls)
>>>>                (update-res res (first ls)))
>>>>       res)))
>>>>
>>>> (defn update-res
>>>>   ""
>>>>   [res line]
>>>>   (let [params (string/split line #",")
>>>>         id     (if (> (count params) 1) (params 0) "0")]
>>>>     (if (res id)
>>>>       (update-in res [id] inc)
>>>>       (assoc res id 1))))
>>>>
>>>> (defn find-write-recur
>>>>   "Get each users' records and store into separate log file"
>>>>   [lines sorted n]
>>>>   (loop [x n
>>>>          sd sorted
>>>>          id (first (keys sd))]
>>>>     (if (and (> x 0) sd)
>>>>       (do (create-write-file id
>>>>                              (find-recur lines id))
>>>>           (recur (dec x)
>>>>                  (rest sd)
>>>>                  (nth (keys sd) 1))))))
>>>>
>>>> (defn find-recur
>>>>   ""
>>>>   [lines id]
>>>>   (loop [ls lines
>>>>            res []]
>>>>     (if ls
>>>>       (recur (next ls)
>>>>                (update-vec res id (first ls)))
>>>>       res)))
>>>>
>>>> (defn update-vec
>>>>   ""
>>>>   [res id line]
>>>>   (let [params (string/split line #",")
>>>>         id_        (if (> (count params) 1) (params 0) "0")]
>>>>         (if (= id id_ )
>>>>           (conj res line)
>>>>           res)))
>>>>
>>>> (defn create-write-file
>>>>   "Create a new file and write information into the file."
>>>>   ([file info-lines]
>>>>    (with-open [wr (io/writer (str MAIN-PATH file))]
>>>>      (doseq [line info-lines] (.write wr (str line "\n")))
>>>>      ))
>>>>   ([file info-lines append?]
>>>>    (with-open [wr (io/writer (str MAIN-PATH file) :append append?)]
>>>>      (doseq [line info-lines] (.write wr (str line "\n"))))
>>>>    ))
>>>> ;======================================================
>>>>
>>>> I tested this clj in REPL with command (parse-file "./DATA/log.log" 3),
>>>> and get the results:
>>>>
>>>> Records         Size          Time      Result
>>>> 1,000             42KB         <1s         OK
>>>> 10,000           420KB       <1s         OK
>>>> 100,000          4.3MB        3s          OK
>>>> 1,000,000       43MB         15s         OK
>>>> 6,000,000       258MB       >20M      "OutOfMemoryError Java heap space
>>>>  java.lang.String.substring (String.java:1913)"
>>>>
>>>> ======================================================
>>>> Here is the question:
>>>> 1. how can i fix the error when i try to parse big log file , like >
>>>> 200MB
>>>> 2. how can i optimize the function to run faster ?
>>>> 3. there are logs more than 1G size , how can the function deal with it.
>>>>
>>>> I am still new to Clojure, any suggestion or solution will be
>>>> appreciate~
>>>> Thanks
>>>>
>>>> BR
>>>>
>>>> ------------------------------------
>>>>
>>>> 刘家齐 (Jacky Liu)
>>>>
>>>>
>>>>
>>>> 手机:15201091195        邮箱:liujiaq...@gmail.com
>>>>
>>>> Skype:jacky_liu_1987   QQ:406229156
>>>>
>>>> --
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To post to this group, send email to clojure@googlegroups.com
>>>> Note that posts from new members are moderated - please be patient with
>>>> your first post.
>>>> To unsubscribe from this group, send email to
>>>> clojure+unsubscr...@googlegroups.com
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/clojure?hl=en
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to clojure+unsubscr...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clojure@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+unsubscr...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to clojure+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
>>
>> --
>>
>> ------------------------------------
>>
>> 刘家齐 (Jacky Liu)
>>
>>
>>
>> 手机:15201091195        邮箱:liujiaq...@gmail.com
>>
>> Skype:jacky_liu_1987   QQ:406229156
>>
>
>
>
> --
>
> ------------------------------------
>
> 刘家齐 (Jacky Liu)
>
>
>
> 手机:15201091195        邮箱:liujiaq...@gmail.com
>
> Skype:jacky_liu_1987   QQ:406229156
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to