hi,Mark ,thanks for your suggestion.
I modified the main function  to :
;================================
(defn parse-file
  ""
  [file n]
  (with-open [rdr (io/reader file)]
    (println "001 begin with open " (type rdr))
    (let [;lines (line-seq rdr)
          *res (parse-recur (line-seq rdr))*;lines)
          sorted
          (into (sorted-map-by (fn [key1 key2]
                                 (compare [(get res key2) key2]
                                          [(get res key1) key1])))
                res)]
      (println "Statistic result : " res)
      (println "Sorted result : " sorted)
      ;(println "..." (type rdr))
      ;(find-write-recur lines sorted n)
      *(find-write-recur (line-seq rdr) sorted n)*
      )))
;================================
But it's wired , i got this error:

com.util=> (parse-file "./log600w.log" 3)

001 begin with open  java.io.BufferedReader


com.util=> *OutOfMemoryError GC overhead limit exceeded
 java.util.regex.Pattern.matcher (Pattern.java:1088)*





2013/11/20 Mark Engelberg <mark.engelb...@gmail.com>

> Looks like you're "holding on to the head" by giving a name (lines) to the
> result of line-seq.  Don't do that.  Try:
> (parse-recur (line-seq rdr))
>
>
> On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote:
>
>> Hi,all
>> I want to parse big log files using Clojure.
>> And the structure of each line record is
>> "UserID,Lantitude,Lontitude,Timestamp".
>> My implemented steps are:
>> ----> Read log file & Get top-n user list
>> ----> Find each top-n user's records and store in separate log file
>> (UserID.log) .
>>
>> The implement source code :
>> ;======================================================
>> (defn parse-file
>>   ""
>>   [file n]
>>   (with-open [rdr (io/reader file)]
>>     (println "001 begin with open ")
>>     (let [lines (line-seq rdr)
>>           res (parse-recur lines)
>>           sorted
>>           (into (sorted-map-by (fn [key1 key2]
>>                                  (compare [(get res key2) key2]
>>                                           [(get res key1) key1])))
>>                 res)]
>>       (println "Statistic result : " res)
>>       (println "Top-N User List : " sorted)
>>       (find-write-recur lines sorted n)
>>       )))
>>
>> (defn parse-recur
>>   ""
>>   [lines]
>>   (loop [ls  lines
>>          res {}]
>>     (if ls
>>       (recur (next ls)
>>                (update-res res (first ls)))
>>       res)))
>>
>> (defn update-res
>>   ""
>>   [res line]
>>   (let [params (string/split line #",")
>>         id     (if (> (count params) 1) (params 0) "0")]
>>     (if (res id)
>>       (update-in res [id] inc)
>>       (assoc res id 1))))
>>
>> (defn find-write-recur
>>   "Get each users' records and store into separate log file"
>>   [lines sorted n]
>>   (loop [x n
>>          sd sorted
>>          id (first (keys sd))]
>>     (if (and (> x 0) sd)
>>       (do (create-write-file id
>>                              (find-recur lines id))
>>           (recur (dec x)
>>                  (rest sd)
>>                  (nth (keys sd) 1))))))
>>
>> (defn find-recur
>>   ""
>>   [lines id]
>>   (loop [ls lines
>>            res []]
>>     (if ls
>>       (recur (next ls)
>>                (update-vec res id (first ls)))
>>       res)))
>>
>> (defn update-vec
>>   ""
>>   [res id line]
>>   (let [params (string/split line #",")
>>         id_        (if (> (count params) 1) (params 0) "0")]
>>         (if (= id id_ )
>>           (conj res line)
>>           res)))
>>
>> (defn create-write-file
>>   "Create a new file and write information into the file."
>>   ([file info-lines]
>>    (with-open [wr (io/writer (str MAIN-PATH file))]
>>      (doseq [line info-lines] (.write wr (str line "\n")))
>>      ))
>>   ([file info-lines append?]
>>    (with-open [wr (io/writer (str MAIN-PATH file) :append append?)]
>>      (doseq [line info-lines] (.write wr (str line "\n"))))
>>    ))
>> ;======================================================
>>
>> I tested this clj in REPL with command (parse-file "./DATA/log.log" 3),
>> and get the results:
>>
>> Records         Size          Time      Result
>> 1,000             42KB         <1s         OK
>> 10,000           420KB       <1s         OK
>> 100,000          4.3MB        3s          OK
>> 1,000,000       43MB         15s         OK
>> 6,000,000       258MB       >20M      "OutOfMemoryError Java heap space
>>  java.lang.String.substring (String.java:1913)"
>>
>> ======================================================
>> Here is the question:
>> 1. how can i fix the error when i try to parse big log file , like > 200MB
>> 2. how can i optimize the function to run faster ?
>> 3. there are logs more than 1G size , how can the function deal with it.
>>
>> I am still new to Clojure, any suggestion or solution will be appreciate~
>> Thanks
>>
>> BR
>>
>> ------------------------------------
>>
>> 刘家齐 (Jacky Liu)
>>
>>
>>
>> 手机:15201091195        邮箱:liujiaq...@gmail.com
>>
>> Skype:jacky_liu_1987   QQ:406229156
>>
>> --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to clojure+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 

------------------------------------

刘家齐 (Jacky Liu)



手机:15201091195        邮箱:liujiaq...@gmail.com

Skype:jacky_liu_1987   QQ:406229156

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to