sorry, i mean "weird"...

2013/11/20 Jiaqi Liu <liujiaq...@gmail.com>

> hi,Mark ,thanks for your suggestion.
> I modified the main function  to :
> ;================================
> (defn parse-file
>   ""
>   [file n]
>   (with-open [rdr (io/reader file)]
>     (println "001 begin with open " (type rdr))
>     (let [;lines (line-seq rdr)
>           *res (parse-recur (line-seq rdr))*;lines)
>           sorted
>           (into (sorted-map-by (fn [key1 key2]
>                                  (compare [(get res key2) key2]
>                                           [(get res key1) key1])))
>                 res)]
>       (println "Statistic result : " res)
>       (println "Sorted result : " sorted)
>       ;(println "..." (type rdr))
>       ;(find-write-recur lines sorted n)
>       *(find-write-recur (line-seq rdr) sorted n)*
>       )))
> ;================================
> But it's wired , i got this error:
>
> com.util=> (parse-file "./log600w.log" 3)
>
> 001 begin with open  java.io.BufferedReader
>
>
> com.util=> *OutOfMemoryError GC overhead limit exceeded
>  java.util.regex.Pattern.matcher (Pattern.java:1088)*
>
>
>
>
>
> 2013/11/20 Mark Engelberg <mark.engelb...@gmail.com>
>
>> Looks like you're "holding on to the head" by giving a name (lines) to
>> the result of line-seq.  Don't do that.  Try:
>> (parse-recur (line-seq rdr))
>>
>>
>> On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote:
>>
>>> Hi,all
>>> I want to parse big log files using Clojure.
>>> And the structure of each line record is
>>> "UserID,Lantitude,Lontitude,Timestamp".
>>> My implemented steps are:
>>> ----> Read log file & Get top-n user list
>>> ----> Find each top-n user's records and store in separate log file
>>> (UserID.log) .
>>>
>>> The implement source code :
>>> ;======================================================
>>> (defn parse-file
>>>   ""
>>>   [file n]
>>>   (with-open [rdr (io/reader file)]
>>>     (println "001 begin with open ")
>>>     (let [lines (line-seq rdr)
>>>           res (parse-recur lines)
>>>           sorted
>>>           (into (sorted-map-by (fn [key1 key2]
>>>                                  (compare [(get res key2) key2]
>>>                                           [(get res key1) key1])))
>>>                 res)]
>>>       (println "Statistic result : " res)
>>>       (println "Top-N User List : " sorted)
>>>       (find-write-recur lines sorted n)
>>>       )))
>>>
>>> (defn parse-recur
>>>   ""
>>>   [lines]
>>>   (loop [ls  lines
>>>          res {}]
>>>     (if ls
>>>       (recur (next ls)
>>>                (update-res res (first ls)))
>>>       res)))
>>>
>>> (defn update-res
>>>   ""
>>>   [res line]
>>>   (let [params (string/split line #",")
>>>         id     (if (> (count params) 1) (params 0) "0")]
>>>     (if (res id)
>>>       (update-in res [id] inc)
>>>       (assoc res id 1))))
>>>
>>> (defn find-write-recur
>>>   "Get each users' records and store into separate log file"
>>>   [lines sorted n]
>>>   (loop [x n
>>>          sd sorted
>>>          id (first (keys sd))]
>>>     (if (and (> x 0) sd)
>>>       (do (create-write-file id
>>>                              (find-recur lines id))
>>>           (recur (dec x)
>>>                  (rest sd)
>>>                  (nth (keys sd) 1))))))
>>>
>>> (defn find-recur
>>>   ""
>>>   [lines id]
>>>   (loop [ls lines
>>>            res []]
>>>     (if ls
>>>       (recur (next ls)
>>>                (update-vec res id (first ls)))
>>>       res)))
>>>
>>> (defn update-vec
>>>   ""
>>>   [res id line]
>>>   (let [params (string/split line #",")
>>>         id_        (if (> (count params) 1) (params 0) "0")]
>>>         (if (= id id_ )
>>>           (conj res line)
>>>           res)))
>>>
>>> (defn create-write-file
>>>   "Create a new file and write information into the file."
>>>   ([file info-lines]
>>>    (with-open [wr (io/writer (str MAIN-PATH file))]
>>>      (doseq [line info-lines] (.write wr (str line "\n")))
>>>      ))
>>>   ([file info-lines append?]
>>>    (with-open [wr (io/writer (str MAIN-PATH file) :append append?)]
>>>      (doseq [line info-lines] (.write wr (str line "\n"))))
>>>    ))
>>> ;======================================================
>>>
>>> I tested this clj in REPL with command (parse-file "./DATA/log.log" 3),
>>> and get the results:
>>>
>>> Records         Size          Time      Result
>>> 1,000             42KB         <1s         OK
>>> 10,000           420KB       <1s         OK
>>> 100,000          4.3MB        3s          OK
>>> 1,000,000       43MB         15s         OK
>>> 6,000,000       258MB       >20M      "OutOfMemoryError Java heap space
>>>  java.lang.String.substring (String.java:1913)"
>>>
>>> ======================================================
>>> Here is the question:
>>> 1. how can i fix the error when i try to parse big log file , like >
>>> 200MB
>>> 2. how can i optimize the function to run faster ?
>>> 3. there are logs more than 1G size , how can the function deal with it.
>>>
>>> I am still new to Clojure, any suggestion or solution will be appreciate~
>>> Thanks
>>>
>>> BR
>>>
>>> ------------------------------------
>>>
>>> 刘家齐 (Jacky Liu)
>>>
>>>
>>>
>>> 手机:15201091195        邮箱:liujiaq...@gmail.com
>>>
>>> Skype:jacky_liu_1987   QQ:406229156
>>>
>>> --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clojure@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+unsubscr...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to clojure+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to clojure+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> --
>
> ------------------------------------
>
> 刘家齐 (Jacky Liu)
>
>
>
> 手机:15201091195        邮箱:liujiaq...@gmail.com
>
> Skype:jacky_liu_1987   QQ:406229156
>



-- 

------------------------------------

刘家齐 (Jacky Liu)



手机:15201091195        邮箱:liujiaq...@gmail.com

Skype:jacky_liu_1987   QQ:406229156

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to