Looks like you're "holding on to the head" by giving a name (lines) to the
result of line-seq.  Don't do that.  Try:
(parse-recur (line-seq rdr))


On Tue, Nov 19, 2013 at 7:27 PM, Jiaqi Liu <liujiaq...@gmail.com> wrote:

> Hi,all
> I want to parse big log files using Clojure.
> And the structure of each line record is
> "UserID,Lantitude,Lontitude,Timestamp".
> My implemented steps are:
> ----> Read log file & Get top-n user list
> ----> Find each top-n user's records and store in separate log file
> (UserID.log) .
>
> The implement source code :
> ;======================================================
> (defn parse-file
>   ""
>   [file n]
>   (with-open [rdr (io/reader file)]
>     (println "001 begin with open ")
>     (let [lines (line-seq rdr)
>           res (parse-recur lines)
>           sorted
>           (into (sorted-map-by (fn [key1 key2]
>                                  (compare [(get res key2) key2]
>                                           [(get res key1) key1])))
>                 res)]
>       (println "Statistic result : " res)
>       (println "Top-N User List : " sorted)
>       (find-write-recur lines sorted n)
>       )))
>
> (defn parse-recur
>   ""
>   [lines]
>   (loop [ls  lines
>          res {}]
>     (if ls
>       (recur (next ls)
>                (update-res res (first ls)))
>       res)))
>
> (defn update-res
>   ""
>   [res line]
>   (let [params (string/split line #",")
>         id     (if (> (count params) 1) (params 0) "0")]
>     (if (res id)
>       (update-in res [id] inc)
>       (assoc res id 1))))
>
> (defn find-write-recur
>   "Get each users' records and store into separate log file"
>   [lines sorted n]
>   (loop [x n
>          sd sorted
>          id (first (keys sd))]
>     (if (and (> x 0) sd)
>       (do (create-write-file id
>                              (find-recur lines id))
>           (recur (dec x)
>                  (rest sd)
>                  (nth (keys sd) 1))))))
>
> (defn find-recur
>   ""
>   [lines id]
>   (loop [ls lines
>            res []]
>     (if ls
>       (recur (next ls)
>                (update-vec res id (first ls)))
>       res)))
>
> (defn update-vec
>   ""
>   [res id line]
>   (let [params (string/split line #",")
>         id_        (if (> (count params) 1) (params 0) "0")]
>         (if (= id id_ )
>           (conj res line)
>           res)))
>
> (defn create-write-file
>   "Create a new file and write information into the file."
>   ([file info-lines]
>    (with-open [wr (io/writer (str MAIN-PATH file))]
>      (doseq [line info-lines] (.write wr (str line "\n")))
>      ))
>   ([file info-lines append?]
>    (with-open [wr (io/writer (str MAIN-PATH file) :append append?)]
>      (doseq [line info-lines] (.write wr (str line "\n"))))
>    ))
> ;======================================================
>
> I tested this clj in REPL with command (parse-file "./DATA/log.log" 3),
> and get the results:
>
> Records         Size          Time      Result
> 1,000             42KB         <1s         OK
> 10,000           420KB       <1s         OK
> 100,000          4.3MB        3s          OK
> 1,000,000       43MB         15s         OK
> 6,000,000       258MB       >20M      "OutOfMemoryError Java heap space
>  java.lang.String.substring (String.java:1913)"
>
> ======================================================
> Here is the question:
> 1. how can i fix the error when i try to parse big log file , like > 200MB
> 2. how can i optimize the function to run faster ?
> 3. there are logs more than 1G size , how can the function deal with it.
>
> I am still new to Clojure, any suggestion or solution will be appreciate~
> Thanks
>
> BR
>
> ------------------------------------
>
> 刘家齐 (Jacky Liu)
>
>
>
> 手机:15201091195        邮箱:liujiaq...@gmail.com
>
> Skype:jacky_liu_1987   QQ:406229156
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to