I mostly revert to good ole loop/recur for these large file processing exercises. Here's a template you could use (includes a try/catch so you can see errors as you go);
(import '(java.io BufferedReader FileReader PrintWriter File)) (defn process-log-file "Read a log file tracting lines matching regx." [in-fp out-fp regx] (with-open [rdr (BufferedReader. (FileReader. (File. in-fp))) wtr (PrintWriter. (File. out-fp))] (loop [line (.readLine rdr) i 0] (if line (try (let [fnd (re-matches regx line)] (when-not (nil? fnd) (.println wtr line))) ; or whatever (recur (.readLine rdr) (inc i)) (catch Exception e (prn line e))) )))) Regards, Adrian. On Mon, Aug 31, 2009 at 4:44 PM, wangzx<wangzaixi...@gmail.com> wrote: > > I just want to learn clojure by using it to parse log file and > generate reports. and one question is: for a large text file, can we > use it as a sequence effectively? for example, for a 100M log file, we > need to check each line for some pattern match. > > I just using the (line-seq rdr) but it will cause > OutOfMemoryException. > > demo code > > (defn buffered-reader [file] > (new java.io.BufferedReader > (new java.io.InputStreamReader > (new java.io.FileInputStream file)))) > > (def -reader (buffered-reader "test.txt")) > (filter #(= "some" %) -reader) > > even there is no lines match "some", the filter operation will cause > OutOfMemoryException. > > Is there other APIs like the Sequence but provide stream-like API? > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---