You can get a lazy sequence of all the lines in all the files by something like:
(for [file out-files line (with-open [r (io/reader file)] (line-seq r))] line) If "StatusJSONImpl" is on a separate line, you can throw in a :when clause to filter them out: (for [file out-files line (with-open [r (io/reader file)] (line-seq r)) :when (not= line "StatusJSONImpl")] line) If it's a line prefix, you can remove it in the body: (for [file out-files line (with-open [r (io/reader file)] (line-seq r))] (string/replace line "StatusJSONImpl" "")) This is all assuming io is an alias for clojure.java.io, string for clojure.string, and that getting your files line by line is useful. Re OutOfMemoryException: if all the allocated heap memory is really not freeable, then there's nothing the JVM can do -- it's being asked to allocate memory for a new object, and there's none available. On Jul 26, 9:53 am, atucker <agjf.tuc...@googlemail.com> wrote: > Hi all! I have been trying to use Clojure on a student project, but > it's becoming a bit of a nightmare. I wonder whether anyone can > help? I'm not studying computer science, and I really need to be > getting on with the work I'm actually supposed to be doing :) > > I am trying to work from a lot of Twitter statuses that I saved to > text file. (Unfortunately I failed to escape quotes and such, so the > JSON is not valid. Anyone know a good way of coping with that?) > > Here is my function: > > (defn json-seq [] > (apply concat > (map #(do (print "f") (str/split (slurp %) #"\nStatusJSONImpl")) > out-files))) > > Now there are forty files and five thousand statuses per file, which > sounds like a lot, and I don't suppose I can hope to hold them all in > memory at the same time. But I had thought that my function might > produce a lazy sequence that would be more manageable. However I > typically get: > > twitter.core> (nth (json-seq dir-name) 5) > ffff"{createdAt=Fri .... etc. GOOD > > twitter.core> (nth (json-seq dir-name) 5000) > ffff > Java heap space > [Thrown class java.lang.OutOfMemoryError] BAD > > And at this point my REPL is done for. Any further instruction will > result in another OutOfMemoryError. (Surely that has to be a bug just > there? Has the garbage collector just given up?) > > Anyway I am thinking that the sequence is not behaving as lazily as I > need it to. It's not reading one file at a time, and it's not reading > thirty-two as I might expect from "chunks", but something in the > middle. I did try the "dechunkifying" code from page 339 of "Joy of > Clojure", but that doesn't compile at all :( > > I do seem to keep running into memory problems with Clojure. I have > 2GB RAM and am using Snow Leopard, Aquamacs 2.0, Clojure 1.2.0 beta1 > and Leiningen 1.2.0. > > Cheers > Alistair -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en