On Thu, Aug 16, 2012 at 11:47 PM, David Jacobs <da...@wit.io> wrote:
> I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file
> with Clojure.
>
> So far I've got:
>
> (defn multi-nth [values indices]
>   (map (partial nth values) indices))
>
> (defn read-lines [file indices]
>   (with-open [rdr (clojure.java.io/reader file)]
>     (let [lines (line-seq rdr)]
>       (multi-nth lines indices))))
>
> Now, (read-lines "my-file" [0]) works without a problem. However, passing in
> [0 1] gives me the following error: "java.lang.RuntimeException:
> java.io.IOException: Stream closed"
>
> It seems that the stream is being closed before I can read the second line
> from the file. Interestingly, if I manually pull out a line from the file
> with something like `(nth lines 200)`, the `multi-nth` call works for all
> values <= 200.
>
> Any idea what's going on?
>
> PS This question is on SO if someone wants points:
> http://stackoverflow.com/questions/11995807/lazily-extract-lines-from-large-file

The lazyness of map is biting you. The result of read-lines will not
have been fully realized before the file is closed.  Also, calling nth
repeatedly is not going to do wonders for efficiency. Try this on for
size:


(ns nthlines.core
  (:require [clojure.java.io :as io]))

(defn multi-nth [values indices]
  (let [matches-index? (set indices)]
    (keep-indexed #(when (matches-index? %1) %2) values)))

(defn read-lines [file indices]
  (with-open [r (io/reader file)]
    (doall (multi-nth (line-seq r) indices))))

(comment

  (def words "/Users/bsmith/w/nthlines/words.txt")
  (def nlines 84918960) ;; 856MB with one word per line

  (time (read-lines words [0 1 2 (- nlines 2) (- nlines 1)]))

  ;;=> "Elapsed time: 18778.904 msecs"
  ;;   ("A" "a" "aa" "Zyzomys" "Zyzzogeton")

)

// Ben

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to