Er.. 

This version is better. Uses hasNext instead of catching the exception:

(defn lazy-read-records [file regex]
  (let [scanner (java.util.Scanner. file)
        get-next (fn get-next []
                   (if (not (.hasNext scanner))
                     ()
                     (cons (.next scanner)
                           (lazy-seq (get-next)))))]
    (.useDelimiter scanner regex)
    (get-next)))

On Aug 17, 2010, at 2:29 PM, Jeff Palmucci wrote:

> I'm assuming your problem is with memory, and not multithreaded reading. 
> Given that:
> 
> I also work with files much too big to fit into memory.
> 
> You could just use java.util.Scanner. That has a useDelimiter method, so you 
> can set the pattern to break on:
> 
> (defn lazy-read-records [file regex]
>  (let [scanner (java.util.Scanner. file)
>        get-next (fn get-next []
>                   (try
>                     (cons (.next scanner)
>                           (lazy-seq (get-next)))
>                     (catch java.util.NoSuchElementException e ())))]
>    (.useDelimiter scanner regex)
>    (lazy-seq (get-next))))
> 
> The trick here is that the sequence is lazy. It won't read the file until it 
> needs to in order to return the next element.
> 
> If you don't hold onto the head of the sequence, the front part can be 
> garbage collected while you are working further down.
> 
> PS If, for some reason, you want the character indices rather than the actual 
> records, replace (.next scanner) with:
> 
> (do (.next scanner)
>        (.start (.match scanner)))
> 
> On Aug 16, 2010, at 5:22 PM, cej38 wrote:
> 
>> Hello,
>> I work with text files that are, at times, too large to read in all
>> at one time.  In searching for a way to read in only part of the file
>> I came across
>> http://meshy.org/2009/12/13/widefinder-2-with-clojure.html
>> 
>> I am only interested in the chunk-file and read-lines-range functions.
>> 
>> My problem is that I would like to change chunk-file, so that instead
>> of looking for the next line break, it would look for some regular
>> expression (to be given as part of the function call), and would then
>> report the position of the first character of every instance of that
>> regular expression.
>> 
>> After working on this for a couple of days I am raising the white
>> flag.  Is there someone that can help me with this?
>> 
>> Thanks.
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with your 
>> first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
> 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to