Re: Concatenating many lists (stack overflow problem)

Jason Wolfe Mon, 02 Mar 2009 12:47:09 -0800


On Mar 2, 2009, at 2:52 AM, Zededarian wrote:


>
> Ah, I see.  Out of curiosity, is there some design reason why clojure
> doesn't reduce my original function to the apply version?  I mean,
> take the function:
> (loop [x '(1 2 3) fin '()]
>  (if x
>    (recur (cdr x) (concat fin (list x)))
>    fin))
> Will, at least if I'm interpreting your responses correctly,
> on its first time through have an x of '(1 2 3) and a fin of '()
> then it will have an x of '(2 3) and a fin of (concat '() '(1)) (I'm
> using concat to indicate a lazy sequence)
> then it will have an x of '(3) and a fin of (concat (concat '() '(1))
> '(3)).

lazy-seq is (more or less) just a macro that stuffs its body into a  
closure, and stuffs that into a LazySeq object.  The first time  
someone calls "seq" on the LazySeq (or asks for its "count", or  
"first", etc.), this closure will be executed to yield the body.

In your example, after two steps x will be '(3) and fin will be the  
result of executing this:

(let [fin (lazy-seq
           (let [s (seq '())]
             (if s
               (cons (first s) (concat (rest s) '(1)))
               y)))]
   (lazy-seq
     (let [s (seq fin)]
       (if s
         (cons (first s) (concat (rest s) '(2)))
          y))))

In particular, the body of neither lazy-seq call will have executed yet.

> Why doesn't clojure reduce this to (concat '() '(1) '(3))?  It seems
> like if the compiler kept track of whether the first argument to
> concat was already a lazy sequence, and kept an eye on where it ended,
> the stack overflow thing could be avoided.

Well, I can't speak for Rich, but I doubt this will happen.  To do  
this, the compiler must know something specific about "concat", namely  
that (concat (concat a b) c) is equivalent to (concat a b c).  This  
not readily apparent to me when just glancing at the source below, and  
I doubt it would be readily apparent to a program analyzer.

I guess this is one of the cases where you are just supposed to know  
and do the right thing (e.g., use a vector).  Even if your original  
solution didn't cause a stack overflow, it would still be O(n^2).

> Ooh, also, what's that source command you're using?  When I type
> (source concat) into my REPL it says it can't evaluate the symbol.

It's in clojure.contrib, in the repl-utils namespace (?)

http://code.google.com/p/clojure-contrib/


> Thanks for all the help; laziness is still decidedly non-intuitive to
> me.

You're welcome.  Also, maybe see the wikibook, in particular

http://en.wikibooks.org/wiki/Clojure_Programming/Concepts#Lazy_Evaluation_of_Sequences

-Jason


>
>
> On Mar 2, 12:07 am, Jason Wolfe <jawo...@berkeley.edu> wrote:
>> No, it will be O(n), where the "cool magic" is in the use of lazy
>> sequences.
>>
>> Here's a very simple way to write an O(n) lazy concat:
>>
>> user> (defn my-concat [& seqs]
>>         (when-first [s seqs]
>>           (if (seq s)
>>               (lazy-seq (cons (first s) (apply my-concat (rest s)  
>> (rest
>> seqs))))
>>               (recur (rest seqs)))))
>>
>> user> (my-concat '(1 2 3) [4 5] '(6 7))
>> (1 2 3 4 5 6 7)
>>
>> This is not proper lazy programming style, but it should give you the
>> basic idea.
>>
>> The actual definition of concat is significantly more complicated:
>>
>> user> (source concat)
>> (defn concat
>>    "Returns a lazy seq representing the concatenation of the elements
>> in the supplied colls."
>>    ([] (lazy-seq nil))
>>    ([x] (lazy-seq x))
>>    ([x y]
>>       (lazy-seq
>>        (let [s (seq x)]
>>          (if s
>>            (cons (first s) (concat (rest s) y))
>>            y))))
>>    ([x y & zs]
>>       (let [cat (fn cat [xys zs]
>>                   (lazy-seq
>>                    (let [xys (seq xys)]
>>                      (if xys
>>                        (cons (first xys) (cat (rest xys) zs))
>>                        (when zs
>>                          (cat (first zs) (next zs)))))))]
>>             (cat (concat x y) zs))))
>>
>> -Jason
>>
>> On Mar 1, 2009, at 9:47 PM, Zededarian wrote:
>>
>>
>>
>>> Thanks!
>>
>>> One follow up question, though: will the first solution take O(n^2)
>>> time, or is clojure clever enough to keep track of where the end  
>>> of a
>>> sequence is on its own (or, alternately, to start by concatenating  
>>> the
>>> last two elements and working backward)?  Or does it do some other
>>> cool magic to make this quick?
>>
>>> On Mar 1, 9:07 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
>>>> Hi,
>>
>>>> The problem is that you end up with the front of your worklist  
>>>> being
>>>> wrapped in <n-lines> nested calls to lazy-seq, one per each call to
>>>> concat.  Then, realizing the first element causes a stack overflow.
>>>> This wouldn't happen if you reversed the arguments to concat, but
>>>> then
>>>> you wouldn't get what you want.  Some possibilities (untested):
>>
>>>> ; probably the shortest, most idiomatic solution?
>>
>>>> (defn getstrlist [file]
>>>>   (apply concat
>>>>     (take-while identity
>>>>       (repeatedly #(getline file)))))
>>
>>>> ;or, using vectors, and written more like your solution:
>>
>>>> (defn getstrlist [file]
>>>>   (loop [worklst []]
>>>>    (if-let [x (getline file)]
>>>>       (recur (into worklst x))
>>>>      worklst)))
>>
>>>> Cheers,
>>>> Jason
>>
>>>> On Mar 1, 4:53 pm, Zededarian <zededar...@gmail.com> wrote:
>>
>>>>> My ultimate goal is to get a list of all the words in a file in
>>>>> order.  I have a function that will get a list of all the words on
>>>>> one
>>>>> line of this file in order, and I want to efficiently concatenate
>>>>> these lists to end up with one large list.  I'm working with about
>>>>> 12000 lines right now, but in a perfect world I'd like to scale  
>>>>> much
>>>>> higher.
>>
>>>>> I started off just using concat.  Basically my code looks like:
>>>>> (defn getstrlist
>>>>>   ([file]
>>>>>      (loop [x (getline file) worklst '()]
>>>>>        (if x
>>>>>          (recur (getline file) (concat worklst (splitstr x)))
>>>>>          worklst))))
>>
>>>>> Then in the REPL, if I type (def a (getstrlist FILE)), it works
>>>>> fine.
>>>>> But if I try to output a or take (first a), I get a stack overflow
>>>>> error.  I don't know why this is.  I remember hearing somewhere  
>>>>> that
>>>>> Clojure had lazy sequences, so my best guess is that it isn't
>>>>> actually
>>>>> concatenating anything, but is storing pointers to the start of  
>>>>> all
>>>>> the lists on a stack that overflows when I try to evaluate one of
>>>>> these pointers.
>>
>>>>> I know in Scheme I would write this using metalists that keep
>>>>> track of
>>>>> the location of their last element:
>>>>> (define (concat metalst1 metalst2)
>>>>>   (set-cdr! (car metalst1) (cdr metalst2))
>>>>>   (set-car! metalst1 (car metalst2))
>>>>>   metalst1)
>>
>>>>> (define (make-metalst lst)
>>>>>   (cons (get-end lst) lst))
>>
>>>>> (define (get-end lst)
>>>>>   (if (null? lst)
>>>>>       '()
>>>>>       (if (null? (cdr lst))
>>>>>           lst
>>>>>           (get-end (cdr lst)))))
>>
>>>>> Which would be fairly efficient.  But since Clojure doesn't seem  
>>>>> to
>>>>> have real lists, I'm not quite sure how I could port this over.  I
>>>>> suppose I could implement my own cons pairs like an idiot and do  
>>>>> all
>>>>> of this manually, but I figure I'm just not understanding the
>>>>> "Clojure
>>>>> solution" to this problem.
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Concatenating many lists (stack overflow problem)

Reply via email to