On Wed, May 31, 2017 at 5:54 PM, Steve Byan's Lists
<steve-l...@byan-roper.org> wrote:
> Hi Mathias,
>
> Thanks for taking a look.
>
>> On May 31, 2017, at 4:13 PM, Matthias Felleisen <matth...@ccs.neu.edu> wrote:
>>
>>
>> Can you explain why you create a lazy stream instead of a plain list?
>
> The current size of a short binary trace file is about 10 GB, and I want to 
> scale to traces many hundreds of megabytes in size. The expanded s-expression 
> form is about 10 times larger, so keeping the whole list in memory could 
> require up to many terabytes of memory.
>
> Aside from just handling large traces, I also parallelize the problem by 
> running analysis processes on different trace files concurrently. So the 
> amount of memory required for the parallel computation would be about 32 
> times the memory needed for a single trace analysis process.
>
> So, I don't want to try to fit all the records in memory at once. I thought 
> that the lazy stream would accomplish this --- am I wrong?

You're right, but using a lazy stream will still consume more than
just using `read` within the loop that actually processes the data.
So, for example:

(define (map-trace stat%-set in-port)
  (for/fold ([sexp-count 0])
            ([trace-record (in-port read in)])
    (+ sexp-count 1)))

(I didn't try this, but I think it's right.)

This way, you don't build up a list or a lazy stream; you just process
each datum as it's read.

-Jon

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to