On Nov 10, 2012, at 7:29 AM, Robby Findler wrote: > If you're calling from Racket to TR then you have the contract > checking and probably the floats flowing thru there need boxing.
If I understand you correctly, the contract checking would just be "is this a float?" which I imagine wouldn't require additional memory. Certainly the return value would have to be boxed, and that would definitely be eight bytes, I'm guessing. > > Can you put the loop itself into TR? Sadly, no; the loops themselves are written by students. Or, more specifically, written by students as a "network" form that expands into a function that's called in a loop. I think that expanding into TR would be incredibly hard to get right. John > > Robby > > On Sat, Nov 10, 2012 at 9:22 AM, John Clements > <cleme...@brinckerhoff.org> wrote: >> I'm trying to implement some simple comb filters for a reverb, using racket >> and/or typed racket. I have six of these running in parallel; each one has >> a vector, and each time a sample arrives, each comb needs to perform two >> floating-point multiplies and two floating point adds, increment a counter >> with possible reset, and store/mutate two locations in memory to prepare for >> next time. >> >> The problem for code like this isn't runtime, directly; it's all the GC. >> Adding this filter to a simple playback was observed to generate an >> additional 1.6 GB of garbage for a 60-second session[*], which sounds like a >> lot until you divide by 60 seconds and the 44.1K sample rate, to get 606 >> bytes/sample frame. Regardless, you could definitely do it with zero garbage >> in C, so I set out to try to reduce this. >> >> I guessed that most of the garbage in this case was related to boxing of >> floats, so I decided to use TR to try to eliminate this. I hauled my code >> over to TR, and it worked completely without modification, which was a joy. >> Also, the optimization coach tells me that everything is green, and staying >> in the Float realm. Unfortunately, it didn't improve the memory use much, >> and after some experiments, it looks like it reduces the memory overhead per >> comb filter by roughly half, to 278 bytes/sample frame, *but* imposes its >> own fixed overhead of 240 bytes/sample frame, which pretty much negates the >> benefit of the reduction. >> >> So, my question is this: should making a call from racket to this TR code >> >> (: dummy2 (Float -> Float)) >> (define (dummy2 in) >> (* 0.1 in)) >> >> ... generate about 240 bytes in garbage? >> >> >> >> >> FWIW, here's what a comb filter function looks like: >> >> (: comb1 (Float -> Float)) >> (define (comb1 in) >> (define delayed1 (flvector-ref v1 c1)) >> (define midnode1 (fl+ delayed1 (fl* g11 m1))) >> (define out1 (fl+ (fl* g21 midnode1) in)) >> (flvector-set! v1 c1 out1) >> (define next-c1 (add1 c1)) >> (set! c1 (cond [(<= d1 next-c1) 0] >> [else next-c1])) >> (set! m1 midnode1) >> out1) >> >> I can't see anything in this that would cause allocation. >> >> >> Maybe the next step is to take a look at the compiled bytecode.... >> >> John >> >> >> >> >> >> [*] FWIW, I'm observing this by running at the command line with -W debug >> and then parsing the GC output that appears on the console. >> >> >> ____________________ >> Racket Users list: >> http://lists.racket-lang.org/users >>
smime.p7s
Description: S/MIME cryptographic signature
____________________ Racket Users list: http://lists.racket-lang.org/users