I thought I'd follow up my own question with some programs that I should have already known about for memory profiling, which were already installed on my Mac as part of the standard Java installation from Apple (who are just passing them on from Sun, I'm sure), but I didn't know about them:
jconsole -- Good for seeing how fast a java process is allocating memory, and garbage collecting. jmap -- Good for a quick summary of the above, or with the "-histo" option, a much more detailed list of what kinds of objects are taking up the most memory. I also learned that Clojure's cons and lazy cons structures take up 48 bytes per element (at least on a Mac with 'java -client' and java 1.6.0_<foo>), which gets significant when your program has sequences of several millions of elements about. I've updated my github repo with a pretty decent version of the reverse-complement benchmark in Clojure. It isn't as sequence-y as it could be, but the more sequence-y version generates and collects garbage so fast that it really slows things down significantly. Same lesson from other flavors of Lisp, I guess -- you can write the straightforward easy-to-write-and-test-and-understand code that conses a lot (i.e. allocates memory quickly that typically becomes garbage quite soon), or you can write the more loopy code that doesn't, but typically starts to merge many things that you'd otherwise prefer to separate into different functions. Just compare revcomp.clj-5.clj and revcomp.clj-6.clj in my git repo for an example. The nice thing is that when you don't need the "uglier" code, Clojure and other Lisps usually let you write code much more concisely than lower level languages. Get it working first, then optimize it. Since I'm comparing run times of the Clojure programs versus those submitted to the language shootout benchmark web site, some of which appear quite contorted in order to gain performance, I wanted to do some optimizations that you wouldn't necessarily want to do otherwise. git://github.com/jafingerhut/clojure-benchmarks.git You can see my latest run time results here. I've got 4 benchmarks written in Clojure so far, with my current versions being 6x, 8x, 12x, and 15x more CPU time than the Java programs submitted to the language shootout benchmark web site. http://github.com/jafingerhut/clojure-benchmarks/blob/20d21bc169d52ca52d6a8281536838662c54e854/RESULTS I could make some of these significantly closer in speed to the Java versions, but I suspect that they will start looking more and more like the Java versions if I do, except with Clojure syntax for Java calls. I'm happy to be proved wrong on that, if someone finds better Clojure versions than I've got. Thanks, Andy On Jul 30, 11:00 am, Andy Fingerhut <andy_finger...@alum.wustl.edu> wrote: > I'm gradually adding a few more Clojure benchmark programs to my > repository here: > > git://github.com/jafingerhut/clojure-benchmarks.git > > The one I wrote for the "reverse-complement" benchmark is here: > > http://github.com/jafingerhut/clojure-benchmarks/blob/4ab4f41c6f96344... > > revcomp.clj-4.clj is the best I've got so far, but it runs out of > memory on the full size benchmark. > > If you clone the repository, and successfully run the init.sh script > to generate the big input and expected output files, the file rcomp/ > long-input.txt contains 3 DNA sequences in FASTA format. The first is > 50,000,000 characters long, the second is 75,000,000 characters long, > and the third is 125,000,000 characters long. Each needs to be > reversed, have each character replaced with a different one, and > printed out, so we need to store each of the strings one at a time, > but it is acceptable to deallocate/garbage-collect the previous one > when starting on the next. I think my code should be doing that, but I > don't know how to verify that. > > I've read that a Java String takes 2 bytes per character, plus about > 38 bytes of overhead per string. That is about 250 Mbytes for the > longest one. I also read in a seq of lines, and these long strings are > split into lines with 60 characters (plus a newline) each. Thus the > string's data needs to be stored at least twice temporarily -- once > for the many 60-character strings, plus the final long one. Also, the > Java StringBuilder that Clojure's (str ...) function uses probably > needs to be copied and reallocated periodically as it outgrows its > current allocation. So I could imagine needing about 3 * 250 Mbytes > temporarily, but that doesn't explain why my 1536 Mbytes of JVM memory > are being exhausted. > > It would be possible to improve things by not creating all of the > separate strings, one for each line, and then concatenating them > together. But first I'd like to explain why it is using so much, > because I must be missing something. > > Thank, > Andy --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---