I thought I'd follow up my own question with some programs that I
should have already known about for memory profiling, which were
already installed on my Mac as part of the standard Java installation
from Apple (who are just passing them on from Sun, I'm sure), but I
didn't know about them:

jconsole -- Good for seeing how fast a java process is allocating
memory, and garbage collecting.

jmap -- Good for a quick summary of the above, or with the "-histo"
option, a much more detailed list of what kinds of objects are taking
up the most memory.

I also learned that Clojure's cons and lazy cons structures take up 48
bytes per element (at least on a Mac with 'java -client' and java
1.6.0_<foo>), which gets significant when your program has sequences
of several millions of elements about.

I've updated my github repo with a pretty decent version of the
reverse-complement benchmark in Clojure.  It isn't as sequence-y as it
could be, but the more sequence-y version generates and collects
garbage so fast that it really slows things down significantly.  Same
lesson from other flavors of Lisp, I guess -- you can write the
straightforward easy-to-write-and-test-and-understand code that conses
a lot (i.e. allocates memory quickly that typically becomes garbage
quite soon), or you can write the more loopy code that doesn't, but
typically starts to merge many things that you'd otherwise prefer to
separate into different functions.  Just compare revcomp.clj-5.clj and
revcomp.clj-6.clj in my git repo for an example.

The nice thing is that when you don't need the "uglier" code, Clojure
and other Lisps usually let you write code much more concisely than
lower level languages.  Get it working first, then optimize it.  Since
I'm comparing run times of the Clojure programs versus those submitted
to the language shootout benchmark web site, some of which appear
quite contorted in order to gain performance, I wanted to do some
optimizations that you wouldn't necessarily want to do otherwise.

git://github.com/jafingerhut/clojure-benchmarks.git

You can see my latest run time results here.  I've got 4 benchmarks
written in Clojure so far, with my current versions being 6x, 8x, 12x,
and 15x more CPU time than the Java programs submitted to the language
shootout benchmark web site.

http://github.com/jafingerhut/clojure-benchmarks/blob/20d21bc169d52ca52d6a8281536838662c54e854/RESULTS

I could make some of these significantly closer in speed to the Java
versions, but I suspect that they will start looking more and more
like the Java versions if I do, except with Clojure syntax for Java
calls.  I'm happy to be proved wrong on that, if someone finds better
Clojure versions than I've got.

Thanks,
Andy


On Jul 30, 11:00 am, Andy Fingerhut <andy_finger...@alum.wustl.edu>
wrote:
> I'm gradually adding a few more Clojure benchmark programs to my
> repository here:
>
> git://github.com/jafingerhut/clojure-benchmarks.git
>
> The one I wrote for the "reverse-complement" benchmark is here:
>
> http://github.com/jafingerhut/clojure-benchmarks/blob/4ab4f41c6f96344...
>
> revcomp.clj-4.clj is the best I've got so far, but it runs out of
> memory on the full size benchmark.
>
> If you clone the repository, and successfully run the init.sh script
> to generate the big input and expected output files, the file rcomp/
> long-input.txt contains 3 DNA sequences in FASTA format. The first is
> 50,000,000 characters long, the second is 75,000,000 characters long,
> and the third is 125,000,000 characters long. Each needs to be
> reversed, have each character replaced with a different one, and
> printed out, so we need to store each of the strings one at a time,
> but it is acceptable to deallocate/garbage-collect the previous one
> when starting on the next. I think my code should be doing that, but I
> don't know how to verify that.
>
> I've read that a Java String takes 2 bytes per character, plus about
> 38 bytes of overhead per string. That is about 250 Mbytes for the
> longest one. I also read in a seq of lines, and these long strings are
> split into lines with 60 characters (plus a newline) each. Thus the
> string's data needs to be stored at least twice temporarily -- once
> for the many 60-character strings, plus the final long one.  Also, the
> Java StringBuilder that Clojure's (str ...) function uses probably
> needs to be copied and reallocated periodically as it outgrows its
> current allocation. So I could imagine needing about 3 * 250 Mbytes
> temporarily, but that doesn't explain why my 1536 Mbytes of JVM memory
> are being exhausted.
>
> It would be possible to improve things by not creating all of the
> separate strings, one for each line, and then concatenating them
> together. But first I'd like to explain why it is using so much,
> because I must be missing something.
>
> Thank,
> Andy

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to