I'm gradually adding a few more Clojure benchmark programs to my
repository here:

git://github.com/jafingerhut/clojure-benchmarks.git

The one I wrote for the "reverse-complement" benchmark is here:

http://github.com/jafingerhut/clojure-benchmarks/blob/4ab4f41c6f963445e1e0973a668ac64939be1c7f/rcomp/revcomp.clj-4.clj

revcomp.clj-4.clj is the best I've got so far, but it runs out of
memory on the full size benchmark.

If you clone the repository, and successfully run the init.sh script
to generate the big input and expected output files, the file rcomp/
long-input.txt contains 3 DNA sequences in FASTA format. The first is
50,000,000 characters long, the second is 75,000,000 characters long,
and the third is 125,000,000 characters long. Each needs to be
reversed, have each character replaced with a different one, and
printed out, so we need to store each of the strings one at a time,
but it is acceptable to deallocate/garbage-collect the previous one
when starting on the next. I think my code should be doing that, but I
don't know how to verify that.

I've read that a Java String takes 2 bytes per character, plus about
38 bytes of overhead per string. That is about 250 Mbytes for the
longest one. I also read in a seq of lines, and these long strings are
split into lines with 60 characters (plus a newline) each. Thus the
string's data needs to be stored at least twice temporarily -- once
for the many 60-character strings, plus the final long one.  Also, the
Java StringBuilder that Clojure's (str ...) function uses probably
needs to be copied and reallocated periodically as it outgrows its
current allocation. So I could imagine needing about 3 * 250 Mbytes
temporarily, but that doesn't explain why my 1536 Mbytes of JVM memory
are being exhausted.

It would be possible to improve things by not creating all of the
separate strings, one for each line, and then concatenating them
together. But first I'd like to explain why it is using so much,
because I must be missing something.

Thank,
Andy

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to