On Feb 3, 12:22 am, Andy Fingerhut <andy.finger...@gmail.com> wrote: > I've done a pass through most of the Clojure programs on the shootout > web site recently, making some of them faster, and choosing -Xmx > command line arguments when running them to keep the memory usage down > to a reasonable level -- not always the smallest heap size that works, > mind you -- just one that avoids exorbitantly large memory usage. > > http://shootout.alioth.debian.org > > The Clojure program for the "fasta" problem, with source code, AOT > compilation command, and execution command given on this web page: > > http://shootout.alioth.debian.org/u32/program.php?test=fasta&lang=clo... > > still takes about 6x to 8x more time than the best Java 6 -server > program here, depending upon which of the four machines it is run on: > > http://shootout.alioth.debian.org/u32/program.php?test=fasta&lang=jav... > > I'm sure the Clojure program can be made faster, e.g. by doing fewer > calls to write to the output file, with more bytes per call. Most of > the time seems to be file writing and generating random numbers in gen- > random!, at least on my systems where I've done testing and > profiling. I'm also seeing a fair amount of time spent in calls to > java.lang.Double.valueOf, according to the built-in profiler that > comes with the Hotspot JVM. > > Note: The web site is Clojure 1.2 only right now, so don't expect a > tweaked-out program using things that only work in Clojure 1.3 to work > there yet.
This program is considerably faster on my computer: (set! *warn-on-reflection* true) (def *width* 60) (def *lookup-size* 222000) (def *alu* (str "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG" "GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA" "CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT" "ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA" "GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG" "AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC" "AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA")) (def *codes* "acgtBDHKMNRSVWY") (def *iub* [0.27 0.12 0.12 0.27 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02]) (def *homosapiens* [0.3029549426680 0.1979883004921 0.1975473066391 0.3015094502008]) (defn find-index [f coll] (first (keep-indexed #(if (f %2) %1) coll))) (def random-seed (int-array [42])) (let [ IM (int 139968) IA (int 3877) IC (int 29573) scale (double (/ *lookup-size* IM)) ] (defn gen-random-fast [] (let [ new-seed (unchecked-remainder (unchecked-add (unchecked- multiply (aget (ints random-seed) 0) IA) IC) IM) ] (aset (ints random-seed) 0 new-seed) (int (* new-seed scale))))) ;; Takes a vector of probabilities. (defn make-cumulative [v] (vec (map #(reduce + (subvec v 0 %)) (range 1 (inc (count v)))))) ;; Takes a vector of cumulative probabilities. (defn make-lookup-table [v] (let [ lookup-scale (- *lookup-size* 0.0001) tmp (map (fn [n] (find-index #(<= (/ n lookup-scale) %) v)) (range *lookup-size*)) ] (int-array tmp))) (defn cycle-bytes [source source-size n ^java.io.BufferedOutputStream ostream] (let [ source-size (int source-size) width (int *width*) width+1 (int (inc width)) buffer-size (int (* width+1 4096)) buffer (byte-array buffer-size (byte 10)) next-i (fn[i] (unchecked-remainder (unchecked-add (int i) width) source- size)) next-j (fn[j] (let [j (+ j width+1)] (if (= j buffer-size) (do (.write ostream buffer) 0) j))) ] (loop [i (int 0) j (int 0) n (int n)] (System/arraycopy source i buffer j width) (if (> n width) (recur (int (next-i i)) (int (next-j j)) (- n width)) (do (aset buffer (+ j n) (byte 10)) (.write ostream buffer 0 (+ j n 1))))))) (defn fasta-repeat [n ^java.io.BufferedOutputStream ostream] (let [ source (.getBytes (str *alu* *alu*)) ] (cycle-bytes source (count *alu*) n ostream))) (defn fasta-random [probs n ^java.io.BufferedOutputStream ostream] (let [ codes (.getBytes (str *codes*)) lookup-table (ints (make-lookup-table (make-cumulative probs))) width (int *width*) buffer (byte-array 222000) seeds (int-array 222000) first-seed (aget (ints random-seed) 0) ] (loop [i (int 0)] (aset seeds i (aget (ints random-seed) 0)) (aset buffer i (aget codes (aget lookup-table (gen-random-fast)))) (if (= (aget (ints random-seed) 0) first-seed) (do (System/arraycopy buffer 0 buffer (inc i) *width*) (cycle-bytes buffer (inc i) n ostream) (aset (ints random-seed) 0 (aget seeds (mod n (inc i))))) (recur (unchecked-inc i)))))) (defn write-line [s ^java.io.BufferedOutputStream stream] (.write stream (.getBytes (str s "\n")))) (let [ ostream (java.io.BufferedOutputStream. System/out) arg (first *command-line-args*) n (read-string (or arg "25000000")) start-time (System/currentTimeMillis) ] (write-line ">ONE Homo sapiens alu" ostream) (fasta-repeat (* n 2) ostream) (write-line ">TWO IUB ambiguity codes" ostream) (fasta-random *iub* (* n 3) ostream) (write-line ">THREE Homo sapiens frequency" ostream) (fasta-random *homosapiens* (* n 5) ostream) (.flush ostream) (binding [*out* *err*] (prn n) (print (/ (- (System/currentTimeMillis) start-time) 1000.0)) (println " seconds")) ) -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en