Nice. I get about 1/3 of the run time on the two systems I've tested on, too, including one that isn't terribly far off in hardware from the 64-bit benchmark machine used by the shootout web site.

I'll contact you off list about how this gets submitted to the web site.

Thanks!
Andy

On Feb 6, 2011, at 5:01 PM, Bill James wrote:

On Feb 3, 12:22 am, Andy Fingerhut <andy.finger...@gmail.com> wrote:
I've done a pass through most of the Clojure programs on the shootout
web site recently, making some of them faster, and choosing -Xmx
command line arguments when running them to keep the memory usage down to a reasonable level -- not always the smallest heap size that works,
mind you -- just one that avoids exorbitantly large memory usage.

http://shootout.alioth.debian.org

The Clojure program for the "fasta" problem, with source code, AOT
compilation command, and execution command given on this web page:

http://shootout.alioth.debian.org/u32/program.php? test=fasta&lang=clo...

still takes about 6x to 8x more time than the best Java 6 -server
program here, depending upon which of the four machines it is run on:

http://shootout.alioth.debian.org/u32/program.php? test=fasta&lang=jav...

I'm sure the Clojure program can be made faster, e.g. by doing fewer
calls to write to the output file, with more bytes per call.  Most of
the time seems to be file writing and generating random numbers in gen-
random!, at least on my systems where I've done testing and
profiling.  I'm also seeing a fair amount of time spent in calls to
java.lang.Double.valueOf, according to the built-in profiler that
comes with the Hotspot JVM.

Note: The web site is Clojure 1.2 only right now, so don't expect a
tweaked-out program using things that only work in Clojure 1.3 to work
there yet.


This program is considerably faster on my computer:


(set! *warn-on-reflection* true)


(def *width* 60)
(def *lookup-size* 222000)


(def *alu* (str "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG"
               "GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA"
               "CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT"
               "ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA"
               "GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG"
               "AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC"
               "AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"))

(def *codes* "acgtBDHKMNRSVWY")

(def *iub* [0.27 0.12 0.12 0.27 0.02 0.02 0.02 0.02
           0.02 0.02 0.02 0.02 0.02 0.02 0.02])

(def *homosapiens* [0.3029549426680 0.1979883004921
                   0.1975473066391 0.3015094502008])



(defn find-index [f coll]
 (first (keep-indexed #(if (f %2) %1) coll)))




(def random-seed (int-array [42]))
(let [ IM (int 139968)
      IA (int 3877)
      IC (int 29573)
      scale (double (/ *lookup-size* IM))
    ]
 (defn gen-random-fast []
   (let [ new-seed (unchecked-remainder (unchecked-add (unchecked-
multiply
            (aget (ints random-seed) 0) IA) IC) IM) ]
     (aset (ints random-seed) 0 new-seed)
     (int (* new-seed scale)))))



;; Takes a vector of probabilities.
(defn make-cumulative [v]
 (vec (map #(reduce + (subvec v 0 %))  (range 1 (inc (count v))))))

;; Takes a vector of cumulative probabilities.
(defn make-lookup-table [v]
 (let [ lookup-scale (- *lookup-size* 0.0001)
        tmp (map
              (fn [n] (find-index #(<= (/ n lookup-scale) %) v))
              (range *lookup-size*)) ]
   (int-array tmp)))



(defn cycle-bytes [source source-size n  ^java.io.BufferedOutputStream
ostream]
 (let [ source-size (int source-size)
        width (int *width*)
        width+1 (int (inc width))
        buffer-size (int (* width+1 4096))
        buffer (byte-array buffer-size (byte 10))
        next-i (fn[i]
          (unchecked-remainder (unchecked-add (int i) width) source-
size))
        next-j (fn[j]
           (let [j (+ j width+1)]
             (if (= j buffer-size)
               (do  (.write ostream buffer)  0)
               j)))
      ]
   (loop [i (int 0)  j (int 0)  n (int n)]
     (System/arraycopy  source i buffer j width)
     (if (> n width)
       (recur (int (next-i i)) (int (next-j j)) (- n width))
       (do
         (aset buffer (+ j n) (byte 10))
         (.write ostream buffer 0 (+ j n 1)))))))



(defn fasta-repeat [n  ^java.io.BufferedOutputStream ostream]
 (let [ source (.getBytes (str *alu* *alu*)) ]
   (cycle-bytes source (count *alu*) n ostream)))



(defn fasta-random [probs n  ^java.io.BufferedOutputStream ostream]
 (let [ codes (.getBytes (str *codes*))
        lookup-table (ints (make-lookup-table (make-cumulative
probs)))
        width (int *width*)
        buffer (byte-array 222000)
        seeds  (int-array  222000)
        first-seed (aget (ints random-seed) 0)
      ]
   (loop [i (int 0)]
     (aset seeds i (aget (ints random-seed) 0))
     (aset buffer i
       (aget codes
         (aget lookup-table
           (gen-random-fast))))
     (if (= (aget (ints random-seed) 0) first-seed)
       (do
         (System/arraycopy  buffer 0  buffer (inc i)  *width*)
         (cycle-bytes buffer (inc i) n ostream)
         (aset (ints random-seed) 0 (aget seeds (mod n (inc i)))))
       (recur (unchecked-inc i))))))



(defn write-line [s ^java.io.BufferedOutputStream stream]
 (.write stream (.getBytes (str s "\n"))))



(let [ ostream (java.io.BufferedOutputStream. System/out)
      arg (first *command-line-args*)
      n (read-string (or arg "25000000"))

      start-time (System/currentTimeMillis)
    ]

 (write-line ">ONE Homo sapiens alu" ostream)
 (fasta-repeat (* n 2) ostream)

 (write-line ">TWO IUB ambiguity codes" ostream)
 (fasta-random *iub* (* n 3) ostream)

 (write-line ">THREE Homo sapiens frequency" ostream)
 (fasta-random *homosapiens* (* n 5) ostream)

 (.flush ostream)

 (binding [*out* *err*]
   (prn n)
   (print (/ (- (System/currentTimeMillis) start-time) 1000.0))
   (println " seconds"))

)

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to