On Feb 3, 12:22 am, Andy Fingerhut <andy.finger...@gmail.com> wrote:
> I've done a pass through most of the Clojure programs on the shootout  
> web site recently, making some of them faster, and choosing -Xmx  
> command line arguments when running them to keep the memory usage down  
> to a reasonable level -- not always the smallest heap size that works,  
> mind you -- just one that avoids exorbitantly large memory usage.
>
> http://shootout.alioth.debian.org
>
> The Clojure program for the "fasta" problem, with source code, AOT  
> compilation command, and execution command given on this web page:
>
> http://shootout.alioth.debian.org/u32/program.php?test=fasta&lang=clo...
>
> still takes about 6x to 8x more time than the best Java 6 -server  
> program here, depending upon which of the four machines it is run on:
>
> http://shootout.alioth.debian.org/u32/program.php?test=fasta&lang=jav...
>
> I'm sure the Clojure program can be made faster, e.g. by doing fewer  
> calls to write to the output file, with more bytes per call.  Most of  
> the time seems to be file writing and generating random numbers in gen-
> random!, at least on my systems where I've done testing and  
> profiling.  I'm also seeing a fair amount of time spent in calls to  
> java.lang.Double.valueOf, according to the built-in profiler that  
> comes with the Hotspot JVM.
>
> Note: The web site is Clojure 1.2 only right now, so don't expect a  
> tweaked-out program using things that only work in Clojure 1.3 to work  
> there yet.


This program is considerably faster on my computer:


(set! *warn-on-reflection* true)


(def *width* 60)
(def *lookup-size* 222000)


(def *alu* (str "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG"
                "GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA"
                "CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT"
                "ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA"
                "GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG"
                "AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC"
                "AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"))

(def *codes* "acgtBDHKMNRSVWY")

(def *iub* [0.27 0.12 0.12 0.27 0.02 0.02 0.02 0.02
            0.02 0.02 0.02 0.02 0.02 0.02 0.02])

(def *homosapiens* [0.3029549426680 0.1979883004921
                    0.1975473066391 0.3015094502008])



(defn find-index [f coll]
  (first (keep-indexed #(if (f %2) %1) coll)))




(def random-seed (int-array [42]))
(let [ IM (int 139968)
       IA (int 3877)
       IC (int 29573)
       scale (double (/ *lookup-size* IM))
     ]
  (defn gen-random-fast []
    (let [ new-seed (unchecked-remainder (unchecked-add (unchecked-
multiply
             (aget (ints random-seed) 0) IA) IC) IM) ]
      (aset (ints random-seed) 0 new-seed)
      (int (* new-seed scale)))))



;; Takes a vector of probabilities.
(defn make-cumulative [v]
  (vec (map #(reduce + (subvec v 0 %))  (range 1 (inc (count v))))))

;; Takes a vector of cumulative probabilities.
(defn make-lookup-table [v]
  (let [ lookup-scale (- *lookup-size* 0.0001)
         tmp (map
               (fn [n] (find-index #(<= (/ n lookup-scale) %) v))
               (range *lookup-size*)) ]
    (int-array tmp)))



(defn cycle-bytes [source source-size n  ^java.io.BufferedOutputStream
ostream]
  (let [ source-size (int source-size)
         width (int *width*)
         width+1 (int (inc width))
         buffer-size (int (* width+1 4096))
         buffer (byte-array buffer-size (byte 10))
         next-i (fn[i]
           (unchecked-remainder (unchecked-add (int i) width) source-
size))
         next-j (fn[j]
            (let [j (+ j width+1)]
              (if (= j buffer-size)
                (do  (.write ostream buffer)  0)
                j)))
       ]
    (loop [i (int 0)  j (int 0)  n (int n)]
      (System/arraycopy  source i buffer j width)
      (if (> n width)
        (recur (int (next-i i)) (int (next-j j)) (- n width))
        (do
          (aset buffer (+ j n) (byte 10))
          (.write ostream buffer 0 (+ j n 1)))))))



(defn fasta-repeat [n  ^java.io.BufferedOutputStream ostream]
  (let [ source (.getBytes (str *alu* *alu*)) ]
    (cycle-bytes source (count *alu*) n ostream)))



(defn fasta-random [probs n  ^java.io.BufferedOutputStream ostream]
  (let [ codes (.getBytes (str *codes*))
         lookup-table (ints (make-lookup-table (make-cumulative
probs)))
         width (int *width*)
         buffer (byte-array 222000)
         seeds  (int-array  222000)
         first-seed (aget (ints random-seed) 0)
       ]
    (loop [i (int 0)]
      (aset seeds i (aget (ints random-seed) 0))
      (aset buffer i
        (aget codes
          (aget lookup-table
            (gen-random-fast))))
      (if (= (aget (ints random-seed) 0) first-seed)
        (do
          (System/arraycopy  buffer 0  buffer (inc i)  *width*)
          (cycle-bytes buffer (inc i) n ostream)
          (aset (ints random-seed) 0 (aget seeds (mod n (inc i)))))
        (recur (unchecked-inc i))))))



(defn write-line [s ^java.io.BufferedOutputStream stream]
  (.write stream (.getBytes (str s "\n"))))



(let [ ostream (java.io.BufferedOutputStream. System/out)
       arg (first *command-line-args*)
       n (read-string (or arg "25000000"))

       start-time (System/currentTimeMillis)
     ]

  (write-line ">ONE Homo sapiens alu" ostream)
  (fasta-repeat (* n 2) ostream)

  (write-line ">TWO IUB ambiguity codes" ostream)
  (fasta-random *iub* (* n 3) ostream)

  (write-line ">THREE Homo sapiens frequency" ostream)
  (fasta-random *homosapiens* (* n 5) ostream)

  (.flush ostream)

  (binding [*out* *err*]
    (prn n)
    (print (/ (- (System/currentTimeMillis) start-time) 1000.0))
    (println " seconds"))

)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to