As you suspect, your Clojure code isn't very performant. First, you're doing lots of updates to an immutable map, which isn't going to be efficient. Clojure allows immutable data structures to be changed, temporarily, into mutable ones using the transient function. Alternatively, sometimes it's worth falling back to mutable Java structures like Hashtables.
You can work out the frequencies of the patterns a little more concisely with: (defn pattern-frequencies [line] (->> (range 12) (mapcat #(partition (inc %) 1 line)) (map str/join) (frequencies)) And then combine them with: (defn merge-frequencies [freqs] (merge-with + freqs)) There's plenty of room for optimisation there, but making more use of Clojure's core functions is an easy way to make things quicker. On 12 September 2017 at 16:54, <darren92....@gmail.com> wrote: > Hi, > > I am a researcher of Natural Language Processing. > My team want to know how well does Clojure parallelize and how much time > is reduced compared by Java single thread version. > > The problem we want to solve is, > there is a big corpus file (just now 500MB). > Reading sentences line by line, find all patterns and their occurrence > count on length 1 through 12. > > It is a very simple problem and It doesn’t care of order of processing. > We want to make just a big hash-map. (Key is a pattern string, Value is a > occurrence count.) > Ex) { “father” 10000000 “mother” 10000000 … } > > Comparing performance between Java and Clojure, if Clojure version is > better than Java, > then we’ll change our code base to Clojure, if not, we cannot help staying > Java. > > Anyway my first prototype is very very slow. I’m a novice. :( > > Please give me some advices. > Thanks. > > (ns parallel-test.core > (:require [clojure.java.io :as jio] > [clojure.core.reducers :refer [fold]]) > (:gen-class)) > > (def corpus-file-url "resources/korean.txt") > (def OC (atom nil)) > (def MPL 12) > (def each-size 10000) > > (defn add-pattern-to-hashmap > [h-map ^String ptn ^Integer ptn-oc] > (let [h-ptn-oc (get h-map ptn) > n-ptn-oc (if (nil? h-ptn-oc) > ptn-oc > (+ h-ptn-oc ptn-oc))] > (assoc h-map ptn n-ptn-oc))) > > (defn merge-hash-map > ([] (hash-map)) > ([& hs] > (reduce (fn [l-map r-map] > (reduce (fn [[ptn ptn-oc]] > (add-pattern-to-hashmap l-map ptn ptn-oc)) > r-map)) > hs))) > > (defn cal-line-oc > ([] (hash-map)) > ([h-map ^String line] > (let [line-length (count line)] > (loop [i 0 > i-map h-map] > (if (>= i line-length) > i-map > (recur (inc i) > (loop [j 1 > j-map i-map] > (let [end-index (+ i j)] > (if (or (> j MPL) (> end-index line-length)) > j-map > (recur (inc j) > (add-pattern-to-hashmap j-map (subs line i end-index) > 1))))))))))) > > (defn parallel-process > [combine-fn reduce-fn input-file] > (with-open [rdr (jio/reader input-file)] > (fold each-size > combine-fn > reduce-fn > (line-seq rdr)))) > > (defn -main [& args] > (println "start") > (reset! OC (parallel-process merge-hash-map cal-line-oc corpus-file-url)) > (println "end")) > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- James Reeves booleanknot.com -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.