Working with a huge graph - how can I make Clojure performant?

Balint Erdi Mon, 11 Mar 2013 10:37:43 -0700

 

Hey,



I got an assignment to implement an algorithm to calculate strongly 
connected components in a graph (
http://en.wikipedia.org/wiki/Kosaraju's_algorithm). The graph is rather 
big, it has ~900.000 vertices.


In its first pass, it needs to do a depth-first search on the graph and 
calculate the finishing time for each node (the finishing time for a node 
is a number from 0…V-1 where V is the number of vertices). Potentially 
several depth-first search need to be launched (done in the finishing-times 
function below) to discover all components.


The version I pasted below is the most performant. It discovers ~600.000 
vertices (2/3 of all vertices). However, on subsequent 
dfs-by-finishing-times it becomes extremely slow (exploring a handful of 
additional nodes/second) and I'm not sure why.


Here are a few things I tried to speed up the algorithm:


* rewrite to a recursive function (not tail-recursive) and use lazy-seqs

* define dfs-by-finishing-times in finishing-times so that the whole graph 
(G) does not have to be passed at each call.

* use persistent data structures everywhere instead of transients (I know 
this would only slow things down but it did not hurt to try)

* use a profiler (VisualVM) to learn where the bottlenecks are. I'm not 
very proficient with profilers but I could not extract any valuable 
information


Here are the code snippets in question:

(I also pasted it 
here: https://www.refheap.com/paste/3840cc772cc3a5b1d9c4f1db3 for better 
readability)

(defn dfs-by-finishing-times

  ([G u]

     (dfs-by-finishing-times G u #{}))

  ([G u explored]

     (loop [[v & vs :as stack] (list u), explored (transient explored), 
lhalf [], rhalf [],  iter-cnt 0]

         (if (seq stack)

          (let [neighbors (persistent!

                           (reduce

                            (fn [c u] (if (explored u) c (conj! c u)))

                            (transient [])

                            (G v)))]

            (cond

             (explored v) (recur vs explored lhalf rhalf (inc iter-cnt))

             (empty? neighbors) (recur vs (conj! explored v) (conj lhalf v) 
rhalf (inc iter-cnt))

             :else (recur (reduce (fn [stack e] (cons e stack)) vs 
neighbors)

                       (conj! explored v)

                       lhalf

                       (cons v rhalf)

                       (inc iter-cnt))))

          (concat lhalf rhalf)))

     ))


(defn finishing-times [G vertices]

  "The first pass of Kosaraju's algorithm.

   Scan the transpose graph of G, and mark the finishing time for each.

   G should already be the transposed graph"

  (loop [[u & vs :as stack] (seq vertices)

          explored #{},

          finished []]

     (if (nil? u)

       finished

       (let [path (dfs-by-finishing-times G u explored)

             new-explored (into explored path)]

         (recur (remove new-explored vs)

                new-explored

                (into finished path))))))

Do you have any insights into what technique I could use to speed up my 
algorithm? I'm pretty sure I'm missing a key point but I'm not sure 
what. Presumably the whole algorithm runs under 10 seconds in C# on the 
same graph so this is rather embarrassing :)

Appreciate your help,

Balint

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Working with a huge graph - how can I make Clojure performant?

Reply via email to