ReadyForZero is open-sourcing our library for easily gathering data and computing summary measures in a declarative way:
https://github.com/ReadyForZero/babbage The summary measure functionality allows you to compute multiple measures over arbitrary partitions of your input data simultaneously and in a single pass. You just say what you want to compute: > (def my-fields {:y (stats :y count) :x (stats :x count) :both (stats #(+ (or (:x %) 0) (or (:y %) 0)) count sum mean)}) and the sets that are of interest: > (def my-sets (-> (sets {:has-y #(contains? % :y}) (complement :has-y))) ;; could also take intersections, unions And then run it with some data: > (calculate my-sets my-fields [{:x 1 :y 2} {:x 10} {:x 4 :y 3} {:x 5}]) {:not-has-y {:y {:count 0}, :x {:count 2}, :both {:mean 7.5, :sum 15, :count 2}}, :has-y {:y {:count 2}, :x {:count 2}, :both {:mean 5.0, :sum 10, :count 2}}, :all {:y {:count 2}, :x {:count 4}, :both {:mean 6.25, :sum 25, :count 4}}} The functions :x, :y, and #(+ (or (:x %) 0) (or (:y %) 0)) defined in the fields map are called once per input element no matter how many sets the element contributes to. The function #(contains? % y) is also called once per input element, no matter how many unions, intersections, complements, etc. the set :has-y contributes to. A variety of measure functions, and structured means of combining them, are supplied; it's also easy to define additional measures. babbage also supplies a method for running computations structured as dependency graphs; this can make gathering the initial data for summarizing simpler to express. To give an example that's probably familiar from another context: > (defgraphfn sum [xs] (apply + xs)) > (defgraphfn sum-squared [xs] (sum (map #(* % %) xs))) > (defgraphfn count-input :count [xs] (count xs)) > (defgraphfn mean [count sum] (double (/ sum count))) > (defgraphfn mean2 [count sum-squared] (double (/ sum-squared count))) > (defgraphfn variance [mean mean2] (- mean2 (* mean mean))) > (run-graph {:xs [1 2 3 4]} sum variance sum-squared count-input mean mean2) {:sum 10 :count 4 :sum-squared 30 :mean 2.5 :variance 1.25 :mean2 7.5 :xs [1 2 3 4]} Options are provided for parallel, sequential, and lazy computation of the elements of the result map, and for resolving the dependency graph in advance of running the computation for a given input, either at runtime or at compile time. Please see the README at the github repo for more details. Enjoy! -- Ben Wolfson "Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure." [Larousse, "Drink" entry] -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.