ANN: babbage 1.0.0, a library for easily gathering data and computing summary measures in a declarative way

Ben Wolfson Fri, 01 Feb 2013 16:12:37 -0800

ReadyForZero is open-sourcing our library for easily gathering data
and computing summary measures in a declarative way:


https://github.com/ReadyForZero/babbage

The summary measure functionality allows you to compute multiple
measures over arbitrary partitions of your input data simultaneously
and in a single pass. You just say what you want to compute:

> (def my-fields {:y (stats :y count)
                  :x (stats :x count)
                  :both (stats #(+ (or (:x %) 0) (or (:y %) 0)) count
sum mean)})

and the sets that are of interest:

> (def my-sets (-> (sets {:has-y #(contains? % :y})
                   (complement :has-y))) ;; could also take
intersections, unions

And then run it with some data:

> (calculate my-sets my-fields [{:x 1 :y 2} {:x 10} {:x 4 :y 3} {:x 5}])
{:not-has-y
 {:y {:count 0}, :x {:count 2}, :both {:mean 7.5, :sum 15, :count 2}},
 :has-y
 {:y {:count 2}, :x {:count 2}, :both {:mean 5.0, :sum 10, :count 2}},
 :all
 {:y {:count 2}, :x {:count 4}, :both {:mean 6.25, :sum 25, :count 4}}}

The functions :x, :y, and #(+ (or (:x %) 0) (or (:y %) 0)) defined in
the fields map are called once per input element no matter how many
sets the element contributes to. The function #(contains? % y) is also
called once per input element, no matter how many unions,
intersections, complements, etc. the set :has-y contributes to.

A variety of measure functions, and structured means of combining
them, are supplied; it's also easy to define additional measures.

babbage also supplies a method for running computations structured as
dependency graphs; this can make gathering the initial data for
summarizing simpler to express. To give an example that's probably
familiar from another context:

> (defgraphfn sum [xs]
    (apply + xs))
> (defgraphfn sum-squared [xs]
    (sum (map #(* % %) xs)))
> (defgraphfn count-input :count [xs]
    (count xs))
> (defgraphfn mean [count sum]
    (double (/ sum count)))
> (defgraphfn mean2 [count sum-squared]
    (double (/ sum-squared count)))
> (defgraphfn variance [mean mean2]
    (- mean2 (* mean mean)))
> (run-graph {:xs [1 2 3 4]} sum variance sum-squared count-input mean mean2)
{:sum 10
 :count 4
 :sum-squared 30
 :mean 2.5
 :variance 1.25
 :mean2 7.5
 :xs [1 2 3 4]}

Options are provided for parallel, sequential, and lazy computation of
the elements of the result map, and for resolving the dependency graph
in advance of running the computation for a given input, either at
runtime or at compile time.

Please see the README at the github repo for more details.

Enjoy!

-- 
Ben Wolfson
"Human kind has used its intelligence to vary the flavour of drinks,
which may be sweet, aromatic, fermented or spirit-based. ... Family
and social life also offer numerous other occasions to consume drinks
for pleasure." [Larousse, "Drink" entry]

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ANN: babbage 1.0.0, a library for easily gathering data and computing summary measures in a declarative way

Reply via email to