Hi, I am pleased to announce the initial Clojure Contrib release of data.avl (previously known as avl.clj), a library implementing drop-in replacements for Clojure(Script)'s sorted maps and sets with faster lookups, support for the transient API (leading to unambiguously improved performance for many usage patterns) and log-time rank queries (via clojure.core/nth and clojure.data.avl/rank-of).
There are two relevant trade-offs: higher memory usage (a pointer and two ints per key) and somewhat slower single "updates". See below for an extended discussion and some benchmark results. For users of avl.clj 0.0.9: this release fixes a bug in rank-of and a single reflective call I missed previously. Upgrading requires no changes other than to the namespace name, which is now clojure.data.avl. The Maven artefact is now available from Maven Central; dependency information is as follows (Leiningen / Maven / Gradle): [org.clojure/data.avl "0.0.10"] <dependency> <groupId>org.clojure</groupId> <artifactId>data.avl</artifactId> <version>0.0.10</version> </dependency> compile "org.clojure:data.avl:0.0.10" The repository for this project is located here: https://github.com/clojure/data.avl data.avl uses persistent AVL trees as the underlying data structure. AVL trees tend to be significantly shallower than red-black trees; since the built-in sorted collections use red-black trees, data.avl maps and sets offer noticeably improved expected lookup times. The trade-off here is that "updates" (assoc / dissoc / conj / disj) must perform more work in rebalancing the trees, so they are typically slower than with the built-ins. However, data.avl maps and sets support the transient API, which can be used to speed up long chains of "updates". Thanks to this, data.avl types tend to outperform the built-ins for batch "updates". One particularly important case of this type is that of the initial construction of large instances; this tends to be noticeably faster with data.avl types (on a par with the built-ins worst-case in my benchmarking so far). (Of course the time to complete all operations varies with tree structure. When comparing lookup times for nodes occupying the same positions in both trees, the built-in red-black trees have a very slight edge; see for example (get avl 0) and (get rb 0) in the benchmark results below. "Updates" involving identically placed nodes may be faster in either tree type depending on the overall shape of the tree; see for example (dissoc avl 131071) and (dissoc rb 131071) below, where the AVL map completes the operation twice as fast. On average, however, performance is as described above.) Besides supporting transients, data.avl types offer two additions to the sorted collection API: (require '[clojure.data.avl :as avl]) ;; nth can be used to look up the nth smallest item in a sorted ;; collection, as determined by the comparator being used: (nth (avl/sorted-set 0 2 4) 2) ;= 4 (nth (avl/sorted-set-by > 0 2 4) 2) ;= 0 ;; for maps, nth returns map entry objects: (nth (avl/sorted-map 0 0 2 2 4 4) 2) ;= [4 4] ;; clojure.data.avl/rank-of can be used to query for the index of a ;; given key in a data.avl collection: (avl/rank-of (avl/sorted-set 3 4 5) 4) ;= 1 ;; if the key is not present in the collection, -1 is returned: (avl/rank-of (avl/sorted-set 3 4 5) 0) ;= -1 As mentioned above, the new functionality comes at a cost in memory consumption. To support transients, each node in a data.avl tree must carry an extra reference field -- that's one pointer per key. To support efficient rank queries, each pointer needs to carry an extra int field. Finally, another int field is used by the rebalancing algorithm itself. The library carries its own test suite used in both Clojure and ClojureScript. Additionally, the Clojure version is stress-tested with Zach Tellman's collection-check library. I attach some Criterium benchmark results below. Cheers, Michał Benchmark environment: OpenJDK 1.7, Clojure 1.5.1, Criterium 0.4.2. Relevant defs: (def ks (range 300000)) (def ksks (interleave ks ks)) (def rb (apply sorted-map ksks)) (def avl (apply avl/sorted-map ksks)) (defn rb-rank-of [rb-map k] (if (contains? rb k) (count (take-while #(not= k (key %)) (seq rb))) -1)) Benchmark results: (lookup-benchmarks) =================== (c/bench (get avl 0)) WARNING: Final GC required 17.30960512486499 % of runtime WARNING: Final GC required 1.487642242862071 % of runtime Evaluation count : 184965780 in 60 samples of 3082763 calls. Execution time mean : 320.595187 ns Execution time std-deviation : 2.710112 ns Execution time lower quantile : 316.756609 ns ( 2.5%) Execution time upper quantile : 325.833642 ns (97.5%) Overhead used : 2.128084 ns (c/bench (get rb 0)) WARNING: Final GC required 1.481744139127142 % of runtime Evaluation count : 192022740 in 60 samples of 3200379 calls. Execution time mean : 308.423733 ns Execution time std-deviation : 3.135581 ns Execution time lower quantile : 304.385589 ns ( 2.5%) Execution time upper quantile : 315.500500 ns (97.5%) Overhead used : 2.128084 ns (c/bench (get avl 131071)) WARNING: Final GC required 1.594644811795316 % of runtime Evaluation count : 1833957600 in 60 samples of 30565960 calls. Execution time mean : 30.406098 ns Execution time std-deviation : 0.317429 ns Execution time lower quantile : 29.896215 ns ( 2.5%) Execution time upper quantile : 31.120450 ns (97.5%) Overhead used : 2.128084 ns Found 3 outliers in 60 samples (5.0000 %) low-severe 3 (5.0000 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (get rb 131071)) WARNING: Final GC required 1.623586044372867 % of runtime Evaluation count : 1976983260 in 60 samples of 32949721 calls. Execution time mean : 27.838439 ns Execution time std-deviation : 0.287299 ns Execution time lower quantile : 27.346813 ns ( 2.5%) Execution time upper quantile : 28.375493 ns (97.5%) Overhead used : 2.128084 ns Found 1 outliers in 60 samples (1.6667 %) low-severe 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (get avl 299999)) WARNING: Final GC required 1.433452208207925 % of runtime Evaluation count : 156641220 in 60 samples of 2610687 calls. Execution time mean : 386.680734 ns Execution time std-deviation : 3.648226 ns Execution time lower quantile : 381.171378 ns ( 2.5%) Execution time upper quantile : 394.259276 ns (97.5%) Overhead used : 2.128084 ns (c/bench (get rb 299999)) WARNING: Final GC required 1.458498405132945 % of runtime Evaluation count : 87642960 in 60 samples of 1460716 calls. Execution time mean : 683.691293 ns Execution time std-deviation : 6.293332 ns Execution time lower quantile : 676.164098 ns ( 2.5%) Execution time upper quantile : 697.942500 ns (97.5%) Overhead used : 2.128084 ns Found 1 outliers in 60 samples (1.6667 %) low-severe 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (construction-benchmarks) ========================= (c/bench (apply avl/sorted-map ksks)) WARNING: Final GC required 6.475014797576352 % of runtime Evaluation count : 300 in 60 samples of 5 calls. Execution time mean : 196.151940 ms Execution time std-deviation : 2.434096 ms Execution time lower quantile : 193.202175 ms ( 2.5%) Execution time upper quantile : 202.256218 ms (97.5%) Overhead used : 2.128084 ns Found 3 outliers in 60 samples (5.0000 %) low-severe 3 (5.0000 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (apply sorted-map ksks)) WARNING: Final GC required 4.8414421771766305 % of runtime Evaluation count : 240 in 60 samples of 4 calls. Execution time mean : 314.285107 ms Execution time std-deviation : 2.687500 ms Execution time lower quantile : 310.117205 ms ( 2.5%) Execution time upper quantile : 319.825282 ms (97.5%) Overhead used : 2.128084 ns (assoc-benchmarks) ================== (c/bench (assoc avl -1 -1)) WARNING: Final GC required 1.443316131643841 % of runtime Evaluation count : 85057080 in 60 samples of 1417618 calls. Execution time mean : 704.100540 ns Execution time std-deviation : 5.756555 ns Execution time lower quantile : 696.151863 ns ( 2.5%) Execution time upper quantile : 716.721543 ns (97.5%) Overhead used : 2.128084 ns Found 2 outliers in 60 samples (3.3333 %) low-severe 2 (3.3333 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (assoc rb -1 -1)) WARNING: Final GC required 1.473615175697951 % of runtime Evaluation count : 94522860 in 60 samples of 1575381 calls. Execution time mean : 638.321250 ns Execution time std-deviation : 5.147774 ns Execution time lower quantile : 631.959478 ns ( 2.5%) Execution time upper quantile : 648.923646 ns (97.5%) Overhead used : 2.128084 ns Found 2 outliers in 60 samples (3.3333 %) low-severe 2 (3.3333 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (assoc avl 131071.5 131071.5)) WARNING: Final GC required 1.475403685555393 % of runtime Evaluation count : 87600660 in 60 samples of 1460011 calls. Execution time mean : 694.815846 ns Execution time std-deviation : 6.267862 ns Execution time lower quantile : 685.491226 ns ( 2.5%) Execution time upper quantile : 707.824187 ns (97.5%) Overhead used : 2.128084 ns (c/bench (assoc rb 131071.5 131071.5)) WARNING: Final GC required 1.500960406817446 % of runtime Evaluation count : 96543600 in 60 samples of 1609060 calls. Execution time mean : 617.554662 ns Execution time std-deviation : 6.103529 ns Execution time lower quantile : 609.156583 ns ( 2.5%) Execution time upper quantile : 631.065400 ns (97.5%) Overhead used : 2.128084 ns Found 2 outliers in 60 samples (3.3333 %) low-severe 2 (3.3333 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (assoc avl 300000 300000)) WARNING: Final GC required 1.515899576717984 % of runtime Evaluation count : 51857640 in 60 samples of 864294 calls. Execution time mean : 1.142891 µs Execution time std-deviation : 10.147119 ns Execution time lower quantile : 1.129993 µs ( 2.5%) Execution time upper quantile : 1.165111 µs (97.5%) Overhead used : 2.128084 ns Found 1 outliers in 60 samples (1.6667 %) low-severe 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (assoc rb 300000 300000)) WARNING: Final GC required 1.508336024198179 % of runtime Evaluation count : 65808300 in 60 samples of 1096805 calls. Execution time mean : 909.241190 ns Execution time std-deviation : 8.308224 ns Execution time lower quantile : 899.482479 ns ( 2.5%) Execution time upper quantile : 927.906490 ns (97.5%) Overhead used : 2.128084 ns Found 1 outliers in 60 samples (1.6667 %) low-severe 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (dissoc-benchmarks) =================== (c/bench (dissoc avl 0)) WARNING: Final GC required 1.423418517945197 % of runtime Evaluation count : 93948600 in 60 samples of 1565810 calls. Execution time mean : 646.364053 ns Execution time std-deviation : 4.752838 ns Execution time lower quantile : 638.311265 ns ( 2.5%) Execution time upper quantile : 655.962057 ns (97.5%) Overhead used : 2.128084 ns Found 2 outliers in 60 samples (3.3333 %) low-severe 2 (3.3333 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (dissoc rb 0)) WARNING: Final GC required 1.584824677744148 % of runtime Evaluation count : 94848360 in 60 samples of 1580806 calls. Execution time mean : 631.947940 ns Execution time std-deviation : 3.817958 ns Execution time lower quantile : 625.144366 ns ( 2.5%) Execution time upper quantile : 638.920567 ns (97.5%) Overhead used : 2.128084 ns (c/bench (dissoc avl 131071)) WARNING: Final GC required 1.605081849882345 % of runtime Evaluation count : 220124040 in 60 samples of 3668734 calls. Execution time mean : 266.365232 ns Execution time std-deviation : 2.013280 ns Execution time lower quantile : 263.177469 ns ( 2.5%) Execution time upper quantile : 270.351565 ns (97.5%) Overhead used : 2.128084 ns (c/bench (dissoc rb 131071)) WARNING: Final GC required 1.571888258342256 % of runtime Evaluation count : 112493340 in 60 samples of 1874889 calls. Execution time mean : 533.037294 ns Execution time std-deviation : 3.334672 ns Execution time lower quantile : 527.605873 ns ( 2.5%) Execution time upper quantile : 539.581328 ns (97.5%) Overhead used : 2.128084 ns (c/bench (dissoc avl 299999)) WARNING: Final GC required 1.483980334798602 % of runtime Evaluation count : 53342520 in 60 samples of 889042 calls. Execution time mean : 1.132967 µs Execution time std-deviation : 7.353171 ns Execution time lower quantile : 1.120979 µs ( 2.5%) Execution time upper quantile : 1.148367 µs (97.5%) Overhead used : 2.128084 ns (c/bench (dissoc rb 299999)) WARNING: Final GC required 1.508118289184663 % of runtime Evaluation count : 69107160 in 60 samples of 1151786 calls. Execution time mean : 881.091241 ns Execution time std-deviation : 6.621161 ns Execution time lower quantile : 868.480766 ns ( 2.5%) Execution time upper quantile : 892.546645 ns (97.5%) Overhead used : 2.128084 ns Found 1 outliers in 60 samples (1.6667 %) low-severe 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (reduce-kv-benchmarks) ====================== (c/bench (reduce-kv + 0 avl)) WARNING: Final GC required 1.561011180350695 % of runtime Evaluation count : 1620 in 60 samples of 27 calls. Execution time mean : 37.223528 ms Execution time std-deviation : 284.200552 µs Execution time lower quantile : 36.733677 ms ( 2.5%) Execution time upper quantile : 37.700421 ms (97.5%) Overhead used : 2.128084 ns (c/bench (reduce-kv + 0 rb)) WARNING: Final GC required 1.498782422077256 % of runtime Evaluation count : 1500 in 60 samples of 25 calls. Execution time mean : 41.694365 ms Execution time std-deviation : 332.607807 µs Execution time lower quantile : 41.083362 ms ( 2.5%) Execution time upper quantile : 42.348164 ms (97.5%) Overhead used : 2.128084 ns (select-benchmarks) =================== (c/bench (nth avl 0)) WARNING: Final GC required 1.479982375647741 % of runtime Evaluation count : 207875520 in 60 samples of 3464592 calls. Execution time mean : 289.801393 ns Execution time std-deviation : 2.361106 ns Execution time lower quantile : 286.809191 ns ( 2.5%) Execution time upper quantile : 293.622950 ns (97.5%) Overhead used : 2.128084 ns Found 1 outliers in 60 samples (1.6667 %) low-severe 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (nth (seq rb) 0)) WARNING: Final GC required 1.46387748103801 % of runtime Evaluation count : 280530660 in 60 samples of 4675511 calls. Execution time mean : 214.456442 ns Execution time std-deviation : 1.838107 ns Execution time lower quantile : 211.716850 ns ( 2.5%) Execution time upper quantile : 217.151388 ns (97.5%) Overhead used : 2.128084 ns Found 3 outliers in 60 samples (5.0000 %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (nth avl 131071)) WARNING: Final GC required 1.630328987905956 % of runtime Evaluation count : 2479221180 in 60 samples of 41320353 calls. Execution time mean : 22.273116 ns Execution time std-deviation : 0.193119 ns Execution time lower quantile : 21.913883 ns ( 2.5%) Execution time upper quantile : 22.674207 ns (97.5%) Overhead used : 2.128084 ns Found 1 outliers in 60 samples (1.6667 %) low-severe 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (nth (seq rb) 131071)) WARNING: Final GC required 1.501022611379085 % of runtime Evaluation count : 15480 in 60 samples of 258 calls. Execution time mean : 3.868847 ms Execution time std-deviation : 30.435022 µs Execution time lower quantile : 3.815580 ms ( 2.5%) Execution time upper quantile : 3.919490 ms (97.5%) Overhead used : 2.128084 ns Found 1 outliers in 60 samples (1.6667 %) low-severe 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (nth avl 299999)) WARNING: Final GC required 1.590021994376285 % of runtime Evaluation count : 68943060 in 60 samples of 1149051 calls. Execution time mean : 846.178065 ns Execution time std-deviation : 5.350532 ns Execution time lower quantile : 835.947697 ns ( 2.5%) Execution time upper quantile : 855.535318 ns (97.5%) Overhead used : 2.128084 ns (c/bench (nth (seq rb) 299999)) WARNING: Final GC required 1.4559626406394761 % of runtime Evaluation count : 3540 in 60 samples of 59 calls. Execution time mean : 17.253646 ms Execution time std-deviation : 137.868346 µs Execution time lower quantile : 17.043361 ms ( 2.5%) Execution time upper quantile : 17.566532 ms (97.5%) Overhead used : 2.128084 ns Found 4 outliers in 60 samples (6.6667 %) low-severe 4 (6.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (rank-of-benchmarks) ==================== (c/bench (avl/rank-of avl 0)) WARNING: Final GC required 1.435081391872642 % of runtime Evaluation count : 206515440 in 60 samples of 3441924 calls. Execution time mean : 288.829471 ns Execution time std-deviation : 2.436205 ns Execution time lower quantile : 284.627916 ns ( 2.5%) Execution time upper quantile : 293.684025 ns (97.5%) Overhead used : 2.128084 ns (c/bench (rb-rank-of rb 0)) WARNING: Final GC required 1.5237700419512759 % of runtime Evaluation count : 116885160 in 60 samples of 1948086 calls. Execution time mean : 515.495174 ns Execution time std-deviation : 4.405742 ns Execution time lower quantile : 509.372306 ns ( 2.5%) Execution time upper quantile : 525.137634 ns (97.5%) Overhead used : 2.128084 ns (c/bench (avl/rank-of avl 131071)) WARNING: Final GC required 1.89767944295473 % of runtime Evaluation count : 2006788020 in 60 samples of 33446467 calls. Execution time mean : 27.768944 ns Execution time std-deviation : 0.316862 ns Execution time lower quantile : 27.307884 ns ( 2.5%) Execution time upper quantile : 28.547999 ns (97.5%) Overhead used : 2.128084 ns Found 4 outliers in 60 samples (6.6667 %) low-severe 4 (6.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (rb-rank-of rb 131071)) WARNING: Final GC required 1.557940388859877 % of runtime Evaluation count : 2700 in 60 samples of 45 calls. Execution time mean : 22.387215 ms Execution time std-deviation : 172.796714 µs Execution time lower quantile : 22.054730 ms ( 2.5%) Execution time upper quantile : 22.716453 ms (97.5%) Overhead used : 2.128084 ns (c/bench (avl/rank-of avl 299999)) WARNING: Final GC required 1.58577957292816 % of runtime Evaluation count : 71499360 in 60 samples of 1191656 calls. Execution time mean : 846.167011 ns Execution time std-deviation : 6.919658 ns Execution time lower quantile : 834.322212 ns ( 2.5%) Execution time upper quantile : 860.323062 ns (97.5%) Overhead used : 2.128084 ns Found 1 outliers in 60 samples (1.6667 %) low-severe 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers (c/bench (rb-rank-of rb 299999)) WARNING: Final GC required 1.518370722058118 % of runtime Evaluation count : 1200 in 60 samples of 20 calls. Execution time mean : 51.116182 ms Execution time std-deviation : 384.558939 µs Execution time lower quantile : 50.562968 ms ( 2.5%) Execution time upper quantile : 51.927406 ms (97.5%) Overhead used : 2.128084 ns -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.