Hi, I am pleased to announce the 0.0.2 release of psq.clj, a Clojure priority search queue library based on Ralf Hinze's priority search pennants (see Ralf Hinze, "A Simple Implementation Technique for Priority Search Queues"):
https://github.com/michalmarczyk/psq.clj https://clojars.org/psq.clj Priority search queues simultaneously behave like sorted maps with respect to their keys and like priority queues with respect to their entries, with the priority queue ordering based on the values, or priorities, attached to the keys. Additionally they support several operations that blend the sorted map and priority queue aspects together, such as "peek in key range" (retrieve a minimum-priority entry – NB. ties on priorities are supported – within the given key range) and optimized subseq-like traversals of key ranges that only return those entries whose priorities fall below a certain threshold (this is much more efficient than subseq + filter on a regular sorted map). psq.clj priority search queues support the full data.avl abstract data type (with the exception of the transient API which I am not sure would bring a significant return on the additional memory cost in this context). The full public API is as follows: ;;; there is a single public namespace: (require '[psq.clj :as psq]) ;;; 1. factory functions ;; the psq.clj counterpart to clojure.core/sorted-map: (psq/psqueue key priority …) ;= {key priority …} ;; a version of the above that takes keys+priorities in a seqable: (psq/psqueue* [key priority …]) ;; a factory that accepts a map or seqable of map entries: (psq/psq {key priority …}) (psq/psq [[key priority] …]) ;; versions of the above that take *two* custom comparators: the ;; first one determines the PSQ's key order, the second one is used ;; for priorities: (psq/psqueue-by > > key priority …) ; reverse numeric order on keys ; and priorities psq/psqueue-by*, psq/psq-by ; like psq/psqueue*, psq/psq, but with ; custom comparators ;;; 2. regular sorted map API; NB. (r)(sub)seq use key order (seq (psq/psqueue 0 10 1 9)) ;= ([0 10] [1 9]) ;; also supported: assoc, dissoc, conj, rseq, subseq, rsubseq ;;; 3. nearest neighbour lookups (psq/nearest (psq/psq {0 1 4 5 9 10}) > 3) ;= [4 5] ;;; 4. nth, rank in key order (nth (psq/psqueue 0 3 6 -3) 0) ;= [0 3] (psq/rank (psq/psqueue 0 3 6 -3) 6) ; returns long, -1 for not found ;= 1 ;;; 5. priority queue API based on values/priorities (peek (psq/psqueue 0 3 1 -3)) ;= [1 -3] (pop (psq/psqueue 0 3 1 -3)) ;= {0 3} ;; seq over the PSQ in order of non-decreasing priorities (psq/priority-seq (psq/psqueue 0 3 1 -3)) ;= ([1 -3] [0 3]) ;;; 6. splits, subranges – with structural sharing in the common ;;; parts, but no holding on to keys outside the stated range for ;;; GC purposes ;; return a vector of ;; ;; 1. a fully independent PSQ comprising the entries of the ;; input PSQ to the left of the given key, ;; ;; 2. the entry at the given key, or nil if not present, ;; ;; 3. a fully independent PSQ comprising the entries of the ;; input PSQ to the right of the given key: (psq/split (psq/psqueue* (range 10)) 4) ;= [{0 1 2 3} [4 5] {6 7 8 9}] ;; like subseq, but returns an independent PSQ (psq/subrange (psq/psqueue* (range 10)) >= 4 < 8) ;= {4 5 6 7} ;;; 7. priority-bounded traversals: (psq/seq<= a-psq priority-upper-bound) ;; also supported: rseq<=, subseq<=, rsubseq<= (the latter two ;; with subseq/rsubseq-like arguments to specify key bounds) and ;; < variants of all of the above (seq< etc.) All of the above operations can be performed in O(log n) time, with the exception of peek, which takes constant time, and seq-like operations, which are sensitive to the number of entries actually produced. Priority-bounded traversals take O(r(log n - log r) + r) time, where r is the number of entries actually returned. In practice, psq.clj priority search queues are slower than Clojure's built-in sorted maps for pure sorted-map operations, typically by a factor of 2 to 4. In return, the PSQ operations are very performant; subseq<= can be over three orders of magnitude faster than the equivalent combination of subseq and filter applied to a regular sorted map, subrange takes tens of microseconds when the size of the input is on the order of hundreds of thousands, peek takes less than 10 ns etc. See end of this message for some benchmark results. Mark Engelberg's excellent data.priority-map is another natural point of comparison. In short, whereas data.priority-map is hash-map-like, with extremely fast lookups and no particular ordering of keys, psq.clj is sorted-map-like, with a much richer abstract data type at the cost of noticeably slower lookups. assoc is often faster with psq.clj, although there are edge cases where data.priority-map is faster; dissoc is generally faster with data.priority-map. In concrete usage scenarios, comparative benchmarks are advisable, as either data structure may come out ahead depending on the specific mix of operations. Of course psq.clj will be most beneficial in those use cases which benefit from the expanded abstract data type, and any lookup-heavy workloads where the common operations suffice will tend to be better served by data.priority-map. I had the pleasure of announcing the initial 0.0.1 release of psq.clj at EuroClojure 2016 in Bratislava: https://www.youtube.com/watch?v=pZ9gBKO_qCA To those of you who were there – many thanks for helping make this year's EuroClojure such a great conference! Also, I have since addressed a few points from my TODO list from the final slide (notably psq.clj/nearest is there now, and so psq.clj really supports the full data.avl abstract data type). Finally I should say that at this early stage both the public API and the implementation strategy are subject to revision. The features that are there, however, I do expect to work – a thorough generative test suite based on collection-check (https://github.com/ztellman/collection-check) and test.check (https://github.com/clojure/test.check) is in place to ensure this. To include this version in your project: [psq.clj "0.0.2"] <dependency> <groupId>psq.clj</groupId> <artifactId>psq.clj</artifactId> <version>0.0.2</version> </dependency> compile "psq.clj:psq.clj:0.0.2" Cheers, Michał Benchmark results ================= PSQ – psq.clj PM – data.priority-map SM – clojure.core/sorted-map Results obtained using Hugo Duncan's Criterium: https://github.com/hugoduncan/criterium NB. some extreme results are due to benchmarks hitting the best/worst cases for each data structure. Where Criterium reported the results in different units for different data structures, these have been changed to make results comparable at first glance. N/A is indicated where an operation cannot be reasonably implemented/emulated for a given data structure. Basic map operations -------------------- Except where otherwise noted, these benchmarks use 5000 key/value pairs inserted in order of increasing keys and values: (psq/psqueue* (range 10000)) (apply pm/priority-map (range 10000)) (apply sorted-map (range 10000)) 1. (get input-map 0) PSQ: 1.367226 ns PM: 62.910936 ns SM: 159.694336 ns 2. (get input-map 5000) PSQ: 268.306762 ns PM: 73.988311 ns SM: 179.033148 ns 3. (get input-map 10000) PSQ: 552.692926 ns PM: 56.753372 ns SM: 248.005636 ns 4. (assoc input-map 0 -1) PSQ: 1.288109 µs PM: 2.344063 µs SM: 0.607151958 µs 5. (assoc input-map 5000 -1) PSQ: 1.822005 µs PM: 2.573408 µs SM: 0.646016659 µs 6. (assoc input-map 10000 -1) PSQ: 3.000240 µs PM: 1.440096 µs SM: 0.725093232 µs 7. (assoc input-map 0 10000) PSQ: 1.309970 µs PM: 2.746458 µs SM: 0.621439055 µs 8. (assoc input-map 5000 10000) PSQ: 1.711392 µs PM: 2.834123 µs SM: 0.663170561 µs 9. (assoc input-map 10000 10000) PSQ: 2.969847 µs PM: 1.746856 µs SM: 0.715915433 µs 10. (dissoc input-map 0) PSQ: 1.398323 µs PM: 1.211380 µs SM: 0.653975123 µs 11. (dissoc input-map 5000) PSQ: 1.668061 µs PM: 1.367344 µs SM: 0.740665097 µs 12. (dissoc input-map 10000) PSQ: 1.509873 µs PM: 0.055210586 µs SM: 0.335146431 µs 13. (peek input-map) PSQ: 4.254087 ns PM: 487.922381 ns SM: N/A 14. (pop input-map) PSQ: 0.762549879 µs PM: 1.758735 µs SM: N/A 15. (last (psq/priority-seq psq)) (last (seq pm)) NB. data.priority-map implements seq in order of non-decreasing priorities; psq.clj's seq respects key order and exposes priority-ordered traversals using a separate function, psq.clj/priority-seq. PSQ: 3.603032 ms PM: 3.359997 ms SM: N/A 16. (last (psq/subseq<= psq 2001 >= 2000 < 70000)) (last (psq/seq<= (psq/subrange psq >= 2000 < 70000) 2001)) (last (f sm 2001 >= 2000 < 700000)) (last (g sm 2001 >= 2000 < 700000)) Here f and g are defined as (fn f [m ubound start-test start end-test end] (filter #(not (pos? (.compare ^java.util.Comparator c (val %) ubound))) (subseq m start-test start end-test end))) (fn g [m ubound start-test start end-test end] (filter #(<= (val %) ubound) (subseq m start-test start end-test end))) PSQ: 26.714850 µs (subseq<=) 25.213709 µs (seq<= + subrange) PM: 147542.945 µs (f) 146825.592 µs (g) SM: N/A 17. Using (psq/psqueue* (range 1000000)) as input: (psq/split psq 500000) PSQ: 3.587069 µs PM: N/A SM: N/A 18. Using (psq/psqueue* (range 1000000)) as input: (psq/subrange psq > 250000 <= 750000) PSQ: 22.776483 µs PM: N/A SM: N/A -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.