Great work. I love the Clojure state for data-structures. People often underestimate how much access to quality data-structures is important for a language. By the way, www.data.avl is not a valid link.
On Sunday, 20 November 2016 12:32:31 UTC-8, Michał Marczyk wrote: > > Hi, > > I am pleased to announce the 0.0.2 release of psq.clj, a Clojure > priority search queue library based on Ralf Hinze's priority search > pennants (see Ralf Hinze, "A Simple Implementation Technique for > Priority Search Queues"): > > https://github.com/michalmarczyk/psq.clj > > https://clojars.org/psq.clj > > Priority search queues simultaneously behave like sorted maps with > respect to their keys and like priority queues with respect to their > entries, with the priority queue ordering based on the values, or > priorities, attached to the keys. Additionally they support several > operations that blend the sorted map and priority queue aspects > together, such as "peek in key range" (retrieve a minimum-priority > entry – NB. ties on priorities are supported – within the given key > range) and optimized subseq-like traversals of key ranges that only > return those entries whose priorities fall below a certain threshold > (this is much more efficient than subseq + filter on a regular sorted > map). > > psq.clj priority search queues support the full data.avl abstract data > type (with the exception of the transient API which I am not sure > would bring a significant return on the additional memory cost in this > context). The full public API is as follows: > > ;;; there is a single public namespace: > (require '[psq.clj :as psq]) > > > ;;; 1. factory functions > > ;; the psq.clj counterpart to clojure.core/sorted-map: > (psq/psqueue key priority …) > ;= {key priority …} > > ;; a version of the above that takes keys+priorities in a seqable: > (psq/psqueue* [key priority …]) > > ;; a factory that accepts a map or seqable of map entries: > (psq/psq {key priority …}) > (psq/psq [[key priority] …]) > > ;; versions of the above that take *two* custom comparators: the > ;; first one determines the PSQ's key order, the second one is used > ;; for priorities: > (psq/psqueue-by > > key priority …) ; reverse numeric order on keys > ; and priorities > > psq/psqueue-by*, psq/psq-by ; like psq/psqueue*, psq/psq, but with > ; custom comparators > > > ;;; 2. regular sorted map API; NB. (r)(sub)seq use key order > > (seq (psq/psqueue 0 10 1 9)) > ;= ([0 10] [1 9]) > > ;; also supported: assoc, dissoc, conj, rseq, subseq, rsubseq > > > ;;; 3. nearest neighbour lookups > (psq/nearest (psq/psq {0 1 4 5 9 10}) > 3) > ;= [4 5] > > > ;;; 4. nth, rank in key order > > (nth (psq/psqueue 0 3 6 -3) 0) > ;= [0 3] > > (psq/rank (psq/psqueue 0 3 6 -3) 6) ; returns long, -1 for not found > ;= 1 > > > ;;; 5. priority queue API based on values/priorities > > (peek (psq/psqueue 0 3 1 -3)) > ;= [1 -3] > > (pop (psq/psqueue 0 3 1 -3)) > ;= {0 3} > > ;; seq over the PSQ in order of non-decreasing priorities > (psq/priority-seq (psq/psqueue 0 3 1 -3)) > ;= ([1 -3] [0 3]) > > > ;;; 6. splits, subranges – with structural sharing in the common > ;;; parts, but no holding on to keys outside the stated range for > ;;; GC purposes > > ;; return a vector of > ;; > ;; 1. a fully independent PSQ comprising the entries of the > ;; input PSQ to the left of the given key, > ;; > ;; 2. the entry at the given key, or nil if not present, > ;; > ;; 3. a fully independent PSQ comprising the entries of the > ;; input PSQ to the right of the given key: > (psq/split (psq/psqueue* (range 10)) 4) > ;= [{0 1 2 3} [4 5] {6 7 8 9}] > > ;; like subseq, but returns an independent PSQ > (psq/subrange (psq/psqueue* (range 10)) >= 4 < 8) > ;= {4 5 6 7} > > > ;;; 7. priority-bounded traversals: > (psq/seq<= a-psq priority-upper-bound) > > ;; also supported: rseq<=, subseq<=, rsubseq<= (the latter two > ;; with subseq/rsubseq-like arguments to specify key bounds) and > ;; < variants of all of the above (seq< etc.) > > All of the above operations can be performed in O(log n) time, with > the exception of peek, which takes constant time, and seq-like > operations, which are sensitive to the number of entries actually > produced. Priority-bounded traversals take O(r(log n - log r) + r) > time, where r is the number of entries actually returned. > > In practice, psq.clj priority search queues are slower than Clojure's > built-in sorted maps for pure sorted-map operations, typically by a > factor of 2 to 4. In return, the PSQ operations are very performant; > subseq<= can be over three orders of magnitude faster than the > equivalent combination of subseq and filter applied to a regular > sorted map, subrange takes tens of microseconds when the size of the > input is on the order of hundreds of thousands, peek takes less than > 10 ns etc. See end of this message for some benchmark results. > > Mark Engelberg's excellent data.priority-map is another natural point > of comparison. In short, whereas data.priority-map is hash-map-like, > with extremely fast lookups and no particular ordering of keys, > psq.clj is sorted-map-like, with a much richer abstract data type at > the cost of noticeably slower lookups. assoc is often faster with > psq.clj, although there are edge cases where data.priority-map is > faster; dissoc is generally faster with data.priority-map. In concrete > usage scenarios, comparative benchmarks are advisable, as either data > structure may come out ahead depending on the specific mix of > operations. Of course psq.clj will be most beneficial in those use > cases which benefit from the expanded abstract data type, and any > lookup-heavy workloads where the common operations suffice will tend > to be better served by data.priority-map. > > I had the pleasure of announcing the initial 0.0.1 release of psq.clj > at EuroClojure 2016 in Bratislava: > > https://www.youtube.com/watch?v=pZ9gBKO_qCA > > To those of you who were there – many thanks for helping make this > year's EuroClojure such a great conference! Also, I have since > addressed a few points from my TODO list from the final slide (notably > psq.clj/nearest is there now, and so psq.clj really supports the full > data.avl abstract data type). > > Finally I should say that at this early stage both the public API and > the implementation strategy are subject to revision. The features that > are there, however, I do expect to work – a thorough generative test > suite based on collection-check > (https://github.com/ztellman/collection-check) and test.check > (https://github.com/clojure/test.check) is in place to ensure this. > > To include this version in your project: > > [psq.clj "0.0.2"] > > <dependency> > <groupId>psq.clj</groupId> > <artifactId>psq.clj</artifactId> > <version>0.0.2</version> > </dependency> > > compile "psq.clj:psq.clj:0.0.2" > > Cheers, > Michał > > > Benchmark results > ================= > > PSQ – psq.clj > PM – data.priority-map > SM – clojure.core/sorted-map > > Results obtained using Hugo Duncan's Criterium: > > https://github.com/hugoduncan/criterium > > NB. some extreme results are due to benchmarks hitting the best/worst > cases for each data structure. Where Criterium reported the results in > different units for different data structures, these have been changed > to make results comparable at first glance. > > N/A is indicated where an operation cannot be reasonably > implemented/emulated for a given data structure. > > > Basic map operations > -------------------- > > Except where otherwise noted, these benchmarks use 5000 key/value > pairs inserted in order of increasing keys and values: > > (psq/psqueue* (range 10000)) > (apply pm/priority-map (range 10000)) > (apply sorted-map (range 10000)) > > 1. (get input-map 0) > > PSQ: 1.367226 ns > PM: 62.910936 ns > SM: 159.694336 ns > > 2. (get input-map 5000) > > PSQ: 268.306762 ns > PM: 73.988311 ns > SM: 179.033148 ns > > 3. (get input-map 10000) > > PSQ: 552.692926 ns > PM: 56.753372 ns > SM: 248.005636 ns > > 4. (assoc input-map 0 -1) > > PSQ: 1.288109 µs > PM: 2.344063 µs > SM: 0.607151958 µs > > 5. (assoc input-map 5000 -1) > > PSQ: 1.822005 µs > PM: 2.573408 µs > SM: 0.646016659 µs > > 6. (assoc input-map 10000 -1) > > PSQ: 3.000240 µs > PM: 1.440096 µs > SM: 0.725093232 µs > > 7. (assoc input-map 0 10000) > > PSQ: 1.309970 µs > PM: 2.746458 µs > SM: 0.621439055 µs > > 8. (assoc input-map 5000 10000) > > PSQ: 1.711392 µs > PM: 2.834123 µs > SM: 0.663170561 µs > > 9. (assoc input-map 10000 10000) > > PSQ: 2.969847 µs > PM: 1.746856 µs > SM: 0.715915433 µs > > 10. (dissoc input-map 0) > > PSQ: 1.398323 µs > PM: 1.211380 µs > SM: 0.653975123 µs > > 11. (dissoc input-map 5000) > > PSQ: 1.668061 µs > PM: 1.367344 µs > SM: 0.740665097 µs > > 12. (dissoc input-map 10000) > > PSQ: 1.509873 µs > PM: 0.055210586 µs > SM: 0.335146431 µs > > 13. (peek input-map) > > PSQ: 4.254087 ns > PM: 487.922381 ns > SM: N/A > > 14. (pop input-map) > > PSQ: 0.762549879 µs > PM: 1.758735 µs > SM: N/A > > 15. (last (psq/priority-seq psq)) > (last (seq pm)) > > NB. data.priority-map implements seq in order of non-decreasing > priorities; psq.clj's seq respects key order and exposes > priority-ordered traversals using a separate function, > psq.clj/priority-seq. > > PSQ: 3.603032 ms > PM: 3.359997 ms > SM: N/A > > 16. (last (psq/subseq<= psq 2001 >= 2000 < 70000)) > (last (psq/seq<= (psq/subrange psq >= 2000 < 70000) 2001)) > (last (f sm 2001 >= 2000 < 700000)) > (last (g sm 2001 >= 2000 < 700000)) > > Here f and g are defined as > > (fn f [m ubound start-test start end-test end] > (filter #(not (pos? (.compare ^java.util.Comparator c > (val %) ubound))) > (subseq m start-test start end-test end))) > > (fn g [m ubound start-test start end-test end] > (filter #(<= (val %) ubound) > (subseq m start-test start end-test end))) > > PSQ: 26.714850 µs (subseq<=) > 25.213709 µs (seq<= + subrange) > > PM: 147542.945 µs (f) > 146825.592 µs (g) > > SM: N/A > > 17. Using (psq/psqueue* (range 1000000)) as input: > > (psq/split psq 500000) > > PSQ: 3.587069 µs > PM: N/A > SM: N/A > > 18. Using (psq/psqueue* (range 1000000)) as input: > > (psq/subrange psq > 250000 <= 750000) > > PSQ: 22.776483 µs > PM: N/A > SM: N/A > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.