Hi,

I am pleased to announce the 0.0.2 release of psq.clj, a Clojure
priority search queue library based on Ralf Hinze's priority search
pennants (see Ralf Hinze, "A Simple Implementation Technique for
Priority Search Queues"):

  https://github.com/michalmarczyk/psq.clj

  https://clojars.org/psq.clj

Priority search queues simultaneously behave like sorted maps with
respect to their keys and like priority queues with respect to their
entries, with the priority queue ordering based on the values, or
priorities, attached to the keys. Additionally they support several
operations that blend the sorted map and priority queue aspects
together, such as "peek in key range" (retrieve a minimum-priority
entry – NB. ties on priorities are supported – within the given key
range) and optimized subseq-like traversals of key ranges that only
return those entries whose priorities fall below a certain threshold
(this is much more efficient than subseq + filter on a regular sorted
map).

psq.clj priority search queues support the full data.avl abstract data
type (with the exception of the transient API which I am not sure
would bring a significant return on the additional memory cost in this
context). The full public API is as follows:

  ;;; there is a single public namespace:
  (require '[psq.clj :as psq])


  ;;; 1. factory functions

  ;; the psq.clj counterpart to clojure.core/sorted-map:
  (psq/psqueue key priority …)
  ;= {key priority …}

  ;; a version of the above that takes keys+priorities in a seqable:
  (psq/psqueue* [key priority …])

  ;; a factory that accepts a map or seqable of map entries:
  (psq/psq {key priority …})
  (psq/psq [[key priority] …])

  ;; versions of the above that take *two* custom comparators: the
  ;; first one determines the PSQ's key order, the second one is used
  ;; for priorities:
  (psq/psqueue-by > > key priority …) ; reverse numeric order on keys
                                      ; and priorities

  psq/psqueue-by*, psq/psq-by ; like psq/psqueue*, psq/psq, but with
                              ; custom comparators


  ;;; 2. regular sorted map API; NB. (r)(sub)seq use key order

  (seq (psq/psqueue 0 10 1 9))
  ;= ([0 10] [1 9])

  ;; also supported: assoc, dissoc, conj, rseq, subseq, rsubseq


  ;;; 3. nearest neighbour lookups
  (psq/nearest (psq/psq {0 1 4 5 9 10}) > 3)
  ;= [4 5]


  ;;; 4. nth, rank in key order

  (nth (psq/psqueue 0 3 6 -3) 0)
  ;= [0 3]

  (psq/rank (psq/psqueue 0 3 6 -3) 6) ; returns long, -1 for not found
  ;= 1


  ;;; 5. priority queue API based on values/priorities

  (peek (psq/psqueue 0 3 1 -3))
  ;= [1 -3]

  (pop (psq/psqueue 0 3 1 -3))
  ;= {0 3}

  ;; seq over the PSQ in order of non-decreasing priorities
  (psq/priority-seq (psq/psqueue 0 3 1 -3))
  ;= ([1 -3] [0 3])


  ;;; 6. splits, subranges – with structural sharing in the common
  ;;;    parts, but no holding on to keys outside the stated range for
  ;;;    GC purposes

  ;; return a vector of
  ;;
  ;;   1. a fully independent PSQ comprising the entries of the
  ;;      input PSQ to the left of the given key,
  ;;
  ;;   2. the entry at the given key, or nil if not present,
  ;;
  ;;   3. a fully independent PSQ comprising the entries of the
  ;;      input PSQ to the right of the given key:
  (psq/split (psq/psqueue* (range 10)) 4)
  ;= [{0 1 2 3} [4 5] {6 7 8 9}]

  ;; like subseq, but returns an independent PSQ
  (psq/subrange (psq/psqueue* (range 10)) >= 4 < 8)
  ;= {4 5 6 7}


  ;;; 7. priority-bounded traversals:
  (psq/seq<= a-psq priority-upper-bound)

  ;; also supported: rseq<=, subseq<=, rsubseq<= (the latter two
  ;; with subseq/rsubseq-like arguments to specify key bounds) and
  ;; < variants of all of the above (seq< etc.)

All of the above operations can be performed in O(log n) time, with
the exception of peek, which takes constant time, and seq-like
operations, which are sensitive to the number of entries actually
produced. Priority-bounded traversals take O(r(log n - log r) + r)
time, where r is the number of entries actually returned.

In practice, psq.clj priority search queues are slower than Clojure's
built-in sorted maps for pure sorted-map operations, typically by a
factor of 2 to 4. In return, the PSQ operations are very performant;
subseq<= can be over three orders of magnitude faster than the
equivalent combination of subseq and filter applied to a regular
sorted map, subrange takes tens of microseconds when the size of the
input is on the order of hundreds of thousands, peek takes less than
10 ns etc. See end of this message for some benchmark results.

Mark Engelberg's excellent data.priority-map is another natural point
of comparison. In short, whereas data.priority-map is hash-map-like,
with extremely fast lookups and no particular ordering of keys,
psq.clj is sorted-map-like, with a much richer abstract data type at
the cost of noticeably slower lookups. assoc is often faster with
psq.clj, although there are edge cases where data.priority-map is
faster; dissoc is generally faster with data.priority-map. In concrete
usage scenarios, comparative benchmarks are advisable, as either data
structure may come out ahead depending on the specific mix of
operations. Of course psq.clj will be most beneficial in those use
cases which benefit from the expanded abstract data type, and any
lookup-heavy workloads where the common operations suffice will tend
to be better served by data.priority-map.

I had the pleasure of announcing the initial 0.0.1 release of psq.clj
at EuroClojure 2016 in Bratislava:

  https://www.youtube.com/watch?v=pZ9gBKO_qCA

To those of you who were there – many thanks for helping make this
year's EuroClojure such a great conference! Also, I have since
addressed a few points from my TODO list from the final slide (notably
psq.clj/nearest is there now, and so psq.clj really supports the full
data.avl abstract data type).

Finally I should say that at this early stage both the public API and
the implementation strategy are subject to revision. The features that
are there, however, I do expect to work – a thorough generative test
suite based on collection-check
(https://github.com/ztellman/collection-check) and test.check
(https://github.com/clojure/test.check) is in place to ensure this.

To include this version in your project:

  [psq.clj "0.0.2"]

  <dependency>
    <groupId>psq.clj</groupId>
    <artifactId>psq.clj</artifactId>
    <version>0.0.2</version>
  </dependency>

  compile "psq.clj:psq.clj:0.0.2"

Cheers,
Michał


Benchmark results
=================

PSQ – psq.clj
PM – data.priority-map
SM – clojure.core/sorted-map

Results obtained using Hugo Duncan's Criterium:

  https://github.com/hugoduncan/criterium

NB. some extreme results are due to benchmarks hitting the best/worst
cases for each data structure. Where Criterium reported the results in
different units for different data structures, these have been changed
to make results comparable at first glance.

N/A is indicated where an operation cannot be reasonably
implemented/emulated for a given data structure.


Basic map operations
--------------------

Except where otherwise noted, these benchmarks use 5000 key/value
pairs inserted in order of increasing keys and values:

  (psq/psqueue* (range 10000))
  (apply pm/priority-map (range 10000))
  (apply sorted-map (range 10000))

 1. (get input-map 0)

    PSQ:   1.367226 ns
    PM:   62.910936 ns
    SM:  159.694336 ns

 2. (get input-map 5000)

    PSQ: 268.306762 ns
    PM:   73.988311 ns
    SM:  179.033148 ns

 3. (get input-map 10000)

    PSQ: 552.692926 ns
    PM:   56.753372 ns
    SM:  248.005636 ns

 4. (assoc input-map 0 -1)

    PSQ: 1.288109 µs
    PM:  2.344063 µs
    SM:  0.607151958 µs

 5. (assoc input-map 5000 -1)

    PSQ: 1.822005 µs
    PM:  2.573408 µs
    SM:  0.646016659 µs

 6. (assoc input-map 10000 -1)

    PSQ: 3.000240 µs
    PM:  1.440096 µs
    SM:  0.725093232 µs

 7. (assoc input-map 0 10000)

    PSQ: 1.309970 µs
    PM:  2.746458 µs
    SM:  0.621439055 µs

 8. (assoc input-map 5000 10000)

    PSQ: 1.711392 µs
    PM:  2.834123 µs
    SM:  0.663170561 µs

 9. (assoc input-map 10000 10000)

    PSQ: 2.969847 µs
    PM:  1.746856 µs
    SM:  0.715915433 µs

10. (dissoc input-map 0)

    PSQ: 1.398323 µs
    PM:  1.211380 µs
    SM:  0.653975123 µs

11. (dissoc input-map 5000)

    PSQ: 1.668061 µs
    PM:  1.367344 µs
    SM:  0.740665097 µs

12. (dissoc input-map 10000)

    PSQ: 1.509873 µs
    PM:  0.055210586 µs
    SM:  0.335146431 µs

13. (peek input-map)

    PSQ:   4.254087 ns
    PM:  487.922381 ns
    SM:  N/A

14. (pop input-map)

    PSQ: 0.762549879 µs
    PM:  1.758735 µs
    SM:  N/A

15. (last (psq/priority-seq psq))
    (last (seq pm))

    NB. data.priority-map implements seq in order of non-decreasing
    priorities; psq.clj's seq respects key order and exposes
    priority-ordered traversals using a separate function,
    psq.clj/priority-seq.

    PSQ: 3.603032 ms
    PM:  3.359997 ms
    SM:  N/A

16. (last (psq/subseq<= psq 2001 >= 2000 < 70000))
    (last (psq/seq<= (psq/subrange psq >= 2000 < 70000) 2001))
    (last (f sm 2001 >= 2000 < 700000))
    (last (g sm 2001 >= 2000 < 700000))

    Here f and g are defined as

      (fn f [m ubound start-test start end-test end]
        (filter #(not (pos? (.compare ^java.util.Comparator c
                              (val %) ubound)))
          (subseq m start-test start end-test end)))

      (fn g [m ubound start-test start end-test end]
        (filter #(<= (val %) ubound)
          (subseq m start-test start end-test end)))

    PSQ:     26.714850 µs (subseq<=)
             25.213709 µs (seq<= + subrange)

    PM:  147542.945 µs (f)
         146825.592 µs (g)

    SM:  N/A

17. Using (psq/psqueue* (range 1000000)) as input:

    (psq/split psq 500000)

    PSQ: 3.587069 µs
    PM:  N/A
    SM:  N/A

18. Using (psq/psqueue* (range 1000000)) as input:

    (psq/subrange psq > 250000 <= 750000)

    PSQ: 22.776483 µs
    PM:  N/A
    SM:  N/A

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to