Re: 'contains?' on 'sorted-set-by' based on the comparator only? (possible bug?)

mnicky Sun, 25 Mar 2012 07:03:39 -0700

Wow. Thanks, Cedric. I'll definitely look into those options...

On Sunday, March 25, 2012 4:28:04 AM UTC+2, Cedric Greevey wrote:
>
> On Sat, Mar 24, 2012 at 9:39 PM, Cedric Greevey <cgree...@gmail.com> 
> wrote:
> > In increasing order of difficulty...
> >
> > Option 1:
> >
> > Extend your comparator to sort first on the key you're actually
> > interested in, then if that key isn't different on the others
> > more-or-less arbitrarily.
>
> Or use this:
>
> (defn po->to [po]
>   (let [m (atom #{})]
>     (fn [a b]
>       (or
>         (po a b)
>         (and
>           (not (po b a))
>           (or
>             (< (hash a) (hash b))
>             (and
>               (== (hash a) (hash b))
>               (not= a b)
>               (or (contains? @m [a b])
>                 (and
>                   (not (contains? @m [b a]))
>                   (swap! m conj [a b]))))))))))
>
> (defn sorted-set-po [po & keys]
>   (apply sorted-set-by (po->to po) keys))
>
> Input: a partial order (may be true for a b, true for b a, or false for 
> both).
>
> Output: a sorted-set that uses a total order built from the partial
> order. For things where the po says neither a < b nor b < a, it will
> first compare their hashes. If those don't collide, the order of a and
> b is decided that way. If they do, and a and b are not actually equal,
> it will arbitrarily pick one to be "less than" the other and remember
> its decision so that the ordering of unequal, hash-colliding items
> will remain consistent.
>
> Notes:
> 1. The memory use can grow over time due to the memoization-like
> behavior using the atom. However, this only occurs for items with both
> order and hash collisions, which should be extraordinarily infrequent.
> (If hashes were uniform on [0,2^32-1] you'd need an average of around
> 65,000 items before, per the birthday paradox, a single collision was
> likely within your collection. And then it would probably be for a
> pair of items that po considered unequal.) So the memory use growth
> should be small (for collections of thousands to millions of items) to
> nonexistent (for smaller ones).
>
> 2. Two sorted-set-pos with the same items, where they were added in
> different sequences, may not sort them in the same order, though this
> should be rare (again tied to the frequency of order+hash collisions).
> But this is not really a problem since, if you're sorting with a
> partial order, it makes sense to consider the ordering within equality
> buckets of the partial order to be undefined (the way ordering of an
> entire hash-set is undefined) and thus does not make sense to make
> program semantics depend on the ordering. Also, all sets compare equal
> purely by having the same contents (I checked APersistentSet.java to
> be sure) so the two sorted-set-pos would still compare equal despite
> traversing their elements in slightly different sequences.
>
> Here's a clooj REPL session testing po->to. Note that the last tests
> intentionally induce an order+hash collision of unequal items and
> check that the same instance of a po->to produces consistent results
> when given the colliding items in either order (that is, it always
> returns logical true when given the items in one order and always
> returns logical false when given them in the opposite order). The very
> last tests verify that it always returns false for equal items.
>
> user=>
> (def foo (po->to #(< (:a %1) (:a %2))))
> #'user/foo
>
> user=>
> (foo {:a 1 :b 1} {:a 2 :b 1})
> true
>
> user=>
> (foo {:a 1 :b 1} {:a 2 :b 1})
> true
>
> user=>
> (foo {:a 1 :b 1} {:a 2 :b 1})
> true
>
> user=>
> (foo {:a 2 :b 1} {:a 1 :b 1})
> false
>
> user=>
> (foo {:a 2 :b 1} {:a 1 :b 1})
> false
>
> user=>
> (foo {:a 1 :b 1} {:a 1 :b 2})
> true
>
> user=>
> (foo {:a 1 :b 1} {:a 1 :b 2})
> true
>
> user=>
> (foo {:a 1 :b 1} {:a 1 :b 2})
> true
>
> user=>
> (foo {:a 1 :b 2} {:a 1 :b 1})
> false
>
> user=>
> (foo {:a 1 :b 2} {:a 1 :b 1})
> false
>
> user=>
> (foo {:a 1 :b 2} {:a 1 :b 1})
> false
>
> user=>
> (hash {:a 1 :b 1 :c 1})
> -1253235521
>
> user=>
> (hash {:a 1 :b 1 :c 2})
> -1253235522
>
> user=>
> (hash {:a 1 :b 2 :c 1})
> -1253235520
>
> user=>
> (hash {:a 1 :b 0 :c 1})
> -1253235522
>
> user=>
> (foo {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
> #{[{:a 1, :c 2, :b 1} {:a 1, :c 1, :b 0}]}
>
> user=>
> (foo {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
> true
>
> user=>
> (foo {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
> true
>
> user=>
> (foo {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
> false
>
> user=>
> (foo {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
> false
>
> user=>
> (foo {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
> false
>
> user=>
> (def bar (po->to #(< (:a %1) (:a %2))))
> #'user/bar
>
> user=>
> (bar {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
> #{[{:a 1, :c 1, :b 0} {:a 1, :c 2, :b 1}]}
>
> user=>
> (bar {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
> true
>
> user=>
> (bar {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
> false
>
> user=>
> (bar {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
> false
>
> user=>
> (bar {:a 1 :b 1 :c 2} {:a 1 :b 1 :c 2})
> false
>
> user=>
> (bar {:a 1 :b 1 :c 2} {:a 1 :b 1 :c 2})
> false
>
> user=>
> (bar {:a 1 :b 0 :c 1} {:a 1 :b 0 :c 1})
> false
>
> user=>
> (bar {:a 1 :b 0 :c 1} {:a 1 :b 0 :c 1})
> false
>


On Sunday, March 25, 2012 4:28:04 AM UTC+2, Cedric Greevey wrote:
>
> On Sat, Mar 24, 2012 at 9:39 PM, Cedric Greevey <cgree...@gmail.com> 
> wrote:
> > In increasing order of difficulty...
> >
> > Option 1:
> >
> > Extend your comparator to sort first on the key you're actually
> > interested in, then if that key isn't different on the others
> > more-or-less arbitrarily.
>
> Or use this:
>
> (defn po->to [po]
>   (let [m (atom #{})]
>     (fn [a b]
>       (or
>         (po a b)
>         (and
>           (not (po b a))
>           (or
>             (< (hash a) (hash b))
>             (and
>               (== (hash a) (hash b))
>               (not= a b)
>               (or (contains? @m [a b])
>                 (and
>                   (not (contains? @m [b a]))
>                   (swap! m conj [a b]))))))))))
>
> (defn sorted-set-po [po & keys]
>   (apply sorted-set-by (po->to po) keys))
>
> Input: a partial order (may be true for a b, true for b a, or false for 
> both).
>
> Output: a sorted-set that uses a total order built from the partial
> order. For things where the po says neither a < b nor b < a, it will
> first compare their hashes. If those don't collide, the order of a and
> b is decided that way. If they do, and a and b are not actually equal,
> it will arbitrarily pick one to be "less than" the other and remember
> its decision so that the ordering of unequal, hash-colliding items
> will remain consistent.
>
> Notes:
> 1. The memory use can grow over time due to the memoization-like
> behavior using the atom. However, this only occurs for items with both
> order and hash collisions, which should be extraordinarily infrequent.
> (If hashes were uniform on [0,2^32-1] you'd need an average of around
> 65,000 items before, per the birthday paradox, a single collision was
> likely within your collection. And then it would probably be for a
> pair of items that po considered unequal.) So the memory use growth
> should be small (for collections of thousands to millions of items) to
> nonexistent (for smaller ones).
>
> 2. Two sorted-set-pos with the same items, where they were added in
> different sequences, may not sort them in the same order, though this
> should be rare (again tied to the frequency of order+hash collisions).
> But this is not really a problem since, if you're sorting with a
> partial order, it makes sense to consider the ordering within equality
> buckets of the partial order to be undefined (the way ordering of an
> entire hash-set is undefined) and thus does not make sense to make
> program semantics depend on the ordering. Also, all sets compare equal
> purely by having the same contents (I checked APersistentSet.java to
> be sure) so the two sorted-set-pos would still compare equal despite
> traversing their elements in slightly different sequences.
>
> Here's a clooj REPL session testing po->to. Note that the last tests
> intentionally induce an order+hash collision of unequal items and
> check that the same instance of a po->to produces consistent results
> when given the colliding items in either order (that is, it always
> returns logical true when given the items in one order and always
> returns logical false when given them in the opposite order). The very
> last tests verify that it always returns false for equal items.
>
> user=>
> (def foo (po->to #(< (:a %1) (:a %2))))
> #'user/foo
>
> user=>
> (foo {:a 1 :b 1} {:a 2 :b 1})
> true
>
> user=>
> (foo {:a 1 :b 1} {:a 2 :b 1})
> true
>
> user=>
> (foo {:a 1 :b 1} {:a 2 :b 1})
> true
>
> user=>
> (foo {:a 2 :b 1} {:a 1 :b 1})
> false
>
> user=>
> (foo {:a 2 :b 1} {:a 1 :b 1})
> false
>
> user=>
> (foo {:a 1 :b 1} {:a 1 :b 2})
> true
>
> user=>
> (foo {:a 1 :b 1} {:a 1 :b 2})
> true
>
> user=>
> (foo {:a 1 :b 1} {:a 1 :b 2})
> true
>
> user=>
> (foo {:a 1 :b 2} {:a 1 :b 1})
> false
>
> user=>
> (foo {:a 1 :b 2} {:a 1 :b 1})
> false
>
> user=>
> (foo {:a 1 :b 2} {:a 1 :b 1})
> false
>
> user=>
> (hash {:a 1 :b 1 :c 1})
> -1253235521
>
> user=>
> (hash {:a 1 :b 1 :c 2})
> -1253235522
>
> user=>
> (hash {:a 1 :b 2 :c 1})
> -1253235520
>
> user=>
> (hash {:a 1 :b 0 :c 1})
> -1253235522
>
> user=>
> (foo {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
> #{[{:a 1, :c 2, :b 1} {:a 1, :c 1, :b 0}]}
>
> user=>
> (foo {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
> true
>
> user=>
> (foo {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
> true
>
> user=>
> (foo {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
> false
>
> user=>
> (foo {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
> false
>
> user=>
> (foo {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
> false
>
> user=>
> (def bar (po->to #(< (:a %1) (:a %2))))
> #'user/bar
>
> user=>
> (bar {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
> #{[{:a 1, :c 1, :b 0} {:a 1, :c 2, :b 1}]}
>
> user=>
> (bar {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
> true
>
> user=>
> (bar {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
> false
>
> user=>
> (bar {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
> false
>
> user=>
> (bar {:a 1 :b 1 :c 2} {:a 1 :b 1 :c 2})
> false
>
> user=>
> (bar {:a 1 :b 1 :c 2} {:a 1 :b 1 :c 2})
> false
>
> user=>
> (bar {:a 1 :b 0 :c 1} {:a 1 :b 0 :c 1})
> false
>
> user=>
> (bar {:a 1 :b 0 :c 1} {:a 1 :b 0 :c 1})
> false
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: 'contains?' on 'sorted-set-by' based on the comparator only? (possible bug?)

Reply via email to