On Wed, Apr 28, 2010 at 2:39 PM, Douglas Philips <d...@mac.com> wrote: > If some function I call uses seq-contains? (so that it can get all the > wonderful seq-ness abstractness clojure offers) I should have to know that > internal detail and not pass in a map or set? or worse, fail to get an > optimization in that case? C'mon, this should be an easy protocol win (or > maybe I don't really understand the limits on what protocols can do.)
Doug, I also am not particularly enamored of the name seq-contains? I agree that it conjures up notions of Scheme's proliferation of names like list-ref, vector-ref, string-ref, etc. But I think you're not fully appreciating the complexity of the situation. This is not just about performance, but a question of two different possible semantics for "contains?", which is why it can't *only* be addressed with a protocol. The fundamental problem is that some Clojure data structures have a dual nature. For example, a vector can be thought of as a collection of items, or you can think of it as a mapping from integers to items. A hash map can be thought of as a map, or as a sequence of key-value pairs. The current "contains?" function is really a "contains-key?" function. It treats the input as a data structure with keys (a map or a set), and asks whether it contains a given key. This has been a tremendous source of confusion to some people who apply the contains? function to a vector, and are startled to discover that it is only asking whether the item is a *key* in the vector, (i.e., in the case of a vector, a valid index between 0 and length-1). Arguably, contains-key? would have been a much better name, and made the semantics much more clear. There are some people who believe that Clojure would benefit from a variant of "contains?" that views the input as a collection. But how to make this clear? That's why the proposal is for a new function called "seq-contains?". It's not so much a warning of "this thing has performance like a linear search", as much as, "it views your data as a sequence". So for example, calling seq-contains? on a vector will actually test whether the item is an element of the vector collection. Calling seq-contains? on a map will actually test whether a given [key value] pair is in the map (rather than just testing for the existence of a key). So what are some other options?: 1. Don't include seq-contains? The behavior you want can usually be achieved by using (some #{item} coll). Disadvantage - if you're testing to see if the collection contains nil, that won't work. 2. Rename contains? to contains-key? and make a function with the new semantics called contains? Disadvantage - breaks existing code. 3. Keep one function called contains?, but use protocols to achieve different semantics for different kind of data structures. So for sets and maps, contains? would work like a contains-key? function, but for vectors and other sequences, it would be a more intuitive contains-item? function. Disadvantage - semantics depends on knowing what kind of thing you're passing to the function, also might break existing code if anyone is actually currently using contains? on vectors. I'm not crazy about the name seq-contains? (and I'm skeptical that it even belongs in the core), but all the other solutions I can think of also have major disadvantages. That said, you make an excellent point about performance. Even though the new seq-contains? is intended to treat the underlying data structure as a sequence from a semantics standpoint, there is absolutely no reason at all why its performance should be bound by this. Clearly contains? and seq-contains? have identical semantics for a set, so for a set data structure the new seq-contains? should also use protocols to achieve the same speed, so seq-contains? performs identically on a set as contains? Similarly, even though the new seq-contains? will have a new meaning for hash maps (testing for a key-value pair rather than just a key), clearly it is possible to use protocols to perform this test far faster than a linear sequential search. So, if things do move forward with seq-contains? to create a new semantic notion in the clojure core of how to test for containment, I absolutely agree that protocols should be used to make this new notion to continue to have good performance on those sorts of collections for which it is possible to have superior performance than a sequential search. --Mark -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en