> I am very fond of the relational functions in Clojure. That was one of
> the first things that started winning me over actually.
Indeed, they're very nice to have!
> Forgive me if this is an obvious question, but what exactly is the
> disadvantage of the add-an-id approach?
It's largely aesthetic for me: I don't like the idea of having to
generate some identifier and decorate my data with it. From my
perspective it's a hack to turn a set into a multiset, which is the
concept I'm really working with (an unordered collection which
includes duplicates). One could argue that choosing a name for the ID
is not obviously easy, and that this is an approach that only works
well for maps/structs, but those problems don't apply in my case, so I
won't argue those points!
I haven't done any timing to determine if it's an expensive hack: this
is not time-critical code, so it doesn't matter much to me. For that
reason I'll probably stick with this approach, albeit well-commented
to explain to my future self why I'm temporarily introducing an
otherwise-unused ID!
I raised this whole issue not because I can't work around it, but
because I like to use the right tool for the job if it exists, and
maybe other people already built that tool. Who knows? perhaps Rich
has been considering spending an afternoon adding multisets to core,
and this is additional motivation. After all, we now have sorted-sets,
which is the other axis of set-hood...
> Or, another way, what would be
> substantially better about having multisets over just doing what
> you're doing? My understanding of relational theory and SQL (thanks
> largely to Joe Celko's books) makes me suspicious of needing
> cardinality—it sounds a lot like wanting access to the physical
> ordering on disk. Then again, a lot of my database tables wind up with
> a sort-order column or an auto-incrementing ID, I admit.
It depends on how "pure" your experience with relational algebra is :)
I've spent a lot of time with SPARQL, the RDF query language. It's
relational (much like SQL for the web), but it preserves cardinality
by default, but not ordering. (It has REDUCED and DISTINCT keywords to
discard duplicates if desired or permitted.)
Some people think preserving cardinality is an odd choice, given that
RDF is defined in terms of sets, not bags, but it has its uses.
Modeling event-like things (charges, in my case) in a pure relational
system -- one with set semantics -- typically requires the addition of
two things: a unique identifier to preserve otherwise-identical
events; and some ordering attribute, to preserve sequentiality in an
unordered system. Removing the "set-ness" (cardinality, un-
orderedness, or both) is another way to resolve the impedance mismatch.
> Of course, just because it violates relational theory doesn't mean it
> wouldn't be a great addition to the language. I'm curious.
>
> Would you mind sharing the code with the error for the calculation
> you're doing?
I'm afraid I can't share the exact code, but the simplified relational
part is something like:
(use 'clojure.set)
(defn example-charges
"Take a relation between charge and identifier, and a relation
between
identifier and client, and sum the charges for each client."
[charges-rel clients]
;; 5. Produce a sum charge for each client in a single map.
;; No need to apply merge-with: the index has unique keys.
(into {}
(map
;; 4. Turn the index into a numeric sum for each client.
(fn [[k v]]
[(:client k)
(reduce + (map :charge v))])
(index
(project
;; 1. Note that any identifiers not in the clients relation
will
;; simply disappear at this point.
(join
charges-rel
clients)
;; 2. Include :id in the projection to prevent set semantics.
[:client :charge :id])
;; 3. Now index from client to the projected relations.
#{:client}))))
E.g.,
(example-charges
#{{:charge 10 :identifier "12345abcdef" :id 0}
{:charge 10 :identifier "67890ghijkl" :id 1}
{:charge 15 :identifier "12345poiuyt" :id 2}}
#{{:identifier "12345abcdef" :client "Foocorp"}
{:identifier "67890ghijkl" :client "Foocorp"}
{:identifier "12345poiuyt" :client "Barcorp"}})
=> {"Foocorp" 20, "Barcorp" 15}
Omit the :id and we get this:
(example-charges
#{{:charge 10 :identifier "12345abcdef"}
{:charge 10 :identifier "67890ghijkl"}
{:charge 15 :identifier "12345poiuyt"}}
#{{:identifier "12345abcdef" :client "Foocorp"}
{:identifier "67890ghijkl" :client "Foocorp"}
{:identifier "12345poiuyt" :client "Barcorp"}})
=> {"Barcorp" 15, "Foocorp" 10}
Oops! We're going to under-charge Foocorp!
You get the same result if you omit the :id from the projection vector.
Thanks,
-R
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---