All,

I believe that there is some room for adding several new convenience operators or functions to Perl 6 that are used with Mapping and Hash values.
Or getting more to the point, I believe that the need for the 
relational data model concept of a tuple (a "tuple" where elements 
are addressed by name not position) would be satisfied by the 
existing Perl 6 data types of Mapping (immutable variant) and Hash 
(mutable variant), but that some common relational operations would 
be a lot easier to express if Perl 6 had a few more operators that 
make them concise.
Below I will name some of these operators that, AFAIK, don't exist 
yet in some form; since they are all pure functions, I will use the 
Mapping type in their pseudo-Perl-6 signatures, but Hash versions 
should exist too.  Or specifically, these should be part of the 
Mapping role, so anything that .does Mapping, such as a Hash, does 
them too?  Some of these operators are like those for sets, but 
aren't exactly the same due to plain set ops not working for mappings 
or hashes as a whole.
I want to emphasize that the operator names are those that are used 
in DBMS contexts, but you can of course name them something else in 
order for them to fit better into Perl 6; the importance is having 
some concise way to get the desired semantics.  Also, this 
functionality doesn't have to be with new operators, but could 
utilize existing ones if there is a concise way to do so.  Likewise, 
some could conceivably be macros, if it wouldn't impair performance.
I also want to emphasize that I see this functionality being 
generally useful, and that it shouldn't just be shunted off to a 
third-party module.
1.  join() aka natural_join():

        function join of Mapping (Mapping $m1, Mapping $m2) { ... }

This binary operator is conceptually like a set-union operator, in that it derives a Mapping that has all of the distinct keys and values of its 2 arguments, assuming any matching keys also have matching values. (Note that "matching" specifically means that === returns true, or if users get a choice, then that is its default meaning.)
	But if there are any matching keys with mismatching values, 
then this is a failure condition (they are incompatible), and the 
function returns undef instead (or fail, though given the anticipated 
use case, undef is more appropriate).  It is only possible for 2 
arguments to be incompatible if they have any keys in common; if they 
have none, the result is guaranteed to be defined/successful.  If the 
2 arguments have all keys in common, they must be equal, and the 
result is also equal to either.
	This join() function is both commutative and associative, and 
can generalize to N arguments.  Any equal arguments are redundant and 
so duplicates can be ignored.  Given 2 or more arguments, each is 
unioned pairwise until 1 remains.  Given 1 argument, the result is 
that argument.  Given zero arguments, the result is a Mapping with 
zero elements.  A zero-element Mapping is its identity value.
	So join() can be used as a reduction operator, with identity 
of the empty Mapping, but that it can return undef (or fail) instead 
if any 2 arguments have the same keys but different associated values.
        For examples:

        join( { a<1>, b<2> }, { b<2>, c<3> } )
                # returns { a<1>, b<2>, c<3> }
        join( { a<1>, b<2> }, { b<4>, c<3> } )
                # returns undef
        join( { a<1>, b<2> }, { c<3>, d<4> } )
                # returns { a<1>, b<2>, c<3>, d<4> }
        join( { a<1>, b<2> }, { a<1> } )
                # returns { a<1>, b<2> }
        join( { a<1> } )
                # returns { a<1> }
        join( { a<1> }, {} )
                # returns { a<1> }
        join()
                # returns {}

In practice, if a relation were implemented, say, as a set of Mapping, then the relational (natural) join could then be implemented sort of like this:
        function join of Relation (Relation $r1, Relation $r2) {
                return Relation( grep <-- $r1.values XjoinX $r2.values );
        }

That is, the relational (natural) join could then simply be implemented as a pairwise invocation of the tuple join between every tuple in each relation, keeping only the results that are defined.
	In this wider sense, a relational (natural) join is both an 
intersection in one dimension and a union in the other dimension.
	Now, I'm not currently asking for Relation to be implemented 
as a Perl 6 feature (it is actually more complicated than "set of 
mapping"), but if Mapping|Hash had an operator like I mentioned, it 
would be easier to make one on top of it; moreover, the Mapping|Hash 
could also implement the "heading" of a relation (a 
name-to-declared-type map), not just its "body" composed of tuples 
(each being a name-to-value map).
2.  semijoin() aka matching():

        function semijoin of Mapping (Mapping $m1, Mapping $m2) { ... }

This operator is like join() except that it will simply return $m1 if the arguments are compatible, rather than a new mapping. (This is assuming we're dealing with actual Mapping, which are immutable; depending on usage with a Hash instead, perhaps making a new Hash is desired?) Therefore, in a wider relational semijoin() contect, we are simply filtering $r1 by $r2. Note that a normal join() such that $m2 is a subset of $m1 is functionally a semijoin() anyway. Also, unlike join(), semijoin() is *not* commutative.
3.  semidifference() aka not_matching():

        function semidifference of Mapping (Mapping $m1, Mapping $m2) { ... }

This operator is the complement of semijoin(), in that, given the same 2 Mapping arguments, it would return $m1 when semijoin() would return undef, and return undef when semijoin() would return $m1.
4.  rename():

        function rename of Mapping (Mapping $m, Str $old_k, Str $new_k) { ... }

This operator takes one Mapping argument and derives another that is identical but that one of its existing keys was replaced with a different previously-nonexisting key, and the old key's value moved over to the new one, so just a key changed. It is invalid to rename a key to match another existing one, and should fail (not undef). Renaming a key to itself is a no-op.
	This operator can be generalized so that it renames N keys 
rather than just one, in which case the old_k/new_k args can, eg, be 
replaced by a Str<Str> hash argument; in the latter case, it is valid 
to swap 2 key names for each other.
5.  project() aka select():

        function project of Mapping (Mapping $m, @keys_to_keep) { ... }

This operator essentially takes a slice of the Mapping, but that the result is a Mapping too, keeping the values with the projected keys. @keys_to_keep can have zero elements or all of the source Mapping's elements, but specifying a key that isn't in the source is a failure condition.
6.  remove() aka delete() aka project_all_but():

        function remove of Mapping (Mapping $m, @keys_to_remove) { ... }

This operator is the same as project() but that it projects all source elements *except* those specified in @keys_to_remove.
7.  compose():

        function compose of Mapping (Mapping $m1, Mapping $m2) { ... }

This operator is to join() like symmetric_difference() on a set is to union() on a set. It is like a macro that first joins $m1 and $m2, then does a projection on the result so that only keys that were in just one of the source mappings is in the final result, and any keys in common are not.
8.  wrap():

        function wrap of Mapping (Mapping $m, Str @old_k, Str $new_k) { ... }

This operator takes one Mapping argument and derives another that is the same but that the 0..N elements with keys named by @old_k are removed, and then re-inserted as a single Mapping-typed element value whose key is $new_k. This operator fails if @old_k names any non-existant keys, or if $new_k matches an existing key that isn't in @old_k.
9.  unwrap():

        function unwrap of Mapping (Mapping $m, Str $old_k) { ... }

This operator is the inverse of wrap(); $old_k is the name of a Mapping-typed element value, and that element is replaced by the elements from the value. This operator fails if any element keys of the value for $old_k are the same as any other values in $m besides $old_k.
Okay, so that's more or less it.

Its possible that additional operators may be useful, but I haven't thought them through yet. (Also, some relational operators don't make sense just applied to individual tuples, and so they aren't mentioned above.)
Any feedback is appreciated.  Including both appropriate names for 
the semantics of the operators I mentioned, and/or comparably very 
concise syntax for doing the same with existing Perl 6 operators.
Thank you. -- Darren Duncan

Reply via email to