As I carry on in my spare time to implement a Relation type for Perl
6, I would like to use some of the simpler types that were added to
the Synopsis recently and seem to lack a lot of explanatory details
that older types have, and moreover they don't seem to be implemented
yet in Pugs.
So I have a few questions whose answers should clarify the intended
meaning and features of these newer types, as well as the syntax for
declaring them.
Some relevant example types from Synopsis 6:
Immutable types
Objects with these types behave like values, i.e. C<$x === $y> is true
if and only if their types and contents are identical.
List Lazy Perl list (composed of Seq and Range parts)
Seq Completely evaluated (hence immutable) sequence
Range Incrementally generated (hence lazy) sequence
Set Unordered Seqs that allow no duplicates
Junction Sets with additional behaviours
Pair Seq of two elements that serves as an one-element Mapping
Mapping Pairs with no duplicate keys
Signature Function parameters (left-hand side of a binding)
Capture Function call arguments (right-hand side of a binding)
Mutable types
Objects with these types have distinct C<.id> values.
Array Perl array
Hash Perl hash
The intended new Relation type could be described like this, if I
correctly understand the meaning of the existing types:
Immutable types
Relation Set of Mappings where all Mappings have the same keys
Speaking a little more technically, a Relation has 2 main components,
its heading and its body. The heading is a set of 0..N keys (called
"attributes" in relation-land), and the body is a set of 0..N
Mappings (called "tuples" in relation-land), where they set of keys
of each Mapping is identical to the Relation's heading. Its very
likely that a language-embedded Relation implementation would
actually not repeat the keys for each member Mapping, but we can
conceptualize as if they were present for simplicity.
The operations that you can do with a Relation are a proper super-set
of those you can do with a Set. So, the Relation type supports all
the same Set operators, with the same meanings, such as: equal(),
subset(), superset(), union(), intersection(), difference(),
symmetric_difference(), none(), any(), all(), member_exists(),
members(), member_count(). Moreover, the Relation type has these
operators that the Set type doesn't have: rename(), project(),
restrict(), extend(), join(), divide(), summarize(), group(),
ungroup(), wrap(), unwrap(), matching(), etc. Moreover, there would
probably be convenience wrapper functions over combinations of the
above operators such as insert(), update(), delete(), etc, though
they aren't essential (those examples are not mutators, despite their
name-sakes). Some extra operators like sort() would also be
provided, which convert Relations to Seqs or Arrays.
Now, some of the questions:
1. Are Sets or Junctions allowed to contain undefined elements? Can
undef be a key of a Mapping or Hash?
2. What actually is the practical distinction between a Set and a
Junction? Why would someone use one over the other? I recognize
that the use of Junctions is supposed to make parallelism easier, as
iterating through one is known to be order independent. But,
conceptually a Set and a Relation are exactly the same; you could
process their members in any order and/or in parallel as well. So is
the use of a Junction effectively like a compiler flag to make
certain kinds of Set ops faster at the expense of others?
3. Is a Signature like the keys of a Mapping but that it has extra
stuff like associated types and such? Can one declare and use a
Signature separately from declaring a function?
4. What is the syntax for declaring anonymous Sets and Mappings? I
am already aware of these syntax for other types (correct me if I'm
wrong):
$a = [1,2,3]; # Array
$b = {'x'=>2,'y'=>4}; # Hash
$c = (1=>2); # Pair
$d = (1,2,3); # Seq
$e = 1..5; # Range
$f = all(1,2,3); # Junction
If this hasn't yet been decided, might I suggest the following?:
$g = set(1,2,3); # Set
$h = ('x'=>2,'y'=>4); # Mapping
If that works, then perhaps an anonymous Relation declartion could look like:
$r = relation( set( 'x', 'y' ): ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) );
I'm not particular with the exact syntax; it just needs to be something good.
Note that a terse form of this could leave out the heading
declaration if at least one Mapping/tuple is provided, since that
contains the same key list.
$r = relation( ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) );
Then the heading declaration is only needed if the Relation has no
Mappings/tuples.
$r = relation( set( 'x', 'y' ): );
5. What is the syntax for subscripting or extracting Mapping
components? Eg, can we use the same .keys, .values, .pairs, etc that
we use for Hashes? Also, is it possible to directly get the keys of
a Mapping and/or a Hash as a Set, or is it more ideal to do the likes
of all($mapping.keys) to get that behaviour?
6. Can I declare with named Set (or Junction) and Mapping typed
variables and/or parameters that their members are restricted to
particular types, such as Str, as I can with Arrays and Hashes, so
that Perl itself will catch violations? Eg, can I say as a parameter
"Set of Str :$heading?" or "Set of Mapping(Str) of Any :$body?" so
Perl will check that arguments are suchwise correct?
7. Can we add some operators to Mapping that are like the Relation
ones, so that implementing a Relation over Mappings is easier (or,
see the end of #8)? Eg, these would be useful: rename(), project(),
extend(), join(). In particular, implementing join() into Mapping
would help save CPU cycles:
a. join() is an N-ary operator taking 0..N Mappings and returning 1 Mapping.
b. If given zero Mappings, it returns an empty Mapping (no keys or values).
c. If given any undefined arguments, the result is undef;
otherwise, continue as follows:
d. If given one Mapping, it returns that Mapping.
e. If given 2..N Mappings, it merges each pair in turn (order
doesn't matter) as follows.
f. If the two input Mappings have no keys in common, the new
Mapping, consists of all keys and values of the sources.
g. If the two input mappings have any keys in common, and their
corresponding values are also the same (as defined by the identity
operator ===), then the new Mapping contains one copy of each
key/value in common, plus, the key/values where the keys were
different; as a trivial case of this, if all keys and values are the
same, aka the 2 Mappings as a whole are identical, the result is
identical to either of the inputs.
h. If the two input mappings have any keys in common, but any
corresponding values are different, the output is undef.
8. While I avoided it so far for simplicity, I would like for it to
be possible to declare that each attribute of a Relation-valued
variable or parameter, or anonymous value for that matter, is
restricted to a specific type (eg: Str, Int, ...) as we can do for
Arrays and Hashes; until now, the above descriptions assumed that
type was implicitly Any. In that case, such a declaration could look
like this, where the heading is a Mapping rather than a Set:
$r = relation( ( 'x' => Int, 'y' => Int ): ('x'=>2,'y'=>4),
('x'=>5,'y'=>6) );
But if we do this, then I'm not sure that a Mapping would be
appropriate any more as a separable Relation element, since I'm not
aware that you can declare a Mapping variable or parameter with a
predefined set of typed keys; eg:
Mapping('x'=>Int,'y'=>Int) $x;
$x = ('x'=>2,'y'=>4); # succeeds
$x = ('x'=>'Hello'); # fails
$x = ('z'=>5); # fails
So I'm wondering whether this may be a good excuse to have a
relation-land Tuple type, which is like the Relation but whose body
has just one element:
Tuple('x'=>Int,'y'=>Int) $x;
$x = ('x'=>2,'y'=>4); # succeeds
$x = ('x'=>'Hello'); # fails
$x = ('z'=>5); # fails
Or:
$t = tuple( set( 'x'=>Int,'y'=>Int ): 'x'=>2,'y'=>4 );
$t = tuple( 'x'=>2,'y'=>4 );
If we have an actual relation-land Tuple type, then Mapping can be
kept a lot simpler and/or the way it is now.
I'm inclined to rule out the Signature and Capture types to be used
here, since while they have some similar properties to what I'm
looking for (a Relation header could be a Signature and its body a
set of Captures), they seem function specific. But maybe you think
differently?
So, any answers to my questions and/or feedback on my ideas is appreciated.
Thank you in advance.
-- Darren Duncan