As I carry on in my spare time to implement a Relation type for Perl 6, I would like to use some of the simpler types that were added to the Synopsis recently and seem to lack a lot of explanatory details that older types have, and moreover they don't seem to be implemented yet in Pugs.

So I have a few questions whose answers should clarify the intended meaning and features of these newer types, as well as the syntax for declaring them.

Some relevant example types from Synopsis 6:

  Immutable types

  Objects with these types behave like values, i.e. C<$x === $y> is true
  if and only if their types and contents are identical.

    List        Lazy Perl list (composed of Seq and Range parts)
    Seq         Completely evaluated (hence immutable) sequence
    Range       Incrementally generated (hence lazy) sequence
    Set         Unordered Seqs that allow no duplicates
    Junction    Sets with additional behaviours
    Pair        Seq of two elements that serves as an one-element Mapping
    Mapping     Pairs with no duplicate keys
    Signature   Function parameters (left-hand side of a binding)
    Capture     Function call arguments (right-hand side of a binding)

  Mutable types

  Objects with these types have distinct C<.id> values.

    Array       Perl array
    Hash        Perl hash

The intended new Relation type could be described like this, if I correctly understand the meaning of the existing types:

  Immutable types

    Relation    Set of Mappings where all Mappings have the same keys

Speaking a little more technically, a Relation has 2 main components, its heading and its body. The heading is a set of 0..N keys (called "attributes" in relation-land), and the body is a set of 0..N Mappings (called "tuples" in relation-land), where they set of keys of each Mapping is identical to the Relation's heading. Its very likely that a language-embedded Relation implementation would actually not repeat the keys for each member Mapping, but we can conceptualize as if they were present for simplicity.

The operations that you can do with a Relation are a proper super-set of those you can do with a Set. So, the Relation type supports all the same Set operators, with the same meanings, such as: equal(), subset(), superset(), union(), intersection(), difference(), symmetric_difference(), none(), any(), all(), member_exists(), members(), member_count(). Moreover, the Relation type has these operators that the Set type doesn't have: rename(), project(), restrict(), extend(), join(), divide(), summarize(), group(), ungroup(), wrap(), unwrap(), matching(), etc. Moreover, there would probably be convenience wrapper functions over combinations of the above operators such as insert(), update(), delete(), etc, though they aren't essential (those examples are not mutators, despite their name-sakes). Some extra operators like sort() would also be provided, which convert Relations to Seqs or Arrays.

Now, some of the questions:

1. Are Sets or Junctions allowed to contain undefined elements? Can undef be a key of a Mapping or Hash?

2. What actually is the practical distinction between a Set and a Junction? Why would someone use one over the other? I recognize that the use of Junctions is supposed to make parallelism easier, as iterating through one is known to be order independent. But, conceptually a Set and a Relation are exactly the same; you could process their members in any order and/or in parallel as well. So is the use of a Junction effectively like a compiler flag to make certain kinds of Set ops faster at the expense of others?

3. Is a Signature like the keys of a Mapping but that it has extra stuff like associated types and such? Can one declare and use a Signature separately from declaring a function?

4. What is the syntax for declaring anonymous Sets and Mappings? I am already aware of these syntax for other types (correct me if I'm wrong):

  $a = [1,2,3];         # Array
  $b = {'x'=>2,'y'=>4}; # Hash
  $c = (1=>2);          # Pair
  $d = (1,2,3);         # Seq
  $e = 1..5;            # Range
  $f = all(1,2,3);      # Junction

If this hasn't yet been decided, might I suggest the following?:

  $g = set(1,2,3);      # Set
  $h = ('x'=>2,'y'=>4); # Mapping

If that works, then perhaps an anonymous Relation declartion could look like:

  $r = relation( set( 'x', 'y' ): ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) );

I'm not particular with the exact syntax; it just needs to be something good.

Note that a terse form of this could leave out the heading declaration if at least one Mapping/tuple is provided, since that contains the same key list.

  $r = relation( ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) );

Then the heading declaration is only needed if the Relation has no Mappings/tuples.

  $r = relation( set( 'x', 'y' ): );

5. What is the syntax for subscripting or extracting Mapping components? Eg, can we use the same .keys, .values, .pairs, etc that we use for Hashes? Also, is it possible to directly get the keys of a Mapping and/or a Hash as a Set, or is it more ideal to do the likes of all($mapping.keys) to get that behaviour?

6. Can I declare with named Set (or Junction) and Mapping typed variables and/or parameters that their members are restricted to particular types, such as Str, as I can with Arrays and Hashes, so that Perl itself will catch violations? Eg, can I say as a parameter "Set of Str :$heading?" or "Set of Mapping(Str) of Any :$body?" so Perl will check that arguments are suchwise correct?

7. Can we add some operators to Mapping that are like the Relation ones, so that implementing a Relation over Mappings is easier (or, see the end of #8)? Eg, these would be useful: rename(), project(), extend(), join(). In particular, implementing join() into Mapping would help save CPU cycles:

  a. join() is an N-ary operator taking 0..N Mappings and returning 1 Mapping.
  b. If given zero Mappings, it returns an empty Mapping (no keys or values).
c. If given any undefined arguments, the result is undef; otherwise, continue as follows:
  d. If given one Mapping, it returns that Mapping.
e. If given 2..N Mappings, it merges each pair in turn (order doesn't matter) as follows. f. If the two input Mappings have no keys in common, the new Mapping, consists of all keys and values of the sources. g. If the two input mappings have any keys in common, and their corresponding values are also the same (as defined by the identity operator ===), then the new Mapping contains one copy of each key/value in common, plus, the key/values where the keys were different; as a trivial case of this, if all keys and values are the same, aka the 2 Mappings as a whole are identical, the result is identical to either of the inputs. h. If the two input mappings have any keys in common, but any corresponding values are different, the output is undef.

8. While I avoided it so far for simplicity, I would like for it to be possible to declare that each attribute of a Relation-valued variable or parameter, or anonymous value for that matter, is restricted to a specific type (eg: Str, Int, ...) as we can do for Arrays and Hashes; until now, the above descriptions assumed that type was implicitly Any. In that case, such a declaration could look like this, where the heading is a Mapping rather than a Set:

$r = relation( ( 'x' => Int, 'y' => Int ): ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) );

But if we do this, then I'm not sure that a Mapping would be appropriate any more as a separable Relation element, since I'm not aware that you can declare a Mapping variable or parameter with a predefined set of typed keys; eg:

  Mapping('x'=>Int,'y'=>Int) $x;
  $x = ('x'=>2,'y'=>4); # succeeds
  $x = ('x'=>'Hello'); # fails
  $x = ('z'=>5); # fails

So I'm wondering whether this may be a good excuse to have a relation-land Tuple type, which is like the Relation but whose body has just one element:

  Tuple('x'=>Int,'y'=>Int) $x;
  $x = ('x'=>2,'y'=>4); # succeeds
  $x = ('x'=>'Hello'); # fails
  $x = ('z'=>5); # fails

Or:

  $t = tuple( set( 'x'=>Int,'y'=>Int ): 'x'=>2,'y'=>4 );
  $t = tuple( 'x'=>2,'y'=>4 );

If we have an actual relation-land Tuple type, then Mapping can be kept a lot simpler and/or the way it is now.

I'm inclined to rule out the Signature and Capture types to be used here, since while they have some similar properties to what I'm looking for (a Relation header could be a Signature and its body a set of Captures), they seem function specific. But maybe you think differently?

So, any answers to my questions and/or feedback on my ideas is appreciated.

Thank you in advance.

-- Darren Duncan

Reply via email to