using the newer collection types

Darren Duncan Thu, 04 May 2006 14:46:08 -0700

As I carry on in my spare time to implement a Relation type for Perl6, I would like to use some of the simpler types that were added tothe Synopsis recently and seem to lack a lot of explanatory detailsthat older types have, and moreover they don't seem to be implementedyet in Pugs.

So I have a few questions whose answers should clarify the intendedmeaning and features of these newer types, as well as the syntax fordeclaring them.


Some relevant example types from Synopsis 6:

  Immutable types

  Objects with these types behave like values, i.e. C<$x === $y> is true
  if and only if their types and contents are identical.

    List        Lazy Perl list (composed of Seq and Range parts)
    Seq         Completely evaluated (hence immutable) sequence
    Range       Incrementally generated (hence lazy) sequence
    Set         Unordered Seqs that allow no duplicates
    Junction    Sets with additional behaviours
    Pair        Seq of two elements that serves as an one-element Mapping
    Mapping     Pairs with no duplicate keys
    Signature   Function parameters (left-hand side of a binding)
    Capture     Function call arguments (right-hand side of a binding)

  Mutable types

  Objects with these types have distinct C<.id> values.

    Array       Perl array
    Hash        Perl hash

The intended new Relation type could be described like this, if Icorrectly understand the meaning of the existing types:


  Immutable types

    Relation    Set of Mappings where all Mappings have the same keys

Speaking a little more technically, a Relation has 2 main components,its heading and its body. The heading is a set of 0..N keys (called"attributes" in relation-land), and the body is a set of 0..NMappings (called "tuples" in relation-land), where they set of keysof each Mapping is identical to the Relation's heading. Its verylikely that a language-embedded Relation implementation wouldactually not repeat the keys for each member Mapping, but we canconceptualize as if they were present for simplicity.

The operations that you can do with a Relation are a proper super-setof those you can do with a Set. So, the Relation type supports allthe same Set operators, with the same meanings, such as: equal(),subset(), superset(), union(), intersection(), difference(),symmetric_difference(), none(), any(), all(), member_exists(),members(), member_count(). Moreover, the Relation type has theseoperators that the Set type doesn't have: rename(), project(),restrict(), extend(), join(), divide(), summarize(), group(),ungroup(), wrap(), unwrap(), matching(), etc. Moreover, there wouldprobably be convenience wrapper functions over combinations of theabove operators such as insert(), update(), delete(), etc, thoughthey aren't essential (those examples are not mutators, despite theirname-sakes). Some extra operators like sort() would also beprovided, which convert Relations to Seqs or Arrays.


Now, some of the questions:

1. Are Sets or Junctions allowed to contain undefined elements? Canundef be a key of a Mapping or Hash?

2. What actually is the practical distinction between a Set and aJunction? Why would someone use one over the other? I recognizethat the use of Junctions is supposed to make parallelism easier, asiterating through one is known to be order independent. But,conceptually a Set and a Relation are exactly the same; you couldprocess their members in any order and/or in parallel as well. So isthe use of a Junction effectively like a compiler flag to makecertain kinds of Set ops faster at the expense of others?

3. Is a Signature like the keys of a Mapping but that it has extrastuff like associated types and such? Can one declare and use aSignature separately from declaring a function?

4. What is the syntax for declaring anonymous Sets and Mappings? Iam already aware of these syntax for other types (correct me if I'mwrong):


  $a = [1,2,3];         # Array
  $b = {'x'=>2,'y'=>4}; # Hash
  $c = (1=>2);          # Pair
  $d = (1,2,3);         # Seq
  $e = 1..5;            # Range
  $f = all(1,2,3);      # Junction

If this hasn't yet been decided, might I suggest the following?:

  $g = set(1,2,3);      # Set
  $h = ('x'=>2,'y'=>4); # Mapping

If that works, then perhaps an anonymous Relation declartion could look like:

  $r = relation( set( 'x', 'y' ): ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) );

I'm not particular with the exact syntax; it just needs to be something good.

Note that a terse form of this could leave out the headingdeclaration if at least one Mapping/tuple is provided, since thatcontains the same key list.


  $r = relation( ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) );

Then the heading declaration is only needed if the Relation has noMappings/tuples.


  $r = relation( set( 'x', 'y' ): );

5. What is the syntax for subscripting or extracting Mappingcomponents? Eg, can we use the same .keys, .values, .pairs, etc thatwe use for Hashes? Also, is it possible to directly get the keys ofa Mapping and/or a Hash as a Set, or is it more ideal to do the likesof all($mapping.keys) to get that behaviour?

6. Can I declare with named Set (or Junction) and Mapping typedvariables and/or parameters that their members are restricted toparticular types, such as Str, as I can with Arrays and Hashes, sothat Perl itself will catch violations? Eg, can I say as a parameter"Set of Str :$heading?" or "Set of Mapping(Str) of Any :$body?" soPerl will check that arguments are suchwise correct?

7. Can we add some operators to Mapping that are like the Relationones, so that implementing a Relation over Mappings is easier (or,see the end of #8)? Eg, these would be useful: rename(), project(),extend(), join(). In particular, implementing join() into Mappingwould help save CPU cycles:


  a. join() is an N-ary operator taking 0..N Mappings and returning 1 Mapping.
  b. If given zero Mappings, it returns an empty Mapping (no keys or values).

c. If given any undefined arguments, the result is undef;otherwise, continue as follows:

  d. If given one Mapping, it returns that Mapping.

e. If given 2..N Mappings, it merges each pair in turn (orderdoesn't matter) as follows.f. If the two input Mappings have no keys in common, the newMapping, consists of all keys and values of the sources.g. If the two input mappings have any keys in common, and theircorresponding values are also the same (as defined by the identityoperator ===), then the new Mapping contains one copy of eachkey/value in common, plus, the key/values where the keys weredifferent; as a trivial case of this, if all keys and values are thesame, aka the 2 Mappings as a whole are identical, the result isidentical to either of the inputs.h. If the two input mappings have any keys in common, but anycorresponding values are different, the output is undef.

8. While I avoided it so far for simplicity, I would like for it tobe possible to declare that each attribute of a Relation-valuedvariable or parameter, or anonymous value for that matter, isrestricted to a specific type (eg: Str, Int, ...) as we can do forArrays and Hashes; until now, the above descriptions assumed thattype was implicitly Any. In that case, such a declaration could looklike this, where the heading is a Mapping rather than a Set:

$r = relation( ( 'x' => Int, 'y' => Int ): ('x'=>2,'y'=>4),('x'=>5,'y'=>6) );

But if we do this, then I'm not sure that a Mapping would beappropriate any more as a separable Relation element, since I'm notaware that you can declare a Mapping variable or parameter with apredefined set of typed keys; eg:


  Mapping('x'=>Int,'y'=>Int) $x;
  $x = ('x'=>2,'y'=>4); # succeeds
  $x = ('x'=>'Hello'); # fails
  $x = ('z'=>5); # fails

So I'm wondering whether this may be a good excuse to have arelation-land Tuple type, which is like the Relation but whose bodyhas just one element:


  Tuple('x'=>Int,'y'=>Int) $x;
  $x = ('x'=>2,'y'=>4); # succeeds
  $x = ('x'=>'Hello'); # fails
  $x = ('z'=>5); # fails

Or:

  $t = tuple( set( 'x'=>Int,'y'=>Int ): 'x'=>2,'y'=>4 );
  $t = tuple( 'x'=>2,'y'=>4 );

If we have an actual relation-land Tuple type, then Mapping can bekept a lot simpler and/or the way it is now.

I'm inclined to rule out the Signature and Capture types to be usedhere, since while they have some similar properties to what I'mlooking for (a Relation header could be a Signature and its body aset of Captures), they seem function specific. But maybe you thinkdifferently?


So, any answers to my questions and/or feedback on my ideas is appreciated.

Thank you in advance.

-- Darren Duncan

using the newer collection types

Reply via email to