Definitions ===========
set (n) : A data container that can hold many elements simultaneously. The order of elements in a set is meaningless. No two elements in a set may hold the same value.
junction (n) : The combination of several independent values and an explicit boolean predicate, to form an object that can behave as a single entity, until evaluation, when threading will occur.
thread (v) : To take a single statement or expression, and execute it one or more times, with each iteration using a different value or element from a junction or set that is participating in that expression. The results of each execution will be recombined in some fashion.
boolean expression (n) : An expression evaluated in a boolean context. Some notable boolean expressions: 1. Any of the comparison operators taken with their operands. 2. The test expression of an C< if > or C< while > statement. 3. The distant connection between a C< given > and it's associated C< when >s. 4. Type checking.
implicit threading (n) : Threading that is done on the user's behalf with no special syntax in the code declaring when and where it is happening. It is entirely at the implementation's judgment of the given circumstances when threading will take place.
explicit threading (n) : Threading done for the user's, on his request. The form of this request may vary, but there is some syntax that tells the implementation to thread a given section of code.
function (n) : Any subroutine in capable of returning a value. In Perl 6, this includes: operators, subs, methods, multi-methods, and more.
Junctions =========
The purpose of a junction is to allow for performing several tests at a given time, with the testing code needing no knowledge of that junctions are present. While a junction can represent several values at the same time, such notions as "hold" and "contain" should be avoided, and they can be very misleading, they betray the fact that a junction is a single entity. As such, junctions are best thought of as Scalar. (Like other scalars, it may reference to something much more complex, but that's beside the point). Therefore the result of a junction evaluation should be Scalar. (We will see later that this scalar value is in fact boolean).
But what is a "junction evaluation"? Simply put, it is the results of threading all values of a junction through an expression and the recombination of these results back into a scalar, via the logic of the associated boolean predicate.
Note the fact that it's a _boolean_ predicate. That means that the end result of any junction evaluation is a single boolean value. Therefore, the only place that it makes sense to evaluate a junction is in a place where a boolean value is expected. This brings us to our "Prime Junction Rule":
Junctions can only be evaluated as part of a boolean expression. Attempting to evaluate a junction anywhere else is an error.
Some comments about what this does for us.
- Threading in this case is purely implicit. This allows a user to drop a junction into any place a test parameter is called for. Asking for explicit threading here makes the concept useless.
- Nesting of junctions is not an issue, since each threaded evaluation is yet another boolean expression.
- Things like C< for $j .. 10 {...} >, are illegal. They didn't make much sense in the first place.
- In fact, most code written without junctions in mind will execute cleanly and as expected, since the result of the threading operation will be exactly what is expected: boolean.
- As such there should be no limitation as to where a junction can be stored and passed. Functions will have the ability to explicitly reject junctions as arguments by typing their parameters with something like C< all(any(Int, Str), none(Junction)) >. Yes, that's using junctions to reject junctions.
- There is still a limited case of "Nasty Side Effect Syndrome" that can be caused by having an assignment, output, or other undesirable type of function/operator as part of a boolean expression. There will therefore need to be a "is unautothreadable" trait to be had for functions. All builtin I/O and assignment functions should have this on. This protection is shallow. If the call is not in the text of the evaluated expression, it is not guaranteed to be tested.
- If you wanted to thread a junction in a non-boolean way, you probably didn't want a junction. You likely either wanted an array or a set, and some combination of hyper operators. See below.
Boolean Expressions ===================
Some common boolean expressions in context, with valid junction usage:
if $x == any(1,2,3,4) {...}
given $string { when any(/a .* q/, 'santa') {...} when all(&is_all_uppercase(), /^ \D+ $/) {...} }
if is_prime(sqrt(any(25,14,32))) {...}
Note that in the last case, it's the entire boolean expression that's
threaded. The breakdown would look like:
is_prime(sqrt(any(25,14,32))) --->
any(is_prime(sqrt(25)),
is_prime(sqrt(14)),
is_prime(sqrt(32))
)
The rule for what to thread is to take the smallest enclosing boolean
expression around the junction as possible. This expansion cannot extend
past a single statement.
Sets ====
Despite several surface similarities between a set and a junction, they are quite different. Sets actually contain other things. Junctions hold several values at once, and can even hold the same value more than once at once. Sets are inherently plural in thought process. Junctions are inherently singular. The operations one wishes to perform on a set are radically different from those performed on a junction.
Sets are also rather different from arrays. Arrays allow any given value to be duplicated, imply a certain order that remain, and other such usage issues. Not to say that an array can't _hold_ a set. In fact I'll be assuming that an array is used internally a bit later on.
Sets and hashes are an odd pair. Hashes are not a good basis for Sets due to the restriction that the keys are strings. Even if this is not the case, Sets do not need the related data value that hashes give you. But more on hashes and sets later.
So we need a new Set class. We'll give it a constructor called C< set() > (creative, huh?) which takes a list as input. And we'll redefine several operators around it:
my $x = set(1..3); my $y = set(1,3,5,7,9); my $n = 2;
$x + $y # set(1,2,3,5,7,9) $x + $n # set(1,2,3,5,7,9) $x * $y # set(1,3) $x - $y # set(2) $x < $y # $x is a proper subset of $y $x <= $y # $x is a subset of $y $x == $y # $x is the set $y $x ~~ $y # $x is the set $y $n < $y # $n is an element of $y $n ~~ $y # $n is an element of $y set() # empty set $x.powerset # power set of $x $x.elems # count of elements in set $x
(If this list looks familiar, it's because I stole off Damian, and made several edits.) In addition, a set evaluated in a list context returns it's members.
The proposed implementation is that of an internal sorted array. Yes, sorted. This makes the processing time of virtually every set operation dramatically faster. Membership is O(lg n) with a binary search; unions, intersections, set difference and others are all O(n).
However, since a Set can contain elements, we need to consider the sorting function. I don't care how it works, as long as the following holds:
Objects of the same class are grouped together. Objects of the same class are ordered by their traditional sort, if existent. Numbers increasing, Strings alphabetically, etc. It's recreatable. The same code running on any platform will sort any combination of the same data the same way. If the object in question supports a C< .key > method, that is used for the sort.
In addition to C< ~~ > for membership, we will borrow the {} subscripters from hashes (not sure about the <> or «» variants). In this form, a C< $set{$key} > reference will look for the element that matches the place in the sort identified by $key (in other words, the element with the matching .key() or value). If something is there, it then returns C< .value||$_ >, assuming $_ is the element at that place.
This seemingly bizarre set of semantics gives us something very powerful. One can make a Set of Pairs, which behaves much like a hash, only with the ability to use any data type as the key.
We can now completely eliminate Hashes and replace them with a Set of (String => Any) pairs. But we won't. Hashes are a special case. They do lookups in roughly linear time, not logarithmic. Not to mention they're just too useful and common as they are to mess with them much.
Note that we can also create a "sparse array" in a similar manner.
Sets will borrow at least the following methods from Arrays: grep, map, elems and these from hashes: keys, values, each, delete, exists
Since the set is internally sorted, any iteration of the set is sorted (by .key||$_) as well.
HyperQuotes ===========
Just to round out any corner cases that the existing hyper operators do not cover, and to add a clearer way to do some of them, we will allow any list surrounded with » «, called hyperquotes, to thread the immediate function, treating each element of the list as a scalar, and have the expression return a list of the results. Some samples:
@s = 'item' _ [EMAIL PROTECTED];
$x = 'abcdabc'; @x = split »/a/, /c/, /d/«, $x; # (['','bcd','bc'],['ab','dab',''],['abc','abc'])
@x = func($a, [EMAIL PROTECTED]);
With these and all the other hyper operator, one should be able to easy perform the equivalent threading operation that they lost with the Prime Junction Rule.
Since all the »'s and «'s floating around make this explicit threading, there are no nasty surprises laying around. You never have a scalar suddenly explode into a list, causing no end of havoc in unsuspecting code.
These are also infinitely more useful, since you can be guaranteed that they won't short circuit, and they are in exactly the order you specified.
So that's how I feel about things. The Prime Junction Rule was the big revelation I had the other night. The rest of this was the logical consequence spawned off that single thought to make it a complete idea.
I think that overall in this process I've done the following: - Kept the real power of Junctions intact. - Provided fairly strong protection for newbies, without sacrificing power. - Kept Nasty surprises to a minimum. - Got rid of the need for "half on" features. - Provided back any power that the Prime Rule removed through sets and expanded hyper ops.
-- Rod Adams