I just implemented Bag to the point where it passes the spectests.
(https://github.com/masonk/rakudo/commit/2668178c6ba90863538ea74cfdd287684a20c520)
However, in doing so, I discovered that I'm not really sure what Bags are
for, anymore.
The more I think about Bags and Sets, the more my brain hurts. They're a half
an EnumMap and half an Iterable that does Associative but not Positional.
However, I'm starting to believe that they are more like Iterables than
EnumMaps. When I imagine using them, I think of Sets as a cute way to operate
on the unique elements of an Iterable. I think of Bags / KeyBags as a way to
remove ordering, which is a generally useful thing (everything that I'm about
to say applies to both Bags and KeyBags, but I'm going to only talk about Bags
for the rest of this post). This is because, most of the time, we don't care
about ordering, and having ordering on all of our collections even when we
don't need it increases program complexity in time in a way that could be seen
as analogous to the way in which unnecessarily global variables increased the
space complexity of Perl 5.
I want to propose one major change to the Bag spec: When a Bag is used as an
Iterable, you get an Iterator that has each key in proportion to the number of
times it appears in the Bag.
With this one change to Bags, I could use them whenever I don't need ordering
in my lists - which is usually. Even though there are some side effects that
don't rely on ordering (e.g., incrementation), the majority of them do - so by
using this new kind of Bag, I would be reducing the complexity of my programs.
Now, since Sets already give us the distinct values, having Bags do the same
thing seems like redundant functionality, where we could be getting novel
functionality.
I'd like to anticipate one objection to this - the existence of the 'hyper'
operator/keyword. The hyper operator says, "I am taking responsibility for
this particular code block and promising that it can execute out of order and
concurrently". Creating a Bag instead of an Array says, "there is no meaning
to the ordering of this group of things, ever". Basically, if I know at
declaration time that my collection has no sense of ordering, then I shouldn't
have to annotate every iteration of that collection as having no sense of
ordering, which is nearly what hyper does (though, I readily admit, not quite,
because there are unordered ways to create race conditions).
I also have some convenience syntax suggestions. I do think this is important
because Bags and Sets are competing with Arrays. If they aren't as convenient
as Arrays to use, they won't get used - even though they're closer,
semantically, to what the developer wants in a lot of cases. First, we should
besigil Bags and Sets with @ instead of $. Without this convenience, I'm not
likely to replace my Arrays with Bags, because going through them in a loop or
map would be a pain compared to Arrays. If I have to say $bag.keys every
single time, forgettaboutit.
This, however, probably requires a change to S03, which says that the @ sigil
is a means of coercing the object to the "Positional (or Iterable?)" role. It
seems to me, based on the guiding principle that perl6 should support
functional idioms and side-effect free computing, the more fundamental and
important aspect of things with @ in front is that you can go through them one
by one, and not that they're ordered (since ordering is irrelevant in
functional computing, but iterating is not). My feeling is that we should
reserve the special syntax for the more fundamental of the two operations, so
as not to bias the programmer towards rigid sequentiality through syntax.
Second, I would be even more likely to replace my ordered lists with Bags if
there were a convenient operator for constructing Bags. I can't think of any
good non-letter symbols that aren't taken right now (suggestions welcome), but,
at least, &b and &s as aliases to bag and set would be convenient.
Bags and Sets thus updated would look like this in use:
C<
my @array = < a a b c >;
my @set = s...@array;
for s...@array { say $_ };
for @set { say $_ }; # same thing
# b«»a«»c«»
# ordering undefined
# most common use case for sets, I think, is "unique elements of @array", isn't
it?
hyper for @bag { ... };
# a«»b«»c«» a«»
# ordering undefined => less-thinking-required hyper
b< a b c c > === b< c c b a >
# Wouldn't this be the best way to make a comparison with these semantics?
# By the way, this useful idiom works as currently specced, but doesn't work in
my implementation
@bag{a}
# 2
@bag{<a b z>}
# 2, 1, 0
[+] bag @array{<a b z>}
# 3
# this is also neat for "How many a's, b's, and z's do I have?"
+...@bag
# 4
@bag[2]
# I can't think of a meaning for this - not Positional - S03 needs a change?
@bag.WHAT
# Bag()
@bag.pairs
# a => 2, b => 1, c => 1
# ordering undefined
@bag.values
# 2, 1, 1
# ordering undefined
Junctions:
Junctions seem like one time when we care more about the values than the keys,
because C<any|all|none|one> on @array and b...@array will have the same
behavior (if my suggestion above is taken with respect to @bag holding < a a b
c > out of order instead of < a b c > out of order), and for Sets, it's the
same story, with the added proviso that C<one> degenerates to C<any>. But
@bag.any > $x seems like a pretty useful idiom. It would feel inconsistent for
any(@bag) and @bag.any to do different things, however.
On Oct 26, 2010, at 12:57 AM, [email protected] wrote:
> Branch: refs/heads/master
> Home: http://github.com/perl6/specs
>
> Commit: 32511f7db34905c740ed1030a70995239f7cfb66
>
> http://github.com/perl6/specs/commit/32511f7db34905c740ed1030a70995239f7cfb66
> Author: TimToady <[email protected]>
> Date: 2010-10-25 (Mon, 25 Oct 2010)
>
> Changed paths:
> M S02-bits.pod
>
> Log Message:
> -----------
> [S02] be more explicit about iterating sets/bags
>
> The intent has always been that when you use a set or bag as a list,
> it behaves as a list of its keys, regardless of any underlying hash
> interface it might also respond to. You must use .pairs explicitly
> to get the hash pairs out of a set or bag as a list.
>
>