Re: relational data models and Perl 6

2005-12-15 Thread Darren Duncan

At 2:54 AM + 12/15/05, Luke Palmer wrote:

On 12/15/05, Darren Duncan <[EMAIL PROTECTED]> wrote:

 I propose, perhaps redundantly, that Perl 6 include a complete set of
 native


Okay, I'm with you here.  Just please stop saying "native" and "core".
 Everyone.


Yes, of course.  What I meant was that I considered relational data 
important enough for common programming to be considered by the Perl 
6 language designers, so that the language allows for it to be 
elegantly represented and processed.  The implementation details 
aren't that important.



I would like to hear from Ovid and Dave Rolsky on this issue too, as
they seem to have been researching pure relational models.


As am I now.  My own database access framework in development is 
evolving to be centered more around an ideal relational model rather 
than simply what SQL or existing databases define.  It does any 
serious database developer good to be familiar with what the 
relational model actually says, and not just what tangential things 
have actually been implemented by various vendors.  The sources I 
cited are good reference and/or explanatory materials.



 > Essentially it comes down to better handling of data sets.

Cool.  I've recently been taken by list comprehensions, and I keep
seeing "set comprehensions" in my math classes.  Maybe we can steal
some similar notation.


You probably could; the terms used in relational theory are mostly or 
entirely from mathematics.  (I could stand to learn more about those 
maths too.)



Hmm.  I would say it's a hash not so much.  For instance, the
difference between an array and a tuple in many languages is that an
array is homogeneously-typed--that's what allows you to access it
using runtime values (integers).  Tuples are heterogeneously-typed, so
you can't say

my $idx = get_input();
say $tuple[$idx];

(Pretend that Perl 6 is some other language :-), because the compiler
can't know what type it's going to say.

In the same way, I see a hash as homogeneously-typed, because you can
index it by strings.  What you're referring to as a tuple here would
be called a "record" or a "struct" in most languages.


Yes, you are right; a Tuple is very much a "record" or a "struct"; I 
just didn't use those because Perl doesn't have them per se; the 
closest thing that Perl has is the "object", which you could say is 
exactly equivalent.



 >   * a Relation is an unordered set of Tuples, where every Tuple has

 the same definition, as if the Relation were akin to a specific Perl
 class and every Tuple in it were akin to a Perl object of that class


When you say "unordered set" (redundantly, of course), can this set be
infinite?  That is, can I consider this relation (using made-up set
comprehension notation):

{ ($x,$y) where $x & $y (in) Int, $x <= $y }

And do stuff with it?


Yes you can.  A set can be infinite.  For example, the set of INTEGER 
contains every whole number from negative infinity to positive 
infinity.  At the same time, this set excludes all fractional numbers 
and all data that is not a number, such as characters.  This only 
becomes finite when you place bounds on the range, such as saying it 
has to be between +/- 2 billion.



 > Specifically what I would like to see added to Perl, if that doesn't

 already exist, is a set of operators that work on Relations, like set
 operations, such as these (these bulleted definitions from "Database
 in Depth", 1.3.3, some context excluded):

   * Restrict - Returns a relation containing all tuples from a
 specified relation that satisfy a specified condition. For example,
 we might restrict relation EMP to just the tuples where the DNO value
 is D2.


Well, if we consider a relation to be a set, then we can use the set 
operations:


my $newrel = $emp.grep: { .DNO === 'D2' };

I don't know what EMP, DNO, and D2 are...


Part of the context I excluded before, from section 1.3.1, is that 
the author is talking about hypothetical DEPT (Department) and EMP 
(Employee) relations (tables); DEPT has the attributes [DNO, DNAME, 
BUDGET], and EMP has the attributes [ENO, ENAME, DNO, SALARY]; 
DEPT.DNO is referenced by EMP.DNO; DEPT.DNO and EMP.ENO are primary 
keys in their respective relations.


So the restrict example is like, as you said, but with EMP an object:

  my $NEWREL = $EMP.grep:{ $.DNO eq 'D2' };

A SQLish equivalent would be:

  INSERT INTO NEWREL SELECT FROM EMP WHERE DNO = 'D2';


 >   * Project - Returns a relation containing all (sub)tuples that

 remain in a specified relation after specified attributes have been
 removed. For example, we might project relation EMP on just the ENO
 and SALARY attributes.


Hmm...  Well, if we pretend that records and hashes are the same thing
for the moment, then:

my $newrel = $emp.map: { .: };

(See the new S06 for a description of the .: syntax)


Or with EMP an object:

  my $NEWREL = $EMP.map:{ $_.class.new( ENO => $_.ENO, SALARY => $.SALARY ) };

SQLish:

  INSERT INT

Re: relational data models and Perl 6

2005-12-15 Thread Darren Duncan

As an addendum to what I said before ...

The general kind of thing I am proposing for Perl 6 to have is a 
declarative syntax for more kinds of tasks, where you can simply 
specify *what* you want to happen, and you don't have to tell Perl 
how to perform that task.


An example of declaratives that is already specified is 
hyper-operators; you don't have to tell Perl how to iterate through 
various lists or divide up tasks.


I would want the set operations for tuples to be like that, but the 
example code that Luke and I expressed already, with maps and greps 
etc, seems to smack too much of telling Perl how to do the job.


I don't want to have to use maps or greps or whatever, to express the 
various relational operations.


-- Darren Duncan


Re: relational data models and Perl 6

2005-12-15 Thread Xavier Noria

On Dec 15, 2005, at 2:19, Darren Duncan wrote:

 * a Tuple is an associative array having one or more Attributes,  
and each Attribute has a name or ordinal position and it is typed  
according to a Domain;
this is like a restricted Hash in a way, where each key has a  
specific type


 * a Relation is an unordered set of Tuples, where every Tuple has  
the same definition, as if the Relation were akin to a specific  
Perl class and every Tuple in it were akin to a Perl object of that  
class


Something that puzzled me in "Database in Depth" is that jargon,  
supposedly math-based. A relation in math is just a subset of a  
Cartesian product, and a tuple is an element of a relation. So it's  
standard for a Relation type to be a set of Tuples, but a tuple  
itself is not a set (as are "tuples" in the book, argh). So if  
something unordered like that goes into the language to mimick that  
model I wouldn't call it Tuple.


Math conventions there are well established, the jargon in "Database  
in Depth" departs from them and I don't think it is a good idea to  
adopt it.


-- fxn



Transliteration preferring longest match

2005-12-15 Thread Brad Bowman


Hi,

S05 describes an array version of trans for transliteration:
 ( http://dev.perl.org/perl6/doc/design/syn/S05.html#Transliteration )

  The array version can map one-or-more characters to one-or-more characters:

 $str.=trans( [' ',  '<','>','&'] =>
  [' ', '<', '>', '&' ]);

  In the case that more than one sequence of input characters matches,
  the longest one wins. In the case of two identical sequences the first
  in order wins.

Why does the longest input sequence win? 
  Is it for some consistency that that I'm not seeing? Some exceedingly

common use case?  The rule seems unnecessarily restrictive.

The "first in order" rule is more flexible, the user can sort their
arrays to produce the longest input rule, or use another order if that is
preferred.  The first transliteration example even uses sort in
the pair-wise form:

  $str.trans( %mapping.pairs.sort );

Can we drop the longest preference?

Brad

--
 By inconsistency and frivolity we stray from the Way and show ourselves
 to be beginners. In this we do much harm.
-- Hagakure http://bereft.net/hagakure/


Re: relational data models and Perl 6

2005-12-15 Thread Ruud H.G. van Tol
Darren Duncan schreef:

> If you take ...
>
>   +-+-+
>   |a|x|
>   |a|y|
>   |a|z|
>   |b|x|
>   |c|y|
>   +-+-+
>
> ... and divide it by ...
>
>   +-+
>   |x|
>   |z|
>   +-+
>
> ... the result is ...
>
>   +-+
>   |a|
>   +-+
>
> I'm not sure if Divide has an equivalent in SQL.


A verbose way to do it:

SELECTC_abc
FROM  T_abc_xyz NATURAL INNER JOIN T_xz
GROUP BY  C_abc
HAVINGCount(T_abc_xyz.C_xyz)
 =(SELECT Count(*) FROM T_xz);

This basically filters the INNER JOIN result-set to only keep those
subsets that have the required number of rows.

It requires that the rows of each table are unique, so there can not be
another (b,x) in T_abc_xyz.
That is a normal requirement.

-- 
Grtz, Ruud



Re: relational data models and Perl 6

2005-12-15 Thread Dave Whipp

Darren Duncan wrote:

As an addendum to what I said before ...

...
I would want the set operations for tuples to be like that, but the 
example code that Luke and I expressed already, with maps and greps etc, 
seems to smack too much of telling Perl how to do the job.


I don't want to have to use maps or greps or whatever, to express the 
various relational operations.


I think you're reading too many semantics into C and C: they 
don't tell perl *how* to implement the search, any more than 
C would. The example was:


  INSERT INTO NEWREL SELECT FROM EMP WHERE DNO = 'D2';
Vs
  my $NEWREL = $EMP.grep:{ $.DNO eq 'D2' };

The implementation of $EMP.grep depends very much on the class of $EMP. 
If this is an array-ref, then it is reasonable to think that the grep 
method would iterate the array in-order. However, if the class is 
"unordered set", then there is no such expectation on the implementation.


The deeper problem is probably the use of the "eq" operator in the test. 
Without knowing a-priori what operations (greps) will be performed on 
the relation, it is not possible to optimize the data structure for 
those specific operations. For example, if we knew that $EMP should 
store its data based on the {$.DNO eq 'D2'} equivalence class then this 
grep would have high performance (possibly at the expense of its creation).


In theory, a sufficiently magical module could examine the parse tree 
(post type-inference), and find all the calls to C on everything 
that's a tuple -- and use that to attempt optimizations of a few special 
cases (e.g. a code block that contains just an "eq" test against an 
attribute). I'm not sure how practical this would be, but I don't see 
how a different syntax (e.g. s/grep/where/) would be more more 
declarative in a way that makes this task any easier.


Re: relational data models and Perl 6

2005-12-15 Thread Rob Kinyon
[snip entire conversation so far]

(Please bear with me - I'm going to go in random directions.)

Someone please correct me if I'm wrong, but it seems that there's only
a few things missing in P6:
1) An elegant way of creating a tuple-type (the "table", so to speak)
2) A way of providing constraints across the actual tuples of a
given tuple-type
3) Syntactic sugar for performing the relational calculus

To me, a tuple-type is more than a class in the standard OO. It has to
be able to apply any constraints that might be upon the tuple-type,
such as uniqueness of a given element across all tuples or foreign-key
constraints. While this is certainly possible using the P6 OO
constructs, it would make sense for a baseclass to provide this
functionality.

Actually, this is a really great place for metaclasses to shine. The
actual tuple-type needs to be constructed from some class-constructor
(which would be, in the metamodel, itself a class). This is so that it
has the appropriate types for the elements of the tuple along with any
necessary constraints upon the tuples / elements of the tuples.

In addition, you're going to want to take actions not just on the
tuple, but on the entire tuple-type. That screams class-level methods
that operate across all instances of the class. Maybe, a set of roles
would be good for organizing this kind of across-all-instances
behavior that the tuple-type can take advantage of. I'm sure that this
wouldn't be limited to just the relational calculus.

As for the syntactic sugar, I'm not quite sure what should be done
here. And, with macros, it's not clear that there needs to be an
authoritative answer. Personally, I'd simply overload + for union, -
for difference, * for cross-product, / for divide, and so forth.
There's been some discussion with sets as to creating new operators
using the set-operators that come in Unicode. As tuples and relations
among tuples aren't necessarily sets, those might not be appropriate.

It also seems clear that junctionish iterators may be of use here. For
example, "Give me all the tuples that match this criteria" might
return an iterator that also acts as an any-junction.

It could also return a class object that has a different set of
instances marked as created from it. Though, I'm not too sure how that
would work when asking a given instance "who is the class object that
created you?" ... maybe it returns the initial one or maybe it returns
them all? I think the initial one is more correct, as the others are
just subsets. When dealing with SQL, I don't care about the subsets
that a given row belongs to - I only care about the table. So, maybe
the subset class objects delegate all methods to the original class
object except for those that deal with "Who do you have?" and "Give me
a subset where ..."

Also, joins between tuple-types would have to create a new tuple-type,
with the tuples within being delegators to the underlying tuples? I'm
not sure that this (or any other) derived tuple-type class object
should be allowed to create new tuples (though I'm sure someone can
think of a good reason why I'm wrong).

Again, just a bunch of meandering thoughts. Bonus points to whomever
can help me bridge the gap between what I just blathered and an
elegant solution to Sudoku.

Rob


Re: Transliteration preferring longest match

2005-12-15 Thread Luke Palmer
On 12/15/05, Brad Bowman <[EMAIL PROTECTED]> wrote:
> Why does the longest input sequence win?
>Is it for some consistency that that I'm not seeing? Some exceedingly
> common use case?  The rule seems unnecessarily restrictive.

Hmm.  Good point.  You see, the longest token wins because that's an
exceedingly common rule in lexers, and you can't sort regular
expressions the way you can sort strings, so there needs to be special
machinery in there.

There are two rather weak arguments to keep the longest token rule:

* We could compile the transliteration into a DFA and make it
fast.  Premature optimization.
* We could generalize transliteration to work on rules as well.

In fact, I think the first Perl module I ever wrote was
Regexp::Subst::Parallel, which did precisely the second of these. 
That's one of the easy things that was hard in Perl (but I guess
that's what CPAN is for).  Hmm.. none of these is really a compelling
argument either way.

Luke


Re: relational data models and Perl 6

2005-12-15 Thread Dr.Ruud
Ruud H.G. van Tol schreef:

> [RD-interface]

See also these Haskell Hierarchical Libraries (base package)
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Set.html
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Map.html

-- 
Affijn, Ruud

"Gewoon is een tijger."




Re: Transliteration preferring longest match

2005-12-15 Thread Larry Wall
On Thu, Dec 15, 2005 at 06:50:19PM +0100, Brad Bowman wrote:
: 
: Hi,
: 
: S05 describes an array version of trans for transliteration:
:  ( http://dev.perl.org/perl6/doc/design/syn/S05.html#Transliteration )
: 
:   The array version can map one-or-more characters to one-or-more 
:   characters:
: 
:  $str.=trans( [' ',  '<','>','&'] =>
:   [' ', '<', '>', '&' ]);
: 
:   In the case that more than one sequence of input characters matches,
:   the longest one wins. In the case of two identical sequences the first
:   in order wins.
: 
: Why does the longest input sequence win? 
:   Is it for some consistency that that I'm not seeing? Some exceedingly
: common use case?  The rule seems unnecessarily restrictive.

On the contrary, it frees the user from having to worry about the order.

: The "first in order" rule is more flexible, the user can sort their
: arrays to produce the longest input rule, or use another order if that is
: preferred.

What possible use is a user-ordered rule set?  If you put the shorter
entry first, the longer one can never be reached.  It's not like you
can backtrack into a transliteration and pick a different entry.

: The first transliteration example even uses sort in
: the pair-wise form:
: 
:   $str.trans( %mapping.pairs.sort );

That seems like a useless use of sort, and probably defeats the optimizer
as well.

: Can we drop the longest preference?

Doesn't hurt anything, and can probably help.  Plus we already have
the longest token rule in there for magical hash matching in rules,
so it's likely the optimizer will already know how to handle it,
or something like it.

Larry


Re: Transliteration preferring longest match

2005-12-15 Thread John Macdonald
On Thu, Dec 15, 2005 at 09:56:09PM +, Luke Palmer wrote:
> On 12/15/05, Brad Bowman <[EMAIL PROTECTED]> wrote:
> > Why does the longest input sequence win?
> >Is it for some consistency that that I'm not seeing? Some exceedingly
> > common use case?  The rule seems unnecessarily restrictive.
> 
> Hmm.  Good point.  You see, the longest token wins because that's an
> exceedingly common rule in lexers, and you can't sort regular
> expressions the way you can sort strings, so there needs to be special
> machinery in there.
> 
> There are two rather weak arguments to keep the longest token rule:
> 
> * We could compile the transliteration into a DFA and make it
> fast.  Premature optimization.
> * We could generalize transliteration to work on rules as well.
> 
> In fact, I think the first Perl module I ever wrote was
> Regexp::Subst::Parallel, which did precisely the second of these. 
> That's one of the easy things that was hard in Perl (but I guess
> that's what CPAN is for).  Hmm.. none of these is really a compelling
> argument either way.

If a shorter rule is allowed to match first, then the longer
rule can be removed from the match set, at least for constant
string matches.  If, for example, '=' can match without
preferring to try first for '==' then you'll never match '=='
without syntactic help to force a backtracking retry.

--