Re: relational data models and Perl 6
At 2:54 AM + 12/15/05, Luke Palmer wrote: On 12/15/05, Darren Duncan <[EMAIL PROTECTED]> wrote: I propose, perhaps redundantly, that Perl 6 include a complete set of native Okay, I'm with you here. Just please stop saying "native" and "core". Everyone. Yes, of course. What I meant was that I considered relational data important enough for common programming to be considered by the Perl 6 language designers, so that the language allows for it to be elegantly represented and processed. The implementation details aren't that important. I would like to hear from Ovid and Dave Rolsky on this issue too, as they seem to have been researching pure relational models. As am I now. My own database access framework in development is evolving to be centered more around an ideal relational model rather than simply what SQL or existing databases define. It does any serious database developer good to be familiar with what the relational model actually says, and not just what tangential things have actually been implemented by various vendors. The sources I cited are good reference and/or explanatory materials. > Essentially it comes down to better handling of data sets. Cool. I've recently been taken by list comprehensions, and I keep seeing "set comprehensions" in my math classes. Maybe we can steal some similar notation. You probably could; the terms used in relational theory are mostly or entirely from mathematics. (I could stand to learn more about those maths too.) Hmm. I would say it's a hash not so much. For instance, the difference between an array and a tuple in many languages is that an array is homogeneously-typed--that's what allows you to access it using runtime values (integers). Tuples are heterogeneously-typed, so you can't say my $idx = get_input(); say $tuple[$idx]; (Pretend that Perl 6 is some other language :-), because the compiler can't know what type it's going to say. In the same way, I see a hash as homogeneously-typed, because you can index it by strings. What you're referring to as a tuple here would be called a "record" or a "struct" in most languages. Yes, you are right; a Tuple is very much a "record" or a "struct"; I just didn't use those because Perl doesn't have them per se; the closest thing that Perl has is the "object", which you could say is exactly equivalent. > * a Relation is an unordered set of Tuples, where every Tuple has the same definition, as if the Relation were akin to a specific Perl class and every Tuple in it were akin to a Perl object of that class When you say "unordered set" (redundantly, of course), can this set be infinite? That is, can I consider this relation (using made-up set comprehension notation): { ($x,$y) where $x & $y (in) Int, $x <= $y } And do stuff with it? Yes you can. A set can be infinite. For example, the set of INTEGER contains every whole number from negative infinity to positive infinity. At the same time, this set excludes all fractional numbers and all data that is not a number, such as characters. This only becomes finite when you place bounds on the range, such as saying it has to be between +/- 2 billion. > Specifically what I would like to see added to Perl, if that doesn't already exist, is a set of operators that work on Relations, like set operations, such as these (these bulleted definitions from "Database in Depth", 1.3.3, some context excluded): * Restrict - Returns a relation containing all tuples from a specified relation that satisfy a specified condition. For example, we might restrict relation EMP to just the tuples where the DNO value is D2. Well, if we consider a relation to be a set, then we can use the set operations: my $newrel = $emp.grep: { .DNO === 'D2' }; I don't know what EMP, DNO, and D2 are... Part of the context I excluded before, from section 1.3.1, is that the author is talking about hypothetical DEPT (Department) and EMP (Employee) relations (tables); DEPT has the attributes [DNO, DNAME, BUDGET], and EMP has the attributes [ENO, ENAME, DNO, SALARY]; DEPT.DNO is referenced by EMP.DNO; DEPT.DNO and EMP.ENO are primary keys in their respective relations. So the restrict example is like, as you said, but with EMP an object: my $NEWREL = $EMP.grep:{ $.DNO eq 'D2' }; A SQLish equivalent would be: INSERT INTO NEWREL SELECT FROM EMP WHERE DNO = 'D2'; > * Project - Returns a relation containing all (sub)tuples that remain in a specified relation after specified attributes have been removed. For example, we might project relation EMP on just the ENO and SALARY attributes. Hmm... Well, if we pretend that records and hashes are the same thing for the moment, then: my $newrel = $emp.map: { .: }; (See the new S06 for a description of the .: syntax) Or with EMP an object: my $NEWREL = $EMP.map:{ $_.class.new( ENO => $_.ENO, SALARY => $.SALARY ) }; SQLish: INSERT INT
Re: relational data models and Perl 6
As an addendum to what I said before ... The general kind of thing I am proposing for Perl 6 to have is a declarative syntax for more kinds of tasks, where you can simply specify *what* you want to happen, and you don't have to tell Perl how to perform that task. An example of declaratives that is already specified is hyper-operators; you don't have to tell Perl how to iterate through various lists or divide up tasks. I would want the set operations for tuples to be like that, but the example code that Luke and I expressed already, with maps and greps etc, seems to smack too much of telling Perl how to do the job. I don't want to have to use maps or greps or whatever, to express the various relational operations. -- Darren Duncan
Re: relational data models and Perl 6
On Dec 15, 2005, at 2:19, Darren Duncan wrote: * a Tuple is an associative array having one or more Attributes, and each Attribute has a name or ordinal position and it is typed according to a Domain; this is like a restricted Hash in a way, where each key has a specific type * a Relation is an unordered set of Tuples, where every Tuple has the same definition, as if the Relation were akin to a specific Perl class and every Tuple in it were akin to a Perl object of that class Something that puzzled me in "Database in Depth" is that jargon, supposedly math-based. A relation in math is just a subset of a Cartesian product, and a tuple is an element of a relation. So it's standard for a Relation type to be a set of Tuples, but a tuple itself is not a set (as are "tuples" in the book, argh). So if something unordered like that goes into the language to mimick that model I wouldn't call it Tuple. Math conventions there are well established, the jargon in "Database in Depth" departs from them and I don't think it is a good idea to adopt it. -- fxn
Transliteration preferring longest match
Hi, S05 describes an array version of trans for transliteration: ( http://dev.perl.org/perl6/doc/design/syn/S05.html#Transliteration ) The array version can map one-or-more characters to one-or-more characters: $str.=trans( [' ', '<','>','&'] => [' ', '<', '>', '&' ]); In the case that more than one sequence of input characters matches, the longest one wins. In the case of two identical sequences the first in order wins. Why does the longest input sequence win? Is it for some consistency that that I'm not seeing? Some exceedingly common use case? The rule seems unnecessarily restrictive. The "first in order" rule is more flexible, the user can sort their arrays to produce the longest input rule, or use another order if that is preferred. The first transliteration example even uses sort in the pair-wise form: $str.trans( %mapping.pairs.sort ); Can we drop the longest preference? Brad -- By inconsistency and frivolity we stray from the Way and show ourselves to be beginners. In this we do much harm. -- Hagakure http://bereft.net/hagakure/
Re: relational data models and Perl 6
Darren Duncan schreef: > If you take ... > > +-+-+ > |a|x| > |a|y| > |a|z| > |b|x| > |c|y| > +-+-+ > > ... and divide it by ... > > +-+ > |x| > |z| > +-+ > > ... the result is ... > > +-+ > |a| > +-+ > > I'm not sure if Divide has an equivalent in SQL. A verbose way to do it: SELECTC_abc FROM T_abc_xyz NATURAL INNER JOIN T_xz GROUP BY C_abc HAVINGCount(T_abc_xyz.C_xyz) =(SELECT Count(*) FROM T_xz); This basically filters the INNER JOIN result-set to only keep those subsets that have the required number of rows. It requires that the rows of each table are unique, so there can not be another (b,x) in T_abc_xyz. That is a normal requirement. -- Grtz, Ruud
Re: relational data models and Perl 6
Darren Duncan wrote: As an addendum to what I said before ... ... I would want the set operations for tuples to be like that, but the example code that Luke and I expressed already, with maps and greps etc, seems to smack too much of telling Perl how to do the job. I don't want to have to use maps or greps or whatever, to express the various relational operations. I think you're reading too many semantics into C and C: they don't tell perl *how* to implement the search, any more than C would. The example was: INSERT INTO NEWREL SELECT FROM EMP WHERE DNO = 'D2'; Vs my $NEWREL = $EMP.grep:{ $.DNO eq 'D2' }; The implementation of $EMP.grep depends very much on the class of $EMP. If this is an array-ref, then it is reasonable to think that the grep method would iterate the array in-order. However, if the class is "unordered set", then there is no such expectation on the implementation. The deeper problem is probably the use of the "eq" operator in the test. Without knowing a-priori what operations (greps) will be performed on the relation, it is not possible to optimize the data structure for those specific operations. For example, if we knew that $EMP should store its data based on the {$.DNO eq 'D2'} equivalence class then this grep would have high performance (possibly at the expense of its creation). In theory, a sufficiently magical module could examine the parse tree (post type-inference), and find all the calls to C on everything that's a tuple -- and use that to attempt optimizations of a few special cases (e.g. a code block that contains just an "eq" test against an attribute). I'm not sure how practical this would be, but I don't see how a different syntax (e.g. s/grep/where/) would be more more declarative in a way that makes this task any easier.
Re: relational data models and Perl 6
[snip entire conversation so far] (Please bear with me - I'm going to go in random directions.) Someone please correct me if I'm wrong, but it seems that there's only a few things missing in P6: 1) An elegant way of creating a tuple-type (the "table", so to speak) 2) A way of providing constraints across the actual tuples of a given tuple-type 3) Syntactic sugar for performing the relational calculus To me, a tuple-type is more than a class in the standard OO. It has to be able to apply any constraints that might be upon the tuple-type, such as uniqueness of a given element across all tuples or foreign-key constraints. While this is certainly possible using the P6 OO constructs, it would make sense for a baseclass to provide this functionality. Actually, this is a really great place for metaclasses to shine. The actual tuple-type needs to be constructed from some class-constructor (which would be, in the metamodel, itself a class). This is so that it has the appropriate types for the elements of the tuple along with any necessary constraints upon the tuples / elements of the tuples. In addition, you're going to want to take actions not just on the tuple, but on the entire tuple-type. That screams class-level methods that operate across all instances of the class. Maybe, a set of roles would be good for organizing this kind of across-all-instances behavior that the tuple-type can take advantage of. I'm sure that this wouldn't be limited to just the relational calculus. As for the syntactic sugar, I'm not quite sure what should be done here. And, with macros, it's not clear that there needs to be an authoritative answer. Personally, I'd simply overload + for union, - for difference, * for cross-product, / for divide, and so forth. There's been some discussion with sets as to creating new operators using the set-operators that come in Unicode. As tuples and relations among tuples aren't necessarily sets, those might not be appropriate. It also seems clear that junctionish iterators may be of use here. For example, "Give me all the tuples that match this criteria" might return an iterator that also acts as an any-junction. It could also return a class object that has a different set of instances marked as created from it. Though, I'm not too sure how that would work when asking a given instance "who is the class object that created you?" ... maybe it returns the initial one or maybe it returns them all? I think the initial one is more correct, as the others are just subsets. When dealing with SQL, I don't care about the subsets that a given row belongs to - I only care about the table. So, maybe the subset class objects delegate all methods to the original class object except for those that deal with "Who do you have?" and "Give me a subset where ..." Also, joins between tuple-types would have to create a new tuple-type, with the tuples within being delegators to the underlying tuples? I'm not sure that this (or any other) derived tuple-type class object should be allowed to create new tuples (though I'm sure someone can think of a good reason why I'm wrong). Again, just a bunch of meandering thoughts. Bonus points to whomever can help me bridge the gap between what I just blathered and an elegant solution to Sudoku. Rob
Re: Transliteration preferring longest match
On 12/15/05, Brad Bowman <[EMAIL PROTECTED]> wrote: > Why does the longest input sequence win? >Is it for some consistency that that I'm not seeing? Some exceedingly > common use case? The rule seems unnecessarily restrictive. Hmm. Good point. You see, the longest token wins because that's an exceedingly common rule in lexers, and you can't sort regular expressions the way you can sort strings, so there needs to be special machinery in there. There are two rather weak arguments to keep the longest token rule: * We could compile the transliteration into a DFA and make it fast. Premature optimization. * We could generalize transliteration to work on rules as well. In fact, I think the first Perl module I ever wrote was Regexp::Subst::Parallel, which did precisely the second of these. That's one of the easy things that was hard in Perl (but I guess that's what CPAN is for). Hmm.. none of these is really a compelling argument either way. Luke
Re: relational data models and Perl 6
Ruud H.G. van Tol schreef: > [RD-interface] See also these Haskell Hierarchical Libraries (base package) http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Set.html http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Map.html -- Affijn, Ruud "Gewoon is een tijger."
Re: Transliteration preferring longest match
On Thu, Dec 15, 2005 at 06:50:19PM +0100, Brad Bowman wrote: : : Hi, : : S05 describes an array version of trans for transliteration: : ( http://dev.perl.org/perl6/doc/design/syn/S05.html#Transliteration ) : : The array version can map one-or-more characters to one-or-more : characters: : : $str.=trans( [' ', '<','>','&'] => : [' ', '<', '>', '&' ]); : : In the case that more than one sequence of input characters matches, : the longest one wins. In the case of two identical sequences the first : in order wins. : : Why does the longest input sequence win? : Is it for some consistency that that I'm not seeing? Some exceedingly : common use case? The rule seems unnecessarily restrictive. On the contrary, it frees the user from having to worry about the order. : The "first in order" rule is more flexible, the user can sort their : arrays to produce the longest input rule, or use another order if that is : preferred. What possible use is a user-ordered rule set? If you put the shorter entry first, the longer one can never be reached. It's not like you can backtrack into a transliteration and pick a different entry. : The first transliteration example even uses sort in : the pair-wise form: : : $str.trans( %mapping.pairs.sort ); That seems like a useless use of sort, and probably defeats the optimizer as well. : Can we drop the longest preference? Doesn't hurt anything, and can probably help. Plus we already have the longest token rule in there for magical hash matching in rules, so it's likely the optimizer will already know how to handle it, or something like it. Larry
Re: Transliteration preferring longest match
On Thu, Dec 15, 2005 at 09:56:09PM +, Luke Palmer wrote: > On 12/15/05, Brad Bowman <[EMAIL PROTECTED]> wrote: > > Why does the longest input sequence win? > >Is it for some consistency that that I'm not seeing? Some exceedingly > > common use case? The rule seems unnecessarily restrictive. > > Hmm. Good point. You see, the longest token wins because that's an > exceedingly common rule in lexers, and you can't sort regular > expressions the way you can sort strings, so there needs to be special > machinery in there. > > There are two rather weak arguments to keep the longest token rule: > > * We could compile the transliteration into a DFA and make it > fast. Premature optimization. > * We could generalize transliteration to work on rules as well. > > In fact, I think the first Perl module I ever wrote was > Regexp::Subst::Parallel, which did precisely the second of these. > That's one of the easy things that was hard in Perl (but I guess > that's what CPAN is for). Hmm.. none of these is really a compelling > argument either way. If a shorter rule is allowed to match first, then the longer rule can be removed from the match set, at least for constant string matches. If, for example, '=' can match without preferring to try first for '==' then you'll never match '==' without syntactic help to force a backtracking retry. --