> -----Original Message----- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: Wednesday, May 11, 2005 2:22 PM > To: Dave Held > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Oracle Style packages on postgres > > > "Dave Held" <[EMAIL PROTECTED]> writes: > > /* > > * We check the catalog name and then ignore it. > > */ > > if (!isValidNamespace(name[0])) > > { > > if (strcmp(name[0], > get_database_name(MyDatabaseId)) != 0) > > ereport(ERROR, > > Which more or less proves my point: the syntax is fundamentally > ambiguous.
Not at all. Ambiguity means that there are two equally valid parses. Under the semantics I proposed, schema names take precedence. That is, given: db: foo schema: bar schema: foo.bar The expression foo.bar.rel.col refers to schema foo.bar, and not to db foo, schema bar. If by "fundamentally ambiguous", you mean "there is no a priori reason to choose one set of semantics over another", I would tend to disagree, but the syntax as I proposed it is not ambiguous. We use precedence to eliminate otherwise valid parses all the time. > I suppose people would learn not to use schema names that > match the database they are in, but that doesn't make it a > good idea to have sensible behavior depend on non-overlap of > those names. There's nothing wrong with using a schema name that matches the db. The only confusion comes when you put nested elements at both the db level and schema level having the same names. Since I presume most people don't specify db names in their queries, having schemas take precedence makes the most sense to me. > [ thinks for awhile ... ] > > OTOH, what if we pretended that two-level-nested schemas ARE > catalogs in the sense that the SQL spec expects? Then we could > get rid of the pro-forma special case here, which isn't ever > likely to do anything more useful than throw an error anyway. > Thus, we'd go back to the pre-7.3 notion that the current > Postgres DB's name isn't part of the SQL naming scheme at all, > and instead handle the spec's syntax requirements by setting up > some conventions that make a schema act like what the spec says > is a catalog. > [...] I think this would be worse than not having nested schemas at all. It looks, feels, and smells like a hack. I think there should be a reasonable depth to schema nesting, but I think it should be much larger than 2. I think 8 is much more reasonable. One can argue that nested schemas are nothing more than syntactic sugar, and this is most definitely true. But as programming language design teaches us, syntactic sugar is everything. The better our tools can model our problem spaces, the better they can help us solve our problems. A way in which nested schemas are more than syntactic sugar is in the fact that they can provide a convenient means of additinoal security management. Rather than twiddling with the privileges on groups of objects within a schema, objects that should have similar privileges can be put in the same subschema. However, returning to the original topic of the thread, nested schemas are not nearly as interesting to me as the encapsulation provided by a package-like feature. To be honest, though, what tantalizes me is not the prospect of a package feature but an expansion of the Type system. As a reasonably popular production system, Postgres must necessarily be conservative. But its roots lay in experimentation, and vestiges of those roots can still be seen in its structure. Because of its maturity, Postgres is well positioned to implement some rather advanced concepts, but perhaps the most radical of them should be implemented in a fork rather than the main system. Traditionally, a database is seen as a warehouse of raw data. ODBMSes position themselves as the next generation by viewing a database as a collection of persistent, richly structured objects. Both views have strengths and weaknesses. Postgres takes an interesting middle ground position within the ORDBMS space. It is heavily relational with strong support for standard SQL and numerous query tuning options. But it also features an interesting number of rather non-relational concepts, like custom operator definitions, operator classes, user-defined conversions and types. However, it seems to me that these features are probably very underutilized. This is probably due to two reasons: 1) most programmers aren't used to being able to define custom operators in their favorite programming language, so the concept isn't familiar enough to them to try it in their DBMS. 2) The other features which support this aren't designed or presented in a cohesive manner that impresses the programmer that this is a compelling and superior way to go about things. The fact is, operator overloading is a *very* powerful way to program. In particular, it is one of the key factors in supporting generic programming in a natural way. People who are unsure of this claim should examine the Spirit parser generator or Blitz++, among numerous other examples. It's something of a tragedy that such a powerful feature is sitting languishing in a database system, when most mainstream PLs have yet to catch up! But that is exactly the problem. Operator overloading is a concept that is ahead of its time, and was when it was invented way back in the early days of Lisp. Similarly, the user-defined conversions speak to a chimeric nature of Postgres, which is its attitude towards typing. In some areas, Postgres is strongly typed, and in others, it is weakly or dynamically typed. Yet in others it is polymorphically typed. But Postgres isn't alone in this conundrum. Languages that are very strongly typed tend to require casting between types. Languages that are weakly typed or dynamically typed are more prone to type errors (just ask Java programmers about ClassCastException). The trend in PL design is that strong typing is better. In fact, the type system is the essence of a PL. The grammar determines what is a legal program in a given language. The type system determines what is *not* a legal program. By eliminating nonsensical programs, a type system enforces good programming behavior. So the power of a type system is not in what it enables, but rather what it *disables* or disallows. That is why encapsulation is good. It does not add a powerful new form of computation to a programming language. Rather, it *restricts* what can be done so that dangerous operations can be eliminated or at least localized to an easily-inspected region of the code. Postgres has an opportunity to revolutionize database design by introducing a strongly typed DBMS that offers some of the features of a pure ODBMS while providing the traditional RDBMS capabilities. I don't know exactly what such a beast would look like, but I do know that I see a lot of the pieces already in Postgres, waiting to be chipped and molded and fit together into a more elegant whole. Let me give an illustration of how this might possibly come about. The canonical example is the Employee table. In a flat file world, this table would be a simple set of records, each containing possibly redundant information about a set of employees. Names, ssn, phones, position, salary, etc. To access the employees, we fetch the records of interest from the DB and manipulate them mostly in our client or middleware code. The DB does nothing more than load and save our data, and assist with searches. In the relational world, we factor out the common data and put them into different relations, linking to them with foreign keys. We basically break the record up into pieces, and reassemble the pieces as needed. For compiling reports that look at the data as an aggregate in different ways, this is a very powerful and efficient way to go about things. For dealing with individual employees as singular entities, this is not such an efficient way to do things. It is very space efficient, but it pays for that space efficiency with time. When we want our time back, we flatten the relations and materialize our views and basically tread back towards the the flat file world. When a client wants a tuple, the DBMS has to ressurect the tuple from its component relations, as if it were breathing life into it from the clay of the earth. However, the DBMS can be clever and offer some server-side triggers or procedures to assist the client in manipulating the object. In the ODBMS world, we return partway to the flat file world by serializing objects in a more or less flat way. However, the objects can refer to each other in a relational way, and those relations are captured as well. But the normalization typical in the relational world is by no means typical of the OOP world, so the data factoring is much less aggressive, which has both drawbacks and benefits, depending on the application. Once again, the DBMS is just a data server, serializing and unserializing objects to disk. The process is very fast and clean for object-oriented data, but not so fast for flat data. The problem in the database world is the same as the problem in the PL world: people are conflating types and representations. A representation is a raw data format, like a 16-bit 2's complement signed integer, or an 8-bit ASCII character. A type is a set of values, an instance of which is stored in some representation. An Integer may be stored in the 16-bit signed int, or it may be represented by some 128-bit memory location. But really, most real-world problems do not have Integers as a natural type in the model. Going back to our Employee, the natural types will be more like SSN: the values are always 9 digits long and there's rules about allowed prefixes, etc. The set of values in the SSN type is smaller than the set of values in the 9-digit-Integers type. So 9-digit-Integer can be a representation of SSN, but an SSN *type* implies more. It doesn't make sense to do arithmetic on an SSN, so arithmetical operations should neither be defined nor allowed on SSNs. To do arithmetic on one, you should be forced to cast to another type, and the cast should check whether this is a sensible or valid operation in the current context. Now the thing about the relational model is that it is essentially about *representations*. So the things that we call "attribute types" are really "attribute representations", which is why there are so few of them by default. In reality, every problem domain brings a large host of richly defined types that map onto those representations to a greater or lesser degree. The reason people want packages is because functions in Postgres, and indeed, most RDBMSes, operate on representations, which allow you to do something like take the square root of an SSN, which makes perfect sense if the SSN is a numeric(9, 0). Programmers who want greater correctness see that type safety is the way to get there. And type safety can only be enforced in a strongly typed environment. The interesting thing is that domains are essentially the formal notion of a type in the type vs. representation distinction. In a strongly typed ORDBMS, tuples would be defined as sets of domains, rather than sets of "types" (representations). And operations on tuples would be strictly type checked. Note that the other features of OOP, inheritance and polymorphism, are really just ways to define exceptions or loopholes in the type system. Inheritance allows you say that one type is substitutable for another, and polymorphism says that an operation knows how to act on more than one type, even when the types aren't related. So back to the future...in an ORDBMS world you should not ask for a collection of representations. You should ask for a collection of objects. You should not ask for transformation of tuples, but rather transformation of objects and sets of objects. And the user should be able to define a sufficiently rich interface to those objects that the ORDBMS can easily fulfill them. Say you want to increase the salary of every employee at pay grade 3. You should not be able to directly manipulate the salary field of the Employee relation. Rather, you should define a method like RaiseSalary() that operates on Employees (or better yet, is a member of the Employee type), and then define the set of Employees over which that method should be applied. To use a somewhat mathematical notation: RaiseSalary({Employee e | e.payGrade() == 3}); That's not say that SQL should go away. But it should be a language of the representations, an implementation detail that is hidden by a more powerful type-safe facade that helps ensure program correctness. Of course, I'm not making any specific proposal to change Postgres at this time. I'm just giving the community some food for thought to chew on when considering future directions of DBMSes in general and Postgres in particular. I could go on about generic programming and its relation to the ideas above and the current features in Postgres, but I will save that discussion for a rainy day. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]